perldoc-html/ 000755 000765 000024 00000000000 12276001417 013216 5 ustar 00jj staff 000000 000000 perldoc-html/AnyDBM_File.html 000644 000765 000024 00000043776 12275777423 016156 0 ustar 00jj staff 000000 000000
AnyDBM_File - provide framework for multiple DBMs
NDBM_File, DB_File, GDBM_File, SDBM_File, ODBM_File - various DBM implementations
- use AnyDBM_File;
This module is a "pure virtual base class"--it has nothing of its own. It's just there to inherit from one of the various DBM packages. It prefers ndbm for compatibility reasons with Perl 4, then Berkeley DB (See DB_File), GDBM, SDBM (which is always there--it comes with Perl), and finally ODBM. This way old programs that used to use NDBM via dbmopen() can still do so, but new ones can reorder @ISA:
- BEGIN { @AnyDBM_File::ISA = qw(DB_File GDBM_File NDBM_File) }
- use AnyDBM_File;
Having multiple DBM implementations makes it trivial to copy database formats:
Here's a partial table of features the different packages offer:
- odbm ndbm sdbm gdbm bsd-db
- ---- ---- ---- ---- ------
- Linkage comes w/ perl yes yes yes yes yes
- Src comes w/ perl no no yes no no
- Comes w/ many unix os yes yes[0] no no no
- Builds ok on !unix ? ? yes yes ?
- Code Size ? ? small big big
- Database Size ? ? small big? ok[1]
- Speed ? ? slow ok fast
- FTPable no no yes yes yes
- Easy to build N/A N/A yes yes ok[2]
- Size limits 1k 4k 1k[3] none none
- Byte-order independent no no no no yes
- Licensing restrictions ? ? no yes no
on mixed universe machines, may be in the bsd compat library, which is often shunned.
Can be trimmed if you compile for one access method.
See DB_File. Requires symbolic links.
By default, but can be redefined.
dbm(3), ndbm(3), DB_File(3), perldbmfilter
AutoLoader - load subroutines only on demand
The AutoLoader module works with the AutoSplit module and the
__END__
token to defer the loading of some subroutines until they are
used rather than loading them all at once.
To use AutoLoader, the author of a module has to place the
definitions of subroutines to be autoloaded after an __END__
token.
(See perldata.) The AutoSplit module can then be run manually to
extract the definitions into individual files auto/funcname.al.
AutoLoader implements an AUTOLOAD subroutine. When an undefined
subroutine in is called in a client module of AutoLoader,
AutoLoader's AUTOLOAD subroutine attempts to locate the subroutine in a
file with a name related to the location of the file from which the
client module was read. As an example, if POSIX.pm is located in
/usr/local/lib/perl5/POSIX.pm, AutoLoader will look for perl
subroutines POSIX in /usr/local/lib/perl5/auto/POSIX/*.al, where
the .al file has the same name as the subroutine, sans package. If
such a file exists, AUTOLOAD will read and evaluate it,
thus (presumably) defining the needed subroutine. AUTOLOAD will then
goto the newly defined subroutine.
Once this process completes for a given function, it is defined, so future calls to the subroutine will bypass the AUTOLOAD mechanism.
In order for object method lookup and/or prototype checking to operate
correctly even when methods have not yet been defined it is necessary to
"forward declare" each subroutine (as in sub NAME;
). See
SYNOPSIS in perlsub. Such forward declaration creates "subroutine
stubs", which are place holders with no code.
The AutoSplit and AutoLoader modules automate the creation of forward declarations. The AutoSplit module creates an 'index' file containing forward declarations of all the AutoSplit subroutines. When the AutoLoader module is 'use'd it loads these declarations into its callers package.
Because of this mechanism it is important that AutoLoader is always
used and not required.
In order to use AutoLoader's AUTOLOAD subroutine you must explicitly import it:
- use AutoLoader 'AUTOLOAD';
Some modules, mainly extensions, provide their own AUTOLOAD subroutines. They typically need to check for some special cases (such as constants) and then fallback to AutoLoader's AUTOLOAD for the rest.
Such modules should not import AutoLoader's AUTOLOAD subroutine. Instead, they should define their own AUTOLOAD subroutines along these lines:
- use AutoLoader;
- use Carp;
- sub AUTOLOAD {
- my $sub = $AUTOLOAD;
- (my $constname = $sub) =~ s/.*:://;
- my $val = constant($constname, @_ ? $_[0] : 0);
- if ($! != 0) {
- if ($! =~ /Invalid/ || $!{EINVAL}) {
- $AutoLoader::AUTOLOAD = $sub;
- goto &AutoLoader::AUTOLOAD;
- }
- else {
- croak "Your vendor has not defined constant $constname";
- }
- }
- *$sub = sub { $val }; # same as: eval "sub $sub { $val }";
- goto &$sub;
- }
If any module's own AUTOLOAD subroutine has no need to fallback to the AutoLoader's AUTOLOAD subroutine (because it doesn't have any AutoSplit subroutines), then that module should not use AutoLoader at all.
Package lexicals declared with my in the main block of a package
using AutoLoader will not be visible to auto-loaded subroutines, due to
the fact that the given scope ends at the __END__
marker. A module
using such variables as package globals will not work properly under the
AutoLoader.
The vars
pragma (see vars in perlmod) may be used in such
situations as an alternative to explicitly qualifying all globals with
the package namespace. Variables pre-declared with this pragma will be
visible to any autoloaded routines (but will not be invisible outside
the package, unfortunately).
You can stop using AutoLoader by simply
- no AutoLoader;
The AutoLoader is similar in purpose to SelfLoader: both delay the loading of subroutines.
SelfLoader uses the __DATA__
marker rather than __END__
.
While this avoids the use of a hierarchy of disk files and the
associated open/close for each routine loaded, SelfLoader suffers a
startup speed disadvantage in the one-time parsing of the lines after
__DATA__
, after which routines are cached. SelfLoader can also
handle multiple packages in a file.
AutoLoader only reads code as it is requested, and in many cases should be faster, but requires a mechanism like AutoSplit be used to create the individual files. ExtUtils::MakeMaker will invoke AutoSplit automatically if AutoLoader is used in a module source file.
Sometimes, it can be necessary or useful to make sure that a certain function is fully loaded by AutoLoader. This is the case, for example, when you need to wrap a function to inject debugging code. It is also helpful to force early loading of code before forking to make use of copy-on-write as much as possible.
Starting with AutoLoader 5.73, you can call the
AutoLoader::autoload_sub
function with the fully-qualified name of
the function to load from its .al file. The behaviour is exactly
the same as if you called the function, triggering the regular
AUTOLOAD
mechanism, but it does not actually execute the
autoloaded function.
AutoLoaders prior to Perl 5.002 had a slightly different interface. Any
old modules which use AutoLoader should be changed to the new calling
style. Typically this just means changing a require to a use, adding
the explicit 'AUTOLOAD'
import if needed, and removing AutoLoader
from @ISA
.
On systems with restrictions on file name length, the file corresponding to a subroutine may have a shorter name that the routine itself. This can lead to conflicting file names. The AutoSplit package warns of these potential conflicts when used to split a module.
AutoLoader may fail to find the autosplit files (or even find the wrong
ones) in cases where @INC
contains relative paths, and the program
does chdir.
SelfLoader - an autoloader that doesn't use external files.
AutoLoader
is maintained by the perl5-porters. Please direct
any questions to the canonical mailing list. Anything that
is applicable to the CPAN release can be sent to its maintainer,
though.
Author and Maintainer: The Perl5-Porters <perl5-porters@perl.org>
Maintainer of the CPAN release: Steffen Mueller <smueller@cpan.org>
This package has been part of the perl core since the first release of perl5. It has been released separately to CPAN so older installations can benefit from bug fixes.
This package has the same copyright and license as the perl core:
- Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999,
- 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
- 2011, 2012
- by Larry Wall and others
- All rights reserved.
- This program is free software; you can redistribute it and/or modify
- it under the terms of either:
- a) the GNU General Public License as published by the Free
- Software Foundation; either version 1, or (at your option) any
- later version, or
- b) the "Artistic License" which comes with this Kit.
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See either
- the GNU General Public License or the Artistic License for more details.
- You should have received a copy of the Artistic License with this
- Kit, in the file named "Artistic". If not, I'll be glad to provide one.
- You should also have received a copy of the GNU General Public License
- along with this program in the file named "Copying". If not, write to the
- Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
- MA 02110-1301, USA or visit their web page on the internet at
- http://www.gnu.org/copyleft/gpl.html.
- For those of you that choose to use the GNU General Public License,
- my interpretation of the GNU General Public License is that no Perl
- script falls under the terms of the GPL unless you explicitly put
- said script under the terms of the GPL yourself. Furthermore, any
- object code linked with perl does not automatically fall under the
- terms of the GPL, provided such object code only adds definitions
- of subroutines and variables, and does not otherwise impair the
- resulting interpreter from executing any standard Perl script. I
- consider linking in C subroutines in this manner to be the moral
- equivalent of defining subroutines in the Perl language itself. You
- may sell such an object file as proprietary provided that you provide
- or offer to provide the Perl source, as specified by the GNU General
- Public License. (This is merely an alternate way of specifying input
- to the program.) You may also sell a binary produced by the dumping of
- a running Perl script that belongs to you, provided that you provide or
- offer to provide the Perl source as specified by the GPL. (The
- fact that a Perl interpreter and your code are in the same binary file
- is, in this case, a form of mere aggregation.) This is my interpretation
- of the GPL. If you still have concerns or difficulties understanding
- my intent, feel free to contact me. Of course, the Artistic License
- spells all this out for your protection, so you may prefer to use that.
AutoSplit - split a package for autoloading
- autosplit($file, $dir, $keep, $check, $modtime);
- autosplit_lib_modules(@modules);
This function will split up your program into files that the AutoLoader module can handle. It is used by both the standard perl libraries and by the MakeMaker utility, to automatically configure libraries for autoloading.
The autosplit
interface splits the specified file into a hierarchy
rooted at the directory $dir
. It creates directories as needed to reflect
class hierarchy, and creates the file autosplit.ix. This file acts as
both forward declaration of all package routines, and as timestamp for the
last update of the hierarchy.
The remaining three arguments to autosplit
govern other options to
the autosplitter.
If the third argument, $keep, is false, then any
pre-existing *.al files in the autoload directory are removed if
they are no longer part of the module (obsoleted functions).
$keep defaults to 0.
The
fourth argument, $check, instructs autosplit
to check the module
currently being split to ensure that it includes a use
specification for the AutoLoader module, and skips the module if
AutoLoader is not detected.
$check defaults to 1.
Lastly, the $modtime argument specifies
that autosplit
is to check the modification time of the module
against that of the autosplit.ix
file, and only split the module if
it is newer.
$modtime defaults to 1.
Typical use of AutoSplit in the perl MakeMaker utility is via the command-line with:
- perl -e 'use AutoSplit; autosplit($ARGV[0], $ARGV[1], 0, 1, 1)'
Defined as a Make macro, it is invoked with file and directory arguments;
autosplit
will split the specified file into the specified directory and
delete obsolete .al files, after checking first that the module does use
the AutoLoader, and ensuring that the module is not already currently split
in its current form (the modtime test).
The autosplit_lib_modules
form is used in the building of perl. It takes
as input a list of files (modules) that are assumed to reside in a directory
lib relative to the current directory. Each file is sent to the
autosplitter one at a time, to be split into the directory lib/auto.
In both usages of the autosplitter, only subroutines defined following the perl __END__ token are split out into separate files. Some routines may be placed prior to this marker to force their immediate loading and parsing.
As of version 1.01 of the AutoSplit module it is possible to have multiple packages within a single file. Both of the following cases are supported:
AutoSplit
will inform the user if it is necessary to create the
top-level directory specified in the invocation. It is preferred that
the script or installation process that invokes AutoSplit
have
created the full directory path ahead of time. This warning may
indicate that the module is being split into an incorrect path.
AutoSplit
will warn the user of all subroutines whose name causes
potential file naming conflicts on machines with drastically limited
(8 characters or less) file name length. Since the subroutine name is
used as the file name, these warnings can aid in portability to such
systems.
Warnings are issued and the file skipped if AutoSplit
cannot locate
either the __END__ marker or a "package Name;"-style specification.
AutoSplit
will also emit general diagnostics for inability to
create directories or files.
AutoSplit
is maintained by the perl5-porters. Please direct
any questions to the canonical mailing list. Anything that
is applicable to the CPAN release can be sent to its maintainer,
though.
Author and Maintainer: The Perl5-Porters <perl5-porters@perl.org>
Maintainer of the CPAN release: Steffen Mueller <smueller@cpan.org>
This package has been part of the perl core since the first release of perl5. It has been released separately to CPAN so older installations can benefit from bug fixes.
This package has the same copyright and license as the perl core:
- Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999,
- 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008
- by Larry Wall and others
- All rights reserved.
- This program is free software; you can redistribute it and/or modify
- it under the terms of either:
- a) the GNU General Public License as published by the Free
- Software Foundation; either version 1, or (at your option) any
- later version, or
- b) the "Artistic License" which comes with this Kit.
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See either
- the GNU General Public License or the Artistic License for more details.
- You should have received a copy of the Artistic License with this
- Kit, in the file named "Artistic". If not, I'll be glad to provide one.
- You should also have received a copy of the GNU General Public License
- along with this program in the file named "Copying". If not, write to the
- Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
- 02111-1307, USA or visit their web page on the internet at
- http://www.gnu.org/copyleft/gpl.html.
- For those of you that choose to use the GNU General Public License,
- my interpretation of the GNU General Public License is that no Perl
- script falls under the terms of the GPL unless you explicitly put
- said script under the terms of the GPL yourself. Furthermore, any
- object code linked with perl does not automatically fall under the
- terms of the GPL, provided such object code only adds definitions
- of subroutines and variables, and does not otherwise impair the
- resulting interpreter from executing any standard Perl script. I
- consider linking in C subroutines in this manner to be the moral
- equivalent of defining subroutines in the Perl language itself. You
- may sell such an object file as proprietary provided that you provide
- or offer to provide the Perl source, as specified by the GNU General
- Public License. (This is merely an alternate way of specifying input
- to the program.) You may also sell a binary produced by the dumping of
- a running Perl script that belongs to you, provided that you provide or
- offer to provide the Perl source as specified by the GPL. (The
- fact that a Perl interpreter and your code are in the same binary file
- is, in this case, a form of mere aggregation.) This is my interpretation
- of the GPL. If you still have concerns or difficulties understanding
- my intent, feel free to contact me. Of course, the Artistic License
- spells all this out for your protection, so you may prefer to use that.
B - The Perl Compiler Backend
- use B;
The B
module supplies classes which allow a Perl program to delve
into its own innards. It is the module used to implement the
"backends" of the Perl compiler. Usage of the compiler does not
require knowledge of this module: see the O module for the
user-visible part. The B
module is of use to those who want to
write new compiler backends. This documentation assumes that the
reader knows a fair amount about perl's internals including such
things as SVs, OPs and the internal symbol table and syntax tree
of a program.
The B
module contains a set of utility functions for querying the
current state of the Perl interpreter; typically these functions
return objects from the B::SV and B::OP classes, or their derived
classes. These classes in turn define methods for querying the
resulting objects about their own internal state.
The B
module exports a variety of functions: some are simple
utility functions, others provide a Perl program with a way to
get an initial "handle" on an internal object.
B::SV
, B::AV
, B::HV
, and B::CV
objectsFor descriptions of the class hierarchy of these objects and the methods that can be called on them, see below, OVERVIEW OF CLASSES and SV-RELATED CLASSES.
Returns the SV object corresponding to the C variable sv_undef
.
Returns the SV object corresponding to the C variable sv_yes
.
Returns the SV object corresponding to the C variable sv_no
.
Takes a reference to any Perl value, and turns the referred-to value
into an object in the appropriate B::OP-derived or B::SV-derived
class. Apart from functions such as main_root
, this is the primary
way to get an initial "handle" on an internal perl data structure
which can then be followed with the other access methods.
The returned object will only be valid as long as the underlying OPs and SVs continue to exist. Do not attempt to use the object after the underlying structures are freed.
Returns the SV object corresponding to the C variable amagic_generation
.
As of Perl 5.18, this is just an alias to PL_na
, so its value is
meaningless.
Returns the AV object (i.e. in class B::AV) representing INIT blocks.
Returns the AV object (i.e. in class B::AV) representing CHECK blocks.
Returns the AV object (i.e. in class B::AV) representing UNITCHECK blocks.
Returns the AV object (i.e. in class B::AV) representing BEGIN blocks.
Returns the AV object (i.e. in class B::AV) representing END blocks.
Returns the AV object (i.e. in class B::AV) of the global comppadlist.
Only when perl was compiled with ithreads.
Return the (faked) CV corresponding to the main part of the Perl program.
Walk the symbol table starting at SYMREF and call METHOD on each symbol (a B::GV object) visited. When the walk reaches package symbols (such as "Foo::") it invokes RECURSE, passing in the symbol name, and only recurses into the package if that sub returns true.
PREFIX is the name of the SYMREF you're walking.
For example:
print_subs() is a B::GV method you have declared. Also see B::GV Methods, below.
B::OP
objects or for walking op treesFor descriptions of the class hierarchy of these objects and the methods that can be called on them, see below, OVERVIEW OF CLASSES and OP-RELATED CLASSES.
Returns the root op (i.e. an object in the appropriate B::OP-derived class) of the main part of the Perl program.
Returns the starting op of the main part of the Perl program.
Does a tree-walk of the syntax tree based at OP and calls METHOD on
each op it visits. Each node is visited before its children. If
walkoptree_debug
(see below) has been called to turn debugging on then
the method walkoptree_debug
is called on each op before METHOD is
called.
Returns the current debugging flag for walkoptree
. If the optional
DEBUG argument is non-zero, it sets the debugging flag to that. See
the description of walkoptree
above for what the debugging flag
does.
Return the PP function name (e.g. "pp_add") of op number OPNUM.
Returns a string in the form "0x..." representing the value of the internal hash function used by perl on string STR.
Casts I to the internal I32 type used by that perl.
Does the equivalent of the -c
command-line option. Obviously, this
is only useful in a BEGIN block or else the flag is set too late.
Returns a double-quote-surrounded escaped version of STR which can be used as a string in C source code.
Returns a double-quote-surrounded escaped version of STR which can be used as a string in Perl source code.
Returns the class of an object without the part of the classname
preceding the first "::"
. This is used to turn "B::UNOP"
into
"UNOP"
for example.
In a perl compiled for threads, this returns a list of the special per-thread threadsv variables.
- my $op_type = $optype[$op_type_num];
A simple mapping of the op type number to its type (like 'COP' or 'BINOP').
- my $sv_name = $specialsv_name[$sv_index];
Certain SV types are considered 'special'. They're represented by B::SPECIAL and are referred to by a number from the specialsv_list. This array maps that number back to the name of the SV (like 'Nullsv' or '&PL_sv_undef').
The C structures used by Perl's internals to hold SV and OP
information (PVIV, AV, HV, ..., OP, SVOP, UNOP, ...) are modelled on a
class hierarchy and the B
module gives access to them via a true
object hierarchy. Structure fields which point to other objects
(whether types of SV or types of OP) are represented by the B
module as Perl objects of the appropriate class.
The bulk of the B
module is the methods for accessing fields of
these structures.
Note that all access is read-only. You cannot modify the internals by using this module. Also, note that the B::OP and B::SV objects created by this module are only valid for as long as the underlying objects exist; their creation doesn't increase the reference counts of the underlying objects. Trying to access the fields of a freed object will give incomprehensible results, or worse.
B::IV, B::NV, B::RV, B::PV, B::PVIV, B::PVNV, B::PVMG, B::BM (5.9.5 and earlier), B::PVLV, B::AV, B::HV, B::CV, B::GV, B::FM, B::IO. These classes correspond in the obvious way to the underlying C structures of similar names. The inheritance hierarchy mimics the underlying C "inheritance". For the 5.10.x branch, (ie 5.10.0, 5.10.1 etc) this is:
- B::SV
- |
- +------------+------------+------------+
- | | | |
- B::PV B::IV B::NV B::RV
- \ / /
- \ / /
- B::PVIV /
- \ /
- \ /
- \ /
- B::PVNV
- |
- |
- B::PVMG
- |
- +-----+-----+-----+-----+
- | | | | |
- B::AV B::GV B::HV B::CV B::IO
- | |
- | |
- B::PVLV B::FM
For 5.9.0 and earlier, PVLV is a direct subclass of PVMG, and BM is still present as a distinct type, so the base of this diagram is
- |
- |
- B::PVMG
- |
- +------+-----+-----+-----+-----+-----+
- | | | | | | |
- B::PVLV B::BM B::AV B::GV B::HV B::CV B::IO
- |
- |
- B::FM
For 5.11.0 and later, B::RV is abolished, and IVs can be used to store references, and a new type B::REGEXP is introduced, giving this structure:
- B::SV
- |
- +------------+------------+
- | | |
- B::PV B::IV B::NV
- \ / /
- \ / /
- B::PVIV /
- \ /
- \ /
- \ /
- B::PVNV
- |
- |
- B::PVMG
- |
- +-------+-------+---+---+-------+-------+
- | | | | | |
- B::AV B::GV B::HV B::CV B::IO B::REGEXP
- | |
- | |
- B::PVLV B::FM
Access methods correspond to the underlying C macros for field access,
usually with the leading "class indication" prefix removed (Sv, Av,
Hv, ...). The leading prefix is only left in cases where its removal
would cause a clash in method name. For example, GvREFCNT
stays
as-is since its abbreviation would clash with the "superclass" method
REFCNT
(corresponding to the C function SvREFCNT
).
Returns a reference to the regular scalar corresponding to this B::SV object. In other words, this method is the inverse operation to the svref_2object() subroutine. This scalar and other data it points at should be considered read-only: modifying them is neither safe nor guaranteed to have a sensible effect.
Returns the value of the IV, interpreted as
a signed integer. This will be misleading
if FLAGS & SVf_IVisUV
. Perhaps you want the
int_value
method instead?
This method returns the value of the IV as an integer.
It differs from IV
in that it returns the correct
value regardless of whether it's stored signed or
unsigned.
This method is the one you usually want. It constructs a string using the length and offset information in the struct: for ordinary scalars it will return the string that you'd see from Perl, even if it contains null characters.
Same as B::RV::RV, except that it will die() if the PV isn't a reference.
This method is less often useful. It assumes that the string stored in the struct is null-terminated, and disregards the length information.
It is the appropriate method to use if you need to get the name of a lexical variable from a padname array. Lexical variable names are always stored with a null terminator, and the length field (CUR) is overloaded for other purposes and can't be relied on here.
This method returns the internal length field, which consists of the number of internal bytes, not necessarily the number of logical characters.
This method returns the number of bytes allocated (via malloc) for storing the string. This is 0 if the scalar does not "own" the string.
Only valid on r-magic, returns the string that generated the regexp.
Will die() if called on r-magic.
Only valid on r-magic, returns the integer value of the REGEX stored in the MAGIC.
This method returns TRUE if the GP field of the GV is NULL.
This method returns the name of the glob, but if the first character of the name is a control character, then it converts it to ^X first, so that *^G would return "^G" rather than "\cG".
It's useful if you want to print out the name of a variable.
If you restrict yourself to globs which exist at compile-time
then the result ought to be unambiguous, because code like
${"^G"} = 1
is compiled as two ops - a constant string and
a dereference (rv2gv) - so that the glob is created at runtime.
If you're working with globs at runtime, and need to disambiguate *^G from *{"^G"}, then you should use the raw NAME method.
B::IO objects derive from IO objects and you will get more information from the IO object itself.
For example:
- $gvio = B::svref_2object(\*main::stdin)->IO;
- $IO = $gvio->object_2svref();
- $fd = $IO->fileno();
A character symbolizing the type of IO Handle.
Takes one argument ( 'stdin' | 'stdout' | 'stderr' ) and returns true if the IoIFP of the object is equal to the handle whose name was passed as argument; i.e., $io->IsSTD('stderr') is true if IoIFP($io) == PerlIO_stderr().
Like ARRAY
, but takes an index as an argument to get only one element,
rather than a list of all of them.
This method is deprecated if running under Perl 5.8, and is no longer present if running under Perl 5.9
This method returns the AV specific flags. In Perl 5.9 these are now stored in with the main SV flags, so this method is no longer present.
For constant subroutines, returns the constant SV returned by the subroutine.
Returns the name of a lexical sub, otherwise undef.
This method is not present if running under Perl 5.9, as the PMROOT information is no longer stored directly in the hash.
B::OP
, B::UNOP
, B::BINOP
, B::LOGOP
, B::LISTOP
, B::PMOP
,
B::SVOP
, B::PADOP
, B::PVOP
, B::LOOP
, B::COP
.
These classes correspond in the obvious way to the underlying C structures of similar names. The inheritance hierarchy mimics the underlying C "inheritance":
- B::OP
- |
- +---------------+--------+--------+-------+
- | | | | |
- B::UNOP B::SVOP B::PADOP B::COP B::PVOP
- ,' `-.
- / `--.
- B::BINOP B::LOGOP
- |
- |
- B::LISTOP
- ,' `.
- / \
- B::LOOP B::PMOP
Access methods correspond to the underlying C structre field names,
with the leading "class indication" prefix ("op_"
) removed.
These methods get the values of similarly named fields within the OP
data structure. See top of op.h
for more info.
This returns the op name as a string (e.g. "add", "rv2av").
This returns the function name as a string (e.g. "PL_ppaddr[OP_ADD]", "PL_ppaddr[OP_RV2AV]").
This returns the op description from the global C PL_op_desc array (e.g. "addition" "array deref").
Only up to Perl 5.9.4
Since Perl 5.9.5
Only when perl was compiled with ithreads.
Since perl 5.17.1
Although the optree is read-only, there is an overlay facility that allows
you to override what values the various B::*OP methods return for a
particular op. $B::overlay
should be set to reference a two-deep hash:
indexed by OP address, then method name. Whenever a an op method is
called, the value in the hash is returned if it exists. This facility is
used by B::Deparse to "undo" some optimisations. For example:
Malcolm Beattie, mbeattie@sable.ox.ac.uk
Benchmark - benchmark running times of Perl code
- use Benchmark qw(:all) ;
- timethis ($count, "code");
- # Use Perl code in strings...
- timethese($count, {
- 'Name1' => '...code1...',
- 'Name2' => '...code2...',
- });
- # ... or use subroutine references.
- timethese($count, {
- 'Name1' => sub { ...code1... },
- 'Name2' => sub { ...code2... },
- });
- # cmpthese can be used both ways as well
- cmpthese($count, {
- 'Name1' => '...code1...',
- 'Name2' => '...code2...',
- });
- cmpthese($count, {
- 'Name1' => sub { ...code1... },
- 'Name2' => sub { ...code2... },
- });
- # ...or in two stages
- $results = timethese($count,
- {
- 'Name1' => sub { ...code1... },
- 'Name2' => sub { ...code2... },
- },
- 'none'
- );
- cmpthese( $results ) ;
- $t = timeit($count, '...other code...')
- print "$count loops of other code took:",timestr($t),"\n";
- $t = countit($time, '...other code...')
- $count = $t->iters ;
- print "$count loops of other code took:",timestr($t),"\n";
- # enable hires wallclock timing if possible
- use Benchmark ':hireswallclock';
The Benchmark module encapsulates a number of routines to help you figure out how long it takes to execute some code.
timethis - run a chunk of code several times
timethese - run several chunks of code several times
cmpthese - print results of timethese as a comparison chart
timeit - run a chunk of code and see how long it goes
countit - see how many times a chunk of code runs in a given time
Returns the current time. Example:
Enables or disable debugging by setting the $Benchmark::Debug
flag:
- Benchmark->debug(1);
- $t = timeit(10, ' 5 ** $Global ');
- Benchmark->debug(0);
Returns the number of iterations.
The following routines will be exported into your namespace if you use the Benchmark module:
Arguments: COUNT is the number of times to run the loop, and CODE is the code to run. CODE may be either a code reference or a string to be eval'd; either way it will be run in the caller's package.
Returns: a Benchmark object.
Time COUNT iterations of CODE. CODE may be a string to eval or a code reference; either way the CODE will run in the caller's package. Results will be printed to STDOUT as TITLE followed by the times. TITLE defaults to "timethis COUNT" if none is provided. STYLE determines the format of the output, as described for timestr() below.
The COUNT can be zero or negative: this means the minimum number of CPU seconds to run. A zero signifies the default of 3 seconds. For example to run at least for 10 seconds:
- timethis(-10, $code)
or to run two pieces of code tests for at least 3 seconds:
- timethese(0, { test1 => '...', test2 => '...'})
CPU seconds is, in UNIX terms, the user time plus the system time of the process itself, as opposed to the real (wallclock) time and the time spent by the child processes. Less than 0.1 seconds is not accepted (-0.01 as the count, for example, will cause a fatal runtime exception).
Note that the CPU seconds is the minimum time: CPU scheduling and
other operating system factors may complicate the attempt so that a
little bit more time is spent. The benchmark output will, however,
also tell the number of $code
runs/second, which should be a more
interesting number than the actually spent seconds.
Returns a Benchmark object.
The CODEHASHREF is a reference to a hash containing names as keys and either a string to eval or a code reference for each value. For each (KEY, VALUE) pair in the CODEHASHREF, this routine will call
- timethis(COUNT, VALUE, KEY, STYLE)
The routines are called in string comparison order of KEY.
The COUNT can be zero or negative, see timethis().
Returns a hash reference of Benchmark objects, keyed by name.
Returns the difference between two Benchmark times as a Benchmark object suitable for passing to timestr().
Returns a string that formats the times in the TIMEDIFF object in the requested STYLE. TIMEDIFF is expected to be a Benchmark object similar to that returned by timediff().
STYLE can be any of 'all', 'none', 'noc', 'nop' or 'auto'. 'all' shows each of the 5 times available ('wallclock' time, user time, system time, user time of children, and system time of children). 'noc' shows all except the two children times. 'nop' shows only wallclock and the two children times. 'auto' (the default) will act as 'all' unless the children times are both zero, in which case it acts as 'noc'. 'none' prevents output.
FORMAT is the printf(3)-style format specifier (without the leading '%') to use to print the times. It defaults to '5.2f'.
The following routines will be exported into your namespace if you specifically ask that they be imported:
Clear the cached time for COUNT rounds of the null loop.
Clear all cached times.
Optionally calls timethese(), then outputs comparison chart. This:
- cmpthese( -1, { a => "++\$i", b => "\$i *= 2" } ) ;
outputs a chart like:
- Rate b a
- b 2831802/s -- -61%
- a 7208959/s 155% --
This chart is sorted from slowest to fastest, and shows the percent speed difference between each pair of tests.
cmpthese
can also be passed the data structure that timethese() returns:
- $results = timethese( -1, { a => "++\$i", b => "\$i *= 2" } ) ;
- cmpthese( $results );
in case you want to see both sets of results. If the first argument is an unblessed hash reference, that is RESULTSHASHREF; otherwise that is COUNT.
Returns a reference to an ARRAY of rows, each row is an ARRAY of cells from the above chart, including labels. This:
- my $rows = cmpthese( -1, { a => '++$i', b => '$i *= 2' }, "none" );
returns a data structure like:
- [
- [ '', 'Rate', 'b', 'a' ],
- [ 'b', '2885232/s', '--', '-59%' ],
- [ 'a', '7099126/s', '146%', '--' ],
- ]
NOTE: This result value differs from previous versions, which returned
the timethese()
result structure. If you want that, just use the two
statement timethese
...cmpthese
idiom shown above.
Incidentally, note the variance in the result values between the two examples; this is typical of benchmarking. If this were a real benchmark, you would probably want to run a lot more iterations.
Arguments: TIME is the minimum length of time to run CODE for, and CODE is the code to run. CODE may be either a code reference or a string to be eval'd; either way it will be run in the caller's package.
TIME is not negative. countit() will run the loop many times to calculate the speed of CODE before running it for TIME. The actual time run for will usually be greater than TIME due to system clock resolution, so it's best to look at the number of iterations divided by the times that you are concerned with, not just the iterations.
Returns: a Benchmark object.
Disable caching of timings for the null loop. This will force Benchmark to recalculate these timings for each new piece of code timed.
Enable caching of timings for the null loop. The time taken for COUNT rounds of the null loop will be calculated only once for each different COUNT used.
Returns the sum of two Benchmark times as a Benchmark object suitable for passing to timestr().
If the Time::HiRes module has been installed, you can specify the
special tag :hireswallclock
for Benchmark (if Time::HiRes is not
available, the tag will be silently ignored). This tag will cause the
wallclock time to be measured in microseconds, instead of integer
seconds. Note though that the speed computations are still conducted
in CPU time, not wallclock time.
The data is stored as a list of values from the time and times functions:
- ($real, $user, $system, $children_user, $children_system, $iters)
in seconds for the whole loop (not divided by the number of rounds).
The timing is done using time(3) and times(3).
Code is executed in the caller's package.
The time of the null loop (a loop with the same number of rounds but empty loop body) is subtracted from the time of the real loop.
The null loop times can be cached, the key being the number of rounds. The caching can be controlled using calls like these:
- clearcache($key);
- clearallcache();
- disablecache();
- enablecache();
Caching is off by default, as it can (usually slightly) decrease accuracy and does not usually noticeably affect runtimes.
For example,
outputs something like this:
- Benchmark: running a, b, each for at least 5 CPU seconds...
- Rate b a
- b 1559428/s -- -62%
- a 4152037/s 166% --
while
outputs something like this:
- Benchmark: running a, b, each for at least 5 CPU seconds...
- a: 10 wallclock secs ( 5.14 usr + 0.13 sys = 5.27 CPU) @ 3835055.60/s (n=20210743)
- b: 5 wallclock secs ( 5.41 usr + 0.00 sys = 5.41 CPU) @ 1574944.92/s (n=8520452)
- Rate b a
- b 1574945/s -- -59%
- a 3835056/s 144% --
Benchmark inherits from no other class, except of course for Exporter.
Comparing eval'd strings with code references will give you inaccurate results: a code reference will show a slightly slower execution time than the equivalent eval'd string.
The real time timing is done using time(2) and the granularity is therefore only one second.
Short tests may produce negative figures because perl can appear to take longer to execute the empty loop than a short test; try:
- timethis(100,'1');
The system time of the null loop might be slightly more than the system time of the loop with the actual code and therefore the difference might end up being < 0.
Devel::NYTProf - a Perl code profiler
Jarkko Hietaniemi <jhi@iki.fi>, Tim Bunce <Tim.Bunce@ig.co.uk>
September 8th, 1994; by Tim Bunce.
March 28th, 1997; by Hugo van der Sanden: added support for code references and the already documented 'debug' method; revamped documentation.
April 04-07th, 1997: by Jarkko Hietaniemi, added the run-for-some-time functionality.
September, 1999; by Barrie Slaymaker: math fixes and accuracy and efficiency tweaks. Added cmpthese(). A result is now returned from timethese(). Exposed countit() (was runfor()).
December, 2001; by Nicholas Clark: make timestr() recognise the style 'none' and return an empty string. If cmpthese is calling timethese, make it pass the style in. (so that 'none' will suppress output). Make sub new dump its debugging output to STDERR, to be consistent with everything else. All bugs found while writing a regression test.
September, 2002; by Jarkko Hietaniemi: add ':hireswallclock' special tag.
February, 2004; by Chia-liang Kao: make cmpthese and timestr use time statistics for children instead of parent when the style is 'nop'.
November, 2007; by Christophe Grosjean: make cmpthese and timestr compute time consistently with style argument, default is 'all' not 'noc' any more.
CGI - Handle Common Gateway Interface requests and responses
- use CGI;
- my $q = CGI->new;
- # Process an HTTP request
- @values = $q->param('form_field');
- $fh = $q->upload('file_field');
- $riddle = $query->cookie('riddle_name');
- %answers = $query->cookie('answers');
- # Prepare various HTTP responses
- print $q->header();
- print $q->header('application/json');
- $cookie1 = $q->cookie(-name=>'riddle_name', -value=>"The Sphynx's Question");
- $cookie2 = $q->cookie(-name=>'answers', -value=>\%answers);
- print $q->header(
- -type => 'image/gif',
- -expires => '+3d',
- -cookie => [$cookie1,$cookie2]
- );
- print $q->redirect('http://somewhere.else/in/movie/land');
CGI.pm is a stable, complete and mature solution for processing and preparing HTTP requests and responses. Major features including processing form submissions, file uploads, reading and writing cookies, query string generation and manipulation, and processing and preparing HTTP headers. Some HTML generation utilities are included as well.
CGI.pm performs very well in in a vanilla CGI.pm environment and also comes with built-in support for mod_perl and mod_perl2 as well as FastCGI.
It has the benefit of having developed and refined over 10 years with input from dozens of contributors and being deployed on thousands of websites. CGI.pm has been included in the Perl distribution since Perl 5.4, and has become a de-facto standard.
There are two styles of programming with CGI.pm, an object-oriented style and a function-oriented style. In the object-oriented style you create one or more CGI objects and then use object methods to create the various elements of the page. Each CGI object starts out with the list of named parameters that were passed to your CGI script by the server. You can modify the objects, save them to a file or database and recreate them. Because each object corresponds to the "state" of the CGI script, and because each object's parameter list is independent of the others, this allows you to save the state of the script and restore it later.
For example, using the object oriented style, here is how you create a simple "Hello World" HTML page:
In the function-oriented style, there is one default CGI object that you rarely deal with directly. Instead you just call functions to retrieve CGI parameters, create HTML tags, manage cookies, and so on. This provides you with a cleaner programming interface, but limits you to using one CGI object at a time. The following example prints the same page, but uses the function-oriented interface. The main differences are that we now need to import a set of functions into our name space (usually the "standard" functions), and we don't need to create the CGI object.
The examples in this document mainly use the object-oriented style. See HOW TO IMPORT FUNCTIONS for important information on function-oriented programming in CGI.pm
Most CGI.pm routines accept several arguments, sometimes as many as 20 optional ones! To simplify this interface, all routines use a named argument calling style that looks like this:
- print $q->header(-type=>'image/gif',-expires=>'+3d');
Each argument name is preceded by a dash. Neither case nor order matters in the argument list. -type, -Type, and -TYPE are all acceptable. In fact, only the first argument needs to begin with a dash. If a dash is present in the first argument, CGI.pm assumes dashes for the subsequent ones.
Several routines are commonly called with just one argument. In the case of these routines you can provide the single argument without an argument name. header() happens to be one of these routines. In this case, the single argument is the document type.
- print $q->header('text/html');
Other such routines are documented below.
Sometimes named arguments expect a scalar, sometimes a reference to an array, and sometimes a reference to a hash. Often, you can pass any type of argument and the routine will do whatever is most appropriate. For example, the param() routine is used to set a CGI parameter to a single or a multi-valued value. The two cases are shown below:
- $q->param(-name=>'veggie',-value=>'tomato');
- $q->param(-name=>'veggie',-value=>['tomato','tomahto','potato','potahto']);
A large number of routines in CGI.pm actually aren't specifically defined in the module, but are generated automatically as needed. These are the "HTML shortcuts," routines that generate HTML tags for use in dynamically-generated pages. HTML tags have both attributes (the attribute="value" pairs within the tag itself) and contents (the part between the opening and closing pairs.) To distinguish between attributes and contents, CGI.pm uses the convention of passing HTML attributes as a hash reference as the first argument, and the contents, if any, as any subsequent arguments. It works out like this:
- Code Generated HTML
- ---- --------------
- h1() <h1>
- h1('some','contents'); <h1>some contents</h1>
- h1({-align=>left}); <h1 align="LEFT">
- h1({-align=>left},'contents'); <h1 align="LEFT">contents</h1>
HTML tags are described in more detail later.
Many newcomers to CGI.pm are puzzled by the difference between the calling conventions for the HTML shortcuts, which require curly braces around the HTML tag attributes, and the calling conventions for other routines, which manage to generate attributes without the curly brackets. Don't be confused. As a convenience the curly braces are optional in all but the HTML shortcuts. If you like, you can use curly braces when calling any routine that takes named arguments. For example:
- print $q->header( {-type=>'image/gif',-expires=>'+3d'} );
If you use the -w switch, you will be warned that some CGI.pm argument names conflict with built-in Perl functions. The most frequent of these is the -values argument, used to create multi-valued menus, radio button clusters and the like. To get around this warning, you have several choices:
Use another name for the argument, if one is available. For example, -value is an alias for -values.
Change the capitalization, e.g. -Values
Put quotes around the argument name, e.g. '-values'
Many routines will do something useful with a named argument that it doesn't recognize. For example, you can produce non-standard HTTP header fields by providing them as named arguments:
- print $q->header(-type => 'text/html',
- -cost => 'Three smackers',
- -annoyance_level => 'high',
- -complaints_to => 'bit bucket');
This will produce the following nonstandard HTTP header:
- HTTP/1.0 200 OK
- Cost: Three smackers
- Annoyance-level: high
- Complaints-to: bit bucket
- Content-type: text/html
Notice the way that underscores are translated automatically into hyphens. HTML-generating routines perform a different type of translation.
This feature allows you to keep up with the rapidly changing HTTP and HTML "standards".
- $query = CGI->new;
This will parse the input (from POST, GET and DELETE methods) and store it into a perl5 object called $query.
Any filehandles from file uploads will have their position reset to the beginning of the file.
- $query = CGI->new(INPUTFILE);
If you provide a file handle to the new() method, it will read parameters from the file (or STDIN, or whatever). The file can be in any of the forms describing below under debugging (i.e. a series of newline delimited TAG=VALUE pairs will work). Conveniently, this type of file is created by the save() method (see below). Multiple records can be saved and restored.
Perl purists will be pleased to know that this syntax accepts references to file handles, or even references to filehandle globs, which is the "official" way to pass a filehandle:
- $query = CGI->new(\*STDIN);
You can also initialize the CGI object with a FileHandle or IO::File object.
If you are using the function-oriented interface and want to initialize CGI state from a file handle, the way to do this is with restore_parameters(). This will (re)initialize the default CGI object from the indicated file handle.
You can also initialize the query object from a hash reference:
- $query = CGI->new( {'dinosaur'=>'barney',
- 'song'=>'I love you',
- 'friends'=>[qw/Jessica George Nancy/]}
- );
or from a properly formatted, URL-escaped query string:
- $query = CGI->new('dinosaur=barney&color=purple');
or from a previously existing CGI object (currently this clones the parameter list, but none of the other object-specific fields, such as autoescaping):
- $old_query = CGI->new;
- $new_query = CGI->new($old_query);
To create an empty query, initialize it from an empty string or hash:
- $empty_query = CGI->new("");
- -or-
- $empty_query = CGI->new({});
- @keywords = $query->keywords
If the script was invoked as the result of an <ISINDEX> search, the parsed keywords can be obtained as an array using the keywords() method.
- @names = $query->param
If the script was invoked with a parameter list (e.g. "name1=value1&name2=value2&name3=value3"), the param() method will return the parameter names as a list. If the script was invoked as an <ISINDEX> script and contains a string without ampersands (e.g. "value1+value2+value3") , there will be a single parameter named "keywords" containing the "+"-delimited keywords.
NOTE: As of version 1.5, the array of parameter names returned will be in the same order as they were submitted by the browser. Usually this order is the same as the order in which the parameters are defined in the form (however, this isn't part of the spec, and so isn't guaranteed).
- @values = $query->param('foo');
- -or-
- $value = $query->param('foo');
Pass the param() method a single argument to fetch the value of the named parameter. If the parameter is multivalued (e.g. from multiple selections in a scrolling list), you can ask to receive an array. Otherwise the method will return a single value.
If a value is not given in the query string, as in the queries "name1=&name2=", it will be returned as an empty string.
If the parameter does not exist at all, then param() will return undef in a scalar context, and the empty list in a list context.
- $query->param('foo','an','array','of','values');
This sets the value for the named parameter 'foo' to an array of values. This is one way to change the value of a field AFTER the script has been invoked once before. (Another way is with the -override parameter accepted by all methods that generate form elements.)
param() also recognizes a named parameter style of calling described in more detail later:
- $query->param(-name=>'foo',-values=>['an','array','of','values']);
- -or-
- $query->param(-name=>'foo',-value=>'the value');
- $query->append(-name=>'foo',-values=>['yet','more','values']);
This adds a value or list of values to the named parameter. The values are appended to the end of the parameter if it already exists. Otherwise the parameter is created. Note that this method only recognizes the named argument calling syntax.
- $query->import_names('R');
This creates a series of variables in the 'R' namespace. For example, $R::foo, @R:foo. For keyword lists, a variable @R::keywords will appear. If no namespace is given, this method will assume 'Q'. WARNING: don't import anything into 'main'; this is a major security risk!!!!
NOTE 1: Variable names are transformed as necessary into legal Perl variable names. All non-legal characters are transformed into underscores. If you need to keep the original names, you should use the param() method instead to access CGI variables by name.
NOTE 2: In older versions, this method was called import(). As of version 2.20, this name has been removed completely to avoid conflict with the built-in Perl module import operator.
- $query->delete('foo','bar','baz');
This completely clears a list of parameters. It sometimes useful for resetting parameters that you don't want passed down between script invocations.
If you are using the function call interface, use "Delete()" instead to avoid conflicts with Perl's built-in delete operator.
- $query->delete_all();
This clears the CGI object completely. It might be useful to ensure that all the defaults are taken when you create a fill-out form.
Use Delete_all() instead if you are using the function call interface.
If POSTed data is not of type application/x-www-form-urlencoded or multipart/form-data, then the POSTed data will not be processed, but instead be returned as-is in a parameter named POSTDATA. To retrieve it, use code like this:
- my $data = $query->param('POSTDATA');
Likewise if PUTed data can be retrieved with code like this:
- my $data = $query->param('PUTDATA');
(If you don't know what the preceding means, don't worry about it. It only affects people trying to use CGI for XML processing and other specialized tasks.)
- $q->param_fetch('address')->[1] = '1313 Mockingbird Lane';
- unshift @{$q->param_fetch(-name=>'address')},'George Munster';
If you need access to the parameter list in a way that isn't covered by the methods given in the previous sections, you can obtain a direct reference to it by calling the param_fetch() method with the name of the parameter. This will return an array reference to the named parameter, which you then can manipulate in any way you like.
You can also use a named argument style using the -name argument.
Many people want to fetch the entire parameter list as a hash in which the keys are the names of the CGI parameters, and the values are the parameters' values. The Vars() method does this. Called in a scalar context, it returns the parameter list as a tied hash reference. Changing a key changes the value of the parameter in the underlying CGI parameter list. Called in a list context, it returns the parameter list as an ordinary hash. This allows you to read the contents of the parameter list, but not to change it.
When using this, the thing you must watch out for are multivalued CGI parameters. Because a hash cannot distinguish between scalar and list context, multivalued parameters will be returned as a packed string, separated by the "\0" (null) character. You must split this packed string in order to get at the individual values. This is the convention introduced long ago by Steve Brenner in his cgi-lib.pl module for Perl version 4.
If you wish to use Vars() as a function, import the :cgi-lib set of function calls (also see the section on CGI-LIB compatibility).
- $query->save(\*FILEHANDLE)
This will write the current state of the form to the provided filehandle. You can read it back in by providing a filehandle to the new() method. Note that the filehandle can be a file, a pipe, or whatever!
The format of the saved file is:
- NAME1=VALUE1
- NAME1=VALUE1'
- NAME2=VALUE2
- NAME3=VALUE3
- =
Both name and value are URL escaped. Multi-valued CGI parameters are represented as repeated names. A session record is delimited by a single = symbol. You can write out multiple records and read them back in with several calls to new. You can do this across several sessions by opening the file in append mode, allowing you to create primitive guest books, or to keep a history of users' queries. Here's a short example of creating multiple session records:
The file format used for save/restore is identical to that used by the Whitehead Genome Center's data exchange format "Boulderio", and can be manipulated and even databased using Boulderio utilities. See
- http://stein.cshl.org/boulder/
for further details.
If you wish to use this method from the function-oriented (non-OO) interface, the exported name for this method is save_parameters().
Errors can occur while processing user input, particularly when processing uploaded files. When these errors occur, CGI will stop processing and return an empty parameter list. You can test for the existence and nature of errors using the cgi_error() function. The error messages are formatted as HTTP status codes. You can either incorporate the error text into an HTML page, or use it as the value of the HTTP status:
When using the function-oriented interface (see the next section), errors may only occur the first time you call param(). Be ready for this!
To use the function-oriented interface, you must specify which CGI.pm routines or sets of routines to import into your script's namespace. There is a small overhead associated with this importation, but it isn't much.
- use CGI <list of methods>;
The listed methods will be imported into the current package; you can call them directly without creating a CGI object first. This example shows how to import the param() and header() methods, and then use them directly:
More frequently, you'll import common sets of functions by referring to the groups by name. All function sets are preceded with a ":" character as in ":html3" (for tags defined in the HTML 3 standard).
Here is a list of the function sets you can import:
Import all CGI-handling methods, such as param(), path_info() and the like.
Import all fill-out form generating methods, such as textfield().
Import all methods that generate HTML 2.0 standard elements.
Import all methods that generate HTML 3.0 elements (such as <table>, <super> and <sub>).
Import all methods that generate HTML 4 elements (such as <abbrev>, <acronym> and <thead>).
Import the <blink>, <fontsize> and <center> tags.
Import all HTML-generating shortcuts (i.e. 'html2', 'html3', 'html4' and 'netscape')
Import "standard" features, 'html2', 'html3', 'html4', 'form' and 'cgi'.
Import all the available methods. For the full list, see the CGI.pm code, where the variable %EXPORT_TAGS is defined.
If you import a function name that is not part of CGI.pm, the module will treat it as a new HTML tag and generate the appropriate subroutine. You can then use it like any other HTML tag. This is to provide for the rapidly-evolving HTML "standard." For example, say Microsoft comes out with a new tag called <gradient> (which causes the user's desktop to be flooded with a rotating gradient fill until his machine reboots). You don't need to wait for a new version of CGI.pm to start using it immediately:
Note that in the interests of execution speed CGI.pm does not use the standard Exporter syntax for specifying load symbols. This may change in the future.
If you import any of the state-maintaining CGI or form-generating methods, a default CGI object will be created and initialized automatically the first time you use any of the methods that require one to be present. This includes param(), textfield(), submit() and the like. (If you need direct access to the CGI object, you can find it in the global variable $CGI::Q). By importing CGI.pm methods, you can create visually elegant scripts:
- use CGI qw/:standard/;
- header,
- start_html('Simple Script'),
- h1('Simple Script'),
- start_form,
- "What's your name? ",textfield('name'),p,
- "What's the combination?",
- checkbox_group(-name=>'words',
- -values=>['eenie','meenie','minie','moe'],
- -defaults=>['eenie','moe']),p,
- "What's your favorite color?",
- popup_menu(-name=>'color',
- -values=>['red','green','blue','chartreuse']),p,
- submit,
- end_form,
- hr,"\n";
- if (param) {
- "Your name is ",em(param('name')),p,
- "The keywords are: ",em(join(", ",param('words'))),p,
- "Your favorite color is ",em(param('color')),".\n";
- }
- print end_html;
In addition to the function sets, there are a number of pragmas that you can import. Pragmas, which are always preceded by a hyphen, change the way that CGI.pm functions in various ways. Pragmas, function sets, and individual functions can all be imported in the same use() line. For example, the following use statement imports the standard set of functions and enables debugging mode (pragma -debug):
- use CGI qw/:standard -debug/;
The current list of pragmas is as follows:
When you use CGI -any, then any method that the query object doesn't recognize will be interpreted as a new HTML tag. This allows you to support the next ad hoc HTML extension. This lets you go wild with new and unsupported tags:
Since using <cite>any</cite> causes any mistyped method name to be interpreted as an HTML tag, use it with care or not at all.
This causes the indicated autoloaded methods to be compiled up front, rather than deferred to later. This is useful for scripts that run for an extended period of time under FastCGI or mod_perl, and for those destined to be crunched by Malcolm Beattie's Perl compiler. Use it in conjunction with the methods or method families you plan to use.
- use CGI qw(-compile :standard :html3);
or even
- use CGI qw(-compile :all);
Note that using the -compile pragma in this way will always have the effect of importing the compiled functions into the current namespace. If you want to compile without importing use the compile() method instead:
- use CGI();
- CGI->compile();
This is particularly useful in a mod_perl environment, in which you might want to precompile all CGI routines in a startup script, and then import the functions individually in each mod_perl script.
By default the CGI module implements a state-preserving behavior called "sticky" fields. The way this works is that if you are regenerating a form, the methods that generate the form field values will interrogate param() to see if similarly-named parameters are present in the query string. If they find a like-named parameter, they will use it to set their default values.
Sometimes this isn't what you want. The -nosticky pragma prevents this behavior. You can also selectively change the sticky behavior in each element that you generate.
Automatically add tab index attributes to each form field. With this option turned off, you can still add tab indexes manually by passing a -tabindex option to each field-generating method.
This keeps CGI.pm from including undef params in the parameter list.
By default, CGI.pm versions 2.69 and higher emit XHTML (http://www.w3.org/TR/xhtml1/). The -no_xhtml pragma disables this feature. Thanks to Michalis Kabrianis <kabrianis@hellug.gr> for this feature.
If start_html()'s -dtd parameter specifies an HTML 2.0, 3.2, 4.0 or 4.01 DTD, XHTML will automatically be disabled without needing to use this pragma.
This makes CGI.pm treat all parameters as UTF-8 strings. Use this with care, as it will interfere with the processing of binary uploads. It is better to manually select which fields are expected to return utf-8 strings and convert them using code like this:
This makes CGI.pm produce a header appropriate for an NPH (no parsed header) script. You may need to do other things as well to tell the server that the script is NPH. See the discussion of NPH scripts below.
Separate the name=value pairs in CGI parameter query strings with semicolons rather than ampersands. For example:
- ?name=fred;age=24;favorite_color=3
Semicolon-delimited query strings are always accepted, and will be emitted by self_url() and query_string(). newstyle_urls became the default in version 2.64.
Separate the name=value pairs in CGI parameter query strings with ampersands rather than semicolons. This is no longer the default.
This overrides the autoloader so that any function in your program that is not recognized is referred to CGI.pm for possible evaluation. This allows you to use all the CGI.pm functions without adding them to your symbol table, which is of concern for mod_perl users who are worried about memory consumption. Warning: when -autoload is in effect, you cannot use "poetry mode" (functions without the parenthesis). Use hr() rather than hr, or add something like use subs qw/hr p header/ to the top of your script.
This turns off the command-line processing features. If you want to run a CGI.pm script from the command line to produce HTML, and you don't want it to read CGI parameters from the command line or STDIN, then use this pragma:
- use CGI qw(-no_debug :standard);
This turns on full debugging. In addition to reading CGI arguments from the command-line processing, CGI.pm will pause and try to read arguments from STDIN, producing the message "(offline mode: enter name=value pairs on standard input)" features.
See the section on debugging for more details.
CGI.pm can process uploaded file. Ordinarily it spools the uploaded file to a temporary directory, then deletes the file when done. However, this opens the risk of eavesdropping as described in the file upload section. Another CGI script author could peek at this data during the upload, even if it is confidential information. On Unix systems, the -private_tempfiles pragma will cause the temporary file to be unlinked as soon as it is opened and before any data is written into it, reducing, but not eliminating the risk of eavesdropping (there is still a potential race condition). To make life harder for the attacker, the program chooses tempfile names by calculating a 32 bit checksum of the incoming HTTP headers.
To ensure that the temporary file cannot be read by other CGI scripts, use suEXEC or a CGI wrapper program to run your script. The temporary file is created with mode 0600 (neither world nor group readable).
The temporary directory is selected using the following algorithm:
Each of these locations is checked that it is a directory and is writable. If not, the algorithm tries the next choice.
Many of the methods generate HTML tags. As described below, tag functions automatically generate both the opening and closing tags. For example:
- print h1('Level 1 Header');
produces
- <h1>Level 1 Header</h1>
There will be some times when you want to produce the start and end tags yourself. In this case, you can use the form start_tag_name and end_tag_name, as in:
- print start_h1,'Level 1 Header',end_h1;
With a few exceptions (described below), start_tag_name and end_tag_name functions are not generated automatically when you use CGI. However, you can specify the tags you want to generate start/end functions for by putting an asterisk in front of their name, or, alternatively, requesting either "start_tag_name" or "end_tag_name" in the import list.
Example:
- use CGI qw/:standard *table start_ul/;
In this example, the following functions are generated in addition to the standard ones:
Most of CGI.pm's functions deal with creating documents on the fly. Generally you will produce the HTTP header first, followed by the document itself. CGI.pm provides functions for generating HTTP headers of various types as well as for generating HTML. For creating GIF images, see the GD.pm module.
Each of these functions produces a fragment of HTML or HTTP which you can print out directly so that it displays in the browser window, append to a string, or save to a file for later use.
Normally the first thing you will do in any CGI script is print out an HTTP header. This tells the browser what type of document to expect, and gives other optional information, such as the language, expiration date, and whether to cache the document. The header can also be manipulated for special purposes, such as server push and pay per view pages.
header() returns the Content-type: header. You can provide your own MIME type if you choose, otherwise it defaults to text/html. An optional second parameter specifies the status code and a human-readable message. For example, you can specify 204, "No response" to create a script that tells the browser to do nothing at all. Note that RFC 2616 expects the human-readable phase to be there as well as the numeric status code.
The last example shows the named argument style for passing arguments to the CGI methods using named parameters. Recognized parameters are -type, -status, -expires, and -cookie. Any other named parameters will be stripped of their initial hyphens and turned into header fields, allowing you to specify any HTTP header you desire. Internal underscores will be turned into hyphens:
- print header(-Content_length=>3002);
Most browsers will not cache the output from CGI scripts. Every time the browser reloads the page, the script is invoked anew. You can change this behavior with the -expires parameter. When you specify an absolute or relative expiration interval with this parameter, some browsers and proxy servers will cache the script's output until the indicated expiration date. The following forms are all valid for the -expires field:
- +30s 30 seconds from now
- +10m ten minutes from now
- +1h one hour from now
- -1d yesterday (i.e. "ASAP!")
- now immediately
- +3M in three months
- +10y in ten years time
- Thursday, 25-Apr-1999 00:40:33 GMT at the indicated time & date
The -cookie parameter generates a header that tells the browser to provide a "magic cookie" during all subsequent transactions with your script. Some cookies have a special format that includes interesting attributes such as expiration time. Use the cookie() method to create and retrieve session cookies.
The -nph parameter, if set to a true value, will issue the correct headers to work with a NPH (no-parse-header) script. This is important to use with certain servers that expect all their scripts to be NPH.
The -charset parameter can be used to control the character set sent to the browser. If not provided, defaults to ISO-8859-1. As a side effect, this sets the charset() method as well.
The -attachment parameter can be used to turn the page into an attachment. Instead of displaying the page, some browsers will prompt the user to save it to disk. The value of the argument is the suggested name for the saved file. In order for this to work, you may have to set the -type to "application/octet-stream".
The -p3p parameter will add a P3P tag to the outgoing header. The parameter can be an arrayref or a space-delimited string of P3P tags. For example:
In either case, the outgoing header will be formatted as:
- P3P: policyref="/w3c/p3p.xml" cp="CAO DSP LAW CURa"
CGI.pm will accept valid multi-line headers when each line is separated with a CRLF value ("\r\n" on most platforms) followed by at least one space. For example:
- print header( -ingredients => "ham\r\n\seggs\r\n\sbacon" );
Invalid multi-line header input will trigger in an exception. When multi-line headers are received, CGI.pm will always output them back as a single line, according to the folding rules of RFC 2616: the newlines will be removed, while the white space remains.
- print $q->redirect('http://somewhere.else/in/movie/land');
Sometimes you don't want to produce a document yourself, but simply redirect the browser elsewhere, perhaps choosing a URL based on the time of day or the identity of the user.
The redirect() method redirects the browser to a different URL. If you use redirection like this, you should not print out a header as well.
You should always use full URLs (including the http: or ftp: part) in redirection requests. Relative URLs will not work correctly.
You can also use named arguments:
- print $q->redirect(
- -uri=>'http://somewhere.else/in/movie/land',
- -nph=>1,
- -status=>'301 Moved Permanently');
All names arguments recognized by header() are also recognized by redirect(). However, most HTTP headers, including those generated by -cookie and -target, are ignored by the browser.
The -nph parameter, if set to a true value, will issue the correct headers to work with a NPH (no-parse-header) script. This is important to use with certain servers, such as Microsoft IIS, which expect all their scripts to be NPH.
The -status parameter will set the status of the redirect. HTTP defines three different possible redirection status codes:
- 301 Moved Permanently
- 302 Found
- 303 See Other
The default if not specified is 302, which means "moved temporarily." You may change the status to another status code if you wish. Be advised that changing the status to anything other than 301, 302 or 303 will probably break redirection.
Note that the human-readable phrase is also expected to be present to conform with RFC 2616, section 6.1.
- print start_html(-title=>'Secrets of the Pyramids',
- -author=>'fred@capricorn.org',
- -base=>'true',
- -target=>'_blank',
- -meta=>{'keywords'=>'pharaoh secret mummy',
- 'copyright'=>'copyright 1996 King Tut'},
- -style=>{'src'=>'/styles/style1.css'},
- -BGCOLOR=>'blue');
The start_html() routine creates the top of the page, along with a lot of optional information that controls the page's appearance and behavior.
This method returns a canned HTML header and the opening <body> tag. All parameters are optional. In the named parameter form, recognized parameters are -title, -author, -base, -xbase, -dtd, -lang and -target (see below for the explanation). Any additional parameters you provide, such as the unofficial BGCOLOR attribute, are added to the <body> tag. Additional parameters must be proceeded by a hyphen.
The argument -xbase allows you to provide an HREF for the <base> tag different from the current location, as in
- -xbase=>"http://home.mcom.com/"
All relative links will be interpreted relative to this tag.
The argument -target allows you to provide a default target frame for all the links and fill-out forms on the page. This is a non-standard HTTP feature which only works with some browsers!
- -target=>"answer_window"
All relative links will be interpreted relative to this tag. You add arbitrary meta information to the header with the -meta argument. This argument expects a reference to a hash containing name/value pairs of meta information. These will be turned into a series of header <meta> tags that look something like this:
- <meta name="keywords" content="pharaoh secret mummy">
- <meta name="description" content="copyright 1996 King Tut">
To create an HTTP-EQUIV type of <meta> tag, use -head, described below.
The -style argument is used to incorporate cascading stylesheets into your code. See the section on CASCADING STYLESHEETS for more information.
The -lang argument is used to incorporate a language attribute into the <html> tag. For example:
- print $q->start_html(-lang=>'fr-CA');
The default if not specified is "en-US" for US English, unless the -dtd parameter specifies an HTML 2.0 or 3.2 DTD, in which case the lang attribute is left off. You can force the lang attribute to left off in other cases by passing an empty string (-lang=>'').
The -encoding argument can be used to specify the character set for XHTML. It defaults to iso-8859-1 if not specified.
The -dtd argument can be used to specify a public DTD identifier string. For example:
- -dtd => '-//W3C//DTD HTML 4.01 Transitional//EN')
Alternatively, it can take public and system DTD identifiers as an array:
- dtd => [ '-//W3C//DTD HTML 4.01 Transitional//EN', 'http://www.w3.org/TR/html4/loose.dtd' ]
For the public DTD identifier to be considered, it must be valid. Otherwise it will be replaced by the default DTD. If the public DTD contains 'XHTML', CGI.pm will emit XML.
The -declare_xml argument, when used in conjunction with XHTML, will put a <?xml> declaration at the top of the HTML header. The sole purpose of this declaration is to declare the character set encoding. In the absence of -declare_xml, the output HTML will contain a <meta> tag that specifies the encoding, allowing the HTML to pass most validators. The default for -declare_xml is false.
You can place other arbitrary HTML elements to the <head> section with the -head tag. For example, to place a <link> element in the head section, use this:
- print start_html(-head=>Link({-rel=>'shortcut icon',
- -href=>'favicon.ico'}));
To incorporate multiple HTML elements into the <head> section, just pass an array reference:
- print start_html(-head=>[
- Link({-rel=>'next',
- -href=>'http://www.capricorn.com/s2.html'}),
- Link({-rel=>'previous',
- -href=>'http://www.capricorn.com/s1.html'})
- ]
- );
And here's how to create an HTTP-EQUIV <meta> tag:
- print start_html(-head=>meta({-http_equiv => 'Content-Type',
- -content => 'text/html'}))
JAVASCRIPTING: The -script, -noScript, -onLoad, -onMouseOver, -onMouseOut and -onUnload parameters are used to add JavaScript calls to your pages. -script should point to a block of text containing JavaScript function definitions. This block will be placed within a <script> block inside the HTML (not HTTP) header. The block is placed in the header in order to give your page a fighting chance of having all its JavaScript functions in place even if the user presses the stop button before the page has loaded completely. CGI.pm attempts to format the script in such a way that JavaScript-naive browsers will not choke on the code: unfortunately there are some browsers, such as Chimera for Unix, that get confused by it nevertheless.
The -onLoad and -onUnload parameters point to fragments of JavaScript code to execute when the page is respectively opened and closed by the browser. Usually these parameters are calls to functions defined in the -script field:
- $query = CGI->new;
- print header;
- $JSCRIPT=<<END;
- // Ask a silly question
- function riddle_me_this() {
- var r = prompt("What walks on four legs in the morning, " +
- "two legs in the afternoon, " +
- "and three legs in the evening?");
- response(r);
- }
- // Get a silly answer
- function response(answer) {
- if (answer == "man")
- alert("Right you are!");
- else
- alert("Wrong! Guess again.");
- }
- END
- print start_html(-title=>'The Riddle of the Sphinx',
- -script=>$JSCRIPT);
Use the -noScript parameter to pass some HTML text that will be displayed on browsers that do not have JavaScript (or browsers where JavaScript is turned off).
The <script> tag, has several attributes including "type", "charset" and "src". "src" allows you to keep JavaScript code in an external file. To use these attributes pass a HASH reference in the -script parameter containing one or more of -type, -src, or -code:
A final feature allows you to incorporate multiple <script> sections into the header. Just pass the list of script sections as an array reference. this allows you to specify different source files for different dialects of JavaScript. Example:
- print $q->start_html(-title=>'The Riddle of the Sphinx',
- -script=>[
- { -type => 'text/javascript',
- -src => '/javascript/utilities10.js'
- },
- { -type => 'text/javascript',
- -src => '/javascript/utilities11.js'
- },
- { -type => 'text/jscript',
- -src => '/javascript/utilities12.js'
- },
- { -type => 'text/ecmascript',
- -src => '/javascript/utilities219.js'
- }
- ]
- );
The option "-language" is a synonym for -type, and is supported for backwards compatibility.
The old-style positional parameters are as follows:
The title
The author's e-mail address (will create a <link rev="MADE"> tag if present
A 'true' flag if you want to include a <base> tag in the header. This helps resolve relative addresses to absolute ones when the document is moved, but makes the document hierarchy non-portable. Use with care!
Any other parameters you want to include in the <body> tag. This is a good place to put HTML extensions, such as colors and wallpaper patterns.
- print $q->end_html;
This ends an HTML document by printing the </body></html> tags.
- $myself = $q->self_url;
- print q(<a href="$myself">I'm talking to myself.</a>);
self_url() will return a URL, that, when selected, will reinvoke this script with all its state information intact. This is most useful when you want to jump around within the document using internal anchors but you don't want to disrupt the current contents of the form(s). Something like this will do the trick.
If you want more control over what's returned, using the url() method instead.
You can also retrieve the unprocessed query string with query_string():
- $the_string = $q->query_string();
The behavior of calling query_string is currently undefined when the HTTP method is something other than GET.
- $full_url = url();
- $full_url = url(-full=>1); #alternative syntax
- $relative_url = url(-relative=>1);
- $absolute_url = url(-absolute=>1);
- $url_with_path = url(-path_info=>1);
- $url_with_path_and_query = url(-path_info=>1,-query=>1);
- $netloc = url(-base => 1);
url() returns the script's URL in a variety of formats. Called without any arguments, it returns the full form of the URL, including host name and port number
- http://your.host.com/path/to/script.cgi
You can modify this format with the following named arguments:
If true, produce an absolute URL, e.g.
- /path/to/script.cgi
Produce a relative URL. This is useful if you want to reinvoke your script with different parameters. For example:
- script.cgi
Produce the full URL, exactly as if called without any arguments. This overrides the -relative and -absolute arguments.
Append the additional path information to the URL. This can be combined with -full, -absolute or -relative. -path_info is provided as a synonym.
Append the query string to the URL. This can be combined with -full, -absolute or -relative. -query_string is provided as a synonym.
Generate just the protocol and net location, as in http://www.foo.com:8000
If Apache's mod_rewrite is turned on, then the script name and path info probably won't match the request that the user sent. Set -rewrite=>1 (default) to return URLs that match what the user sent (the original request URI). Set -rewrite=>0 to return URLs that match the URL after mod_rewrite's rules have run.
- $color = url_param('color');
It is possible for a script to receive CGI parameters in the URL as well as in the fill-out form by creating a form that POSTs to a URL containing a query string (a "?" mark followed by arguments). The param() method will always return the contents of the POSTed fill-out form, ignoring the URL's query string. To retrieve URL parameters, call the url_param() method. Use it in the same way as param(). The main difference is that it allows you to read the parameters, but not set them.
Under no circumstances will the contents of the URL query string interfere with similarly-named CGI parameters in POSTed forms. If you try to mix a URL query string with a form submitted with the GET method, the results will not be what you expect.
CGI.pm defines general HTML shortcut methods for many HTML tags. HTML shortcuts are named after a single HTML element and return a fragment of HTML text. Example:
- print $q->blockquote(
- "Many years ago on the island of",
- $q->a({href=>"http://crete.org/"},"Crete"),
- "there lived a Minotaur named",
- $q->strong("Fred."),
- ),
- $q->hr;
This results in the following HTML code (extra newlines have been added for readability):
- <blockquote>
- Many years ago on the island of
- <a href="http://crete.org/">Crete</a> there lived
- a minotaur named <strong>Fred.</strong>
- </blockquote>
- <hr>
If you find the syntax for calling the HTML shortcuts awkward, you can import them into your namespace and dispense with the object syntax completely (see the next section for more details):
- use CGI ':standard';
- print blockquote(
- "Many years ago on the island of",
- a({href=>"http://crete.org/"},"Crete"),
- "there lived a minotaur named",
- strong("Fred."),
- ),
- hr;
The HTML methods will accept zero, one or multiple arguments. If you provide no arguments, you get a single tag:
- print hr; # <hr>
If you provide one or more string arguments, they are concatenated together with spaces and placed between opening and closing tags:
- print h1("Chapter","1"); # <h1>Chapter 1</h1>"
If the first argument is a hash reference, then the keys and values of the hash become the HTML tag's attributes:
- print a({-href=>'fred.html',-target=>'_new'},
- "Open a new frame");
- <a href="fred.html",target="_new">Open a new frame</a>
You may dispense with the dashes in front of the attribute names if you prefer:
- print img {src=>'fred.gif',align=>'LEFT'};
- <img align="LEFT" src="fred.gif">
Sometimes an HTML tag attribute has no argument. For example, ordered lists can be marked as COMPACT. The syntax for this is an argument that that points to an undef string:
Prior to CGI.pm version 2.41, providing an empty ('') string as an attribute argument was the same as providing undef. However, this has changed in order to accommodate those who want to create tags of the form <img alt="">. The difference is shown in these two pieces of code:
- CODE RESULT
- img({alt=>undef}) <img alt>
- img({alt=>''}) <img alt="">
One of the cool features of the HTML shortcuts is that they are distributive. If you give them an argument consisting of a reference to a list, the tag will be distributed across each element of the list. For example, here's one way to make an ordered list:
- print ul(
- li({-type=>'disc'},['Sneezy','Doc','Sleepy','Happy'])
- );
This example will result in HTML output that looks like this:
- <ul>
- <li type="disc">Sneezy</li>
- <li type="disc">Doc</li>
- <li type="disc">Sleepy</li>
- <li type="disc">Happy</li>
- </ul>
This is extremely useful for creating tables. For example:
Consider this bit of code:
- print blockquote(em('Hi'),'mom!'));
It will ordinarily return the string that you probably expect, namely:
- <blockquote><em>Hi</em> mom!</blockquote>
Note the space between the element "Hi" and the element "mom!". CGI.pm puts the extra space there using array interpolation, which is controlled by the magic $" variable. Sometimes this extra space is not what you want, for example, when you are trying to align a series of images. In this case, you can simply change the value of $" to an empty string.
- {
- local($") = '';
- print blockquote(em('Hi'),'mom!'));
- }
I suggest you put the code in a block as shown here. Otherwise the change to $" will affect all subsequent code until you explicitly reset it.
A few HTML tags don't follow the standard pattern for various reasons.
comment() generates an HTML comment (<!-- comment -->). Call it like
- print comment('here is my comment');
Because of conflicts with built-in Perl functions, the following functions begin with initial caps:
- Select
- Tr
- Link
- Delete
- Accept
- Sub
In addition, start_html(), end_html(), start_form(), end_form(), start_multipart_form() and all the fill-out form tags are special. See their respective sections.
By default, all HTML that is emitted by the form-generating functions is passed through a function called escapeHTML():
Provided that you have specified a character set of ISO-8859-1 (the default), the standard HTML escaping rules will be used. The "<" character becomes "<", ">" becomes ">", "&" becomes "&", and the quote character becomes """. In addition, the hexadecimal 0x8b and 0x9b characters, which some browsers incorrectly interpret as the left and right angle-bracket characters, are replaced by their numeric character entities ("‹" and "›"). If you manually change the charset, either by calling the charset() method explicitly or by passing a -charset argument to header(), then all characters will be replaced by their numeric entities, since CGI.pm has no lookup table for all the possible encodings.
escapeHTML()
expects the supplied string to be a character string. This means you
should Encode::decode data received from "outside" and Encode::encode your
strings before sending them back outside. If your source code UTF-8 encoded and
you want to upgrade string literals in your source to character strings, you
can use "use utf8". See perlunitut, perlunifaq and perlunicode for more
information on how Perl handles the difference between bytes and characters.
The automatic escaping does not apply to other shortcuts, such as h1(). You should call escapeHTML() yourself on untrusted data in order to protect your pages against nasty tricks that people may enter into guestbooks, etc.. To change the character set, use charset(). To turn autoescaping off completely, use autoEscape(0):
Get or set the current character set.
Get or set the value of the autoescape flag.
By default, all the HTML produced by these functions comes out as one long line without carriage returns or indentation. This is yuck, but it does reduce the size of the documents by 10-20%. To get pretty-printed output, please use CGI::Pretty, a subclass contributed by Brian Paulsen.
General note The various form-creating methods all return strings to the caller, containing the tag or tags that will create the requested form element. You are responsible for actually printing out these strings. It's set up this way so that you can place formatting tags around the form elements.
Another note The default values that you specify for the forms are only used the first time the script is invoked (when there is no query string). On subsequent invocations of the script (when there is a query string), the former values are used even if they are blank.
If you want to change the value of a field from its previous value, you have two choices:
(1) call the param() method to set it.
(2) use the -override (alias -force) parameter (a new feature in version 2.15). This forces the default value to be used, regardless of the previous value:
- print textfield(-name=>'field_name',
- -default=>'starting value',
- -override=>1,
- -size=>50,
- -maxlength=>80);
Yet another note By default, the text and labels of form elements are escaped according to HTML rules. This means that you can safely use "<CLICK ME>" as the label for a button. However, it also interferes with your ability to incorporate special HTML character sequences, such as Á, into your fields. If you wish to turn off automatic escaping, call the autoEscape() method with a false value immediately after creating the CGI object:
- $query = CGI->new;
- $query->autoEscape(0);
Note that autoEscape() is exclusively used to effect the behavior of how some CGI.pm HTML generation functions handle escaping. Calling escapeHTML() explicitly will always escape the HTML.
A Lurking Trap! Some of the form-element generating methods return multiple tags. In a scalar context, the tags will be concatenated together with spaces, or whatever is the current value of the $" global. In a list context, the methods will return a list of elements, allowing you to modify them if you wish. Usually you will not notice this behavior, but beware of this:
- printf("%s\n",end_form())
end_form() produces several tags, and only the first of them will be printed because the format only expects one value.
<p>
Prints out an <isindex> tag. Not very exciting. The parameter -action specifies the URL of the script to process the query. The default is to process the query with the current script.
start_form() will return a <form> tag with the optional method, action and form encoding that you specify. The defaults are:
- method: POST
- action: this script
- enctype: application/x-www-form-urlencoded for non-XHTML
- multipart/form-data for XHTML, see multipart/form-data below.
end_form() returns the closing </form> tag.
Start_form()'s enctype argument tells the browser how to package the various fields of the form before sending the form to the server. Two values are possible:
Note: These methods were previously named startform() and endform(). These methods are now DEPRECATED. Please use start_form() and end_form() instead.
This is the older type of encoding. It is compatible with many CGI scripts and is suitable for short fields containing text data. For your convenience, CGI.pm stores the name of this encoding type in &CGI::URL_ENCODED.
This is the newer type of encoding. It is suitable for forms that contain very large fields or that are intended for transferring binary data. Most importantly, it enables the "file upload" feature. For your convenience, CGI.pm stores the name of this encoding type in &CGI::MULTIPART
Forms that use this type of encoding are not easily interpreted by CGI scripts unless they use CGI.pm or another library designed to handle them.
If XHTML is activated (the default), then forms will be automatically created using this type of encoding.
The start_form() method uses the older form of encoding by default unless XHTML is requested. If you want to use the newer form of encoding by default, you can call start_multipart_form() instead of start_form(). The method end_multipart_form() is an alias to end_form().
JAVASCRIPTING: The -name and -onSubmit parameters are provided for use with JavaScript. The -name parameter gives the form a name so that it can be identified and manipulated by JavaScript functions. -onSubmit should point to a JavaScript function that will be executed just before the form is submitted to your server. You can use this opportunity to check the contents of the form for consistency and completeness. If you find something wrong, you can put up an alert box or maybe fix things up yourself. You can abort the submission by returning false from this function.
Usually the bulk of JavaScript functions are defined in a <script> block in the HTML header and -onSubmit points to one of these function call. See start_html() for details.
After starting a form, you will typically create one or more textfields, popup menus, radio groups and other form elements. Each of these elements takes a standard set of named arguments. Some elements also have optional arguments. The standard arguments are as follows:
The name of the field. After submission this name can be used to retrieve the field's value using the param() method.
The initial value of the field which will be returned to the script after form submission. Some form elements, such as text fields, take a single scalar -value argument. Others, such as popup menus, take a reference to an array of values. The two arguments are synonyms.
A numeric value that sets the order in which the form element receives focus when the user presses the tab key. Elements with lower values receive focus first.
A string identifier that can be used to identify this element to JavaScript and DHTML.
A boolean, which, if true, forces the element to take on the value specified by -value, overriding the sticky behavior described earlier for the -nosticky pragma.
These are used to assign JavaScript event handlers. See the JavaScripting section for more details.
Other common arguments are described in the next section. In addition to these, all attributes described in the HTML specifications are supported.
textfield() will return a text input field.
The first parameter is the required name for the field (-name).
The optional second parameter is the default starting value for the field contents (-value, formerly known as -default).
The optional third parameter is the size of the field in characters (-size).
The optional fourth parameter is the maximum number of characters the field will accept (-maxlength).
As with all these methods, the field will be initialized with its previous contents from earlier invocations of the script. When the form is processed, the value of the text field can be retrieved with:
- $value = param('foo');
If you want to reset it from its initial value after the script has been called once, you can do so like this:
- param('foo',"I'm taking over this value!");
textarea() is just like textfield, but it allows you to specify rows and columns for a multiline text entry box. You can provide a starting value for the field, which can be long and contain multiple lines.
password_field() is identical to textfield(), except that its contents will be starred out on the web page.
filefield() will return a file upload field. In order to take full advantage of this you must use the new multipart encoding scheme for the form. You can do this either by calling start_form() with an encoding type of &CGI::MULTIPART, or by calling the new method start_multipart_form() instead of vanilla start_form().
The first parameter is the required name for the field (-name).
The optional second parameter is the starting value for the field contents to be used as the default file name (-default).
For security reasons, browsers don't pay any attention to this field, and so the starting value will always be blank. Worse, the field loses its "sticky" behavior and forgets its previous contents. The starting value field is called for in the HTML specification, however, and possibly some browser will eventually provide support for it.
The optional third parameter is the size of the field in characters (-size).
The optional fourth parameter is the maximum number of characters the field will accept (-maxlength).
JAVASCRIPTING: The -onChange, -onFocus, -onBlur, -onMouseOver, -onMouseOut and -onSelect parameters are recognized. See textfield() for details.
When the form is processed, you can retrieve an IO::Handle compatible handle for a file upload field like this:
- $lightweight_fh = $q->upload('field_name');
- # undef may be returned if it's not a valid file handle
- if (defined $lightweight_fh) {
- # Upgrade the handle to one compatible with IO::Handle:
- my $io_handle = $lightweight_fh->handle;
- open (OUTFILE,'>>','/usr/local/web/users/feedback');
- while ($bytesread = $io_handle->read($buffer,1024)) {
- print OUTFILE $buffer;
- }
- }
In a list context, upload() will return an array of filehandles. This makes it possible to process forms that use the same name for multiple upload fields.
If you want the entered file name for the file, you can just call param():
- $filename = $q->param('field_name');
Different browsers will return slightly different things for the name. Some browsers return the filename only. Others return the full path to the file, using the path conventions of the user's machine. Regardless, the name returned is always the name of the file on the user's machine, and is unrelated to the name of the temporary file that CGI.pm creates during upload spooling (see below).
When a file is uploaded the browser usually sends along some information along with it in the format of headers. The information usually includes the MIME content type. To retrieve this information, call uploadInfo(). It returns a reference to a hash containing all the document headers.
- $filename = $q->param('uploaded_file');
- $type = $q->uploadInfo($filename)->{'Content-Type'};
- unless ($type eq 'text/html') {
- die "HTML FILES ONLY!";
- }
If you are using a machine that recognizes "text" and "binary" data modes, be sure to understand when and how to use them (see the Camel book). Otherwise you may find that binary files are corrupted during file uploads.
When processing an uploaded file, CGI.pm creates a temporary file on your hard disk and passes you a file handle to that file. After you are finished with the file handle, CGI.pm unlinks (deletes) the temporary file. If you need to you can access the temporary file directly. You can access the temp file for a file upload by passing the file name to the tmpFileName() method:
- $filename = $query->param('uploaded_file');
- $tmpfilename = $query->tmpFileName($filename);
The temporary file will be deleted automatically when your program exits unless you manually rename it. On some operating systems (such as Windows NT), you will need to close the temporary file's filehandle before your program exits. Otherwise the attempt to delete the temporary file will fail.
There are occasionally problems involving parsing the uploaded file. This usually happens when the user presses "Stop" before the upload is finished. In this case, CGI.pm will return undef for the name of the uploaded file and set cgi_error() to the string "400 Bad request (malformed multipart POST)". This error message is designed so that you can incorporate it into a status code to be sent to the browser. Example:
You are free to create a custom HTML page to complain about the error, if you wish.
CGI.pm gives you low-level access to file upload management through a file upload hook. You can use this feature to completely turn off the temp file storage of file uploads, or potentially write your own file upload progress meter.
This is much like the UPLOAD_HOOK facility available in Apache::Request, with the exception that the first argument to the callback is an Apache::Upload object, here it's the remote filename.
The $data
field is optional; it lets you pass configuration
information (e.g. a database handle) to your hook callback.
The $use_tempfile
field is a flag that lets you turn on and off
CGI.pm's use of a temporary disk-based file during file upload. If you
set this to a FALSE value (default true) then $q->param('uploaded_file')
will no longer work, and the only way to get at the uploaded data is
via the hook you provide.
If using the function-oriented interface, call the CGI::upload_hook() method before calling param() or any other CGI functions:
- CGI::upload_hook(\&hook [,$data [,$use_tempfile]]);
This method is not exported by default. You will have to import it explicitly if you wish to use it without the CGI:: prefix.
If you are using CGI.pm on a Windows platform and find that binary files get slightly larger when uploaded but that text files remain the same, then you have forgotten to activate binary mode on the output filehandle. Be sure to call binmode() on any handle that you create to write the uploaded file to disk.
( This section is here for completeness. if you are building a new application with CGI.pm, you can skip it. )
The original way to process file uploads with CGI.pm was to use param(). The
value it returns has a dual nature as both a file name and a lightweight
filehandle. This dual nature is problematic if you following the recommended
practice of having use strict
in your code. Perl will complain when you try
to use a string as a filehandle. More seriously, it is possible for the remote
user to type garbage into the upload field, in which case what you get from
param() is not a filehandle at all, but a string.
To solve this problem the upload() method was added, which always returns a lightweight filehandle. This generally works well, but will have trouble interoperating with some other modules because the file handle is not derived from IO::Handle. So that brings us to current recommendation given above, which is to call the handle() method on the file handle returned by upload(). That upgrades the handle to an IO::Handle. It's a big win for compatibility for a small penalty of loading IO::Handle the first time you call it.
- print popup_menu('menu_name',
- ['eenie','meenie','minie'],
- 'meenie');
- -or-
- %labels = ('eenie'=>'your first choice',
- 'meenie'=>'your second choice',
- 'minie'=>'your third choice');
- %attributes = ('eenie'=>{'class'=>'class of first choice'});
- print popup_menu('menu_name',
- ['eenie','meenie','minie'],
- 'meenie',\%labels,\%attributes);
- -or (named parameter style)-
- print popup_menu(-name=>'menu_name',
- -values=>['eenie','meenie','minie'],
- -default=>['meenie','minie'],
- -labels=>\%labels,
- -attributes=>\%attributes);
popup_menu() creates a menu.
The required first argument is the menu's name (-name).
The required second argument (-values) is an array reference containing the list of menu items in the menu. You can pass the method an anonymous array, as shown in the example, or a reference to a named array, such as "\@foo".
The optional third parameter (-default) is the name of the default menu choice. If not specified, the first item will be the default. The values of the previous choice will be maintained across queries. Pass an array reference to select multiple defaults.
The optional fourth parameter (-labels) is provided for people who want to use different values for the user-visible label inside the popup menu and the value returned to your script. It's a pointer to an hash relating menu values to user-visible labels. If you leave this parameter blank, the menu values will be displayed by default. (You can also leave a label undefined if you want to).
The optional fifth parameter (-attributes) is provided to assign any of the common HTML attributes to an individual menu item. It's a pointer to a hash relating menu values to another hash with the attribute's name as the key and the attribute's value as the value.
When the form is processed, the selected value of the popup menu can be retrieved using:
- $popup_menu_value = param('menu_name');
Named parameter style
- print popup_menu(-name=>'menu_name',
- -values=>[qw/eenie meenie minie/,
- optgroup(-name=>'optgroup_name',
- -values => ['moe','catch'],
- -attributes=>{'catch'=>{'class'=>'red'}})],
- -labels=>{'eenie'=>'one',
- 'meenie'=>'two',
- 'minie'=>'three'},
- -default=>'meenie');
- Old style
- print popup_menu('menu_name',
- ['eenie','meenie','minie',
- optgroup('optgroup_name', ['moe', 'catch'],
- {'catch'=>{'class'=>'red'}})],'meenie',
- {'eenie'=>'one','meenie'=>'two','minie'=>'three'});
optgroup() creates an option group within a popup menu.
The required first argument (-name) is the label attribute of the optgroup and is not inserted in the parameter list of the query.
The required second argument (-values) is an array reference containing the list of menu items in the menu. You can pass the method an anonymous array, as shown in the example, or a reference to a named array, such as \@foo. If you pass a HASH reference, the keys will be used for the menu values, and the values will be used for the menu labels (see -labels below).
The optional third parameter (-labels) allows you to pass a reference to a hash containing user-visible labels for one or more of the menu items. You can use this when you want the user to see one menu string, but have the browser return your program a different one. If you don't specify this, the value string will be used instead ("eenie", "meenie" and "minie" in this example). This is equivalent to using a hash reference for the -values parameter.
An optional fourth parameter (-labeled) can be set to a true value and indicates that the values should be used as the label attribute for each option element within the optgroup.
An optional fifth parameter (-novals) can be set to a true value and indicates to suppress the val attribute in each option element within the optgroup.
See the discussion on optgroup at W3C (http://www.w3.org/TR/REC-html40/interact/forms.html#edef-OPTGROUP) for details.
An optional sixth parameter (-attributes) is provided to assign any of the common HTML attributes to an individual menu item. It's a pointer to a hash relating menu values to another hash with the attribute's name as the key and the attribute's value as the value.
- print scrolling_list('list_name',
- ['eenie','meenie','minie','moe'],
- ['eenie','moe'],5,'true',{'moe'=>{'class'=>'red'}});
- -or-
- print scrolling_list('list_name',
- ['eenie','meenie','minie','moe'],
- ['eenie','moe'],5,'true',
- \%labels,%attributes);
- -or-
- print scrolling_list(-name=>'list_name',
- -values=>['eenie','meenie','minie','moe'],
- -default=>['eenie','moe'],
- -size=>5,
- -multiple=>'true',
- -labels=>\%labels,
- -attributes=>\%attributes);
scrolling_list() creates a scrolling list.
The first and second arguments are the list name (-name) and values (-values). As in the popup menu, the second argument should be an array reference.
The optional third argument (-default) can be either a reference to a list containing the values to be selected by default, or can be a single value to select. If this argument is missing or undefined, then nothing is selected when the list first appears. In the named parameter version, you can use the synonym "-defaults" for this parameter.
The optional fourth argument is the size of the list (-size).
The optional fifth argument can be set to true to allow multiple simultaneous selections (-multiple). Otherwise only one selection will be allowed at a time.
The optional sixth argument is a pointer to a hash containing long user-visible labels for the list items (-labels). If not provided, the values will be displayed.
The optional sixth parameter (-attributes) is provided to assign any of the common HTML attributes to an individual menu item. It's a pointer to a hash relating menu values to another hash with the attribute's name as the key and the attribute's value as the value.
When this form is processed, all selected list items will be returned as a list under the parameter name 'list_name'. The values of the selected items can be retrieved with:
- @selected = param('list_name');
- print checkbox_group(-name=>'group_name',
- -values=>['eenie','meenie','minie','moe'],
- -default=>['eenie','moe'],
- -linebreak=>'true',
- -disabled => ['moe'],
- -labels=>\%labels,
- -attributes=>\%attributes);
- print checkbox_group('group_name',
- ['eenie','meenie','minie','moe'],
- ['eenie','moe'],'true',\%labels,
- {'moe'=>{'class'=>'red'}});
- HTML3-COMPATIBLE BROWSERS ONLY:
- print checkbox_group(-name=>'group_name',
- -values=>['eenie','meenie','minie','moe'],
- -rows=2,-columns=>2);
checkbox_group() creates a list of checkboxes that are related by the same name.
The first and second arguments are the checkbox name and values, respectively (-name and -values). As in the popup menu, the second argument should be an array reference. These values are used for the user-readable labels printed next to the checkboxes as well as for the values passed to your script in the query string.
The optional third argument (-default) can be either a reference to a list containing the values to be checked by default, or can be a single value to checked. If this argument is missing or undefined, then nothing is selected when the list first appears.
The optional fourth argument (-linebreak) can be set to true to place line breaks between the checkboxes so that they appear as a vertical list. Otherwise, they will be strung together on a horizontal line.
The optional -labels argument is a pointer to a hash relating the checkbox values to the user-visible labels that will be printed next to them. If not provided, the values will be used as the default.
The optional parameters -rows, and -columns cause checkbox_group() to return an HTML3 compatible table containing the checkbox group formatted with the specified number of rows and columns. You can provide just the -columns parameter if you wish; checkbox_group will calculate the correct number of rows for you.
The option -disabled takes an array of checkbox values and disables them by greying them out (this may not be supported by all browsers).
The optional -attributes argument is provided to assign any of the common HTML attributes to an individual menu item. It's a pointer to a hash relating menu values to another hash with the attribute's name as the key and the attribute's value as the value.
The optional -tabindex argument can be used to control the order in which radio buttons receive focus when the user presses the tab button. If passed a scalar numeric value, the first element in the group will receive this tab index and subsequent elements will be incremented by one. If given a reference to an array of radio button values, then the indexes will be jiggered so that the order specified in the array will correspond to the tab order. You can also pass a reference to a hash in which the hash keys are the radio button values and the values are the tab indexes of each button. Examples:
- -tabindex => 100 # this group starts at index 100 and counts up
- -tabindex => ['moe','minie','eenie','meenie'] # tab in this order
- -tabindex => {meenie=>100,moe=>101,minie=>102,eenie=>200} # tab in this order
The optional -labelattributes argument will contain attributes attached to the <label> element that surrounds each button.
When the form is processed, all checked boxes will be returned as a list under the parameter name 'group_name'. The values of the "on" checkboxes can be retrieved with:
- @turned_on = param('group_name');
The value returned by checkbox_group() is actually an array of button elements. You can capture them and use them within tables, lists, or in other creative ways:
- @h = checkbox_group(-name=>'group_name',-values=>\@values);
- &use_in_creative_way(@h);
checkbox() is used to create an isolated checkbox that isn't logically related to any others.
The first parameter is the required name for the checkbox (-name). It will also be used for the user-readable label printed next to the checkbox.
The optional second parameter (-checked) specifies that the checkbox is turned on by default. Synonyms are -selected and -on.
The optional third parameter (-value) specifies the value of the checkbox when it is checked. If not provided, the word "on" is assumed.
The optional fourth parameter (-label) is the user-readable label to be attached to the checkbox. If not provided, the checkbox name is used.
The value of the checkbox can be retrieved using:
- $turned_on = param('checkbox_name');
- print radio_group(-name=>'group_name',
- -values=>['eenie','meenie','minie'],
- -default=>'meenie',
- -linebreak=>'true',
- -labels=>\%labels,
- -attributes=>\%attributes);
- -or-
- print radio_group('group_name',['eenie','meenie','minie'],
- 'meenie','true',\%labels,\%attributes);
- HTML3-COMPATIBLE BROWSERS ONLY:
- print radio_group(-name=>'group_name',
- -values=>['eenie','meenie','minie','moe'],
- -rows=2,-columns=>2);
radio_group() creates a set of logically-related radio buttons (turning one member of the group on turns the others off)
The first argument is the name of the group and is required (-name).
The second argument (-values) is the list of values for the radio buttons. The values and the labels that appear on the page are identical. Pass an array reference in the second argument, either using an anonymous array, as shown, or by referencing a named array as in "\@foo".
The optional third parameter (-default) is the name of the default button to turn on. If not specified, the first item will be the default. You can provide a nonexistent button name, such as "-" to start up with no buttons selected.
The optional fourth parameter (-linebreak) can be set to 'true' to put line breaks between the buttons, creating a vertical list.
The optional fifth parameter (-labels) is a pointer to an associative array relating the radio button values to user-visible labels to be used in the display. If not provided, the values themselves are displayed.
All modern browsers can take advantage of the optional parameters -rows, and -columns. These parameters cause radio_group() to return an HTML3 compatible table containing the radio group formatted with the specified number of rows and columns. You can provide just the -columns parameter if you wish; radio_group will calculate the correct number of rows for you.
To include row and column headings in the returned table, you can use the -rowheaders and -colheaders parameters. Both of these accept a pointer to an array of headings to use. The headings are just decorative. They don't reorganize the interpretation of the radio buttons -- they're still a single named unit.
The optional -tabindex argument can be used to control the order in which radio buttons receive focus when the user presses the tab button. If passed a scalar numeric value, the first element in the group will receive this tab index and subsequent elements will be incremented by one. If given a reference to an array of radio button values, then the indexes will be jiggered so that the order specified in the array will correspond to the tab order. You can also pass a reference to a hash in which the hash keys are the radio button values and the values are the tab indexes of each button. Examples:
- -tabindex => 100 # this group starts at index 100 and counts up
- -tabindex => ['moe','minie','eenie','meenie'] # tab in this order
- -tabindex => {meenie=>100,moe=>101,minie=>102,eenie=>200} # tab in this order
The optional -attributes argument is provided to assign any of the common HTML attributes to an individual menu item. It's a pointer to a hash relating menu values to another hash with the attribute's name as the key and the attribute's value as the value.
The optional -labelattributes argument will contain attributes attached to the <label> element that surrounds each button.
When the form is processed, the selected radio button can be retrieved using:
- $which_radio_button = param('group_name');
The value returned by radio_group() is actually an array of button elements. You can capture them and use them within tables, lists, or in other creative ways:
- @h = radio_group(-name=>'group_name',-values=>\@values);
- &use_in_creative_way(@h);
submit() will create the query submission button. Every form should have one of these.
The first argument (-name) is optional. You can give the button a name if you have several submission buttons in your form and you want to distinguish between them.
The second argument (-value) is also optional. This gives the button a value that will be passed to your script in the query string. The name will also be used as the user-visible label.
You can use -label as an alias for -value. I always get confused about which of -name and -value changes the user-visible label on the button.
You can figure out which button was pressed by using different values for each one:
- $which_one = param('button_name');
reset() creates the "reset" button. Note that it restores the form to its value from the last time the script was called, NOT necessarily to the defaults.
Note that this conflicts with the Perl reset() built-in. Use CORE::reset() to get the original reset function.
- print defaults('button_label')
defaults() creates a button that, when invoked, will cause the form to be completely reset to its defaults, wiping out all the changes the user ever made.
hidden() produces a text field that can't be seen by the user. It is useful for passing state variable information from one invocation of the script to the next.
The first argument is required and specifies the name of this field (-name).
The second argument is also required and specifies its value (-default). In the named parameter style of calling, you can provide a single value here or a reference to a whole list
Fetch the value of a hidden field this way:
- $hidden_value = param('hidden_name');
Note, that just like all the other form elements, the value of a hidden field is "sticky". If you want to replace a hidden field with some other values after the script has been called once you'll have to do it manually:
- param('hidden_name','new','values','here');
image_button() produces a clickable image. When it's clicked on the position of the click is returned to your script as "button_name.x" and "button_name.y", where "button_name" is the name you've assigned to it.
The first argument (-name) is required and specifies the name of this field.
The second argument (-src) is also required and specifies the URL
Fetch the value of the button this way: $x = param('button_name.x'); $y = param('button_name.y');
button() produces an <input>
tag with type="button"
. When it's
pressed the fragment of JavaScript code pointed to by the -onClick parameter
will be executed.
Browsers support a so-called "cookie" designed to help maintain state within a browser session. CGI.pm has several methods that support cookies.
A cookie is a name=value pair much like the named parameters in a CGI query string. CGI scripts create one or more cookies and send them to the browser in the HTTP header. The browser maintains a list of cookies that belong to a particular Web server, and returns them to the CGI script during subsequent interactions.
In addition to the required name=value pair, each cookie has several optional attributes:
This is a time/date string (in a special GMT format) that indicates when a cookie expires. The cookie will be saved and returned to your script until this expiration date is reached if the user exits the browser and restarts it. If an expiration date isn't specified, the cookie will remain active until the user quits the browser.
This is a partial or complete domain name for which the cookie is valid. The browser will return the cookie to any host that matches the partial domain name. For example, if you specify a domain name of ".capricorn.com", then the browser will return the cookie to Web servers running on any of the machines "www.capricorn.com", "www2.capricorn.com", "feckless.capricorn.com", etc. Domain names must contain at least two periods to prevent attempts to match on top level domains like ".edu". If no domain is specified, then the browser will only return the cookie to servers on the host the cookie originated from.
If you provide a cookie path attribute, the browser will check it against your script's URL before returning the cookie. For example, if you specify the path "/cgi-bin", then the cookie will be returned to each of the scripts "/cgi-bin/tally.pl", "/cgi-bin/order.pl", and "/cgi-bin/customer_service/complain.pl", but not to the script "/cgi-private/site_admin.pl". By default, path is set to "/", which causes the cookie to be sent to any CGI script on your site.
If the "secure" attribute is set, the cookie will only be sent to your script if the CGI request is occurring on a secure channel, such as SSL.
The interface to HTTP cookies is the cookie() method:
- $cookie = cookie(-name=>'sessionID',
- -value=>'xyzzy',
- -expires=>'+1h',
- -path=>'/cgi-bin/database',
- -domain=>'.capricorn.org',
- -secure=>1);
- print header(-cookie=>$cookie);
cookie() creates a new cookie. Its parameters include:
The name of the cookie (required). This can be any string at all. Although browsers limit their cookie names to non-whitespace alphanumeric characters, CGI.pm removes this restriction by escaping and unescaping cookies behind the scenes.
The value of the cookie. This can be any scalar value, array reference, or even hash reference. For example, you can store an entire hash into a cookie this way:
- $cookie=cookie(-name=>'family information',
- -value=>\%childrens_ages);
The optional partial path for which this cookie will be valid, as described above.
The optional partial domain for which this cookie will be valid, as described above.
The optional expiration date for this cookie. The format is as described in the section on the header() method:
- "+1h" one hour from now
If set to true, this cookie will only be used within a secure SSL session.
The cookie created by cookie() must be incorporated into the HTTP header within the string returned by the header() method:
To create multiple cookies, give header() an array reference:
- $cookie1 = cookie(-name=>'riddle_name',
- -value=>"The Sphynx's Question");
- $cookie2 = cookie(-name=>'answers',
- -value=>\%answers);
- print header(-cookie=>[$cookie1,$cookie2]);
To retrieve a cookie, request it by name by calling cookie() method without the -value parameter. This example uses the object-oriented form:
- use CGI;
- $query = CGI->new;
- $riddle = $query->cookie('riddle_name');
- %answers = $query->cookie('answers');
Cookies created with a single scalar value, such as the "riddle_name" cookie, will be returned in that form. Cookies with array and hash values can also be retrieved.
The cookie and CGI namespaces are separate. If you have a parameter named 'answers' and a cookie named 'answers', the values retrieved by param() and cookie() are independent of each other. However, it's simple to turn a CGI parameter into a cookie, and vice-versa:
- # turn a CGI parameter into a cookie
- $c=cookie(-name=>'answers',-value=>[param('answers')]);
- # vice-versa
- param(-name=>'answers',-value=>[cookie('answers')]);
If you call cookie() without any parameters, it will return a list of the names of all cookies passed to your script:
- @cookies = cookie();
See the cookie.cgi example script for some ideas on how to use cookies effectively.
It's possible for CGI.pm scripts to write into several browser panels and windows using the HTML 4 frame mechanism. There are three techniques for defining new frames programmatically:
After writing out the HTTP header, instead of creating a standard HTML document using the start_html() call, create a <frameset> document that defines the frames on the page. Specify your script(s) (with appropriate parameters) as the SRC for each of the frames.
There is no specific support for creating <frameset> sections in CGI.pm, but the HTML is very simple to write.
You may provide a -target parameter to the header() method:
- print header(-target=>'ResultsWindow');
This will tell the browser to load the output of your script into the
frame named "ResultsWindow". If a frame of that name doesn't already
exist, the browser will pop up a new window and load your script's
document into that. There are a number of magic names that you can
use for targets. See the HTML <frame>
documentation for details.
You can specify the frame to load in the FORM tag itself. With CGI.pm it looks like this:
- print start_form(-target=>'ResultsWindow');
When your script is reinvoked by the form, its output will be loaded into the frame named "ResultsWindow". If one doesn't already exist a new window will be created.
The script "frameset.cgi" in the examples directory shows one way to create pages in which the fill-out form and the response live in side-by-side frames.
The usual way to use JavaScript is to define a set of functions in a <SCRIPT> block inside the HTML header and then to register event handlers in the various elements of the page. Events include such things as the mouse passing over a form element, a button being clicked, the contents of a text field changing, or a form being submitted. When an event occurs that involves an element that has registered an event handler, its associated JavaScript code gets called.
The elements that can register event handlers include the <BODY> of an HTML document, hypertext links, all the various elements of a fill-out form, and the form itself. There are a large number of events, and each applies only to the elements for which it is relevant. Here is a partial list:
The browser is loading the current document. Valid in:
- + The HTML <BODY> section only.
The browser is closing the current page or frame. Valid for:
- + The HTML <BODY> section only.
The user has pressed the submit button of a form. This event happens just before the form is submitted, and your function can return a value of false in order to abort the submission. Valid for:
- + Forms only.
The mouse has clicked on an item in a fill-out form. Valid for:
- + Buttons (including submit, reset, and image buttons)
- + Checkboxes
- + Radio buttons
The user has changed the contents of a field. Valid for:
- + Text fields
- + Text areas
- + Password fields
- + File fields
- + Popup Menus
- + Scrolling lists
The user has selected a field to work with. Valid for:
- + Text fields
- + Text areas
- + Password fields
- + File fields
- + Popup Menus
- + Scrolling lists
The user has deselected a field (gone to work somewhere else). Valid for:
- + Text fields
- + Text areas
- + Password fields
- + File fields
- + Popup Menus
- + Scrolling lists
The user has changed the part of a text field that is selected. Valid for:
- + Text fields
- + Text areas
- + Password fields
- + File fields
The mouse has moved over an element.
- + Text fields
- + Text areas
- + Password fields
- + File fields
- + Popup Menus
- + Scrolling lists
The mouse has moved off an element.
- + Text fields
- + Text areas
- + Password fields
- + File fields
- + Popup Menus
- + Scrolling lists
In order to register a JavaScript event handler with an HTML element, just use the event name as a parameter when you call the corresponding CGI method. For example, to have your validateAge() JavaScript code executed every time the textfield named "age" changes, generate the field like this:
- print textfield(-name=>'age',-onChange=>"validateAge(this)");
This example assumes that you've already declared the validateAge() function by incorporating it into a <SCRIPT> block. The CGI.pm start_html() method provides a convenient way to create this section.
Similarly, you can create a form that checks itself over for consistency and alerts the user if some essential value is missing by creating it this way: print start_form(-onSubmit=>"validateMe(this)");
See the javascript.cgi script for a demonstration of how this all works.
CGI.pm has limited support for HTML3's cascading style sheets (css). To incorporate a stylesheet into your document, pass the start_html() method a -style parameter. The value of this parameter may be a scalar, in which case it is treated as the source URL for the stylesheet, or it may be a hash reference. In the latter case you should provide the hash with one or more of -src or -code. -src points to a URL where an externally-defined stylesheet can be found. -code points to a scalar value to be incorporated into a <style> section. Style definitions in -code override similarly-named ones in -src, hence the name "cascading."
You may also specify the type of the stylesheet by adding the optional -type parameter to the hash pointed to by -style. If not specified, the style defaults to 'text/css'.
To refer to a style within the body of your document, add the -class parameter to any HTML element:
- print h1({-class=>'Fancy'},'Welcome to the Party');
Or define styles on the fly with the -style parameter:
- print h1({-style=>'Color: red;'},'Welcome to Hell');
You may also use the new span() element to apply a style to a section of text:
- print span({-style=>'Color: red;'},
- h1('Welcome to Hell'),
- "Where did that handbasket get to?"
- );
Note that you must import the ":html3" definitions to have the span() method available. Here's a quick and dirty example of using CSS's. See the CSS specification at http://www.w3.org/Style/CSS/ for more information.
- use CGI qw/:standard :html3/;
- #here's a stylesheet incorporated directly into the page
- $newStyle=<<END;
- <!--
- P.Tip {
- margin-right: 50pt;
- margin-left: 50pt;
- color: red;
- }
- P.Alert {
- font-size: 30pt;
- font-family: sans-serif;
- color: red;
- }
- -->
- END
- print header();
- print start_html( -title=>'CGI with Style',
- -style=>{-src=>'http://www.capricorn.com/style/st1.css',
- -code=>$newStyle}
- );
- print h1('CGI with Style'),
- p({-class=>'Tip'},
- "Better read the cascading style sheet spec before playing with this!"),
- span({-style=>'color: magenta'},
- "Look Mom, no hands!",
- p(),
- "Whooo wee!"
- );
- print end_html;
Pass an array reference to -code or -src in order to incorporate multiple stylesheets into your document.
Should you wish to incorporate a verbatim stylesheet that includes arbitrary formatting in the header, you may pass a -verbatim tag to the -style hash, as follows:
print start_html (-style => {-verbatim => '@import url("/server-common/css/'.$cssFile.'");', -src => '/server-common/css/core.css'});
This will generate an HTML header that contains this:
- <link rel="stylesheet" type="text/css" href="/server-common/css/core.css">
- <style type="text/css">
- @import url("/server-common/css/main.css");
- </style>
Any additional arguments passed in the -style value will be incorporated into the <link> tag. For example:
- start_html(-style=>{-src=>['/styles/print.css','/styles/layout.css'],
- -media => 'all'});
This will give:
- <link rel="stylesheet" type="text/css" href="/styles/print.css" media="all"/>
- <link rel="stylesheet" type="text/css" href="/styles/layout.css" media="all"/>
<p>
To make more complicated <link> tags, use the Link() function and pass it to start_html() in the -head argument, as in:
- @h = (Link({-rel=>'stylesheet',-type=>'text/css',-src=>'/ss/ss.css',-media=>'all'}),
- Link({-rel=>'stylesheet',-type=>'text/css',-src=>'/ss/fred.css',-media=>'paper'}));
- print start_html({-head=>\@h})
To create primary and "alternate" stylesheet, use the -alternate option:
- start_html(-style=>{-src=>[
- {-src=>'/styles/print.css'},
- {-src=>'/styles/alt.css',-alternate=>1}
- ]
- });
If you are running the script from the command line or in the perl debugger, you can pass the script a list of keywords or parameter=value pairs on the command line or from standard input (you don't have to worry about tricking your script into reading from environment variables). You can pass keywords like this:
- your_script.pl keyword1 keyword2 keyword3
or this:
- your_script.pl keyword1+keyword2+keyword3
or this:
- your_script.pl name1=value1 name2=value2
or this:
- your_script.pl name1=value1&name2=value2
To turn off this feature, use the -no_debug pragma.
To test the POST method, you may enable full debugging with the -debug pragma. This will allow you to feed newline-delimited name=value pairs to the script on standard input.
When debugging, you can use quotes and backslashes to escape characters in the familiar shell manner, letting you place spaces and other funny characters in your parameter=value pairs:
- your_script.pl "name1='I am a long value'" "name2=two\ words"
Finally, you can set the path info for the script by prefixing the first name/value parameter with the path followed by a question mark (?):
- your_script.pl /your/path/here?name1=value1&name2=value2
The Dump() method produces a string consisting of all the query's name/value pairs formatted nicely as a nested list. This is useful for debugging purposes:
- print Dump
Produces something that looks like:
- <ul>
- <li>name1
- <ul>
- <li>value1
- <li>value2
- </ul>
- <li>name2
- <ul>
- <li>value1
- </ul>
- </ul>
As a shortcut, you can interpolate the entire CGI object into a string and it will be replaced with the a nice HTML dump shown above:
- $query=CGI->new;
- print "<h2>Current Values</h2> $query\n";
Some of the more useful environment variables can be fetched through this interface. The methods are as follows:
Return a list of MIME types that the remote browser accepts. If you give this method a single argument corresponding to a MIME type, as in Accept('text/html'), it will return a floating point value corresponding to the browser's preference for this type from 0.0 (don't want) to 1.0. Glob types (e.g. text/*) in the browser's accept list are handled correctly.
Note that the capitalization changed between version 2.43 and 2.44 in order to avoid conflict with Perl's accept() function.
Returns the HTTP_COOKIE variable. Cookies have a special format, and this method call just returns the raw form (?cookie dough). See cookie() for ways of setting and retrieving cooked cookies.
Called with no parameters, raw_cookie() returns the packed cookie structure. You can separate it into individual cookies by splitting on the character sequence "; ". Called with the name of a cookie, retrieves the unescaped form of the cookie. You can use the regular cookie() method to get the names, or use the raw_fetch() method from the CGI::Cookie module.
Returns the HTTP_USER_AGENT variable. If you give this method a single argument, it will attempt to pattern match on it, allowing you to do something like user_agent(Mozilla);
Returns additional path information from the script URL. E.G. fetching /cgi-bin/your_script/additional/stuff will result in path_info() returning "/additional/stuff".
NOTE: The Microsoft Internet Information Server is broken with respect to additional path information. If you use the Perl DLL library, the IIS server will attempt to execute the additional path information as a Perl script. If you use the ordinary file associations mapping, the path information will be present in the environment, but incorrect. The best thing to do is to avoid using additional path information in CGI scripts destined for use with IIS.
As per path_info() but returns the additional path information translated into a physical path, e.g. "/usr/local/etc/httpd/htdocs/additional/stuff".
The Microsoft IIS is broken with respect to the translated path as well.
Returns either the remote host name or IP address. if the former is unavailable.
Returns the remote host IP address, or 127.0.0.1 if the address is unavailable.
Return the URL of the page the browser was viewing prior to fetching your script. Not available for all browsers.
Return the authorization/verification method in use for this script, if any.
Returns the name of the server, usually the machine's host name.
When using virtual hosts, returns the name of the host that the browser attempted to contact
Return the port that the server is listening on.
Like server_port() except that it takes virtual hosts into account. Use this when running with virtual hosts.
Returns the server software and version number.
Return the authorization/verification name used for user verification, if this script is protected.
Attempt to obtain the remote user's name, using a variety of different techniques. This only works with older browsers such as Mosaic. Newer browsers do not report the user name for privacy reasons!
Returns the method used to access your script, usually one of 'POST', 'GET' or 'HEAD'.
Returns the content_type of data submitted in a POST, generally multipart/form-data or application/x-www-form-urlencoded
Called with no arguments returns the list of HTTP environment variables, including such things as HTTP_USER_AGENT, HTTP_ACCEPT_LANGUAGE, and HTTP_ACCEPT_CHARSET, corresponding to the like-named HTTP header fields in the request. Called with the name of an HTTP header field, returns its value. Capitalization and the use of hyphens versus underscores are not significant.
For example, all three of these examples are equivalent:
- $requested_language = http('Accept-language');
- $requested_language = http('Accept_language');
- $requested_language = http('HTTP_ACCEPT_LANGUAGE');
The same as http(), but operates on the HTTPS environment variables present when the SSL protocol is in effect. Can be used to determine whether SSL is turned on.
NPH, or "no-parsed-header", scripts bypass the server completely by sending the complete HTTP header directly to the browser. This has slight performance benefits, but is of most use for taking advantage of HTTP extensions that are not directly supported by your server, such as server push and PICS headers.
Servers use a variety of conventions for designating CGI scripts as NPH. Many Unix servers look at the beginning of the script's name for the prefix "nph-". The Macintosh WebSTAR server and Microsoft's Internet Information Server, in contrast, try to decide whether a program is an NPH script by examining the first line of script output.
CGI.pm supports NPH scripts with a special NPH mode. When in this mode, CGI.pm will output the necessary extra header information when the header() and redirect() methods are called.
The Microsoft Internet Information Server requires NPH mode. As of version 2.30, CGI.pm will automatically detect when the script is running under IIS and put itself into this mode. You do not need to do this manually, although it won't hurt anything if you do. However, note that if you have applied Service Pack 6, much of the functionality of NPH scripts, including the ability to redirect while setting a cookie, do not work at all on IIS without a special patch from Microsoft. See http://web.archive.org/web/20010812012030/http://support.microsoft.com/support/kb/articles/Q280/3/41.ASP Non-Parsed Headers Stripped From CGI Applications That Have nph- Prefix in Name.
Simply add the "-nph" pragma to the list of symbols to be imported into your script:
- use CGI qw(:standard -nph)
Call nph() with a non-zero parameter at any point after using CGI.pm in your program.
- CGI->nph(1)
in the header() and redirect() statements:
- print header(-nph=>1);
CGI.pm provides four simple functions for producing multipart documents of the type needed to implement server push. These functions were graciously provided by Ed Jordan <ed@fidalgo.net>. To import these into your namespace, you must import the ":push" set. You are also advised to put the script into NPH mode and to set $| to 1 to avoid buffering problems.
Here is a simple script that demonstrates server push:
This script initializes server push by calling multipart_init(). It then enters a loop in which it begins a new multipart section by calling multipart_start(), prints the current local time, and ends a multipart section with multipart_end(). It then sleeps a second, and begins again. On the final iteration, it ends the multipart section with multipart_final() rather than with multipart_end().
- multipart_init(-boundary=>$boundary);
Initialize the multipart system. The -boundary argument specifies what MIME boundary string to use to separate parts of the document. If not provided, CGI.pm chooses a reasonable boundary for you.
- multipart_start(-type=>$type)
Start a new part of the multipart document using the specified MIME type. If not specified, text/html is assumed.
- multipart_end()
End a part. You must remember to call multipart_end() once for each multipart_start(), except at the end of the last part of the multipart document when multipart_final() should be called instead of multipart_end().
- multipart_final()
End all parts. You should call multipart_final() rather than multipart_end() at the end of the last part of the multipart document.
Users interested in server push applications should also have a look at the CGI::Push module.
A potential problem with CGI.pm is that, by default, it attempts to process form POSTings no matter how large they are. A wily hacker could attack your site by sending a CGI script a huge POST of many megabytes. CGI.pm will attempt to read the entire POST into a variable, growing hugely in size until it runs out of memory. While the script attempts to allocate the memory the system may slow down dramatically. This is a form of denial of service attack.
Another possible attack is for the remote user to force CGI.pm to accept a huge file upload. CGI.pm will accept the upload and store it in a temporary directory even if your script doesn't expect to receive an uploaded file. CGI.pm will delete the file automatically when it terminates, but in the meantime the remote user may have filled up the server's disk space, causing problems for other programs.
The best way to avoid denial of service attacks is to limit the amount of memory, CPU time and disk space that CGI scripts can use. Some Web servers come with built-in facilities to accomplish this. In other cases, you can use the shell limit or ulimit commands to put ceilings on CGI resource usage.
CGI.pm also has some simple built-in protections against denial of service attacks, but you must activate them before you can use them. These take the form of two global variables in the CGI name space:
If set to a non-negative integer, this variable puts a ceiling on the size of POSTings, in bytes. If CGI.pm detects a POST that is greater than the ceiling, it will immediately exit with an error message. This value will affect both ordinary POSTs and multipart POSTs, meaning that it limits the maximum size of file uploads as well. You should set this to a reasonably high value, such as 1 megabyte.
If set to a non-zero value, this will disable file uploads completely. Other fill-out form values will work as usual.
You can use these variables in either of two ways.
Set the variable at the top of the script, right after the "use" statement:
Open up CGI.pm, find the definitions for $POST_MAX and $DISABLE_UPLOADS, and set them to the desired values. You'll find them towards the top of the file in a subroutine named initialize_globals().
An attempt to send a POST larger than $POST_MAX bytes will cause param() to return an empty CGI parameter list. You can test for this event by checking cgi_error(), either after you create the CGI object or, if you are using the function-oriented interface, call <param()> for the first time. If the POST was intercepted, then cgi_error() will return the message "413 POST too large".
This error message is actually defined by the HTTP protocol, and is designed to be returned to the browser as the CGI script's status code. For example:
However it isn't clear that any browser currently knows what to do with this status code. It might be better just to create an HTML page that warns the user of the problem.
To make it easier to port existing programs that use cgi-lib.pl the compatibility routine "ReadParse" is provided. Porting is simple:
OLD VERSION
NEW VERSION
CGI.pm's ReadParse() routine creates a tied variable named %in, which can be accessed to obtain the query variables. Like ReadParse, you can also provide your own variable. Infrequently used features of ReadParse, such as the creation of @in and $in variables, are not supported.
Once you use ReadParse, you can retrieve the query object itself this way:
- $q = $in{CGI};
- print $q->textfield(-name=>'wow',
- -value=>'does this really work?');
This allows you to start using the more interesting features of CGI.pm without rewriting your old scripts from scratch.
An even simpler way to mix cgi-lib calls with CGI.pm calls is to import both the
:cgi-lib
and :standard
method:
In compatibility mode, the following cgi-lib.pl functions are available for your use:
- ReadParse()
- PrintHeader()
- HtmlTop()
- HtmlBot()
- SplitParam()
- MethGet()
- MethPost()
- * Extended form of ReadParse()
- The extended form of ReadParse() that provides for file upload
- spooling, is not available.
- * MyBaseURL()
- This function is not available. Use CGI.pm's url() method instead.
- * MyFullURL()
- This function is not available. Use CGI.pm's self_url() method
- instead.
- * CgiError(), CgiDie()
- These functions are not supported. Look at CGI::Carp for the way I
- prefer to handle error messages.
- * PrintVariables()
- This function is not available. To achieve the same effect,
- just print out the CGI object:
- use CGI qw(:standard);
- $q = CGI->new;
- print h1("The Variables Are"),$q;
- * PrintEnv()
- This function is not available. You'll have to roll your own if you really need it.
The CGI.pm distribution is copyright 1995-2007, Lincoln D. Stein. It is distributed under GPL and the Artistic License 2.0. It is currently maintained by Mark Stosberg with help from many contributors.
Address bug reports and comments to: https://rt.cpan.org/Public/Dist/Display.html?Queue=CGI.pm When sending bug reports, please provide the version of CGI.pm, the version of Perl, the name and version of your Web server, and the name and version of the operating system you are using. If the problem is even remotely browser dependent, please provide information about the affected browsers as well.
Thanks very much to:
for suggestions and bug fixes.
- #!/usr/local/bin/perl
- use CGI ':standard';
- print header;
- print start_html("Example CGI.pm Form");
- print "<h1> Example CGI.pm Form</h1>\n";
- print_prompt();
- do_work();
- print_tail();
- print end_html;
- sub print_prompt {
- print start_form;
- print "<em>What's your name?</em><br>";
- print textfield('name');
- print checkbox('Not my real name');
- print "<p><em>Where can you find English Sparrows?</em><br>";
- print checkbox_group(
- -name=>'Sparrow locations',
- -values=>[England,France,Spain,Asia,Hoboken],
- -linebreak=>'yes',
- -defaults=>[England,Asia]);
- print "<p><em>How far can they fly?</em><br>",
- radio_group(
- -name=>'how far',
- -values=>['10 ft','1 mile','10 miles','real far'],
- -default=>'1 mile');
- print "<p><em>What's your favorite color?</em> ";
- print popup_menu(-name=>'Color',
- -values=>['black','brown','red','yellow'],
- -default=>'red');
- print hidden('Reference','Monty Python and the Holy Grail');
- print "<p><em>What have you got there?</em><br>";
- print scrolling_list(
- -name=>'possessions',
- -values=>['A Coconut','A Grail','An Icon',
- 'A Sword','A Ticket'],
- -size=>5,
- -multiple=>'true');
- print "<p><em>Any parting comments?</em><br>";
- print textarea(-name=>'Comments',
- -rows=>10,
- -columns=>50);
- print "<p>",reset;
- print submit('Action','Shout');
- print submit('Action','Scream');
- print end_form;
- print "<hr>\n";
- }
- sub do_work {
- print "<h2>Here are the current settings in this form</h2>";
- for my $key (param) {
- print "<strong>$key</strong> -> ";
- my @values = param($key);
- print join(", ",@values),"<br>\n";
- }
- }
- sub print_tail {
- print <<END;
- <hr>
- <address>Lincoln D. Stein</address><br>
- <a href="/">Home Page</a>
- END
- }
Please report them.
CGI::Carp - provides a Carp implementation tailored to the CGI environment.
CGI::Fast - supports running CGI applications under FastCGI
CGI::Pretty - pretty prints HTML generated by CGI.pm (with a performance penalty)
CORE - Namespace for Perl's core routines
The CORE
namespace gives access to the original built-in functions of
Perl. The CORE
package is built into
Perl, and therefore you do not need to use or
require a hypothetical "CORE" module prior to accessing routines in this
namespace.
A list of the built-in functions in Perl can be found in perlfunc.
For all Perl keywords, a CORE::
prefix will force the built-in function
to be used, even if it has been overridden or would normally require the
feature pragma. Despite appearances, this has nothing to do with the
CORE package, but is part of Perl's syntax.
For many Perl functions, the CORE package contains real subroutines. This
feature is new in Perl 5.16. You can take references to these and make
aliases. However, some can only be called as barewords; i.e., you cannot
use ampersand syntax (&foo
) or call them through references. See the
shove
example above. These subroutines exist for all keywords except the following:
__DATA__
, __END__
, and
, cmp
, default
, do, dump,
else
, elsif
, eq
, eval, for
, foreach
, format, ge
,
given
, goto, grep, gt
, if
, last, le
, local, lt
,
m, map, my, ne
, next, no, or
, our, package,
print, printf, q, qq, qr, qw, qx, redo, require,
return, s, say, sort, state, sub, tr, unless
,
until
, use, when
, while
, x
, xor
, y
Calling with
ampersand syntax and through references does not work for the following
functions, as they have special syntax that cannot always be translated
into a simple list (e.g., eof vs eof()):
chdir, chomp, chop, defined, delete, each,
eof, exec, exists, keys, lstat, pop, push,
shift, splice, split, stat, system, truncate,
unlink, unshift, values
To override a Perl built-in routine with your own version, you need to
import it at compile-time. This can be conveniently achieved with the
subs
pragma. This will affect only the package in which you've imported
the said subroutine:
To override a built-in globally (that is, in all namespaces), you need to
import your function into the CORE::GLOBAL
pseudo-namespace at compile
time:
- BEGIN {
- *CORE::GLOBAL::hex = sub {
- # ... your code here
- };
- }
The new routine will be called whenever a built-in function is called without a qualifying package:
In both cases, if you want access to the original, unaltered routine, use
the CORE::
prefix:
- print CORE::hex("0x50"),"\n"; # prints 80
This documentation provided by Tels <nospam-abuse@bloodgate.com> 2007.
CPAN - query, download and build perl modules from CPAN sites
Interactive mode:
- perl -MCPAN -e shell
--or--
- cpan
Basic commands:
- # Modules:
- cpan> install Acme::Meta # in the shell
- CPAN::Shell->install("Acme::Meta"); # in perl
- # Distributions:
- cpan> install NWCLARK/Acme-Meta-0.02.tar.gz # in the shell
- CPAN::Shell->
- install("NWCLARK/Acme-Meta-0.02.tar.gz"); # in perl
- # module objects:
- $mo = CPAN::Shell->expandany($mod);
- $mo = CPAN::Shell->expand("Module",$mod); # same thing
- # distribution objects:
- $do = CPAN::Shell->expand("Module",$mod)->distribution;
- $do = CPAN::Shell->expandany($distro); # same thing
- $do = CPAN::Shell->expand("Distribution",
- $distro); # same thing
The CPAN module automates or at least simplifies the make and install of perl modules and extensions. It includes some primitive searching capabilities and knows how to use LWP, HTTP::Tiny, Net::FTP and certain external download clients to fetch distributions from the net.
These are fetched from one or more mirrored CPAN (Comprehensive Perl Archive Network) sites and unpacked in a dedicated directory.
The CPAN module also supports named and versioned bundles of modules. Bundles simplify handling of sets of related modules. See Bundles below.
The package contains a session manager and a cache manager. The session manager keeps track of what has been fetched, built, and installed in the current session. The cache manager keeps track of the disk space occupied by the make processes and deletes excess space using a simple FIFO mechanism.
All methods provided are accessible in a programmer style and in an interactive shell style.
Enter interactive mode by running
- perl -MCPAN -e shell
or
- cpan
which puts you into a readline interface. If Term::ReadKey
and
either of Term::ReadLine::Perl
or Term::ReadLine::Gnu
are installed,
history and command completion are supported.
Once at the command line, type h
for one-page help
screen; the rest should be self-explanatory.
The function call shell
takes two optional arguments: one the
prompt, the second the default initial command line (the latter
only works if a real ReadLine interface module is installed).
The most common uses of the interactive modes are
There are corresponding one-letter commands a
, b
, d
, and m
for each of the four categories and another, i
for any of the
mentioned four. Each of the four entities is implemented as a class
with slightly differing methods for displaying an object.
Arguments to these commands are either strings exactly matching the identification string of an object, or regular expressions matched case-insensitively against various attributes of the objects. The parser only recognizes a regular expression when you enclose it with slashes.
The principle is that the number of objects found influences how an
item is displayed. If the search finds one item, the result is
displayed with the rather verbose method as_string
, but if
more than one is found, each object is displayed with the terse method
as_glimpse
.
Examples:
- cpan> m Acme::MetaSyntactic
- Module id = Acme::MetaSyntactic
- CPAN_USERID BOOK (Philippe Bruhat (BooK) <[...]>)
- CPAN_VERSION 0.99
- CPAN_FILE B/BO/BOOK/Acme-MetaSyntactic-0.99.tar.gz
- UPLOAD_DATE 2006-11-06
- MANPAGE Acme::MetaSyntactic - Themed metasyntactic variables names
- INST_FILE /usr/local/lib/perl/5.10.0/Acme/MetaSyntactic.pm
- INST_VERSION 0.99
- cpan> a BOOK
- Author id = BOOK
- EMAIL [...]
- FULLNAME Philippe Bruhat (BooK)
- cpan> d BOOK/Acme-MetaSyntactic-0.99.tar.gz
- Distribution id = B/BO/BOOK/Acme-MetaSyntactic-0.99.tar.gz
- CPAN_USERID BOOK (Philippe Bruhat (BooK) <[...]>)
- CONTAINSMODS Acme::MetaSyntactic Acme::MetaSyntactic::Alias [...]
- UPLOAD_DATE 2006-11-06
- cpan> m /lorem/
- Module = Acme::MetaSyntactic::loremipsum (BOOK/Acme-MetaSyntactic-0.99.tar.gz)
- Module Text::Lorem (ADEOLA/Text-Lorem-0.3.tar.gz)
- Module Text::Lorem::More (RKRIMEN/Text-Lorem-More-0.12.tar.gz)
- Module Text::Lorem::More::Source (RKRIMEN/Text-Lorem-More-0.12.tar.gz)
- cpan> i /berlin/
- Distribution BEATNIK/Filter-NumberLines-0.02.tar.gz
- Module = DateTime::TimeZone::Europe::Berlin (DROLSKY/DateTime-TimeZone-0.7904.tar.gz)
- Module Filter::NumberLines (BEATNIK/Filter-NumberLines-0.02.tar.gz)
- Author [...]
The examples illustrate several aspects: the first three queries target modules, authors, or distros directly and yield exactly one result. The last two use regular expressions and yield several results. The last one targets all of bundles, modules, authors, and distros simultaneously. When more than one result is available, they are printed in one-line format.
get
, make
, test
, install
, clean
modules or distributions
These commands take any number of arguments and investigate what is necessary to perform the action. Argument processing is as follows:
- known module name in format Foo/Bar.pm module
- other embedded slash distribution
- - with trailing slash dot directory
- enclosing slashes regexp
- known module name in format Foo::Bar module
If the argument is a distribution file name (recognized by embedded
slashes), it is processed. If it is a module, CPAN determines the
distribution file in which this module is included and processes that,
following any dependencies named in the module's META.yml or
Makefile.PL (this behavior is controlled by the configuration
parameter prerequisites_policy
). If an argument is enclosed in
slashes it is treated as a regular expression: it is expanded and if
the result is a single object (distribution, bundle or module), this
object is processed.
Example:
- install Dummy::Perl # installs the module
- install AUXXX/Dummy-Perl-3.14.tar.gz # installs that distribution
- install /Dummy-Perl-3.14/ # same if the regexp is unambiguous
get
downloads a distribution file and untars or unzips it, make
builds it, test
runs the test suite, and install
installs it.
Any make
or test
is run unconditionally. An
- install <distribution_file>
is also run unconditionally. But for
- install <module>
CPAN checks whether an install is needed and prints module up to date if the distribution file containing the module doesn't need updating.
CPAN also keeps track of what it has done within the current session and doesn't try to build a package a second time regardless of whether it succeeded or not. It does not repeat a test run if the test has been run successfully before. Same for install runs.
The force
pragma may precede another command (currently: get
,
make
, test
, or install
) to execute the command from scratch
and attempt to continue past certain errors. See the section below on
the force
and the fforce
pragma.
The notest
pragma skips the test part in the build
process.
Example:
- cpan> notest install Tk
A clean
command results in a
- make clean
being executed within the distribution file's working directory.
readme
, perldoc
, look
module or distribution
readme
displays the README file of the associated distribution.
Look
gets and untars (if not yet done) the distribution file,
changes to the appropriate directory and opens a subshell process in
that directory. perldoc
displays the module's pod documentation
in html or plain text format.
ls
author
ls
globbing_expression
The first form lists all distribution files in and below an author's CPAN directory as stored in the CHECKUMS files distributed on CPAN. The listing recurses into subdirectories.
The second form limits or expands the output with shell globbing as in the following examples:
- ls JV/make*
- ls GSAR/*make*
- ls */*make*
The last example is very slow and outputs extra progress indicators that break the alignment of the result.
Note that globbing only lists directories explicitly asked for, for example FOO/* will not list FOO/bar/Acme-Sthg-n.nn.tar.gz. This may be regarded as a bug that may be changed in some future version.
failed
The failed
command reports all distributions that failed on one of
make
, test
or install
for some reason in the currently
running shell session.
If the YAML
or the YAML::Syck
module is installed a record of
the internal state of all modules is written to disk after each step.
The files contain a signature of the currently running perl version
for later perusal.
If the configurations variable build_dir_reuse
is set to a true
value, then CPAN.pm reads the collected YAML files. If the stored
signature matches the currently running perl, the stored state is
loaded into memory such that persistence between sessions
is effectively established.
force
and the fforce
pragma
To speed things up in complex installation scenarios, CPAN.pm keeps
track of what it has already done and refuses to do some things a
second time. A get
, a make
, and an install
are not repeated.
A test
is repeated only if the previous test was unsuccessful. The
diagnostic message when CPAN.pm refuses to do something a second time
is one of Has already been unwrapped|made|tested successfully
or
something similar. Another situation where CPAN refuses to act is an
install
if the corresponding test
was not successful.
In all these cases, the user can override this stubborn behaviour by prepending the command with the word force, for example:
- cpan> force get Foo
- cpan> force make AUTHOR/Bar-3.14.tar.gz
- cpan> force test Baz
- cpan> force install Acme::Meta
Each forced command is executed with the corresponding part of its memory erased.
The fforce
pragma is a variant that emulates a force get
which
erases the entire memory followed by the action specified, effectively
restarting the whole get/make/test/install procedure from scratch.
Interactive sessions maintain a lockfile, by default ~/.cpan/.lock
.
Batch jobs can run without a lockfile and not disturb each other.
The shell offers to run in downgraded mode when another process is holding the lockfile. This is an experimental feature that is not yet tested very well. This second shell then does not write the history file, does not use the metadata file, and has a different prompt.
CPAN.pm installs signal handlers for SIGINT and SIGTERM. While you are
in the cpan-shell, it is intended that you can press ^C anytime and
return to the cpan-shell prompt. A SIGTERM will cause the cpan-shell
to clean up and leave the shell loop. You can emulate the effect of a
SIGTERM by sending two consecutive SIGINTs, which usually means by
pressing ^C twice.
CPAN.pm ignores SIGPIPE. If the user sets inactivity_timeout
, a
SIGALRM is used during the run of the perl Makefile.PL
or perl
Build.PL
subprocess. A SIGALRM is also used during module version
parsing, and is controlled by version_timeout
.
The commands available in the shell interface are methods in the package CPAN::Shell. If you enter the shell command, your input is split by the Text::ParseWords::shellwords() routine, which acts like most shells do. The first word is interpreted as the method to be invoked, and the rest of the words are treated as the method's arguments. Continuation lines are supported by ending a line with a literal backslash.
autobundle
writes a bundle file into the
$CPAN::Config->{cpan_home}/Bundle
directory. The file contains
a list of all modules that are both available from CPAN and currently
installed within @INC. Duplicates of each distribution are suppressed.
The name of the bundle file is based on the current date and a
counter, e.g. Bundle/Snapshot_2012_05_21_00.pm. This is installed
again by running cpan Bundle::Snapshot_2012_05_21_00
, or installing
Bundle::Snapshot_2012_05_21_00
from the CPAN shell.
Return value: path to the written file.
Note: this feature is still in alpha state and may change in future versions of CPAN.pm
This commands provides a statistical overview over recent download
activities. The data for this is collected in the YAML file
FTPstats.yml
in your cpan_home
directory. If no YAML module is
configured or YAML not installed, no stats are provided.
Install all distributions that have been tested successfully but have
not yet been installed. See also is_tested
.
List all buid directories of distributions that have been tested
successfully but have not yet been installed. See also
install_tested
.
mkmyconfig() writes your own CPAN::MyConfig file into your ~/.cpan/
directory so that you can save your own preferences instead of the
system-wide ones.
scans current perl installation for modules that have a newer version available on CPAN and provides a list of them. If called without argument, all potential upgrades are listed; if called with arguments the list is filtered to the modules and regexps given as arguments.
The listing looks something like this:
- Package namespace installed latest in CPAN file
- CPAN 1.94_64 1.9600 ANDK/CPAN-1.9600.tar.gz
- CPAN::Reporter 1.1801 1.1902 DAGOLDEN/CPAN-Reporter-1.1902.tar.gz
- YAML 0.70 0.73 INGY/YAML-0.73.tar.gz
- YAML::Syck 1.14 1.17 AVAR/YAML-Syck-1.17.tar.gz
- YAML::Tiny 1.44 1.50 ADAMK/YAML-Tiny-1.50.tar.gz
- CGI 3.43 3.55 MARKSTOS/CGI.pm-3.55.tar.gz
- Module::Build::YAML 1.40 1.41 DAGOLDEN/Module-Build-0.3800.tar.gz
- TAP::Parser::Result::YAML 3.22 3.23 ANDYA/Test-Harness-3.23.tar.gz
- YAML::XS 0.34 0.35 INGY/YAML-LibYAML-0.35.tar.gz
It suppresses duplicates in the column in CPAN file
such that
distributions with many upgradeable modules are listed only once.
Note that the list is not sorted.
The recent
command downloads a list of recent uploads to CPAN and
displays them slowly. While the command is running, a $SIG{INT}
exits the loop after displaying the current item.
Note: This command requires XML::LibXML installed.
Note: This whole command currently is just a hack and will probably change in future versions of CPAN.pm, but the general approach will likely remain.
Note: See also smoke
recompile() is a special command that takes no argument and
runs the make/test/install cycle with brute force over all installed
dynamically loadable extensions (a.k.a. XS modules) with 'force' in
effect. The primary purpose of this command is to finish a network
installation. Imagine you have a common source tree for two different
architectures. You decide to do a completely independent fresh
installation. You start on one architecture with the help of a Bundle
file produced earlier. CPAN installs the whole Bundle for you, but
when you try to repeat the job on the second architecture, CPAN
responds with a "Foo up to date"
message for all modules. So you
invoke CPAN's recompile on the second architecture and you're done.
Another popular use for recompile
is to act as a rescue in case your
perl breaks binary compatibility. If one of the modules that CPAN uses
is in turn depending on binary compatibility (so you cannot run CPAN
commands), then you should try the CPAN::Nox module for recovery.
The report
command temporarily turns on the test_report
config
variable, then runs the force test
command with the given
arguments. The force
pragma reruns the tests and repeats
every step that might have failed before.
*** WARNING: this command downloads and executes software from CPAN to your computer of completely unknown status. You should never do this with your normal account and better have a dedicated well separated and secured machine to do this. ***
The smoke
command takes the list of recent uploads to CPAN as
provided by the recent
command and tests them all. While the
command is running $SIG{INT} is defined to mean that the current item
shall be skipped.
Note: This whole command currently is just a hack and will probably change in future versions of CPAN.pm, but the general approach will likely remain.
Note: See also recent
The upgrade
command first runs an r
command with the given
arguments and then installs the newest versions of all modules that
were listed by that.
CPAN::*
Classes: Author, Bundle, Module, DistributionAlthough it may be considered internal, the class hierarchy does matter for both users and programmer. CPAN.pm deals with the four classes mentioned above, and those classes all share a set of methods. Classical single polymorphism is in effect. A metaclass object registers all objects of all kinds and indexes them with a string. The strings referencing objects have a separated namespace (well, not completely separated):
- Namespace Class
- words containing a "/" (slash) Distribution
- words starting with Bundle:: Bundle
- everything else Module or Author
Modules know their associated Distribution objects. They always refer to the most recent official release. Developers may mark their releases as unstable development versions (by inserting an underbar into the module version number which will also be reflected in the distribution name when you run 'make dist'), so the really hottest and newest distribution is not always the default. If a module Foo circulates on CPAN in both version 1.23 and 1.23_90, CPAN.pm offers a convenient way to install version 1.23 by saying
- install Foo
This would install the complete distribution file (say BAR/Foo-1.23.tar.gz) with all accompanying material. But if you would like to install version 1.23_90, you need to know where the distribution file resides on CPAN relative to the authors/id/ directory. If the author is BAR, this might be BAR/Foo-1.23_90.tar.gz; so you would have to say
- install BAR/Foo-1.23_90.tar.gz
The first example will be driven by an object of the class CPAN::Module, the second by an object of class CPAN::Distribution.
Note: this feature is still in alpha state and may change in future versions of CPAN.pm
Distribution objects are normally distributions from the CPAN, but
there is a slightly degenerate case for Distribution objects, too, of
projects held on the local disk. These distribution objects have the
same name as the local directory and end with a dot. A dot by itself
is also allowed for the current directory at the time CPAN.pm was
used. All actions such as make
, test
, and install
are applied
directly to that directory. This gives the command cpan .
an
interesting touch: while the normal mantra of installing a CPAN module
without CPAN.pm is one of
- perl Makefile.PL perl Build.PL
- ( go and get prerequisites )
- make ./Build
- make test ./Build test
- make install ./Build install
the command cpan .
does all of this at once. It figures out which
of the two mantras is appropriate, fetches and installs all
prerequisites, takes care of them recursively, and finally finishes the
installation of the module in the current directory, be it a CPAN
module or not.
The typical usage case is for private modules or working copies of projects from remote repositories on the local disk.
The usual shell redirection symbols | and > are recognized
by the cpan shell only when surrounded by whitespace. So piping to
pager or redirecting output into a file works somewhat as in a normal
shell, with the stipulation that you must type extra spaces.
When the CPAN module is used for the first time, a configuration
dialogue tries to determine a couple of site specific options. The
result of the dialog is stored in a hash reference $CPAN::Config
in a file CPAN/Config.pm.
Default values defined in the CPAN/Config.pm file can be
overridden in a user specific file: CPAN/MyConfig.pm. Such a file is
best placed in $HOME/.cpan/CPAN/MyConfig.pm, because $HOME/.cpan is
added to the search path of the CPAN module before the use() or
require() statements. The mkmyconfig command writes this file for you.
The o conf
command has various bells and whistles:
If you have a ReadLine module installed, you can hit TAB at any point
of the commandline and o conf
will offer you completion for the
built-in subcommands and/or config variable names.
Displays a short help
Displays the current value(s) for this config variable. Without KEY, displays all subcommands and config variables.
Example:
- o conf shell
If KEY starts and ends with a slash, the string in between is treated as a regular expression and only keys matching this regexp are displayed
Example:
- o conf /color/
Sets the config variable KEY to VALUE. The empty string can be
specified as usual in shells, with ''
or ""
Example:
- o conf wget /usr/bin/wget
If a config variable name ends with list
, it is a list. o conf
KEY shift
removes the first element of the list, o conf KEY pop
removes the last element of the list. o conf KEYS unshift LIST
prepends a list of values to the list, o conf KEYS push LIST
appends a list of valued to the list.
Likewise, o conf KEY splice LIST
passes the LIST to the corresponding
splice command.
Finally, any other list of arguments is taken as a new list value for the KEY variable discarding the previous value.
Examples:
- o conf urllist unshift http://cpan.dev.local/CPAN
- o conf urllist splice 3 1
- o conf urllist http://cpan1.local http://cpan2.local ftp://ftp.perl.org
Reverts all config variables to the state in the saved config file.
Saves all config variables to the current config file (CPAN/Config.pm or CPAN/MyConfig.pm that was loaded at start).
The configuration dialog can be started any time later again by
issuing the command o conf init
in the CPAN shell. A subset of
the configuration dialog can be run by issuing o conf init WORD
where WORD is any valid config variable or a regular expression.
The following keys in the hash reference $CPAN::Config are currently defined:
- applypatch path to external prg
- auto_commit commit all changes to config variables to disk
- build_cache size of cache for directories to build modules
- build_dir locally accessible directory to build modules
- build_dir_reuse boolean if distros in build_dir are persistent
- build_requires_install_policy
- to install or not to install when a module is
- only needed for building. yes|no|ask/yes|ask/no
- bzip2 path to external prg
- cache_metadata use serializer to cache metadata
- check_sigs if signatures should be verified
- colorize_debug Term::ANSIColor attributes for debugging output
- colorize_output boolean if Term::ANSIColor should colorize output
- colorize_print Term::ANSIColor attributes for normal output
- colorize_warn Term::ANSIColor attributes for warnings
- commandnumber_in_prompt
- boolean if you want to see current command number
- commands_quote preferred character to use for quoting external
- commands when running them. Defaults to double
- quote on Windows, single tick everywhere else;
- can be set to space to disable quoting
- connect_to_internet_ok
- whether to ask if opening a connection is ok before
- urllist is specified
- cpan_home local directory reserved for this package
- curl path to external prg
- dontload_hash DEPRECATED
- dontload_list arrayref: modules in the list will not be
- loaded by the CPAN::has_inst() routine
- ftp path to external prg
- ftp_passive if set, the environment variable FTP_PASSIVE is set
- for downloads
- ftp_proxy proxy host for ftp requests
- ftpstats_period max number of days to keep download statistics
- ftpstats_size max number of items to keep in the download statistics
- getcwd see below
- gpg path to external prg
- gzip location of external program gzip
- halt_on_failure stop processing after the first failure of queued
- items or dependencies
- histfile file to maintain history between sessions
- histsize maximum number of lines to keep in histfile
- http_proxy proxy host for http requests
- inactivity_timeout breaks interactive Makefile.PLs or Build.PLs
- after this many seconds inactivity. Set to 0 to
- disable timeouts.
- index_expire refetch index files after this many days
- inhibit_startup_message
- if true, suppress the startup message
- keep_source_where directory in which to keep the source (if we do)
- load_module_verbosity
- report loading of optional modules used by CPAN.pm
- lynx path to external prg
- make location of external make program
- make_arg arguments that should always be passed to 'make'
- make_install_make_command
- the make command for running 'make install', for
- example 'sudo make'
- make_install_arg same as make_arg for 'make install'
- makepl_arg arguments passed to 'perl Makefile.PL'
- mbuild_arg arguments passed to './Build'
- mbuild_install_arg arguments passed to './Build install'
- mbuild_install_build_command
- command to use instead of './Build' when we are
- in the install stage, for example 'sudo ./Build'
- mbuildpl_arg arguments passed to 'perl Build.PL'
- ncftp path to external prg
- ncftpget path to external prg
- no_proxy don't proxy to these hosts/domains (comma separated list)
- pager location of external program more (or any pager)
- password your password if you CPAN server wants one
- patch path to external prg
- patches_dir local directory containing patch files
- perl5lib_verbosity verbosity level for PERL5LIB additions
- prefer_external_tar
- per default all untar operations are done with
- Archive::Tar; by setting this variable to true
- the external tar command is used if available
- prefer_installer legal values are MB and EUMM: if a module comes
- with both a Makefile.PL and a Build.PL, use the
- former (EUMM) or the latter (MB); if the module
- comes with only one of the two, that one will be
- used no matter the setting
- prerequisites_policy
- what to do if you are missing module prerequisites
- ('follow' automatically, 'ask' me, or 'ignore')
- For 'follow', also sets PERL_AUTOINSTALL and
- PERL_EXTUTILS_AUTOINSTALL for "--defaultdeps" if
- not already set
- prefs_dir local directory to store per-distro build options
- proxy_user username for accessing an authenticating proxy
- proxy_pass password for accessing an authenticating proxy
- randomize_urllist add some randomness to the sequence of the urllist
- scan_cache controls scanning of cache ('atstart', 'atexit' or 'never')
- shell your favorite shell
- show_unparsable_versions
- boolean if r command tells which modules are versionless
- show_upload_date boolean if commands should try to determine upload date
- show_zero_versions boolean if r command tells for which modules $version==0
- tar location of external program tar
- tar_verbosity verbosity level for the tar command
- term_is_latin deprecated: if true Unicode is translated to ISO-8859-1
- (and nonsense for characters outside latin range)
- term_ornaments boolean to turn ReadLine ornamenting on/off
- test_report email test reports (if CPAN::Reporter is installed)
- trust_test_report_history
- skip testing when previously tested ok (according to
- CPAN::Reporter history)
- unzip location of external program unzip
- urllist arrayref to nearby CPAN sites (or equivalent locations)
- use_sqlite use CPAN::SQLite for metadata storage (fast and lean)
- username your username if you CPAN server wants one
- version_timeout stops version parsing after this many seconds.
- Default is 15 secs. Set to 0 to disable.
- wait_list arrayref to a wait server to try (See CPAN::WAIT)
- wget path to external prg
- yaml_load_code enable YAML code deserialisation via CPAN::DeferredCode
- yaml_module which module to use to read/write YAML files
You can set and query each of these options interactively in the cpan
shell with the o conf
or the o conf init
command as specified below.
o conf <scalar option>
prints the current value of the scalar option
o conf <scalar option> <value>
Sets the value of the scalar option to value
o conf <list option>
prints the current value of the list option in MakeMaker's neatvalue format.
o conf <list option> [shift|pop]
shifts or pops the array in the list option variable
o conf <list option> [unshift|push|splice] <list>
works like the corresponding perl commands.
Runs an interactive configuration dialog for matching variables. Without argument runs the dialog over all supported config variables. To specify a MATCH the argument must be enclosed by slashes.
Examples:
- o conf init ftp_passive ftp_proxy
- o conf init /color/
Note: this method of setting config variables often provides more explanation about the functioning of a variable than the manpage.
CPAN.pm changes the current working directory often and needs to determine its own current working directory. By default it uses Cwd::cwd, but if for some reason this doesn't work on your system, configure alternatives according to the following table:
Calls Cwd::cwd
Calls Cwd::getcwd
Calls Cwd::fastcwd
Calls the external command cwd.
urllist parameters are URLs according to RFC 1738. We do a little
guessing if your URL is not compliant, but if you have problems with
file
URLs, please try the correct format. Either:
- file://localhost/whatever/ftp/pub/CPAN/
or
- file:///home/ftp/pub/CPAN/
The urllist
parameter of the configuration table contains a list of
URLs used for downloading. If the list contains any
file
URLs, CPAN always tries there first. This
feature is disabled for index files. So the recommendation for the
owner of a CD-ROM with CPAN contents is: include your local, possibly
outdated CD-ROM as a file
URL at the end of urllist, e.g.
- o conf urllist push file://localhost/CDROM/CPAN
CPAN.pm will then fetch the index files from one of the CPAN sites that come at the beginning of urllist. It will later check for each module to see whether there is a local copy of the most recent version.
Another peculiarity of urllist is that the site that we could successfully fetch the last file from automatically gets a preference token and is tried as the first site for the next request. So if you add a new site at runtime it may happen that the previously preferred site will be tried another time. This means that if you want to disallow a site for the next transfer, it must be explicitly removed from urllist.
If you have YAML.pm (or some other YAML module configured in
yaml_module
) installed, CPAN.pm collects a few statistical data
about recent downloads. You can view the statistics with the hosts
command or inspect them directly by looking into the FTPstats.yml
file in your cpan_home
directory.
To get some interesting statistics, it is recommended that
randomize_urllist
be set; this introduces some amount of
randomness into the URL selection.
requires
and build_requires
dependency declarationsSince CPAN.pm version 1.88_51 modules declared as build_requires
by
a distribution are treated differently depending on the config
variable build_requires_install_policy
. By setting
build_requires_install_policy
to no, such a module is not
installed. It is only built and tested, and then kept in the list of
tested but uninstalled modules. As such, it is available during the
build of the dependent module by integrating the path to the
blib/arch
and blib/lib
directories in the environment variable
PERL5LIB. If build_requires_install_policy
is set ti yes
, then
both modules declared as requires
and those declared as
build_requires
are treated alike. By setting to ask/yes
or
ask/no
, CPAN.pm asks the user and sets the default accordingly.
(Note: This feature has been introduced in CPAN.pm 1.8854 and is still considered beta quality)
Distributions on CPAN usually behave according to what we call the CPAN mantra. Or since the advent of Module::Build we should talk about two mantras:
- perl Makefile.PL perl Build.PL
- make ./Build
- make test ./Build test
- make install ./Build install
But some modules cannot be built with this mantra. They try to get some extra data from the user via the environment, extra arguments, or interactively--thus disturbing the installation of large bundles like Phalanx100 or modules with many dependencies like Plagger.
The distroprefs system of CPAN.pm
addresses this problem by
allowing the user to specify extra informations and recipes in YAML
files to either
pass additional arguments to one of the four commands,
set environment variables
instantiate an Expect object that reads from the console, waits for some regular expressions and enters some answers
temporarily override assorted CPAN.pm
configuration variables
specify dependencies the original maintainer forgot
disable the installation of an object altogether
See the YAML and Data::Dumper files that come with the CPAN.pm
distribution in the distroprefs/
directory for examples.
The YAML files themselves must have the .yml extension; all other
files are ignored (for two exceptions see Fallback Data::Dumper and
Storable below). The containing directory can be specified in
CPAN.pm
in the prefs_dir
config variable. Try o conf init
prefs_dir
in the CPAN shell to set and activate the distroprefs
system.
Every YAML file may contain arbitrary documents according to the YAML specification, and every document is treated as an entity that can specify the treatment of a single distribution.
Filenames can be picked arbitrarily; CPAN.pm
always reads
all files (in alphabetical order) and takes the key match
(see
below in Language Specs) as a hashref containing match criteria
that determine if the current distribution matches the YAML document
or not.
If neither your configured yaml_module
nor YAML.pm is installed,
CPAN.pm falls back to using Data::Dumper and Storable and looks for
files with the extensions .dd or .st in the prefs_dir
directory. These files are expected to contain one or more hashrefs.
For Data::Dumper generated files, this is expected to be done with by
defining $VAR1
, $VAR2
, etc. The YAML shell would produce these
with the command
- ysh < somefile.yml > somefile.dd
For Storable files the rule is that they must be constructed such that
Storable::retrieve(file)
returns an array reference and the array
elements represent one distropref object each. The conversion from
YAML would look like so:
- perl -MYAML=LoadFile -MStorable=nstore -e '
- @y=LoadFile(shift);
- nstore(\@y, shift)' somefile.yml somefile.st
In bootstrapping situations it is usually sufficient to translate only
a few YAML files to Data::Dumper for crucial modules like
YAML::Syck
, YAML.pm
and Expect.pm
. If you prefer Storable
over Data::Dumper, remember to pull out a Storable version that writes
an older format than all the other Storable versions that will need to
read them.
The following example contains all supported keywords and structures
with the exception of eexpect
which can be used instead of
expect
.
- ---
- comment: "Demo"
- match:
- module: "Dancing::Queen"
- distribution: "^CHACHACHA/Dancing-"
- not_distribution: "\.zip$"
- perl: "/usr/local/cariba-perl/bin/perl"
- perlconfig:
- archname: "freebsd"
- not_cc: "gcc"
- env:
- DANCING_FLOOR: "Shubiduh"
- disabled: 1
- cpanconfig:
- make: gmake
- pl:
- args:
- - "--somearg=specialcase"
- env: {}
- expect:
- - "Which is your favorite fruit"
- - "apple\n"
- make:
- args:
- - all
- - extra-all
- env: {}
- expect: []
- commandline: "echo SKIPPING make"
- test:
- args: []
- env: {}
- expect: []
- install:
- args: []
- env:
- WANT_TO_INSTALL: YES
- expect:
- - "Do you really want to install"
- - "y\n"
- patches:
- - "ABCDE/Fedcba-3.14-ABCDE-01.patch"
- depends:
- configure_requires:
- LWP: 5.8
- build_requires:
- Test::Exception: 0.25
- requires:
- Spiffy: 0.30
Every YAML document represents a single hash reference. The valid keys in this hash are as follows:
A comment
Temporarily override assorted CPAN.pm
configuration variables.
Supported are: build_requires_install_policy
, check_sigs
,
make
, make_install_make_command
, prefer_installer
,
test_report
. Please report as a bug when you need another one
supported.
All three types, namely configure_requires
, build_requires
, and
requires
are supported in the way specified in the META.yml
specification. The current implementation merges the specified
dependencies with those declared by the package maintainer. In a
future implementation this may be changed to override the original
declaration.
Specifies that this distribution shall not be processed at all.
Experimental implementation to deal with optional_features from
META.yml. Still needs coordination with installer software and
currently works only for META.yml declaring dynamic_config=0
. Use
with caution.
The canonical name of a delegate distribution to install instead. Useful when a new version, although it tests OK itself, breaks something else or a developer release or a fork is already uploaded that is better than the last released version.
Processing instructions for the make install
or ./Build install
phase of the CPAN mantra. See below under Processing Instructions.
Processing instructions for the make
or ./Build phase of the
CPAN mantra. See below under Processing Instructions.
A hashref with one or more of the keys distribution
, module
,
perl
, perlconfig
, and env
that specify whether a document is
targeted at a specific CPAN distribution or installation.
Keys prefixed with not_
negates the corresponding match.
The corresponding values are interpreted as regular expressions. The
distribution
related one will be matched against the canonical
distribution name, e.g. "AUTHOR/Foo-Bar-3.14.tar.gz".
The module
related one will be matched against all modules
contained in the distribution until one module matches.
The perl
related one will be matched against $^X
(but with the
absolute path).
The value associated with perlconfig
is itself a hashref that is
matched against corresponding values in the %Config::Config
hash
living in the Config.pm
module.
Keys prefixed with not_
negates the corresponding match.
The value associated with env
is itself a hashref that is
matched against corresponding values in the %ENV
hash.
Keys prefixed with not_
negates the corresponding match.
If more than one restriction of module
, distribution
, etc. is
specified, the results of the separately computed match values must
all match. If so, the hashref represented by the
YAML document is returned as the preference structure for the current
distribution.
An array of patches on CPAN or on the local disk to be applied in
order via an external patch program. If the value for the -p
parameter is 0
or 1
is determined by reading the patch
beforehand. The path to each patch is either an absolute path on the
local filesystem or relative to a patch directory specified in the
patches_dir
configuration variable or in the format of a canonical
distro name. For examples please consult the distroprefs/ directory in
the CPAN.pm distribution (these examples are not installed by
default).
Note: if the applypatch
program is installed and CPAN::Config
knows about it and a patch is written by the makepatch
program,
then CPAN.pm
lets applypatch
apply the patch. Both makepatch
and applypatch
are available from CPAN in the JV/makepatch-*
distribution.
Processing instructions for the perl Makefile.PL
or perl
Build.PL
phase of the CPAN mantra. See below under Processing
Instructions.
Processing instructions for the make test
or ./Build test phase
of the CPAN mantra. See below under Processing Instructions.
Arguments to be added to the command line
A full commandline to run via system().
During execution, the environment variable PERL is set
to $^X (but with an absolute path). If commandline
is specified,
args
is not used.
Extended expect
. This is a hash reference with four allowed keys,
mode
, timeout
, reuse
, and talk
.
You must install the Expect
module to use eexpect
. CPAN.pm
does not install it for you.
mode
may have the values deterministic
for the case where all
questions come in the order written down and anyorder
for the case
where the questions may come in any order. The default mode is
deterministic
.
timeout
denotes a timeout in seconds. Floating-point timeouts are
OK. With mode=deterministic
, the timeout denotes the
timeout per question; with mode=anyorder
it denotes the
timeout per byte received from the stream or questions.
talk
is a reference to an array that contains alternating questions
and answers. Questions are regular expressions and answers are literal
strings. The Expect module watches the stream from the
execution of the external program (perl Makefile.PL
, perl
Build.PL
, make
, etc.).
For mode=deterministic
, the CPAN.pm injects the
corresponding answer as soon as the stream matches the regular expression.
For mode=anyorder
CPAN.pm answers a question as soon
as the timeout is reached for the next byte in the input stream. In
this mode you can use the reuse
parameter to decide what will
happen with a question-answer pair after it has been used. In the
default case (reuse=0) it is removed from the array, avoiding being
used again accidentally. If you want to answer the
question Do you really want to do that
several times, then it must
be included in the array at least as often as you want this answer to
be given. Setting the parameter reuse
to 1 makes this repetition
unnecessary.
Environment variables to be set during the command
You must install the Expect
module to use expect
. CPAN.pm
does not install it for you.
expect: <array>
is a short notation for this eexpect
:
- eexpect:
- mode: deterministic
- timeout: 15
- talk: <array>
Kwalify
If you have the Kwalify
module installed (which is part of the
Bundle::CPANxxl), then all your distroprefs files are checked for
syntactic correctness.
CPAN.pm
comes with a collection of example YAML files. Note that these
are really just examples and should not be used without care because
they cannot fit everybody's purpose. After all, the authors of the
packages that ask questions had a need to ask, so you should watch
their questions and adjust the examples to your environment and your
needs. You have been warned:-)
If you do not enter the shell, shell commands are
available both as methods (CPAN::Shell->install(...)
) and as
functions in the calling package (install(...)
). Before calling low-level
commands, it makes sense to initialize components of CPAN you need, e.g.:
- CPAN::HandleConfig->load;
- CPAN::Shell::setup_output;
- CPAN::Index->reload;
High-level commands do such initializations automatically.
There's currently only one class that has a stable interface - CPAN::Shell. All commands that are available in the CPAN shell are methods of the class CPAN::Shell. The arguments on the commandline are passed as arguments to the method.
So if you take for example the shell command
- notest install A B C
the actually executed command is
- CPAN::Shell->notest("install","A","B","C");
Each of the commands that produce listings of modules (r
,
autobundle
, u
) also return a list of the IDs of all modules
within the list.
The IDs of all objects available within a program are strings that can
be expanded to the corresponding real objects with the
CPAN::Shell->expand("Module",@things)
method. Expand returns a
list of CPAN::Module objects according to the @things
arguments
given. In scalar context, it returns only the first element of the
list.
Like expand, but returns objects of the appropriate type, i.e. CPAN::Bundle objects for bundles, CPAN::Module objects for modules, and CPAN::Distribution objects for distributions. Note: it does not expand to CPAN::Author objects.
This enables the programmer to do operations that combine functionalities that are available in the shell.
- # install everything that is outdated on my disk:
- perl -MCPAN -e 'CPAN::Shell->install(CPAN::Shell->r)'
- # install my favorite programs if necessary:
- for $mod (qw(Net::FTP Digest::SHA Data::Dumper)) {
- CPAN::Shell->install($mod);
- }
- # list all modules on my disk that have no VERSION number
- for $mod (CPAN::Shell->expand("Module","/./")) {
- next unless $mod->inst_file;
- # MakeMaker convention for undefined $VERSION:
- next unless $mod->inst_version eq "undef";
- print "No VERSION in ", $mod->id, "\n";
- }
- # find out which distribution on CPAN contains a module:
- print CPAN::Shell->expand("Module","Apache::Constants")->cpan_file
Or if you want to schedule a cron job to watch CPAN, you could list all modules that need updating. First a quick and dirty way:
- perl -e 'use CPAN; CPAN::Shell->r;'
If you don't want any output should all modules be
up to date, parse the output of above command for the regular
expression /modules are up to date/
and decide to mail the output
only if it doesn't match.
If you prefer to do it more in a programmerish style in one single process, something like this may better suit you:
If that gives too much output every day, you may want to watch only for three modules. You can write
- for $mod (CPAN::Shell->expand("Module","/Apache|LWP|CGI/")) {
as the first line instead. Or you can combine some of the above tricks:
- # watch only for a new mod_perl module
- $mod = CPAN::Shell->expand("Module","mod_perl");
- exit if $mod->uptodate;
- # new mod_perl arrived, let me know all update recommendations
- CPAN::Shell->r;
Returns a one-line description of the author
Returns a multi-line description of the author
Returns the author's email address
Returns the author's name
An alias for fullname
Returns a one-line description of the bundle
Returns a multi-line description of the bundle
Recursively runs the clean
method on all items contained in the bundle.
Returns a list of objects' IDs contained in a bundle. The associated objects may be bundles, modules or distributions.
Forces CPAN to perform a task that it normally would have refused to
do. Force takes as arguments a method name to be called and any number
of additional arguments that should be passed to the called method.
The internals of the object get the needed changes so that CPAN.pm
does not refuse to take the action. The force
is passed recursively
to all contained objects. See also the section above on the force
and the fforce
pragma.
Recursively runs the get
method on all items contained in the bundle
Returns the highest installed version of the bundle in either @INC or
$CPAN::Config->{cpan_home}
. Note that this is different from
CPAN::Module::inst_file.
Like CPAN::Bundle::inst_file, but returns the $VERSION
Returns 1 if the bundle itself and all its members are up-to-date.
Recursively runs the install
method on all items contained in the bundle
Recursively runs the make
method on all items contained in the bundle
Recursively runs the readme
method on all items contained in the bundle
Recursively runs the test
method on all items contained in the bundle
Returns a one-line description of the distribution
Returns a multi-line description of the distribution
Returns the CPAN::Author object of the maintainer who uploaded this distribution
Returns a string of the form "AUTHORID/TARBALL", where AUTHORID is the author's PAUSE ID and TARBALL is the distribution filename.
Returns the distribution filename without any archive suffix. E.g "Foo-Bar-0.01"
Changes to the directory where the distribution has been unpacked and
runs make clean
there.
Returns a list of IDs of modules contained in a distribution file. Works only for distributions listed in the 02packages.details.txt.gz file. This typically means that just most recent version of a distribution is covered.
Changes to the directory where the distribution has been unpacked and runs something like
- cvs -d $cvs_root import -m $cvs_log $cvs_dir $userid v$version
there.
Returns the directory into which this distribution has been unpacked.
Forces CPAN to perform a task that it normally would have refused to
do. Force takes as arguments a method name to be called and any number
of additional arguments that should be passed to the called method.
The internals of the object get the needed changes so that CPAN.pm
does not refuse to take the action. See also the section above on the
force
and the fforce
pragma.
Downloads the distribution from CPAN and unpacks it. Does nothing if the distribution has already been downloaded and unpacked within the current session.
Changes to the directory where the distribution has been unpacked and
runs the external command make install
there. If make
has not
yet been run, it will be run first. A make test
is issued in
any case and if this fails, the install is cancelled. The
cancellation can be avoided by letting force
run the install
for
you.
This install method only has the power to install the distribution if there are no dependencies in the way. To install an object along with all its dependencies, use CPAN::Shell->install.
Note that install() gives no meaningful return value. See uptodate().
Returns 1 if this distribution file seems to be a perl distribution. Normally this is derived from the file name only, but the index from CPAN can contain a hint to achieve a return value of true for other filenames too.
Changes to the directory where the distribution has been unpacked and opens a subshell there. Exiting the subshell returns.
First runs the get
method to make sure the distribution is
downloaded and unpacked. Changes to the directory where the
distribution has been unpacked and runs the external commands perl
Makefile.PL
or perl Build.PL
and make
there.
Downloads the pod documentation of the file associated with a
distribution (in HTML format) and runs it through the external
command lynx specified in $CPAN::Config->{lynx}
. If lynx
isn't available, it converts it to plain text with the external
command html2text and runs it through the pager specified
in $CPAN::Config->{pager}
.
Returns the hash reference from the first matching YAML file that the
user has deposited in the prefs_dir/
directory. The first
succeeding match wins. The files in the prefs_dir/
are processed
alphabetically, and the canonical distro name (e.g.
AUTHOR/Foo-Bar-3.14.tar.gz) is matched against the regular expressions
stored in the $root->{match}{distribution} attribute value.
Additionally all module names contained in a distribution are matched
against the regular expressions in the $root->{match}{module} attribute
value. The two match values are ANDed together. Each of the two
attributes are optional.
Returns the hash reference that has been announced by a distribution
as the requires
and build_requires
elements. These can be
declared either by the META.yml
(if authoritative) or can be
deposited after the run of Build.PL
in the file ./_build/prereqs
or after the run of Makfile.PL
written as the PREREQ_PM
hash in
a comment in the produced Makefile
. Note: this method only works
after an attempt has been made to make
the distribution. Returns
undef otherwise.
Downloads the README file associated with a distribution and runs it
through the pager specified in $CPAN::Config->{pager}
.
Downloads report data for this distribution from www.cpantesters.org and displays a subset of them.
Returns the content of the META.yml of this distro as a hashref. Note:
works only after an attempt has been made to make
the distribution.
Returns undef otherwise. Also returns undef if the content of META.yml
is not authoritative. (The rules about what exactly makes the content
authoritative are still in flux.)
Changes to the directory where the distribution has been unpacked and
runs make test
there.
Returns 1 if all the modules contained in the distribution are up-to-date. Relies on containsmods.
Forces a reload of all indices.
Reloads all indices if they have not been read for more than
$CPAN::Config->{index_expire}
days.
CPAN::Author, CPAN::Bundle, CPAN::Module, and CPAN::Distribution inherit this method. It prints the data structure associated with an object. Useful for debugging. Note: the data structure is considered internal and thus subject to change without notice.
Returns a one-line description of the module in four columns: The
first column contains the word Module
, the second column consists
of one character: an equals sign if this module is already installed
and up-to-date, a less-than sign if this module is installed but can be
upgraded, and a space if the module is not installed. The third column
is the name of the module and the fourth column gives maintainer or
distribution information.
Returns a multi-line description of the module
Runs a clean on the distribution associated with this module.
Returns the filename on CPAN that is associated with the module.
Returns the latest version of this module available on CPAN.
Runs a cvs_import on the distribution associated with this module.
Returns a 44 character description of this module. Only available for modules listed in The Module List (CPAN/modules/00modlist.long.html or 00modlist.long.txt.gz)
Returns the CPAN::Distribution object that contains the current version of this module.
Returns a hash reference. The keys of the hash are the letters D
,
S
, L
, I
, and <P>, for development status, support level,
language, interface and public licence respectively. The data for the
DSLIP status are collected by pause.perl.org when authors register
their namespaces. The values of the 5 hash elements are one-character
words whose meaning is described in the table below. There are also 5
hash elements DV
, SV
, LV
, IV
, and <PV> that carry a more
verbose value of the 5 status variables.
Where the 'DSLIP' characters have the following meanings:
- D - Development Stage (Note: *NO IMPLIED TIMESCALES*):
- i - Idea, listed to gain consensus or as a placeholder
- c - under construction but pre-alpha (not yet released)
- a/b - Alpha/Beta testing
- R - Released
- M - Mature (no rigorous definition)
- S - Standard, supplied with Perl 5
- S - Support Level:
- m - Mailing-list
- d - Developer
- u - Usenet newsgroup comp.lang.perl.modules
- n - None known, try comp.lang.perl.modules
- a - abandoned; volunteers welcome to take over maintenance
- L - Language Used:
- p - Perl-only, no compiler needed, should be platform independent
- c - C and perl, a C compiler will be needed
- h - Hybrid, written in perl with optional C code, no compiler needed
- + - C++ and perl, a C++ compiler will be needed
- o - perl and another language other than C or C++
- I - Interface Style
- f - plain Functions, no references used
- h - hybrid, object and function interfaces available
- n - no interface at all (huh?)
- r - some use of unblessed References or ties
- O - Object oriented using blessed references and/or inheritance
- P - Public License
- p - Standard-Perl: user may choose between GPL and Artistic
- g - GPL: GNU General Public License
- l - LGPL: "GNU Lesser General Public License" (previously known as
- "GNU Library General Public License")
- b - BSD: The BSD License
- a - Artistic license alone
- 2 - Artistic license 2.0 or later
- o - open source: approved by www.opensource.org
- d - allows distribution without restrictions
- r - restricted distribution
- n - no license at all
Forces CPAN to perform a task it would normally refuse to
do. Force takes as arguments a method name to be invoked and any number
of additional arguments to pass that method.
The internals of the object get the needed changes so that CPAN.pm
does not refuse to take the action. See also the section above on the
force
and the fforce
pragma.
Runs a get on the distribution associated with this module.
Returns the filename of the module found in @INC. The first file found is reported, just as perl itself stops searching @INC once it finds a module.
Returns the filename of the module found in PERL5LIB or @INC. The
first file found is reported. The advantage of this method over
inst_file
is that modules that have been tested but not yet
installed are included because PERL5LIB keeps track of tested modules.
Returns the version number of the installed module in readable format.
Returns the version number of the available module in readable format.
Runs an install
on the distribution associated with this module.
Changes to the directory where the distribution associated with this module has been unpacked and opens a subshell there. Exiting the subshell returns.
Runs a make
on the distribution associated with this module.
If module is installed, peeks into the module's manpage, reads the headline, and returns it. Moreover, if the module has been downloaded within this session, does the equivalent on the downloaded module even if it hasn't been installed yet.
Runs a perldoc
on this module.
Runs a readme
on the distribution associated with this module.
Calls the reports() method on the associated distribution object.
Runs a test
on the distribution associated with this module.
Returns 1 if the module is installed and up-to-date.
Returns the author's ID of the module.
Currently the cache manager only keeps track of the build directory
($CPAN::Config->{build_dir}). It is a simple FIFO mechanism that
deletes complete directories below build_dir
as soon as the size of
all directories there gets bigger than $CPAN::Config->{build_cache}
(in MB). The contents of this cache may be used for later
re-installations that you intend to do manually, but will never be
trusted by CPAN itself. This is due to the fact that the user might
use these directories for building modules on different architectures.
There is another directory ($CPAN::Config->{keep_source_where}) where the original distribution files are kept. This directory is not covered by the cache manager and must be controlled by the user. If you choose to have the same directory as build_dir and as keep_source_where directory, then your sources will be deleted with the same fifo mechanism.
A bundle is just a perl module in the namespace Bundle:: that does not define any functions or methods. It usually only contains documentation.
It starts like a perl module with a package declaration and a $VERSION variable. After that the pod section looks like any other pod with the only difference being that one special pod section exists starting with (verbatim):
- =head1 CONTENTS
In this pod section each line obeys the format
- Module_Name [Version_String] [- optional text]
The only required part is the first field, the name of a module (e.g. Foo::Bar, i.e. not the name of the distribution file). The rest of the line is optional. The comment part is delimited by a dash just as in the man page header.
The distribution of a bundle should follow the same convention as other distributions.
Bundles are treated specially in the CPAN package. If you say 'install Bundle::Tkkit' (assuming such a bundle exists), CPAN will install all the modules in the CONTENTS section of the pod. You can install your own Bundles locally by placing a conformant Bundle file somewhere into your @INC path. The autobundle() command which is available in the shell interface does that for you by including all currently installed modules in a snapshot bundle file.
The CPAN program is trying to depend on as little as possible so the user can use it in hostile environment. It works better the more goodies the environment provides. For example if you try in the CPAN shell
- install Bundle::CPAN
or
- install Bundle::CPANxxl
you will find the shell more convenient than the bare shell before.
If you have a local mirror of CPAN and can access all files with
"file:" URLs, then you only need a perl later than perl5.003 to run
this module. Otherwise Net::FTP is strongly recommended. LWP may be
required for non-UNIX systems, or if your nearest CPAN site is
associated with a URL that is not ftp:
.
If you have neither Net::FTP nor LWP, there is a fallback mechanism implemented for an external ftp command or for an external lynx command.
This module presumes that all packages on CPAN
declare their $VERSION variable in an easy to parse manner. This prerequisite can hardly be relaxed because it consumes far too much memory to load all packages into the running program just to determine the $VERSION variable. Currently all programs that are dealing with version use something like this
- perl -MExtUtils::MakeMaker -le \
- 'print MM->parse_version(shift)' filename
If you are author of a package and wonder if your $VERSION can be parsed, please try the above method.
come as compressed or gzipped tarfiles or as zip files and contain a
Makefile.PL
or Build.PL
(well, we try to handle a bit more, but
with little enthusiasm).
Debugging this module is more than a bit complex due to interference from the software producing the indices on CPAN, the mirroring process on CPAN, packaging, configuration, synchronicity, and even (gasp!) due to bugs within the CPAN.pm module itself.
For debugging the code of CPAN.pm itself in interactive mode, some debugging aid can be turned on for most packages within CPAN.pm with one of
sets debug mode for packages.
unsets debug mode for packages.
turns debugging on for all packages.
which sets the debugging packages directly. Note that o debug 0
turns debugging off.
What seems a successful strategy is the combination of reload
cpan
and the debugging switches. Add a new debug statement while
running in the shell and then issue a reload cpan
and see the new
debugging messages immediately without losing the current context.
o debug
without an argument lists the valid package names and the
current set of packages in debugging mode. o debug
has built-in
completion support.
For debugging of CPAN data there is the dump command which takes
the same arguments as make/test/install and outputs each object's
Data::Dumper dump. If an argument looks like a perl variable and
contains one of $
, @
or %
, it is eval()ed and fed to
Data::Dumper directly.
CPAN.pm works nicely without network access, too. If you maintain machines
that are not networked at all, you should consider working with file:
URLs. You'll have to collect your modules somewhere first. So
you might use CPAN.pm to put together all you need on a networked
machine. Then copy the $CPAN::Config->{keep_source_where} (but not
$CPAN::Config->{build_dir}) directory on a floppy. This floppy is kind
of a personal CPAN. CPAN.pm on the non-networked machines works nicely
with this floppy. See also below the paragraph about CD-ROM support.
Returns true if the module is installed. Used to load all modules into
the running CPAN.pm that are considered optional. The config variable
dontload_list
intercepts the has_inst()
call such
that an optional module is not loaded despite being available. For
example, the following command will prevent YAML.pm
from being
loaded:
- cpan> o conf dontload_list push YAML
See the source for details.
Returns true if the module is installed and in a usable state. Only useful for a handful of modules that are used internally. See the source for details.
The constructor for all the singletons used to represent modules, distributions, authors, and bundles. If the object already exists, this method returns the object; otherwise, it calls the constructor.
There's no strong security layer in CPAN.pm. CPAN.pm helps you to install foreign, unmasked, unsigned code on your machine. We compare to a checksum that comes from the net just as the distribution file itself. But we try to make it easy to add security on demand:
Since release 1.77, CPAN.pm has been able to verify cryptographically signed module distributions using Module::Signature. The CPAN modules can be signed by their authors, thus giving more security. The simple unsigned MD5 checksums that were used before by CPAN protect mainly against accidental file corruption.
You will need to have Module::Signature installed, which in turn requires that you have at least one of Crypt::OpenPGP module or the command-line gpg tool installed.
You will also need to be able to connect over the Internet to the public key servers, like pgp.mit.edu, and their port 11731 (the HKP protocol).
The configuration parameter check_sigs is there to turn signature checking on or off.
Most functions in package CPAN are exported by default. The reason for this is that the primary use is intended for the cpan shell or for one-liners.
When the CPAN shell enters a subshell via the look command, it sets the environment CPAN_SHELL_LEVEL to 1, or increments that variable if it is already set.
When CPAN runs, it sets the environment variable PERL5_CPAN_IS_RUNNING to the ID of the running process. It also sets PERL5_CPANPLUS_IS_RUNNING to prevent runaway processes which could happen with older versions of Module::Install.
When running perl Makefile.PL
, the environment variable
PERL5_CPAN_IS_EXECUTING
is set to the full path of the
Makefile.PL
that is being executed. This prevents runaway processes
with newer versions of Module::Install.
When the config variable ftp_passive is set, all downloads will be run with the environment variable FTP_PASSIVE set to this value. This is in general a good idea as it influences both Net::FTP and LWP based connections. The same effect can be achieved by starting the cpan shell with this environment variable set. For Net::FTP alone, one can also always set passive mode by running libnetcfg.
Populating a freshly installed perl with one's favorite modules is pretty easy if you maintain a private bundle definition file. To get a useful blueprint of a bundle definition file, the command autobundle can be used on the CPAN shell command line. This command writes a bundle definition file for all modules installed for the current perl interpreter. It's recommended to run this command once only, and from then on maintain the file manually under a private name, say Bundle/my_bundle.pm. With a clever bundle file you can then simply say
- cpan> install Bundle::my_bundle
then answer a few questions and go out for coffee (possibly even in a different city).
Maintaining a bundle definition file means keeping track of two things: dependencies and interactivity. CPAN.pm sometimes fails on calculating dependencies because not all modules define all MakeMaker attributes correctly, so a bundle definition file should specify prerequisites as early as possible. On the other hand, it's annoying that so many distributions need some interactive configuring. So what you can try to accomplish in your private bundle file is to have the packages that need to be configured early in the file and the gentle ones later, so you can go out for coffee after a few minutes and leave CPAN.pm to churn away unattended.
Thanks to Graham Barr for contributing the following paragraphs about the interaction between perl, and various firewall configurations. For further information on firewalls, it is recommended to consult the documentation that comes with the ncftp program. If you are unable to go through the firewall with a simple Perl setup, it is likely that you can configure ncftp so that it works through your firewall.
Firewalls can be categorized into three basic types.
This is when the firewall machine runs a web server, and to access the outside world, you must do so via that web server. If you set environment variables like http_proxy or ftp_proxy to values beginning with http://, or in your web browser you've proxy information set, then you know you are running behind an http firewall.
To access servers outside these types of firewalls with perl (even for ftp), you need LWP or HTTP::Tiny.
This where the firewall machine runs an ftp server. This kind of firewall will only let you access ftp servers outside the firewall. This is usually done by connecting to the firewall with ftp, then entering a username like "user@outside.host.com".
To access servers outside these type of firewalls with perl, you need Net::FTP.
One-way visibility means these firewalls try to make themselves invisible to users inside the firewall. An FTP data connection is normally created by sending your IP address to the remote server and then listening for the return connection. But the remote server will not be able to connect to you because of the firewall. For these types of firewall, FTP connections need to be done in a passive mode.
There are two that I can think off.
If you are using a SOCKS firewall, you will need to compile perl and link it with the SOCKS library. This is what is normally called a 'socksified' perl. With this executable you will be able to connect to servers outside the firewall as if it were not there.
This is when the firewall implemented in the kernel (via NAT, or networking address translation), it allows you to hide a complete network behind one IP address. With this firewall no special compiling is needed as you can access hosts directly.
For accessing ftp servers behind such firewalls you usually need to
set the environment variable FTP_PASSIVE
or the config variable
ftp_passive to a true value.
If you can go through your firewall with e.g. lynx, presumably with a command such as
- /usr/local/bin/lynx -pscott:tiger
then you would configure CPAN.pm with the command
- o conf lynx "/usr/local/bin/lynx -pscott:tiger"
That's all. Similarly for ncftp or ftp, you would configure something like
- o conf ncftp "/usr/bin/ncftp -f /home/scott/ncftplogin.cfg"
Your mileage may vary...
I installed a new version of module X but CPAN keeps saying, I have the old version installed
Probably you do have the old version installed. This can
happen if a module installs itself into a different directory in the
@INC path than it was previously installed. This is not really a
CPAN.pm problem, you would have the same problem when installing the
module manually. The easiest way to prevent this behaviour is to add
the argument UNINST=1
to the make install
call, and that is why
many people add this argument permanently by configuring
- o conf make_install_arg UNINST=1
So why is UNINST=1 not the default?
Because there are people who have their precise expectations about who
may install where in the @INC path and who uses which @INC array. In
fine tuned environments UNINST=1
can cause damage.
I want to clean up my mess, and install a new perl along with all modules I have. How do I go about it?
Run the autobundle command for your old perl and optionally rename the resulting bundle file (e.g. Bundle/mybundle.pm), install the new perl with the Configure option prefix, e.g.
- ./Configure -Dprefix=/usr/local/perl-5.6.78.9
Install the bundle file you produced in the first step with something like
- cpan> install Bundle::mybundle
and you're done.
When I install bundles or multiple modules with one command there is too much output to keep track of.
You may want to configure something like
- o conf make_arg "| tee -ai /root/.cpan/logs/make.out"
- o conf make_install_arg "| tee -ai /root/.cpan/logs/make_install.out"
so that STDOUT is captured in a file for later inspection.
I am not root, how can I install a module in a personal directory?
As of CPAN 1.9463, if you do not have permission to write the default perl library directories, CPAN's configuration process will ask you whether you want to bootstrap <local::lib>, which makes keeping a personal perl library directory easy.
Another thing you should bear in mind is that the UNINST parameter can be dangerous when you are installing into a private area because you might accidentally remove modules that other people depend on that are not using the private area.
How to get a package, unwrap it, and make a change before building it?
Have a look at the look
(!) command.
I installed a Bundle and had a couple of fails. When I retried, everything resolved nicely. Can this be fixed to work on first try?
The reason for this is that CPAN does not know the dependencies of all
modules when it starts out. To decide about the additional items to
install, it just uses data found in the META.yml file or the generated
Makefile. An undetected missing piece breaks the process. But it may
well be that your Bundle installs some prerequisite later than some
depending item and thus your second try is able to resolve everything.
Please note, CPAN.pm does not know the dependency tree in advance and
cannot sort the queue of things to install in a topologically correct
order. It resolves perfectly well if all modules declare the
prerequisites correctly with the PREREQ_PM attribute to MakeMaker or
the requires
stanza of Module::Build. For bundles which fail and
you need to install often, it is recommended to sort the Bundle
definition file manually.
In our intranet, we have many modules for internal use. How can I integrate these modules with CPAN.pm but without uploading the modules to CPAN?
Have a look at the CPAN::Site module.
When I run CPAN's shell, I get an error message about things in my
/etc/inputrc (or ~/.inputrc) file.
These are readline issues and can only be fixed by studying readline
configuration on your architecture and adjusting the referenced file
accordingly. Please make a backup of the /etc/inputrc or ~/.inputrc
and edit them. Quite often harmless changes like uppercasing or
lowercasing some arguments solves the problem.
Some authors have strange characters in their names.
Internally CPAN.pm uses the UTF-8 charset. If your terminal is expecting ISO-8859-1 charset, a converter can be activated by setting term_is_latin to a true value in your config file. One way of doing so would be
- cpan> o conf term_is_latin 1
If other charset support is needed, please file a bug report against CPAN.pm at rt.cpan.org and describe your needs. Maybe we can extend the support or maybe UTF-8 terminals become widely available.
Note: this config variable is deprecated and will be removed in a future version of CPAN.pm. It will be replaced with the conventions around the family of $LANG and $LC_* environment variables.
When an install fails for some reason and then I correct the error
condition and retry, CPAN.pm refuses to install the module, saying
Already tried without success
.
Use the force pragma like so
- force install Foo::Bar
Or you can use
- look Foo::Bar
and then make install
directly in the subshell.
How do I install a "DEVELOPER RELEASE" of a module?
By default, CPAN will install the latest non-developer release of a module. If you want to install a dev release, you have to specify the partial path starting with the author id to the tarball you wish to install, like so:
- cpan> install KWILLIAMS/Module-Build-0.27_07.tar.gz
Note that you can use the ls
command to get this path listed.
How do I install a module and all its dependencies from the commandline, without being prompted for anything, despite my CPAN configuration (or lack thereof)?
CPAN uses ExtUtils::MakeMaker's prompt() function to ask its questions, so if you set the PERL_MM_USE_DEFAULT environment variable, you shouldn't be asked any questions at all (assuming the modules you are installing are nice about obeying that variable as well):
- % PERL_MM_USE_DEFAULT=1 perl -MCPAN -e 'install My::Module'
How do I create a Module::Build based Build.PL derived from an ExtUtils::MakeMaker focused Makefile.PL?
I'm frequently irritated with the CPAN shell's inability to help me select a good mirror.
CPAN can now help you select a "good" mirror, based on which ones have the lowest 'ping' round-trip times. From the shell, use the command 'o conf init urllist' and allow CPAN to automatically select mirrors for you.
Beyond that help, the urllist config parameter is yours. You can add and remove sites at will. You should find out which sites have the best up-to-dateness, bandwidth, reliability, etc. and are topologically close to you. Some people prefer fast downloads, others up-to-dateness, others reliability. You decide which to try in which order.
Henk P. Penning maintains a site that collects data about CPAN sites:
- http://www.cs.uu.nl/people/henkp/mirmon/cpan.html
Also, feel free to play with experimental features. Run
- o conf init randomize_urllist ftpstats_period ftpstats_size
and choose your favorite parameters. After a few downloads running the
hosts
command will probably assist you in choosing the best mirror
sites.
Why do I get asked the same questions every time I start the shell?
You can make your configuration changes permanent by calling the
command o conf commit
. Alternatively set the auto_commit
variable to true by running o conf init auto_commit
and answering
the following question with yes.
Older versions of CPAN.pm had the original root directory of all tarballs in the build directory. Now there are always random characters appended to these directory names. Why was this done?
The random characters are provided by File::Temp and ensure that each module's individual build directory is unique. This makes running CPAN.pm in concurrent processes simultaneously safe.
Speaking of the build directory. Do I have to clean it up myself?
You have the choice to set the config variable scan_cache
to
never
. Then you must clean it up yourself. The other possible
values, atstart
and atexit
clean up the build directory when you
start or exit the CPAN shell, respectively. If you never start up the
CPAN shell, you probably also have to clean up the build directory
yourself.
CPAN.pm is regularly tested to run under 5.005 and assorted newer versions. It is getting more and more difficult to get the minimal prerequisites working on older perls. It is close to impossible to get the whole Bundle::CPAN working there. If you're in the position to have only these old versions, be advised that CPAN is designed to work fine without the Bundle::CPAN installed.
To get things going, note that GBARR/Scalar-List-Utils-1.18.tar.gz is compatible with ancient perls and that File::Temp is listed as a prerequisite but CPAN has reasonable workarounds if it is missing.
This module and its competitor, the CPANPLUS module, are both much cooler than the other. CPAN.pm is older. CPANPLUS was designed to be more modular, but it was never intended to be compatible with CPAN.pm.
In the year 2010 App::cpanminus was launched as a new approach to a cpan shell with a considerably smaller footprint. Very cool stuff.
This software enables you to upgrade software on your computer and so is inherently dangerous because the newly installed software may contain bugs and may alter the way your computer works or even make it unusable. Please consider backing up your data before every upgrade.
Please report bugs via http://rt.cpan.org/
Before submitting a bug, please make sure that the traditional method of building a Perl module package from a shell by following the installation instructions of that package still works in your environment.
Andreas Koenig <andk@cpan.org>
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html
Kawai,Takanori provides a Japanese translation of a very old version of this manpage at http://homepage3.nifty.com/hippo2000/perltips/CPAN.htm
Many people enter the CPAN shell by running the cpan utility
program which is installed in the same directory as perl itself. So if
you have this directory in your PATH variable (or some equivalent in
your operating system) then typing cpan
in a console window will
work for you as well. Above that the utility provides several
commandline shortcuts.
melezhik (Alexey) sent me a link where he published a chef recipe to work with CPAN.pm: http://community.opscode.com/cookbooks/cpan.
CPANPLUS - API & CLI access to the CPAN mirrors
- ### standard invocation from the command line
- $ cpanp
- $ cpanp -i Some::Module
- $ perl -MCPANPLUS -eshell
- $ perl -MCPANPLUS -e'fetch Some::Module'
The CPANPLUS
library is an API to the CPAN
mirrors and a
collection of interactive shells, commandline programs, etc,
that use this API.
This is the document you are currently reading. It describes basic usage and background information. Its main purpose is to assist the user who wants to learn how to invoke CPANPLUS and install modules from the commandline and to point you to more indepth reading if required.
The CPANPLUS
API is meant to let you programmatically
interact with the CPAN
mirrors. The documentation in
CPANPLUS::Backend shows you how to create an object
capable of interacting with those mirrors, letting you
create & retrieve module objects.
CPANPLUS::Module shows you how you can use these module
objects to perform actions like installing and testing.
The default shell, documented in CPANPLUS::Shell::Default is also scriptable. You can use its API to dispatch calls from your script to the CPANPLUS Shell.
You can start an interactive shell by running either of the two following commands:
- $ cpanp
- $ perl -MCPANPLUS -eshell
All commands available are listed in the interactive shells
help menu. See cpanp -h
or CPANPLUS::Shell::Default
for instructions on using the default shell.
By running cpanp
without arguments, you will start up
the shell specified in your config, which defaults to
CPANPLUS::Shell::Default. There are more shells available.
CPANPLUS
itself ships with an emulation shell called
CPANPLUS::Shell::Classic that looks and feels just like
the old CPAN.pm
shell.
You can start this shell by typing:
- $ perl -MCPANPLUS -e'shell Classic'
Even more shells may be available from CPAN
.
Note that if you have changed your default shell in your configuration, that shell will be used instead. If for some reason there was an error with your specified shell, you will be given the default shell.
cpan2dist
is a commandline tool to convert any distribution
from CPAN
into a package in the format of your choice, like
for example .deb or FreeBSD ports
.
See cpan2dist -h
for details.
For quick access to common commands, you may use this module,
CPANPLUS
rather than the full programmatic API situated in
CPANPLUS::Backend
. This module offers the following functions:
This function requires the full name of the module, which is case sensitive. The module name can also be provided as a fully qualified file name, beginning with a /, relative to the /authors/id directory on a CPAN mirror.
It will download, extract and install the module.
Like install, fetch needs the full name of a module or the fully qualified file name, and is case sensitive.
It will download the specified module to the current directory.
Get is provided as an alias for fetch for compatibility with CPAN.pm.
Shell starts the default CPAN shell. You can also start the shell
by using the cpanp
command, which will be installed in your
perl bin.
For frequently asked questions and answers, please consult the
CPANPLUS::FAQ
manual.
Please report bugs or other issues to <bug-cpanplus@rt.cpan.org<gt>.
This module by Jos Boumans <kane@cpan.org>.
The CPAN++ interface (of which this module is a part of) is copyright (c) 2001 - 2007, Jos Boumans <kane@cpan.org>. All rights reserved.
This library is free software; you may redistribute and/or modify it under the same terms as Perl itself.
CPANPLUS::Shell::Default, CPANPLUS::FAQ, CPANPLUS::Backend, CPANPLUS::Module, cpanp, cpan2dist
Carp - alternative warn and die for modules
- use Carp;
- # warn user (from perspective of caller)
- carp "string trimmed to 80 chars";
- # die of errors (from perspective of caller)
- croak "We're outta here!";
- # die of errors with stack backtrace
- confess "not implemented";
- # cluck, longmess and shortmess not exported by default
- use Carp qw(cluck longmess shortmess);
- cluck "This is how we got here!";
- $long_message = longmess( "message from cluck() or confess()" );
- $short_message = shortmess( "message from carp() or croak()" );
The Carp routines are useful in your own modules because
they act like die() or warn(), but with a message which is more
likely to be useful to a user of your module. In the case of
cluck()
and confess()
, that context is a summary of every
call in the call-stack; longmess()
returns the contents of the error
message.
For a shorter message you can use carp()
or croak()
which report the
error as being from where your module was called. shortmess()
returns the
contents of this error message. There is no guarantee that that is where the
error was, but it is a good educated guess.
You can also alter the way the output and logic of Carp
works, by
changing some global variables in the Carp
namespace. See the
section on GLOBAL VARIABLES
below.
Here is a more complete description of how carp
and croak
work.
What they do is search the call-stack for a function call stack where
they have not been told that there shouldn't be an error. If every
call is marked safe, they give up and give a full stack backtrace
instead. In other words they presume that the first likely looking
potential suspect is guilty. Their rules for telling whether
a call shouldn't generate errors work as follows:
Any call from a package to itself is safe.
Packages claim that there won't be errors on calls to or from
packages explicitly marked as safe by inclusion in @CARP_NOT
, or
(if that array is empty) @ISA
. The ability to override what
@ISA says is new in 5.8.
The trust in item 2 is transitive. If A trusts B, and B
trusts C, then A trusts C. So if you do not override @ISA
with @CARP_NOT
, then this trust relationship is identical to,
"inherits from".
Any call from an internal Perl module is safe. (Nothing keeps user modules from marking themselves as internal to Perl, but this practice is discouraged.)
Any call to Perl's warning system (eg Carp itself) is safe.
(This rule is what keeps it from reporting the error at the
point where you call carp
or croak
.)
$Carp::CarpLevel
can be set to skip a fixed number of additional
call levels. Using this is not recommended because it is very
difficult to get it to behave correctly.
As a debugging aid, you can force Carp to treat a croak as a confess and a carp as a cluck across all modules. In other words, force a detailed stack trace to be given. This can be very helpful when trying to understand why, or from where, a warning or error is being generated.
This feature is enabled by 'importing' the non-existent symbol 'verbose'. You would typically enable it by saying
- perl -MCarp=verbose script.pl
or by including the string -MCarp=verbose
in the PERL5OPT
environment variable.
Alternately, you can set the global variable $Carp::Verbose
to true.
See the GLOBAL VARIABLES
section below.
This variable determines how many characters of a string-eval are to
be shown in the output. Use a value of 0
to show all text.
Defaults to 0
.
This variable determines how many characters of each argument to a
function to print. Use a value of 0
to show the full length of the
argument.
Defaults to 64
.
This variable determines how many arguments to each function to show.
Use a value of 0
to show all arguments to a function call.
Defaults to 8
.
This variable makes carp()
and croak()
generate stack backtraces
just like cluck()
and confess()
. This is how use Carp 'verbose'
is implemented internally.
Defaults to 0
.
This variable, in your package, says which packages are not to be
considered as the location of an error. The carp()
and cluck()
functions will skip over callers when reporting where an error occurred.
NB: This variable must be in the package's symbol table, thus:
Example of use:
- package My::Carping::Package;
- use Carp;
- our @CARP_NOT;
- sub bar { .... or _error('Wrong input') }
- sub _error {
- # temporary control of where'ness, __PACKAGE__ is implicit
- local @CARP_NOT = qw(My::Friendly::Caller);
- carp(@_)
- }
This would make Carp
report the error as coming from a caller not
in My::Carping::Package
, nor from My::Friendly::Caller
.
Also read the DESCRIPTION section above, about how Carp
decides
where the error is reported from.
Use @CARP_NOT
, instead of $Carp::CarpLevel
.
Overrides Carp
's use of @ISA
.
This says what packages are internal to Perl. Carp
will never
report an error as being from a line in a package that is internal to
Perl. For example:
would give a full stack backtrace starting from the first caller outside of __PACKAGE__. (Unless that package was also internal to Perl.)
This says which packages are internal to Perl's warning system. For
generating a full stack backtrace this is the same as being internal
to Perl, the stack backtrace will not start inside packages that are
listed in %Carp::CarpInternal
. But it is slightly different for
the summary message generated by carp
or croak
. There errors
will not be reported on any lines that are calling packages in
%Carp::CarpInternal
.
For example Carp
itself is listed in %Carp::CarpInternal
.
Therefore the full stack backtrace from confess
will not start
inside of Carp
, and the short message from calling croak
is
not placed on the line where croak
was called.
This variable determines how many additional call frames are to be
skipped that would not otherwise be when reporting where an error
occurred on a call to one of Carp
's functions. It is fairly easy
to count these call frames on calls that generate a full stack
backtrace. However it is much harder to do this accounting for calls
that generate a short message. Usually people skip too many call
frames. If they are lucky they skip enough that Carp
goes all of
the way through the call stack, realizes that something is wrong, and
then generates a full stack backtrace. If they are unlucky then the
error is reported from somewhere misleading very high in the call
stack.
Therefore it is best to avoid $Carp::CarpLevel
. Instead use
@CARP_NOT
, %Carp::Internal
and %Carp::CarpInternal
.
Defaults to 0
.
The Carp routines don't handle exception objects currently. If called with a first argument that is a reference, they simply call die() or warn(), as appropriate.
The Carp module first appeared in Larry Wall's perl 5.000 distribution. Since then it has been modified by several of the perl 5 porters. Andrew Main (Zefram) <zefram@fysh.org> divested Carp into an independent distribution.
Copyright (C) 1994-2012 Larry Wall
Copyright (C) 2011, 2012 Andrew Main (Zefram) <zefram@fysh.org>
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Config - access Perl configuration information
The Config module contains all the information that was available to
the Configure
program at Perl build time (over 900 values).
Shell variables from the config.sh file (written by Configure) are
stored in the readonly-variable %Config
, indexed by their names.
Values stored in config.sh as 'undef' are returned as undefined
values. The perl exists function can be used to check if a
named variable exists.
For a description of the variables, please have a look at the Glossary file, as written in the Porting folder, or use the url: http://perl5.git.perl.org/perl.git/blob/HEAD:/Porting/Glossary
Returns a textual summary of the major perl configuration values.
See also -V
in Command Switches in perlrun.
Returns the entire perl configuration information in the form of the original config.sh shell variable assignment script.
Like config_sh() but returns, as a list, only the config entries who's names match the $regex.
Prints to STDOUT the values of the named configuration variable. Each is printed on a separate line in the form:
- name='value';
Names which are unknown are output as name='UNKNOWN';
.
See also -V:name in Command Switches in perlrun.
Returns a list of C pre-processor options used when compiling this perl
binary, which affect its binary compatibility with extensions.
bincompat_options()
and non_bincompat_options()
are shown together in
the output of perl -V
as Compile-time options.
Returns a list of C pre-processor options used when compiling this perl binary, which do not affect binary compatibility with extensions.
Returns the compile date (as a string), equivalent to what is shown by
perl -V
Returns a list of the names of locally applied patches, equivalent to what
is shown by perl -V
.
Returns a list of the header files that should be used as dependencies for XS code, for this version of Perl on this platform.
Here's a more sophisticated example of using %Config:
- use Config;
- use strict;
- my %sig_num;
- my @sig_name;
- unless($Config{sig_name} && $Config{sig_num}) {
- die "No sigs?";
- } else {
- my @names = split ' ', $Config{sig_name};
- @sig_num{@names} = split ' ', $Config{sig_num};
- foreach (@names) {
- $sig_name[$sig_num{$_}] ||= $_;
- }
- }
- print "signal #17 = $sig_name[17]\n";
- if ($sig_num{ALRM}) {
- print "SIGALRM is $sig_num{ALRM}\n";
- }
Because this information is not stored within the perl executable itself it is possible (but unlikely) that the information does not relate to the actual perl binary which is being used to access it.
The Config module is installed into the architecture and version specific library directory ($Config{installarchlib}) and it checks the perl version number when loaded.
The values stored in config.sh may be either single-quoted or
double-quoted. Double-quoted strings are handy for those cases where you
need to include escape sequences in the strings. To avoid runtime variable
interpolation, any $
and @
characters are replaced by \$
and
\@
, respectively. This isn't foolproof, of course, so don't embed \$
or \@
in double-quoted strings unless you're willing to deal with the
consequences. (The slashes will end up escaped and the $
or @
will
trigger variable interpolation)
Most Config
variables are determined by the Configure
script
on platforms supported by it (which is most UNIX platforms). Some
platforms have custom-made Config
variables, and may thus not have
some of the variables described below, or may have extraneous variables
specific to that particular port. See the port specific documentation
in such cases.
_a
From Unix.U:
This variable defines the extension used for ordinary library files. For unix, it is .a. The . is included. Other possible values include .lib.
_exe
From Unix.U:
This variable defines the extension used for executable files.
DJGPP
, Cygwin and OS/2 use .exe. Stratus VOS
uses .pm.
On operating systems which do not require a specific extension
for executable files, this variable is empty.
_o
From Unix.U:
This variable defines the extension used for object files. For unix, it is .o. The . is included. Other possible values include .obj.
afs
From afs.U:
This variable is set to true
if AFS
(Andrew File System) is used
on the system, false
otherwise. It is possible to override this
with a hint value or command line option, but you'd better know
what you are doing.
afsroot
From afs.U:
This variable is by default set to /afs. In the unlikely case this is not the correct root, it is possible to override this with a hint value or command line option. This will be used in subsequent tests for AFSness in the configure and test process.
alignbytes
From alignbytes.U:
This variable holds the number of bytes required to align a double-- or a long double when applicable. Usual values are 2, 4 and 8. The default is eight, for safety.
ansi2knr
From ansi2knr.U:
This variable is set if the user needs to run ansi2knr. Currently, this is not supported, so we just abort.
aphostname
From d_gethname.U:
This variable contains the command which can be used to compute the host name. The command is fully qualified by its absolute path, to make it safe when used by a process with super-user privileges.
api_revision
From patchlevel.U:
The three variables, api_revision, api_version, and
api_subversion, specify the version of the oldest perl binary
compatible with the present perl. In a full version string
such as 5.6.1, api_revision is the 5
.
Prior to 5.5.640, the format was a floating point number,
like 5.00563.
perl.c:incpush() and lib/lib.pm will automatically search in
$sitelib/.. for older directories back to the limit specified
by these api_ variables. This is only useful if you have a
perl library directory tree structured like the default one.
See INSTALL
for how this works. The versioned site_perl
directory was introduced in 5.005, so that is the lowest
possible value. The version list appropriate for the current
system is determined in inc_version_list.U.
XXX
To do: Since compatibility can depend on compile time
options (such as bincompat, longlong, etc.) it should
(perhaps) be set by Configure, but currently it isn't.
Currently, we read a hard-wired value from patchlevel.h.
Perhaps what we ought to do is take the hard-wired value from
patchlevel.h but then modify it if the current Configure
options warrant. patchlevel.h then would use an #ifdef guard.
api_subversion
From patchlevel.U:
The three variables, api_revision, api_version, and
api_subversion, specify the version of the oldest perl binary
compatible with the present perl. In a full version string
such as 5.6.1, api_subversion is the 1
. See api_revision for
full details.
api_version
From patchlevel.U:
The three variables, api_revision, api_version, and
api_subversion, specify the version of the oldest perl binary
compatible with the present perl. In a full version string
such as 5.6.1, api_version is the 6
. See api_revision for
full details. As a special case, 5.5.0 is rendered in the
old-style as 5.005. (In the 5.005_0x maintenance series,
this was the only versioned directory in $sitelib.)
api_versionstring
From patchlevel.U:
This variable combines api_revision, api_version, and api_subversion in a format such as 5.6.1 (or 5_6_1) suitable for use as a directory name. This is filesystem dependent.
ar
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the ar program. After Configure runs,
the value is reset to a plain ar
and is not useful.
archlib
From archlib.U:
This variable holds the name of the directory in which the user wants to put architecture-dependent public library files for $package. It is most often a local directory such as /usr/local/lib. Programs using this variable must be prepared to deal with filename expansion.
archlibexp
From archlib.U:
This variable is the same as the archlib variable, but is filename expanded at configuration time, for convenient use.
archname
From archname.U:
This variable is a short name to characterize the current architecture. It is used mainly to construct the default archlib.
archname64
From use64bits.U:
This variable is used for the 64-bitness part of $archname.
archobjs
From Unix.U:
This variable defines any additional objects that must be linked in with the program on this architecture. On unix, it is usually empty. It is typically used to include emulations of unix calls or other facilities. For perl on OS/2, for example, this would include os2/os2.obj.
asctime_r_proto
From d_asctime_r.U:
This variable encodes the prototype of asctime_r.
It is zero if d_asctime_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_asctime_r
is defined.
awk
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the awk program. After Configure runs,
the value is reset to a plain awk
and is not useful.
baserev
From baserev.U:
The base revision level of this package, from the .package file.
bash
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
bin
From bin.U:
This variable holds the name of the directory in which the user wants to put publicly executable images for the package in question. It is most often a local directory such as /usr/local/bin. Programs using this variable must be prepared to deal with ~name substitution.
bin_ELF
From dlsrc.U:
This variable saves the result from configure if generated binaries
are in ELF
format. Only set to defined when the test has actually
been performed, and the result was positive.
binexp
From bin.U:
This is the same as the bin variable, but is filename expanded at configuration time, for use in your makefiles.
bison
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the bison program. After Configure runs,
the value is reset to a plain bison
and is not useful.
bootstrap_charset
From ebcdic.U:
This variable conditionally defines BOOTSTRAP_CHARSET
if
this system uses non-ASCII
encoding.
byacc
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the byacc program. After Configure runs,
the value is reset to a plain byacc
and is not useful.
byteorder
From byteorder.U:
This variable holds the byte order in a UV
. In the following,
larger digits indicate more significance. The variable byteorder
is either 4321 on a big-endian machine, or 1234 on a little-endian,
or 87654321 on a Cray ... or 3412 with weird order !
c
From n.U:
This variable contains the \c string if that is what causes the echo command to suppress newline. Otherwise it is null. Correct usage is $echo $n "prompt for a question: $c".
castflags
From d_castneg.U:
This variable contains a flag that precise difficulties the compiler has casting odd floating values to unsigned long: 0 = ok 1 = couldn't cast < 0 2 = couldn't cast >= 0x80000000 4 = couldn't cast in argument expression list
cat
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the cat program. After Configure runs,
the value is reset to a plain cat
and is not useful.
cc
From cc.U:
This variable holds the name of a command to execute a C compiler which
can resolve multiple global references that happen to have the same
name. Usual values are cc
and gcc
.
Fervent ANSI
compilers may be called c89
. AIX
has xlc.
cccdlflags
From dlsrc.U:
This variable contains any special flags that might need to be
passed with cc -c
to compile modules to be used to create a shared
library that will be used for dynamic loading. For hpux, this
should be +z. It is up to the makefile to use it.
ccdlflags
From dlsrc.U:
This variable contains any special flags that might need to be passed to cc to link with a shared library for dynamic loading. It is up to the makefile to use it. For sunos 4.1, it should be empty.
ccflags
From ccflags.U:
This variable contains any additional C compiler flags desired by the user. It is up to the Makefile to use this.
ccflags_uselargefiles
From uselfs.U:
This variable contains the compiler flags needed by large file builds and added to ccflags by hints files.
ccname
From Checkcc.U:
This can set either by hints files or by Configure. If using
gcc, this is gcc, and if not, usually equal to cc, unimpressive, no?
Some platforms, however, make good use of this by storing the
flavor of the C compiler being used here. For example if using
the Sun WorkShop suite, ccname will be workshop
.
ccsymbols
From Cppsym.U:
The variable contains the symbols defined by the C compiler alone. The symbols defined by cpp or by cc when it calls cpp are not in this list, see cppsymbols and cppccsymbols. The list is a space-separated list of symbol=value tokens.
ccversion
From Checkcc.U:
This can set either by hints files or by Configure. If using a (non-gcc) vendor cc, this variable may contain a version for the compiler.
cf_by
From cf_who.U:
Login name of the person who ran the Configure script and answered the questions. This is used to tag both config.sh and config_h.SH.
cf_email
From cf_email.U:
Electronic mail address of the person who ran Configure. This can be used by units that require the user's e-mail, like MailList.U.
cf_time
From cf_who.U:
Holds the output of the date
command when the configuration file was
produced. This is used to tag both config.sh and config_h.SH.
charbits
From charsize.U:
This variable contains the value of the CHARBITS
symbol, which
indicates to the C program how many bits there are in a character.
charsize
From charsize.U:
This variable contains the value of the CHARSIZE
symbol, which
indicates to the C program how many bytes there are in a character.
chgrp
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
chmod
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the chmod program. After Configure runs,
the value is reset to a plain chmod and is not useful.
chown
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
clocktype
From d_times.U:
This variable holds the type returned by times(). It can be long,
or clock_t on BSD
sites (in which case <sys/types.h> should be
included).
comm
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the comm program. After Configure runs,
the value is reset to a plain comm
and is not useful.
compress
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
config_arg0
From Options.U:
This variable contains the string used to invoke the Configure command, as reported by the shell in the $0 variable.
config_argc
From Options.U:
This variable contains the number of command-line arguments passed to Configure, as reported by the shell in the $# variable. The individual arguments are stored as variables config_arg1, config_arg2, etc.
config_args
From Options.U:
This variable contains a single string giving the command-line arguments passed to Configure. Spaces within arguments, quotes, and escaped characters are not correctly preserved. To reconstruct the command line, you must assemble the individual command line pieces, given in config_arg[0-9]*.
contains
From contains.U:
This variable holds the command to do a grep with a proper return
status. On most sane systems it is simply grep. On insane systems
it is a grep followed by a cat followed by a test. This variable
is primarily for the use of other Configure units.
cp
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the cp program. After Configure runs,
the value is reset to a plain cp
and is not useful.
cpio
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
cpp
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the cpp program. After Configure runs,
the value is reset to a plain cpp
and is not useful.
cpp_stuff
From cpp_stuff.U:
This variable contains an identification of the concatenation mechanism used by the C preprocessor.
cppccsymbols
From Cppsym.U:
The variable contains the symbols defined by the C compiler when it calls cpp. The symbols defined by the cc alone or cpp alone are not in this list, see ccsymbols and cppsymbols. The list is a space-separated list of symbol=value tokens.
cppflags
From ccflags.U:
This variable holds the flags that will be passed to the C pre- processor. It is up to the Makefile to use it.
cpplast
From cppstdin.U:
This variable has the same functionality as cppminus, only it applies to cpprun and not cppstdin.
cppminus
From cppstdin.U:
This variable contains the second part of the string which will invoke
the C preprocessor on the standard input and produce to standard
output. This variable will have the value -
if cppstdin needs
a minus to specify standard input, otherwise the value is "".
cpprun
From cppstdin.U:
This variable contains the command which will invoke a C preprocessor on standard input and put the output to stdout. It is guaranteed not to be a wrapper and may be a null string if no preprocessor can be made directly available. This preprocessor might be different from the one used by the C compiler. Don't forget to append cpplast after the preprocessor options.
cppstdin
From cppstdin.U:
This variable contains the command which will invoke the C preprocessor on standard input and put the output to stdout. It is primarily used by other Configure units that ask about preprocessor symbols.
cppsymbols
From Cppsym.U:
The variable contains the symbols defined by the C preprocessor alone. The symbols defined by cc or by cc when it calls cpp are not in this list, see ccsymbols and cppccsymbols. The list is a space-separated list of symbol=value tokens.
crypt_r_proto
From d_crypt_r.U:
This variable encodes the prototype of crypt_r.
It is zero if d_crypt_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_crypt_r
is defined.
cryptlib
From d_crypt.U:
This variable holds -lcrypt or the path to a libcrypt.a archive if the crypt() function is not defined in the standard C library. It is up to the Makefile to use this.
csh
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the csh program. After Configure runs,
the value is reset to a plain csh
and is not useful.
ctermid_r_proto
From d_ctermid_r.U:
This variable encodes the prototype of ctermid_r.
It is zero if d_ctermid_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_ctermid_r
is defined.
ctime_r_proto
From d_ctime_r.U:
This variable encodes the prototype of ctime_r.
It is zero if d_ctime_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_ctime_r
is defined.
d__fwalk
From d__fwalk.U:
This variable conditionally defines HAS__FWALK
if _fwalk() is
available to apply a function to all the file handles.
d_access
From d_access.U:
This variable conditionally defines HAS_ACCESS
if the access() system
call is available to check for access permissions using real IDs.
d_accessx
From d_accessx.U:
This variable conditionally defines the HAS_ACCESSX
symbol, which
indicates to the C program that the accessx() routine is available.
d_aintl
From d_aintl.U:
This variable conditionally defines the HAS_AINTL
symbol, which
indicates to the C program that the aintl() routine is available.
If copysignl is also present we can emulate modfl.
d_alarm
From d_alarm.U:
This variable conditionally defines the HAS_ALARM
symbol, which
indicates to the C program that the alarm() routine is available.
d_archlib
From archlib.U:
This variable conditionally defines ARCHLIB
to hold the pathname
of architecture-dependent library files for $package. If
$archlib is the same as $privlib, then this is set to undef.
d_asctime64
From d_timefuncs64.U:
This variable conditionally defines the HAS_ASCTIME64 symbol, which indicates to the C program that the asctime64 () routine is available.
d_asctime_r
From d_asctime_r.U:
This variable conditionally defines the HAS_ASCTIME_R
symbol,
which indicates to the C program that the asctime_r()
routine is available.
d_atolf
From atolf.U:
This variable conditionally defines the HAS_ATOLF
symbol, which
indicates to the C program that the atolf() routine is available.
d_atoll
From atoll.U:
This variable conditionally defines the HAS_ATOLL
symbol, which
indicates to the C program that the atoll() routine is available.
d_attribute_deprecated
From d_attribut.U:
This variable conditionally defines HASATTRIBUTE_DEPRECATED
, which
indicates that GCC
can handle the attribute for marking deprecated
APIs
d_attribute_format
From d_attribut.U:
This variable conditionally defines HASATTRIBUTE_FORMAT
, which
indicates the C compiler can check for printf-like formats.
d_attribute_malloc
From d_attribut.U:
This variable conditionally defines HASATTRIBUTE_MALLOC
, which
indicates the C compiler can understand functions as having
malloc-like semantics.
d_attribute_nonnull
From d_attribut.U:
This variable conditionally defines HASATTRIBUTE_NONNULL
, which
indicates that the C compiler can know that certain arguments
must not be NULL
, and will check accordingly at compile time.
d_attribute_noreturn
From d_attribut.U:
This variable conditionally defines HASATTRIBUTE_NORETURN
, which
indicates that the C compiler can know that certain functions
are guaranteed never to return.
d_attribute_pure
From d_attribut.U:
This variable conditionally defines HASATTRIBUTE_PURE
, which
indicates that the C compiler can know that certain functions
are pure
functions, meaning that they have no side effects, and
only rely on function input and/or global data for their results.
d_attribute_unused
From d_attribut.U:
This variable conditionally defines HASATTRIBUTE_UNUSED
, which
indicates that the C compiler can know that certain variables
and arguments may not always be used, and to not throw warnings
if they don't get used.
d_attribute_warn_unused_result
From d_attribut.U:
This variable conditionally defines
HASATTRIBUTE_WARN_UNUSED_RESULT
, which indicates that the C
compiler can know that certain functions have a return values
that must not be ignored, such as malloc() or open().
d_bcmp
From d_bcmp.U:
This variable conditionally defines the HAS_BCMP
symbol if
the bcmp() routine is available to compare strings.
d_bcopy
From d_bcopy.U:
This variable conditionally defines the HAS_BCOPY
symbol if
the bcopy() routine is available to copy strings.
d_bsd
From Guess.U:
This symbol conditionally defines the symbol BSD
when running on a
BSD
system.
d_bsdgetpgrp
From d_getpgrp.U:
This variable conditionally defines USE_BSD_GETPGRP
if
getpgrp needs one arguments whereas USG
one needs none.
d_bsdsetpgrp
From d_setpgrp.U:
This variable conditionally defines USE_BSD_SETPGRP
if
setpgrp needs two arguments whereas USG
one needs none.
See also d_setpgid for a POSIX
interface.
d_builtin_choose_expr
From d_builtin.U:
This conditionally defines HAS_BUILTIN_CHOOSE_EXPR
, which
indicates that the compiler supports __builtin_choose_expr(x,y,z).
This built-in function is analogous to the x?y:z operator in C,
except that the expression returned has its type unaltered by
promotion rules. Also, the built-in function does not evaluate
the expression that was not chosen.
d_builtin_expect
From d_builtin.U:
This conditionally defines HAS_BUILTIN_EXPECT
, which indicates
that the compiler supports __builtin_expect(exp,c). You may use
__builtin_expect to provide the compiler with branch prediction
information.
d_bzero
From d_bzero.U:
This variable conditionally defines the HAS_BZERO
symbol if
the bzero() routine is available to set memory to 0.
d_c99_variadic_macros
From d_c99_variadic.U:
This variable conditionally defines the HAS_C99_VARIADIC_MACROS symbol, which indicates to the C program that C99 variadic macros are available.
d_casti32
From d_casti32.U:
This variable conditionally defines CASTI32, which indicates whether the C compiler can cast large floats to 32-bit ints.
d_castneg
From d_castneg.U:
This variable conditionally defines CASTNEG
, which indicates
whether the C compiler can cast negative float to unsigned.
d_charvspr
From d_vprintf.U:
This variable conditionally defines CHARVSPRINTF
if this system
has vsprintf returning type (char*). The trend seems to be to
declare it as "int vsprintf()".
d_chown
From d_chown.U:
This variable conditionally defines the HAS_CHOWN
symbol, which
indicates to the C program that the chown() routine is available.
d_chroot
From d_chroot.U:
This variable conditionally defines the HAS_CHROOT
symbol, which
indicates to the C program that the chroot() routine is available.
d_chsize
From d_chsize.U:
This variable conditionally defines the CHSIZE
symbol, which
indicates to the C program that the chsize() routine is available
to truncate files. You might need a -lx to get this routine.
d_class
From d_class.U:
This variable conditionally defines the HAS_CLASS
symbol, which
indicates to the C program that the class() routine is available.
d_clearenv
From d_clearenv.U:
This variable conditionally defines the HAS_CLEARENV
symbol, which
indicates to the C program that the clearenv () routine is available.
d_closedir
From d_closedir.U:
This variable conditionally defines HAS_CLOSEDIR
if closedir() is
available.
d_cmsghdr_s
From d_cmsghdr_s.U:
This variable conditionally defines the HAS_STRUCT_CMSGHDR
symbol,
which indicates that the struct cmsghdr is supported.
d_const
From d_const.U:
This variable conditionally defines the HASCONST
symbol, which
indicates to the C program that this C compiler knows about the
const type.
d_copysignl
From d_copysignl.U:
This variable conditionally defines the HAS_COPYSIGNL
symbol, which
indicates to the C program that the copysignl() routine is available.
If aintl is also present we can emulate modfl.
d_cplusplus
From d_cplusplus.U:
This variable conditionally defines the USE_CPLUSPLUS
symbol, which
indicates that a C++ compiler was used to compiled Perl and will be
used to compile extensions.
d_crypt
From d_crypt.U:
This variable conditionally defines the CRYPT
symbol, which
indicates to the C program that the crypt() routine is available
to encrypt passwords and the like.
d_crypt_r
From d_crypt_r.U:
This variable conditionally defines the HAS_CRYPT_R
symbol,
which indicates to the C program that the crypt_r()
routine is available.
d_csh
From d_csh.U:
This variable conditionally defines the CSH
symbol, which
indicates to the C program that the C-shell exists.
d_ctermid
From d_ctermid.U:
This variable conditionally defines CTERMID
if ctermid() is
available to generate filename for terminal.
d_ctermid_r
From d_ctermid_r.U:
This variable conditionally defines the HAS_CTERMID_R
symbol,
which indicates to the C program that the ctermid_r()
routine is available.
d_ctime64
From d_timefuncs64.U:
This variable conditionally defines the HAS_CTIME64 symbol, which indicates to the C program that the ctime64 () routine is available.
d_ctime_r
From d_ctime_r.U:
This variable conditionally defines the HAS_CTIME_R
symbol,
which indicates to the C program that the ctime_r()
routine is available.
d_cuserid
From d_cuserid.U:
This variable conditionally defines the HAS_CUSERID
symbol, which
indicates to the C program that the cuserid() routine is available
to get character login names.
d_dbl_dig
From d_dbl_dig.U:
This variable conditionally defines d_dbl_dig if this system's
header files provide DBL_DIG
, which is the number of significant
digits in a double precision number.
d_dbminitproto
From d_dbminitproto.U:
This variable conditionally defines the HAS_DBMINIT_PROTO
symbol,
which indicates to the C program that the system provides
a prototype for the dbminit() function. Otherwise, it is
up to the program to supply one.
d_difftime
From d_difftime.U:
This variable conditionally defines the HAS_DIFFTIME
symbol, which
indicates to the C program that the difftime() routine is available.
d_difftime64
From d_timefuncs64.U:
This variable conditionally defines the HAS_DIFFTIME64 symbol, which indicates to the C program that the difftime64 () routine is available.
d_dir_dd_fd
From d_dir_dd_fd.U:
This variable conditionally defines the HAS_DIR_DD_FD
symbol, which
indicates that the DIR
directory stream type contains a member
variable called dd_fd.
d_dirfd
From d_dirfd.U:
This variable conditionally defines the HAS_DIRFD
constant,
which indicates to the C program that dirfd() is available
to return the file descriptor of a directory stream.
d_dirnamlen
From i_dirent.U:
This variable conditionally defines DIRNAMLEN
, which indicates
to the C program that the length of directory entry names is
provided by a d_namelen field.
d_dlerror
From d_dlerror.U:
This variable conditionally defines the HAS_DLERROR
symbol, which
indicates to the C program that the dlerror() routine is available.
d_dlopen
From d_dlopen.U:
This variable conditionally defines the HAS_DLOPEN
symbol, which
indicates to the C program that the dlopen() routine is available.
d_dlsymun
From d_dlsymun.U:
This variable conditionally defines DLSYM_NEEDS_UNDERSCORE
, which
indicates that we need to prepend an underscore to the symbol
name before calling dlsym().
d_dosuid
From d_dosuid.U:
This variable conditionally defines the symbol DOSUID
, which
tells the C program that it should insert setuid emulation code
on hosts which have setuid #! scripts disabled.
d_drand48_r
From d_drand48_r.U:
This variable conditionally defines the HAS_DRAND48_R symbol, which indicates to the C program that the drand48_r() routine is available.
d_drand48proto
From d_drand48proto.U:
This variable conditionally defines the HAS_DRAND48_PROTO symbol, which indicates to the C program that the system provides a prototype for the drand48() function. Otherwise, it is up to the program to supply one.
d_dup2
From d_dup2.U:
This variable conditionally defines HAS_DUP2 if dup2() is available to duplicate file descriptors.
d_eaccess
From d_eaccess.U:
This variable conditionally defines the HAS_EACCESS
symbol, which
indicates to the C program that the eaccess() routine is available.
d_endgrent
From d_endgrent.U:
This variable conditionally defines the HAS_ENDGRENT
symbol, which
indicates to the C program that the endgrent() routine is available
for sequential access of the group database.
d_endgrent_r
From d_endgrent_r.U:
This variable conditionally defines the HAS_ENDGRENT_R
symbol,
which indicates to the C program that the endgrent_r()
routine is available.
d_endhent
From d_endhent.U:
This variable conditionally defines HAS_ENDHOSTENT
if endhostent() is
available to close whatever was being used for host queries.
d_endhostent_r
From d_endhostent_r.U:
This variable conditionally defines the HAS_ENDHOSTENT_R
symbol,
which indicates to the C program that the endhostent_r()
routine is available.
d_endnent
From d_endnent.U:
This variable conditionally defines HAS_ENDNETENT
if endnetent() is
available to close whatever was being used for network queries.
d_endnetent_r
From d_endnetent_r.U:
This variable conditionally defines the HAS_ENDNETENT_R
symbol,
which indicates to the C program that the endnetent_r()
routine is available.
d_endpent
From d_endpent.U:
This variable conditionally defines HAS_ENDPROTOENT
if endprotoent() is
available to close whatever was being used for protocol queries.
d_endprotoent_r
From d_endprotoent_r.U:
This variable conditionally defines the HAS_ENDPROTOENT_R
symbol,
which indicates to the C program that the endprotoent_r()
routine is available.
d_endpwent
From d_endpwent.U:
This variable conditionally defines the HAS_ENDPWENT
symbol, which
indicates to the C program that the endpwent() routine is available
for sequential access of the passwd database.
d_endpwent_r
From d_endpwent_r.U:
This variable conditionally defines the HAS_ENDPWENT_R
symbol,
which indicates to the C program that the endpwent_r()
routine is available.
d_endsent
From d_endsent.U:
This variable conditionally defines HAS_ENDSERVENT
if endservent() is
available to close whatever was being used for service queries.
d_endservent_r
From d_endservent_r.U:
This variable conditionally defines the HAS_ENDSERVENT_R
symbol,
which indicates to the C program that the endservent_r()
routine is available.
d_eofnblk
From nblock_io.U:
This variable conditionally defines EOF_NONBLOCK
if EOF
can be seen
when reading from a non-blocking I/O source.
d_eunice
From Guess.U:
This variable conditionally defines the symbols EUNICE
and VAX
, which
alerts the C program that it must deal with idiosyncrasies of VMS
.
d_faststdio
From d_faststdio.U:
This variable conditionally defines the HAS_FAST_STDIO
symbol,
which indicates to the C program that the "fast stdio" is available
to manipulate the stdio buffers directly.
d_fchdir
From d_fchdir.U:
This variable conditionally defines the HAS_FCHDIR
symbol, which
indicates to the C program that the fchdir() routine is available.
d_fchmod
From d_fchmod.U:
This variable conditionally defines the HAS_FCHMOD
symbol, which
indicates to the C program that the fchmod() routine is available
to change mode of opened files.
d_fchown
From d_fchown.U:
This variable conditionally defines the HAS_FCHOWN
symbol, which
indicates to the C program that the fchown() routine is available
to change ownership of opened files.
d_fcntl
From d_fcntl.U:
This variable conditionally defines the HAS_FCNTL
symbol, and indicates
whether the fcntl() function exists
d_fcntl_can_lock
From d_fcntl_can_lock.U:
This variable conditionally defines the FCNTL_CAN_LOCK
symbol
and indicates whether file locking with fcntl() works.
d_fd_macros
From d_fd_set.U:
This variable contains the eventual value of the HAS_FD_MACROS
symbol,
which indicates if your C compiler knows about the macros which
manipulate an fd_set.
d_fd_set
From d_fd_set.U:
This variable contains the eventual value of the HAS_FD_SET
symbol,
which indicates if your C compiler knows about the fd_set typedef.
d_fds_bits
From d_fd_set.U:
This variable contains the eventual value of the HAS_FDS_BITS
symbol,
which indicates if your fd_set typedef contains the fds_bits member.
If you have an fd_set typedef, but the dweebs who installed it did
a half-fast job and neglected to provide the macros to manipulate
an fd_set, HAS_FDS_BITS
will let us know how to fix the gaffe.
d_fgetpos
From d_fgetpos.U:
This variable conditionally defines HAS_FGETPOS
if fgetpos() is
available to get the file position indicator.
d_finite
From d_finite.U:
This variable conditionally defines the HAS_FINITE
symbol, which
indicates to the C program that the finite() routine is available.
d_finitel
From d_finitel.U:
This variable conditionally defines the HAS_FINITEL
symbol, which
indicates to the C program that the finitel() routine is available.
d_flexfnam
From d_flexfnam.U:
This variable conditionally defines the FLEXFILENAMES
symbol, which
indicates that the system supports filenames longer than 14 characters.
d_flock
From d_flock.U:
This variable conditionally defines HAS_FLOCK
if flock() is
available to do file locking.
d_flockproto
From d_flockproto.U:
This variable conditionally defines the HAS_FLOCK_PROTO
symbol,
which indicates to the C program that the system provides
a prototype for the flock() function. Otherwise, it is
up to the program to supply one.
d_fork
From d_fork.U:
This variable conditionally defines the HAS_FORK
symbol, which
indicates to the C program that the fork() routine is available.
d_fp_class
From d_fp_class.U:
This variable conditionally defines the HAS_FP_CLASS
symbol, which
indicates to the C program that the fp_class() routine is available.
d_fpathconf
From d_pathconf.U:
This variable conditionally defines the HAS_FPATHCONF
symbol, which
indicates to the C program that the pathconf() routine is available
to determine file-system related limits and options associated
with a given open file descriptor.
d_fpclass
From d_fpclass.U:
This variable conditionally defines the HAS_FPCLASS
symbol, which
indicates to the C program that the fpclass() routine is available.
d_fpclassify
From d_fpclassify.U:
This variable conditionally defines the HAS_FPCLASSIFY
symbol, which
indicates to the C program that the fpclassify() routine is available.
d_fpclassl
From d_fpclassl.U:
This variable conditionally defines the HAS_FPCLASSL
symbol, which
indicates to the C program that the fpclassl() routine is available.
d_fpos64_t
From d_fpos64_t.U:
This symbol will be defined if the C compiler supports fpos64_t.
d_frexpl
From d_frexpl.U:
This variable conditionally defines the HAS_FREXPL
symbol, which
indicates to the C program that the frexpl() routine is available.
d_fs_data_s
From d_fs_data_s.U:
This variable conditionally defines the HAS_STRUCT_FS_DATA
symbol,
which indicates that the struct fs_data is supported.
d_fseeko
From d_fseeko.U:
This variable conditionally defines the HAS_FSEEKO
symbol, which
indicates to the C program that the fseeko() routine is available.
d_fsetpos
From d_fsetpos.U:
This variable conditionally defines HAS_FSETPOS
if fsetpos() is
available to set the file position indicator.
d_fstatfs
From d_fstatfs.U:
This variable conditionally defines the HAS_FSTATFS
symbol, which
indicates to the C program that the fstatfs() routine is available.
d_fstatvfs
From d_statvfs.U:
This variable conditionally defines the HAS_FSTATVFS
symbol, which
indicates to the C program that the fstatvfs() routine is available.
d_fsync
From d_fsync.U:
This variable conditionally defines the HAS_FSYNC
symbol, which
indicates to the C program that the fsync() routine is available.
d_ftello
From d_ftello.U:
This variable conditionally defines the HAS_FTELLO
symbol, which
indicates to the C program that the ftello() routine is available.
d_ftime
From d_ftime.U:
This variable conditionally defines the HAS_FTIME
symbol, which indicates
that the ftime() routine exists. The ftime() routine is basically
a sub-second accuracy clock.
d_futimes
From d_futimes.U:
This variable conditionally defines the HAS_FUTIMES
symbol, which
indicates to the C program that the futimes() routine is available.
d_Gconvert
From d_gconvert.U:
This variable holds what Gconvert is defined as to convert
floating point numbers into strings. By default, Configure
sets this
macro to use the first of gconvert, gcvt, or sprintf
that pass sprintf-%g-like behavior tests. If perl is using
long doubles, the macro uses the first of the following
functions that pass Configure's tests: qgcvt, sprintf (if
Configure knows how to make sprintf format long doubles--see
sPRIgldbl), gconvert, gcvt, and sprintf (casting to double).
The gconvert_preference and gconvert_ld_preference variables
can be used to alter Configure's preferences, for doubles and
long doubles, respectively. If present, they contain a
space-separated list of one or more of the above function
names in the order they should be tried.
d_Gconvert may be set to override Configure with a platform- specific function. If this function expects a double, a different value may need to be set by the uselongdouble.cbu call-back unit so that long doubles can be formatted without loss of precision.
d_gdbm_ndbm_h_uses_prototypes
From i_ndbm.U:
This variable conditionally defines the NDBM_H_USES_PROTOTYPES
symbol,
which indicates that the gdbm-ndbm.h include file uses real ANSI
C
prototypes instead of K&R style function declarations. K&R style
declarations are unsupported in C++, so the include file requires
special handling when using a C++ compiler and this variable is
undefined. Consult the different d_*ndbm_h_uses_prototypes variables
to get the same information for alternative ndbm.h include files.
d_gdbmndbm_h_uses_prototypes
From i_ndbm.U:
This variable conditionally defines the NDBM_H_USES_PROTOTYPES
symbol,
which indicates that the gdbm/ndbm.h include file uses real ANSI
C
prototypes instead of K&R style function declarations. K&R style
declarations are unsupported in C++, so the include file requires
special handling when using a C++ compiler and this variable is
undefined. Consult the different d_*ndbm_h_uses_prototypes variables
to get the same information for alternative ndbm.h include files.
d_getaddrinfo
From d_getaddrinfo.U:
This variable conditionally defines the HAS_GETADDRINFO
symbol,
which indicates to the C program that the getaddrinfo() function
is available.
d_getcwd
From d_getcwd.U:
This variable conditionally defines the HAS_GETCWD
symbol, which
indicates to the C program that the getcwd() routine is available
to get the current working directory.
d_getespwnam
From d_getespwnam.U:
This variable conditionally defines HAS_GETESPWNAM
if getespwnam() is
available to retrieve enhanced (shadow) password entries by name.
d_getfsstat
From d_getfsstat.U:
This variable conditionally defines the HAS_GETFSSTAT
symbol, which
indicates to the C program that the getfsstat() routine is available.
d_getgrent
From d_getgrent.U:
This variable conditionally defines the HAS_GETGRENT
symbol, which
indicates to the C program that the getgrent() routine is available
for sequential access of the group database.
d_getgrent_r
From d_getgrent_r.U:
This variable conditionally defines the HAS_GETGRENT_R
symbol,
which indicates to the C program that the getgrent_r()
routine is available.
d_getgrgid_r
From d_getgrgid_r.U:
This variable conditionally defines the HAS_GETGRGID_R
symbol,
which indicates to the C program that the getgrgid_r()
routine is available.
d_getgrnam_r
From d_getgrnam_r.U:
This variable conditionally defines the HAS_GETGRNAM_R
symbol,
which indicates to the C program that the getgrnam_r()
routine is available.
d_getgrps
From d_getgrps.U:
This variable conditionally defines the HAS_GETGROUPS
symbol, which
indicates to the C program that the getgroups() routine is available
to get the list of process groups.
d_gethbyaddr
From d_gethbyad.U:
This variable conditionally defines the HAS_GETHOSTBYADDR
symbol, which
indicates to the C program that the gethostbyaddr() routine is available
to look up hosts by their IP
addresses.
d_gethbyname
From d_gethbynm.U:
This variable conditionally defines the HAS_GETHOSTBYNAME
symbol, which
indicates to the C program that the gethostbyname() routine is available
to look up host names in some data base or other.
d_gethent
From d_gethent.U:
This variable conditionally defines HAS_GETHOSTENT
if gethostent() is
available to look up host names in some data base or another.
d_gethname
From d_gethname.U:
This variable conditionally defines the HAS_GETHOSTNAME
symbol, which
indicates to the C program that the gethostname() routine may be
used to derive the host name.
d_gethostbyaddr_r
From d_gethostbyaddr_r.U:
This variable conditionally defines the HAS_GETHOSTBYADDR_R
symbol,
which indicates to the C program that the gethostbyaddr_r()
routine is available.
d_gethostbyname_r
From d_gethostbyname_r.U:
This variable conditionally defines the HAS_GETHOSTBYNAME_R
symbol,
which indicates to the C program that the gethostbyname_r()
routine is available.
d_gethostent_r
From d_gethostent_r.U:
This variable conditionally defines the HAS_GETHOSTENT_R
symbol,
which indicates to the C program that the gethostent_r()
routine is available.
d_gethostprotos
From d_gethostprotos.U:
This variable conditionally defines the HAS_GETHOST_PROTOS
symbol,
which indicates to the C program that <netdb.h> supplies
prototypes for the various gethost*() functions.
See also netdbtype.U for probing for various netdb types.
d_getitimer
From d_getitimer.U:
This variable conditionally defines the HAS_GETITIMER
symbol, which
indicates to the C program that the getitimer() routine is available.
d_getlogin
From d_getlogin.U:
This variable conditionally defines the HAS_GETLOGIN
symbol, which
indicates to the C program that the getlogin() routine is available
to get the login name.
d_getlogin_r
From d_getlogin_r.U:
This variable conditionally defines the HAS_GETLOGIN_R
symbol,
which indicates to the C program that the getlogin_r()
routine is available.
d_getmnt
From d_getmnt.U:
This variable conditionally defines the HAS_GETMNT
symbol, which
indicates to the C program that the getmnt() routine is available
to retrieve one or more mount info blocks by filename.
d_getmntent
From d_getmntent.U:
This variable conditionally defines the HAS_GETMNTENT
symbol, which
indicates to the C program that the getmntent() routine is available
to iterate through mounted files to get their mount info.
d_getnameinfo
From d_getnameinfo.U:
This variable conditionally defines the HAS_GETNAMEINFO
symbol,
which indicates to the C program that the getnameinfo() function
is available.
d_getnbyaddr
From d_getnbyad.U:
This variable conditionally defines the HAS_GETNETBYADDR
symbol, which
indicates to the C program that the getnetbyaddr() routine is available
to look up networks by their IP
addresses.
d_getnbyname
From d_getnbynm.U:
This variable conditionally defines the HAS_GETNETBYNAME
symbol, which
indicates to the C program that the getnetbyname() routine is available
to look up networks by their names.
d_getnent
From d_getnent.U:
This variable conditionally defines HAS_GETNETENT
if getnetent() is
available to look up network names in some data base or another.
d_getnetbyaddr_r
From d_getnetbyaddr_r.U:
This variable conditionally defines the HAS_GETNETBYADDR_R
symbol,
which indicates to the C program that the getnetbyaddr_r()
routine is available.
d_getnetbyname_r
From d_getnetbyname_r.U:
This variable conditionally defines the HAS_GETNETBYNAME_R
symbol,
which indicates to the C program that the getnetbyname_r()
routine is available.
d_getnetent_r
From d_getnetent_r.U:
This variable conditionally defines the HAS_GETNETENT_R
symbol,
which indicates to the C program that the getnetent_r()
routine is available.
d_getnetprotos
From d_getnetprotos.U:
This variable conditionally defines the HAS_GETNET_PROTOS
symbol,
which indicates to the C program that <netdb.h> supplies
prototypes for the various getnet*() functions.
See also netdbtype.U for probing for various netdb types.
d_getpagsz
From d_getpagsz.U:
This variable conditionally defines HAS_GETPAGESIZE
if getpagesize()
is available to get the system page size.
d_getpbyname
From d_getprotby.U:
This variable conditionally defines the HAS_GETPROTOBYNAME
symbol, which indicates to the C program that the
getprotobyname() routine is available to look up protocols
by their name.
d_getpbynumber
From d_getprotby.U:
This variable conditionally defines the HAS_GETPROTOBYNUMBER
symbol, which indicates to the C program that the
getprotobynumber() routine is available to look up protocols
by their number.
d_getpent
From d_getpent.U:
This variable conditionally defines HAS_GETPROTOENT
if getprotoent() is
available to look up protocols in some data base or another.
d_getpgid
From d_getpgid.U:
This variable conditionally defines the HAS_GETPGID
symbol, which
indicates to the C program that the getpgid(pid) function
is available to get the process group id.
d_getpgrp
From d_getpgrp.U:
This variable conditionally defines HAS_GETPGRP
if getpgrp() is
available to get the current process group.
d_getpgrp2
From d_getpgrp2.U:
This variable conditionally defines the HAS_GETPGRP2 symbol, which
indicates to the C program that the getpgrp2() (as in DG/UX
) routine
is available to get the current process group.
d_getppid
From d_getppid.U:
This variable conditionally defines the HAS_GETPPID
symbol, which
indicates to the C program that the getppid() routine is available
to get the parent process ID
.
d_getprior
From d_getprior.U:
This variable conditionally defines HAS_GETPRIORITY
if getpriority()
is available to get a process's priority.
d_getprotobyname_r
From d_getprotobyname_r.U:
This variable conditionally defines the HAS_GETPROTOBYNAME_R
symbol,
which indicates to the C program that the getprotobyname_r()
routine is available.
d_getprotobynumber_r
From d_getprotobynumber_r.U:
This variable conditionally defines the HAS_GETPROTOBYNUMBER_R
symbol,
which indicates to the C program that the getprotobynumber_r()
routine is available.
d_getprotoent_r
From d_getprotoent_r.U:
This variable conditionally defines the HAS_GETPROTOENT_R
symbol,
which indicates to the C program that the getprotoent_r()
routine is available.
d_getprotoprotos
From d_getprotoprotos.U:
This variable conditionally defines the HAS_GETPROTO_PROTOS
symbol,
which indicates to the C program that <netdb.h> supplies
prototypes for the various getproto*() functions.
See also netdbtype.U for probing for various netdb types.
d_getprpwnam
From d_getprpwnam.U:
This variable conditionally defines HAS_GETPRPWNAM
if getprpwnam() is
available to retrieve protected (shadow) password entries by name.
d_getpwent
From d_getpwent.U:
This variable conditionally defines the HAS_GETPWENT
symbol, which
indicates to the C program that the getpwent() routine is available
for sequential access of the passwd database.
d_getpwent_r
From d_getpwent_r.U:
This variable conditionally defines the HAS_GETPWENT_R
symbol,
which indicates to the C program that the getpwent_r()
routine is available.
d_getpwnam_r
From d_getpwnam_r.U:
This variable conditionally defines the HAS_GETPWNAM_R
symbol,
which indicates to the C program that the getpwnam_r()
routine is available.
d_getpwuid_r
From d_getpwuid_r.U:
This variable conditionally defines the HAS_GETPWUID_R
symbol,
which indicates to the C program that the getpwuid_r()
routine is available.
d_getsbyname
From d_getsrvby.U:
This variable conditionally defines the HAS_GETSERVBYNAME
symbol, which indicates to the C program that the
getservbyname() routine is available to look up services
by their name.
d_getsbyport
From d_getsrvby.U:
This variable conditionally defines the HAS_GETSERVBYPORT
symbol, which indicates to the C program that the
getservbyport() routine is available to look up services
by their port.
d_getsent
From d_getsent.U:
This variable conditionally defines HAS_GETSERVENT
if getservent() is
available to look up network services in some data base or another.
d_getservbyname_r
From d_getservbyname_r.U:
This variable conditionally defines the HAS_GETSERVBYNAME_R
symbol,
which indicates to the C program that the getservbyname_r()
routine is available.
d_getservbyport_r
From d_getservbyport_r.U:
This variable conditionally defines the HAS_GETSERVBYPORT_R
symbol,
which indicates to the C program that the getservbyport_r()
routine is available.
d_getservent_r
From d_getservent_r.U:
This variable conditionally defines the HAS_GETSERVENT_R
symbol,
which indicates to the C program that the getservent_r()
routine is available.
d_getservprotos
From d_getservprotos.U:
This variable conditionally defines the HAS_GETSERV_PROTOS
symbol,
which indicates to the C program that <netdb.h> supplies
prototypes for the various getserv*() functions.
See also netdbtype.U for probing for various netdb types.
d_getspnam
From d_getspnam.U:
This variable conditionally defines HAS_GETSPNAM
if getspnam() is
available to retrieve SysV shadow password entries by name.
d_getspnam_r
From d_getspnam_r.U:
This variable conditionally defines the HAS_GETSPNAM_R
symbol,
which indicates to the C program that the getspnam_r()
routine is available.
d_gettimeod
From d_ftime.U:
This variable conditionally defines the HAS_GETTIMEOFDAY
symbol, which
indicates that the gettimeofday() system call exists (to obtain a
sub-second accuracy clock). You should probably include <sys/resource.h>.
d_gmtime64
From d_timefuncs64.U:
This variable conditionally defines the HAS_GMTIME64 symbol, which indicates to the C program that the gmtime64 () routine is available.
d_gmtime_r
From d_gmtime_r.U:
This variable conditionally defines the HAS_GMTIME_R
symbol,
which indicates to the C program that the gmtime_r()
routine is available.
d_gnulibc
From d_gnulibc.U:
Defined if we're dealing with the GNU
C Library.
d_grpasswd
From i_grp.U:
This variable conditionally defines GRPASSWD
, which indicates
that struct group in <grp.h> contains gr_passwd.
d_hasmntopt
From d_hasmntopt.U:
This variable conditionally defines the HAS_HASMNTOPT
symbol, which
indicates to the C program that the hasmntopt() routine is available
to query the mount options of file systems.
d_htonl
From d_htonl.U:
This variable conditionally defines HAS_HTONL
if htonl() and its
friends are available to do network order byte swapping.
d_ilogbl
From d_ilogbl.U:
This variable conditionally defines the HAS_ILOGBL
symbol, which
indicates to the C program that the ilogbl() routine is available.
If scalbnl is also present we can emulate frexpl.
d_inc_version_list
From inc_version_list.U:
This variable conditionally defines PERL_INC_VERSION_LIST
.
It is set to undef when PERL_INC_VERSION_LIST
is empty.
d_index
From d_strchr.U:
This variable conditionally defines HAS_INDEX
if index() and
rindex() are available for string searching.
d_inetaton
From d_inetaton.U:
This variable conditionally defines the HAS_INET_ATON
symbol, which
indicates to the C program that the inet_aton() function is available
to parse IP
address dotted-quad
strings.
d_inetntop
From d_inetntop.U:
This variable conditionally defines the HAS_INETNTOP
symbol,
which indicates to the C program that the inet_ntop() function
is available.
d_inetpton
From d_inetpton.U:
This variable conditionally defines the HAS_INETPTON
symbol,
which indicates to the C program that the inet_pton() function
is available.
d_int64_t
From d_int64_t.U:
This symbol will be defined if the C compiler supports int64_t.
d_ip_mreq
From d_socket.U:
This variable conditionally defines the HAS_IP_MREQ
symbol, which
indicates the availability of a struct ip_mreq.
d_ip_mreq_source
From d_socket.U:
This variable conditionally defines the HAS_IP_MREQ_SOURCE
symbol,
which indicates the availability of a struct ip_mreq_source.
d_ipv6_mreq
From d_socket.U:
This variable conditionally defines the HAS_IPV6_MREQ symbol, which indicates the availability of a struct ipv6_mreq.
d_ipv6_mreq_source
From d_socket.U:
This variable conditionally defines the HAS_IPV6_MREQ_SOURCE symbol, which indicates the availability of a struct ipv6_mreq_source.
d_isascii
From d_isascii.U:
This variable conditionally defines the HAS_ISASCII
constant,
which indicates to the C program that isascii() is available.
d_isblank
From d_isblank.U:
This variable conditionally defines the HAS_ISBLANK
constant,
which indicates to the C program that isblank() is available.
d_isfinite
From d_isfinite.U:
This variable conditionally defines the HAS_ISFINITE
symbol, which
indicates to the C program that the isfinite() routine is available.
d_isinf
From d_isinf.U:
This variable conditionally defines the HAS_ISINF
symbol, which
indicates to the C program that the isinf() routine is available.
d_isnan
From d_isnan.U:
This variable conditionally defines the HAS_ISNAN
symbol, which
indicates to the C program that the isnan() routine is available.
d_isnanl
From d_isnanl.U:
This variable conditionally defines the HAS_ISNANL
symbol, which
indicates to the C program that the isnanl() routine is available.
d_killpg
From d_killpg.U:
This variable conditionally defines the HAS_KILLPG
symbol, which
indicates to the C program that the killpg() routine is available
to kill process groups.
d_lchown
From d_lchown.U:
This variable conditionally defines the HAS_LCHOWN
symbol, which
indicates to the C program that the lchown() routine is available
to operate on a symbolic link (instead of following the link).
d_ldbl_dig
From d_ldbl_dig.U:
This variable conditionally defines d_ldbl_dig if this system's
header files provide LDBL_DIG
, which is the number of significant
digits in a long double precision number.
d_libm_lib_version
From d_libm_lib_version.U:
This variable conditionally defines the LIBM_LIB_VERSION
symbol,
which indicates to the C program that math.h defines _LIB_VERSION
being available in libm
d_link
From d_link.U:
This variable conditionally defines HAS_LINK
if link() is
available to create hard links.
d_localtime64
From d_timefuncs64.U:
This variable conditionally defines the HAS_LOCALTIME64 symbol, which indicates to the C program that the localtime64 () routine is available.
d_localtime_r
From d_localtime_r.U:
This variable conditionally defines the HAS_LOCALTIME_R
symbol,
which indicates to the C program that the localtime_r()
routine is available.
d_localtime_r_needs_tzset
From d_localtime_r.U:
This variable conditionally defines the LOCALTIME_R_NEEDS_TZSET
symbol, which makes us call tzset before localtime_r()
d_locconv
From d_locconv.U:
This variable conditionally defines HAS_LOCALECONV
if localeconv() is
available for numeric and monetary formatting conventions.
d_lockf
From d_lockf.U:
This variable conditionally defines HAS_LOCKF
if lockf() is
available to do file locking.
d_longdbl
From d_longdbl.U:
This variable conditionally defines HAS_LONG_DOUBLE
if
the long double type is supported.
d_longlong
From d_longlong.U:
This variable conditionally defines HAS_LONG_LONG
if
the long long type is supported.
d_lseekproto
From d_lseekproto.U:
This variable conditionally defines the HAS_LSEEK_PROTO
symbol,
which indicates to the C program that the system provides
a prototype for the lseek() function. Otherwise, it is
up to the program to supply one.
d_lstat
From d_lstat.U:
This variable conditionally defines HAS_LSTAT
if lstat() is
available to do file stats on symbolic links.
d_madvise
From d_madvise.U:
This variable conditionally defines HAS_MADVISE
if madvise() is
available to map a file into memory.
d_malloc_good_size
From d_malloc_size.U:
This symbol, if defined, indicates that the malloc_good_size routine is available for use.
d_malloc_size
From d_malloc_size.U:
This symbol, if defined, indicates that the malloc_size routine is available for use.
d_mblen
From d_mblen.U:
This variable conditionally defines the HAS_MBLEN
symbol, which
indicates to the C program that the mblen() routine is available
to find the number of bytes in a multibye character.
d_mbstowcs
From d_mbstowcs.U:
This variable conditionally defines the HAS_MBSTOWCS
symbol, which
indicates to the C program that the mbstowcs() routine is available
to convert a multibyte string into a wide character string.
d_mbtowc
From d_mbtowc.U:
This variable conditionally defines the HAS_MBTOWC
symbol, which
indicates to the C program that the mbtowc() routine is available
to convert multibyte to a wide character.
d_memchr
From d_memchr.U:
This variable conditionally defines the HAS_MEMCHR
symbol, which
indicates to the C program that the memchr() routine is available
to locate characters within a C string.
d_memcmp
From d_memcmp.U:
This variable conditionally defines the HAS_MEMCMP
symbol, which
indicates to the C program that the memcmp() routine is available
to compare blocks of memory.
d_memcpy
From d_memcpy.U:
This variable conditionally defines the HAS_MEMCPY
symbol, which
indicates to the C program that the memcpy() routine is available
to copy blocks of memory.
d_memmove
From d_memmove.U:
This variable conditionally defines the HAS_MEMMOVE
symbol, which
indicates to the C program that the memmove() routine is available
to copy potentially overlapping blocks of memory.
d_memset
From d_memset.U:
This variable conditionally defines the HAS_MEMSET
symbol, which
indicates to the C program that the memset() routine is available
to set blocks of memory.
d_mkdir
From d_mkdir.U:
This variable conditionally defines the HAS_MKDIR
symbol, which
indicates to the C program that the mkdir() routine is available
to create directories..
d_mkdtemp
From d_mkdtemp.U:
This variable conditionally defines the HAS_MKDTEMP
symbol, which
indicates to the C program that the mkdtemp() routine is available
to exclusively create a uniquely named temporary directory.
d_mkfifo
From d_mkfifo.U:
This variable conditionally defines the HAS_MKFIFO
symbol, which
indicates to the C program that the mkfifo() routine is available.
d_mkstemp
From d_mkstemp.U:
This variable conditionally defines the HAS_MKSTEMP
symbol, which
indicates to the C program that the mkstemp() routine is available
to exclusively create and open a uniquely named temporary file.
d_mkstemps
From d_mkstemps.U:
This variable conditionally defines the HAS_MKSTEMPS
symbol, which
indicates to the C program that the mkstemps() routine is available
to exclusively create and open a uniquely named (with a suffix)
temporary file.
d_mktime
From d_mktime.U:
This variable conditionally defines the HAS_MKTIME
symbol, which
indicates to the C program that the mktime() routine is available.
d_mktime64
From d_timefuncs64.U:
This variable conditionally defines the HAS_MKTIME64 symbol, which indicates to the C program that the mktime64 () routine is available.
d_mmap
From d_mmap.U:
This variable conditionally defines HAS_MMAP
if mmap() is
available to map a file into memory.
d_modfl
From d_modfl.U:
This variable conditionally defines the HAS_MODFL
symbol, which
indicates to the C program that the modfl() routine is available.
d_modfl_pow32_bug
From d_modfl.U:
This variable conditionally defines the HAS_MODFL_POW32_BUG symbol, which indicates that modfl() is broken for long doubles >= pow(2, 32). For example from 4294967303.150000 one would get 4294967302.000000 and 1.150000. The bug has been seen in certain versions of glibc, release 2.2.2 is known to be okay.
d_modflproto
From d_modfl.U:
This symbol, if defined, indicates that the system provides a prototype for the modfl() function. Otherwise, it is up to the program to supply one. C99 says it should be long double modfl(long double, long double *);
d_mprotect
From d_mprotect.U:
This variable conditionally defines HAS_MPROTECT
if mprotect() is
available to modify the access protection of a memory mapped file.
d_msg
From d_msg.U:
This variable conditionally defines the HAS_MSG
symbol, which
indicates that the entire msg*(2) library is present.
d_msg_ctrunc
From d_socket.U:
This variable conditionally defines the HAS_MSG_CTRUNC
symbol,
which indicates that the MSG_CTRUNC
is available. #ifdef is
not enough because it may be an enum, glibc has been known to do this.
d_msg_dontroute
From d_socket.U:
This variable conditionally defines the HAS_MSG_DONTROUTE
symbol,
which indicates that the MSG_DONTROUTE
is available. #ifdef is
not enough because it may be an enum, glibc has been known to do this.
d_msg_oob
From d_socket.U:
This variable conditionally defines the HAS_MSG_OOB
symbol,
which indicates that the MSG_OOB
is available. #ifdef is
not enough because it may be an enum, glibc has been known to do this.
d_msg_peek
From d_socket.U:
This variable conditionally defines the HAS_MSG_PEEK
symbol,
which indicates that the MSG_PEEK
is available. #ifdef is
not enough because it may be an enum, glibc has been known to do this.
d_msg_proxy
From d_socket.U:
This variable conditionally defines the HAS_MSG_PROXY
symbol,
which indicates that the MSG_PROXY
is available. #ifdef is
not enough because it may be an enum, glibc has been known to do this.
d_msgctl
From d_msgctl.U:
This variable conditionally defines the HAS_MSGCTL
symbol, which
indicates to the C program that the msgctl() routine is available.
d_msgget
From d_msgget.U:
This variable conditionally defines the HAS_MSGGET
symbol, which
indicates to the C program that the msgget() routine is available.
d_msghdr_s
From d_msghdr_s.U:
This variable conditionally defines the HAS_STRUCT_MSGHDR
symbol,
which indicates that the struct msghdr is supported.
d_msgrcv
From d_msgrcv.U:
This variable conditionally defines the HAS_MSGRCV
symbol, which
indicates to the C program that the msgrcv() routine is available.
d_msgsnd
From d_msgsnd.U:
This variable conditionally defines the HAS_MSGSND
symbol, which
indicates to the C program that the msgsnd() routine is available.
d_msync
From d_msync.U:
This variable conditionally defines HAS_MSYNC
if msync() is
available to synchronize a mapped file.
d_munmap
From d_munmap.U:
This variable conditionally defines HAS_MUNMAP
if munmap() is
available to unmap a region mapped by mmap().
d_mymalloc
From mallocsrc.U:
This variable conditionally defines MYMALLOC
in case other parts
of the source want to take special action if MYMALLOC
is used.
This may include different sorts of profiling or error detection.
d_ndbm
From i_ndbm.U:
This variable conditionally defines the HAS_NDBM
symbol, which
indicates that both the ndbm.h include file and an appropriate ndbm
library exist. Consult the different i_*ndbm variables
to find out the actual include location. Sometimes, a system has the
header file but not the library. This variable will only be set if
the system has both.
d_ndbm_h_uses_prototypes
From i_ndbm.U:
This variable conditionally defines the NDBM_H_USES_PROTOTYPES
symbol,
which indicates that the ndbm.h include file uses real ANSI
C
prototypes instead of K&R style function declarations. K&R style
declarations are unsupported in C++, so the include file requires
special handling when using a C++ compiler and this variable is
undefined. Consult the different d_*ndbm_h_uses_prototypes variables
to get the same information for alternative ndbm.h include files.
d_nice
From d_nice.U:
This variable conditionally defines the HAS_NICE
symbol, which
indicates to the C program that the nice() routine is available.
d_nl_langinfo
From d_nl_langinfo.U:
This variable conditionally defines the HAS_NL_LANGINFO
symbol, which
indicates to the C program that the nl_langinfo() routine is available.
d_nv_preserves_uv
From perlxv.U:
This variable indicates whether a variable of type nvtype can preserve all the bits a variable of type uvtype.
d_nv_zero_is_allbits_zero
From perlxv.U:
This variable indicates whether a variable of type nvtype stores 0.0 in memory as all bits zero.
d_off64_t
From d_off64_t.U:
This symbol will be defined if the C compiler supports off64_t.
d_old_pthread_create_joinable
From d_pthrattrj.U:
This variable conditionally defines pthread_create_joinable.
undef if pthread.h defines PTHREAD_CREATE_JOINABLE
.
d_oldpthreads
From usethreads.U:
This variable conditionally defines the OLD_PTHREADS_API
symbol,
and indicates that Perl should be built to use the old
draft POSIX
threads API
. This is only potentially meaningful if
usethreads is set.
d_oldsock
From d_socket.U:
This variable conditionally defines the OLDSOCKET
symbol, which
indicates that the BSD
socket interface is based on 4.1c and not 4.2.
d_open3
From d_open3.U:
This variable conditionally defines the HAS_OPEN3 manifest constant, which indicates to the C program that the 3 argument version of the open(2) function is available.
d_pathconf
From d_pathconf.U:
This variable conditionally defines the HAS_PATHCONF
symbol, which
indicates to the C program that the pathconf() routine is available
to determine file-system related limits and options associated
with a given filename.
d_pause
From d_pause.U:
This variable conditionally defines the HAS_PAUSE
symbol, which
indicates to the C program that the pause() routine is available
to suspend a process until a signal is received.
d_perl_otherlibdirs
From otherlibdirs.U:
This variable conditionally defines PERL_OTHERLIBDIRS
, which
contains a colon-separated set of paths for the perl binary to
include in @INC
. See also otherlibdirs.
d_phostname
From d_gethname.U:
This variable conditionally defines the HAS_PHOSTNAME
symbol, which
contains the shell command which, when fed to popen(), may be
used to derive the host name.
d_pipe
From d_pipe.U:
This variable conditionally defines the HAS_PIPE
symbol, which
indicates to the C program that the pipe() routine is available
to create an inter-process channel.
d_poll
From d_poll.U:
This variable conditionally defines the HAS_POLL
symbol, which
indicates to the C program that the poll() routine is available
to poll active file descriptors.
d_portable
From d_portable.U:
This variable conditionally defines the PORTABLE
symbol, which
indicates to the C program that it should not assume that it is
running on the machine it was compiled on.
d_prctl
From d_prctl.U:
This variable conditionally defines the HAS_PRCTL
symbol, which
indicates to the C program that the prctl() routine is available.
d_prctl_set_name
From d_prctl.U:
This variable conditionally defines the HAS_PRCTL_SET_NAME
symbol,
which indicates to the C program that the prctl() routine supports
the PR_SET_NAME
option.
d_PRId64
From quadfio.U:
This variable conditionally defines the PERL_PRId64 symbol, which indicates that stdio has a symbol to print 64-bit decimal numbers.
d_PRIeldbl
From longdblfio.U:
This variable conditionally defines the PERL_PRIfldbl symbol, which indicates that stdio has a symbol to print long doubles.
d_PRIEUldbl
From longdblfio.U:
This variable conditionally defines the PERL_PRIfldbl symbol, which
indicates that stdio has a symbol to print long doubles.
The U
in the name is to separate this from d_PRIeldbl so that even
case-blind systems can see the difference.
d_PRIfldbl
From longdblfio.U:
This variable conditionally defines the PERL_PRIfldbl symbol, which indicates that stdio has a symbol to print long doubles.
d_PRIFUldbl
From longdblfio.U:
This variable conditionally defines the PERL_PRIfldbl symbol, which
indicates that stdio has a symbol to print long doubles.
The U
in the name is to separate this from d_PRIfldbl so that even
case-blind systems can see the difference.
d_PRIgldbl
From longdblfio.U:
This variable conditionally defines the PERL_PRIfldbl symbol, which indicates that stdio has a symbol to print long doubles.
d_PRIGUldbl
From longdblfio.U:
This variable conditionally defines the PERL_PRIfldbl symbol, which
indicates that stdio has a symbol to print long doubles.
The U
in the name is to separate this from d_PRIgldbl so that even
case-blind systems can see the difference.
d_PRIi64
From quadfio.U:
This variable conditionally defines the PERL_PRIi64 symbol, which indicates that stdio has a symbol to print 64-bit decimal numbers.
d_printf_format_null
From d_attribut.U:
This variable conditionally defines PRINTF_FORMAT_NULL_OK
, which
indicates the C compiler allows printf-like formats to be null.
d_PRIo64
From quadfio.U:
This variable conditionally defines the PERL_PRIo64 symbol, which indicates that stdio has a symbol to print 64-bit octal numbers.
d_PRIu64
From quadfio.U:
This variable conditionally defines the PERL_PRIu64 symbol, which indicates that stdio has a symbol to print 64-bit unsigned decimal numbers.
d_PRIx64
From quadfio.U:
This variable conditionally defines the PERL_PRIx64 symbol, which indicates that stdio has a symbol to print 64-bit hexadecimal numbers.
d_PRIXU64
From quadfio.U:
This variable conditionally defines the PERL_PRIXU64 symbol, which
indicates that stdio has a symbol to print 64-bit hExADECimAl numbers.
The U
in the name is to separate this from d_PRIx64 so that even
case-blind systems can see the difference.
d_procselfexe
From d_procselfexe.U:
Defined if $procselfexe is symlink to the absolute pathname of the executing program.
d_pseudofork
From d_vfork.U:
This variable conditionally defines the HAS_PSEUDOFORK
symbol,
which indicates that an emulation of the fork routine is available.
d_pthread_atfork
From d_pthread_atfork.U:
This variable conditionally defines the HAS_PTHREAD_ATFORK
symbol,
which indicates to the C program that the pthread_atfork()
routine is available.
d_pthread_attr_setscope
From d_pthread_attr_ss.U:
This variable conditionally defines HAS_PTHREAD_ATTR_SETSCOPE
if
pthread_attr_setscope() is available to set the contention scope
attribute of a thread attribute object.
d_pthread_yield
From d_pthread_y.U:
This variable conditionally defines the HAS_PTHREAD_YIELD
symbol if the pthread_yield routine is available to yield
the execution of the current thread.
d_pwage
From i_pwd.U:
This variable conditionally defines PWAGE
, which indicates
that struct passwd contains pw_age.
d_pwchange
From i_pwd.U:
This variable conditionally defines PWCHANGE
, which indicates
that struct passwd contains pw_change.
d_pwclass
From i_pwd.U:
This variable conditionally defines PWCLASS
, which indicates
that struct passwd contains pw_class.
d_pwcomment
From i_pwd.U:
This variable conditionally defines PWCOMMENT
, which indicates
that struct passwd contains pw_comment.
d_pwexpire
From i_pwd.U:
This variable conditionally defines PWEXPIRE
, which indicates
that struct passwd contains pw_expire.
d_pwgecos
From i_pwd.U:
This variable conditionally defines PWGECOS
, which indicates
that struct passwd contains pw_gecos.
d_pwpasswd
From i_pwd.U:
This variable conditionally defines PWPASSWD
, which indicates
that struct passwd contains pw_passwd.
d_pwquota
From i_pwd.U:
This variable conditionally defines PWQUOTA
, which indicates
that struct passwd contains pw_quota.
d_qgcvt
From d_qgcvt.U:
This variable conditionally defines the HAS_QGCVT
symbol, which
indicates to the C program that the qgcvt() routine is available.
d_quad
From quadtype.U:
This variable, if defined, tells that there's a 64-bit integer type, quadtype.
d_random_r
From d_random_r.U:
This variable conditionally defines the HAS_RANDOM_R
symbol,
which indicates to the C program that the random_r()
routine is available.
d_readdir
From d_readdir.U:
This variable conditionally defines HAS_READDIR
if readdir() is
available to read directory entries.
d_readdir64_r
From d_readdir64_r.U:
This variable conditionally defines the HAS_READDIR64_R symbol, which indicates to the C program that the readdir64_r() routine is available.
d_readdir_r
From d_readdir_r.U:
This variable conditionally defines the HAS_READDIR_R
symbol,
which indicates to the C program that the readdir_r()
routine is available.
d_readlink
From d_readlink.U:
This variable conditionally defines the HAS_READLINK
symbol, which
indicates to the C program that the readlink() routine is available
to read the value of a symbolic link.
d_readv
From d_readv.U:
This variable conditionally defines the HAS_READV
symbol, which
indicates to the C program that the readv() routine is available.
d_recvmsg
From d_recvmsg.U:
This variable conditionally defines the HAS_RECVMSG
symbol, which
indicates to the C program that the recvmsg() routine is available.
d_rename
From d_rename.U:
This variable conditionally defines the HAS_RENAME
symbol, which
indicates to the C program that the rename() routine is available
to rename files.
d_rewinddir
From d_readdir.U:
This variable conditionally defines HAS_REWINDDIR
if rewinddir() is
available.
d_rmdir
From d_rmdir.U:
This variable conditionally defines HAS_RMDIR
if rmdir() is
available to remove directories.
d_safebcpy
From d_safebcpy.U:
This variable conditionally defines the HAS_SAFE_BCOPY
symbol if
the bcopy() routine can do overlapping copies. Normally, you
should probably use memmove().
d_safemcpy
From d_safemcpy.U:
This variable conditionally defines the HAS_SAFE_MEMCPY
symbol if
the memcpy() routine can do overlapping copies.
For overlapping copies, memmove() should be used, if available.
d_sanemcmp
From d_sanemcmp.U:
This variable conditionally defines the HAS_SANE_MEMCMP
symbol if
the memcpy() routine is available and can be used to compare relative
magnitudes of chars with their high bits set.
d_sbrkproto
From d_sbrkproto.U:
This variable conditionally defines the HAS_SBRK_PROTO
symbol,
which indicates to the C program that the system provides
a prototype for the sbrk() function. Otherwise, it is
up to the program to supply one.
d_scalbnl
From d_scalbnl.U:
This variable conditionally defines the HAS_SCALBNL
symbol, which
indicates to the C program that the scalbnl() routine is available.
If ilogbl is also present we can emulate frexpl.
d_sched_yield
From d_pthread_y.U:
This variable conditionally defines the HAS_SCHED_YIELD
symbol if the sched_yield routine is available to yield
the execution of the current thread.
d_scm_rights
From d_socket.U:
This variable conditionally defines the HAS_SCM_RIGHTS
symbol,
which indicates that the SCM_RIGHTS
is available. #ifdef is
not enough because it may be an enum, glibc has been known to do this.
d_SCNfldbl
From longdblfio.U:
This variable conditionally defines the PERL_PRIfldbl symbol, which indicates that stdio has a symbol to scan long doubles.
d_seekdir
From d_readdir.U:
This variable conditionally defines HAS_SEEKDIR
if seekdir() is
available.
d_select
From d_select.U:
This variable conditionally defines HAS_SELECT
if select() is
available to select active file descriptors. A <sys/time.h>
inclusion may be necessary for the timeout field.
d_sem
From d_sem.U:
This variable conditionally defines the HAS_SEM
symbol, which
indicates that the entire sem*(2) library is present.
d_semctl
From d_semctl.U:
This variable conditionally defines the HAS_SEMCTL
symbol, which
indicates to the C program that the semctl() routine is available.
d_semctl_semid_ds
From d_union_semun.U:
This variable conditionally defines USE_SEMCTL_SEMID_DS
, which
indicates that struct semid_ds * is to be used for semctl IPC_STAT
.
d_semctl_semun
From d_union_semun.U:
This variable conditionally defines USE_SEMCTL_SEMUN
, which
indicates that union semun is to be used for semctl IPC_STAT
.
d_semget
From d_semget.U:
This variable conditionally defines the HAS_SEMGET
symbol, which
indicates to the C program that the semget() routine is available.
d_semop
From d_semop.U:
This variable conditionally defines the HAS_SEMOP
symbol, which
indicates to the C program that the semop() routine is available.
d_sendmsg
From d_sendmsg.U:
This variable conditionally defines the HAS_SENDMSG
symbol, which
indicates to the C program that the sendmsg() routine is available.
d_setegid
From d_setegid.U:
This variable conditionally defines the HAS_SETEGID
symbol, which
indicates to the C program that the setegid() routine is available
to change the effective gid of the current program.
d_seteuid
From d_seteuid.U:
This variable conditionally defines the HAS_SETEUID
symbol, which
indicates to the C program that the seteuid() routine is available
to change the effective uid of the current program.
d_setgrent
From d_setgrent.U:
This variable conditionally defines the HAS_SETGRENT
symbol, which
indicates to the C program that the setgrent() routine is available
for initializing sequential access to the group database.
d_setgrent_r
From d_setgrent_r.U:
This variable conditionally defines the HAS_SETGRENT_R
symbol,
which indicates to the C program that the setgrent_r()
routine is available.
d_setgrps
From d_setgrps.U:
This variable conditionally defines the HAS_SETGROUPS
symbol, which
indicates to the C program that the setgroups() routine is available
to set the list of process groups.
d_sethent
From d_sethent.U:
This variable conditionally defines HAS_SETHOSTENT
if sethostent() is
available.
d_sethostent_r
From d_sethostent_r.U:
This variable conditionally defines the HAS_SETHOSTENT_R
symbol,
which indicates to the C program that the sethostent_r()
routine is available.
d_setitimer
From d_setitimer.U:
This variable conditionally defines the HAS_SETITIMER
symbol, which
indicates to the C program that the setitimer() routine is available.
d_setlinebuf
From d_setlnbuf.U:
This variable conditionally defines the HAS_SETLINEBUF
symbol, which
indicates to the C program that the setlinebuf() routine is available
to change stderr or stdout from block-buffered or unbuffered to a
line-buffered mode.
d_setlocale
From d_setlocale.U:
This variable conditionally defines HAS_SETLOCALE
if setlocale() is
available to handle locale-specific ctype implementations.
d_setlocale_r
From d_setlocale_r.U:
This variable conditionally defines the HAS_SETLOCALE_R
symbol,
which indicates to the C program that the setlocale_r()
routine is available.
d_setnent
From d_setnent.U:
This variable conditionally defines HAS_SETNETENT
if setnetent() is
available.
d_setnetent_r
From d_setnetent_r.U:
This variable conditionally defines the HAS_SETNETENT_R
symbol,
which indicates to the C program that the setnetent_r()
routine is available.
d_setpent
From d_setpent.U:
This variable conditionally defines HAS_SETPROTOENT
if setprotoent() is
available.
d_setpgid
From d_setpgid.U:
This variable conditionally defines the HAS_SETPGID
symbol if the
setpgid(pid, gpid) function is available to set process group ID
.
d_setpgrp
From d_setpgrp.U:
This variable conditionally defines HAS_SETPGRP
if setpgrp() is
available to set the current process group.
d_setpgrp2
From d_setpgrp2.U:
This variable conditionally defines the HAS_SETPGRP2 symbol, which
indicates to the C program that the setpgrp2() (as in DG/UX
) routine
is available to set the current process group.
d_setprior
From d_setprior.U:
This variable conditionally defines HAS_SETPRIORITY
if setpriority()
is available to set a process's priority.
d_setproctitle
From d_setproctitle.U:
This variable conditionally defines the HAS_SETPROCTITLE
symbol,
which indicates to the C program that the setproctitle() routine
is available.
d_setprotoent_r
From d_setprotoent_r.U:
This variable conditionally defines the HAS_SETPROTOENT_R
symbol,
which indicates to the C program that the setprotoent_r()
routine is available.
d_setpwent
From d_setpwent.U:
This variable conditionally defines the HAS_SETPWENT
symbol, which
indicates to the C program that the setpwent() routine is available
for initializing sequential access to the passwd database.
d_setpwent_r
From d_setpwent_r.U:
This variable conditionally defines the HAS_SETPWENT_R
symbol,
which indicates to the C program that the setpwent_r()
routine is available.
d_setregid
From d_setregid.U:
This variable conditionally defines HAS_SETREGID
if setregid() is
available to change the real and effective gid of the current
process.
d_setresgid
From d_setregid.U:
This variable conditionally defines HAS_SETRESGID
if setresgid() is
available to change the real, effective and saved gid of the current
process.
d_setresuid
From d_setreuid.U:
This variable conditionally defines HAS_SETREUID
if setresuid() is
available to change the real, effective and saved uid of the current
process.
d_setreuid
From d_setreuid.U:
This variable conditionally defines HAS_SETREUID
if setreuid() is
available to change the real and effective uid of the current
process.
d_setrgid
From d_setrgid.U:
This variable conditionally defines the HAS_SETRGID
symbol, which
indicates to the C program that the setrgid() routine is available
to change the real gid of the current program.
d_setruid
From d_setruid.U:
This variable conditionally defines the HAS_SETRUID
symbol, which
indicates to the C program that the setruid() routine is available
to change the real uid of the current program.
d_setsent
From d_setsent.U:
This variable conditionally defines HAS_SETSERVENT
if setservent() is
available.
d_setservent_r
From d_setservent_r.U:
This variable conditionally defines the HAS_SETSERVENT_R
symbol,
which indicates to the C program that the setservent_r()
routine is available.
d_setsid
From d_setsid.U:
This variable conditionally defines HAS_SETSID
if setsid() is
available to set the process group ID
.
d_setvbuf
From d_setvbuf.U:
This variable conditionally defines the HAS_SETVBUF
symbol, which
indicates to the C program that the setvbuf() routine is available
to change buffering on an open stdio stream.
d_sfio
From d_sfio.U:
This variable conditionally defines the USE_SFIO
symbol,
and indicates whether sfio is available (and should be used).
d_shm
From d_shm.U:
This variable conditionally defines the HAS_SHM
symbol, which
indicates that the entire shm*(2) library is present.
d_shmat
From d_shmat.U:
This variable conditionally defines the HAS_SHMAT
symbol, which
indicates to the C program that the shmat() routine is available.
d_shmatprototype
From d_shmat.U:
This variable conditionally defines the HAS_SHMAT_PROTOTYPE
symbol, which indicates that sys/shm.h has a prototype for
shmat.
d_shmctl
From d_shmctl.U:
This variable conditionally defines the HAS_SHMCTL
symbol, which
indicates to the C program that the shmctl() routine is available.
d_shmdt
From d_shmdt.U:
This variable conditionally defines the HAS_SHMDT
symbol, which
indicates to the C program that the shmdt() routine is available.
d_shmget
From d_shmget.U:
This variable conditionally defines the HAS_SHMGET
symbol, which
indicates to the C program that the shmget() routine is available.
d_sigaction
From d_sigaction.U:
This variable conditionally defines the HAS_SIGACTION
symbol, which
indicates that the Vr4 sigaction() routine is available.
d_signbit
From d_signbit.U:
This variable conditionally defines the HAS_SIGNBIT
symbol, which
indicates to the C program that the signbit() routine is available
and safe to use with perl's intern NV
type.
d_sigprocmask
From d_sigprocmask.U:
This variable conditionally defines HAS_SIGPROCMASK
if sigprocmask() is available to examine or change the signal mask
of the calling process.
d_sigsetjmp
From d_sigsetjmp.U:
This variable conditionally defines the HAS_SIGSETJMP
symbol,
which indicates that the sigsetjmp() routine is available to
call setjmp() and optionally save the process's signal mask.
d_sin6_scope_id
From d_socket.U:
This variable conditionally defines the HAS_SIN6_SCOPE_ID symbol, which indicates that a struct sockaddr_in6 structure has the sin6_scope_id member.
d_sitearch
From sitearch.U:
This variable conditionally defines SITEARCH
to hold the pathname
of architecture-dependent library files for $package. If
$sitearch is the same as $archlib, then this is set to undef.
d_snprintf
From d_snprintf.U:
This variable conditionally defines the HAS_SNPRINTF
symbol, which
indicates to the C program that the snprintf () library function
is available.
d_sockaddr_in6
From d_socket.U:
This variable conditionally defines the HAS_SOCKADDR_IN6 symbol, which indicates the availability of a struct sockaddr_in6.
d_sockaddr_sa_len
From d_socket.U:
This variable conditionally defines the HAS_SOCKADDR_SA_LEN
symbol,
which indicates that a struct sockaddr structure has the sa_len
member.
d_sockatmark
From d_sockatmark.U:
This variable conditionally defines the HAS_SOCKATMARK
symbol, which
indicates to the C program that the sockatmark() routine is available.
d_sockatmarkproto
From d_sockatmarkproto.U:
This variable conditionally defines the HAS_SOCKATMARK_PROTO
symbol,
which indicates to the C program that the system provides
a prototype for the sockatmark() function. Otherwise, it is
up to the program to supply one.
d_socket
From d_socket.U:
This variable conditionally defines HAS_SOCKET
, which indicates
that the BSD
socket interface is supported.
d_socklen_t
From d_socklen_t.U:
This symbol will be defined if the C compiler supports socklen_t.
d_sockpair
From d_socket.U:
This variable conditionally defines the HAS_SOCKETPAIR
symbol, which
indicates that the BSD
socketpair() is supported.
d_socks5_init
From d_socks5_init.U:
This variable conditionally defines the HAS_SOCKS5_INIT symbol, which indicates to the C program that the socks5_init() routine is available.
d_sprintf_returns_strlen
From d_sprintf_len.U:
This variable defines whether sprintf returns the length of the string
(as per the ANSI
spec). Some C libraries retain compatibility with
pre-ANSI
C and return a pointer to the passed in buffer; for these
this variable will be undef.
d_sqrtl
From d_sqrtl.U:
This variable conditionally defines the HAS_SQRTL
symbol, which
indicates to the C program that the sqrtl() routine is available.
d_srand48_r
From d_srand48_r.U:
This variable conditionally defines the HAS_SRAND48_R symbol, which indicates to the C program that the srand48_r() routine is available.
d_srandom_r
From d_srandom_r.U:
This variable conditionally defines the HAS_SRANDOM_R
symbol,
which indicates to the C program that the srandom_r()
routine is available.
d_sresgproto
From d_sresgproto.U:
This variable conditionally defines the HAS_SETRESGID_PROTO
symbol,
which indicates to the C program that the system provides
a prototype for the setresgid() function. Otherwise, it is
up to the program to supply one.
d_sresuproto
From d_sresuproto.U:
This variable conditionally defines the HAS_SETRESUID_PROTO
symbol,
which indicates to the C program that the system provides
a prototype for the setresuid() function. Otherwise, it is
up to the program to supply one.
d_statblks
From d_statblks.U:
This variable conditionally defines USE_STAT_BLOCKS
if this system has a stat structure declaring
st_blksize and st_blocks.
d_statfs_f_flags
From d_statfs_f_flags.U:
This variable conditionally defines the HAS_STRUCT_STATFS_F_FLAGS
symbol, which indicates to struct statfs from has f_flags member.
This kind of struct statfs is coming from sys/mount.h (BSD
),
not from sys/statfs.h (SYSV
).
d_statfs_s
From d_statfs_s.U:
This variable conditionally defines the HAS_STRUCT_STATFS
symbol,
which indicates that the struct statfs is supported.
d_static_inline
From d_static_inline.U:
This variable conditionally defines the HAS_STATIC_INLINE
symbol,
which indicates that the C compiler supports C99-style static
inline. That is, the function can't be called from another
translation unit.
d_statvfs
From d_statvfs.U:
This variable conditionally defines the HAS_STATVFS
symbol, which
indicates to the C program that the statvfs() routine is available.
d_stdio_cnt_lval
From d_stdstdio.U:
This variable conditionally defines STDIO_CNT_LVALUE
if the
FILE_cnt
macro can be used as an lvalue.
d_stdio_ptr_lval
From d_stdstdio.U:
This variable conditionally defines STDIO_PTR_LVALUE
if the
FILE_ptr
macro can be used as an lvalue.
d_stdio_ptr_lval_nochange_cnt
From d_stdstdio.U:
This symbol is defined if using the FILE_ptr
macro as an lvalue
to increase the pointer by n leaves File_cnt(fp) unchanged.
d_stdio_ptr_lval_sets_cnt
From d_stdstdio.U:
This symbol is defined if using the FILE_ptr
macro as an lvalue
to increase the pointer by n has the side effect of decreasing the
value of File_cnt(fp) by n.
d_stdio_stream_array
From stdio_streams.U:
This variable tells whether there is an array holding the stdio streams.
d_stdiobase
From d_stdstdio.U:
This variable conditionally defines USE_STDIO_BASE
if this system
has a FILE
structure declaring a usable _base field (or equivalent)
in stdio.h.
d_stdstdio
From d_stdstdio.U:
This variable conditionally defines USE_STDIO_PTR
if this system
has a FILE
structure declaring usable _ptr and _cnt fields (or
equivalent) in stdio.h.
d_strchr
From d_strchr.U:
This variable conditionally defines HAS_STRCHR
if strchr() and
strrchr() are available for string searching.
d_strcoll
From d_strcoll.U:
This variable conditionally defines HAS_STRCOLL
if strcoll() is
available to compare strings using collating information.
d_strctcpy
From d_strctcpy.U:
This variable conditionally defines the USE_STRUCT_COPY
symbol, which
indicates to the C program that this C compiler knows how to copy
structures.
d_strerrm
From d_strerror.U:
This variable holds what Strerror is defined as to translate an error
code condition into an error message string. It could be strerror
or a more complex
macro emulating strerror with sys_errlist[], or the
unknown
string when both strerror and sys_errlist are missing.
d_strerror
From d_strerror.U:
This variable conditionally defines HAS_STRERROR
if strerror() is
available to translate error numbers to strings.
d_strerror_r
From d_strerror_r.U:
This variable conditionally defines the HAS_STRERROR_R
symbol,
which indicates to the C program that the strerror_r()
routine is available.
d_strftime
From d_strftime.U:
This variable conditionally defines the HAS_STRFTIME
symbol, which
indicates to the C program that the strftime() routine is available.
d_strlcat
From d_strlcat.U:
This variable conditionally defines the HAS_STRLCAT
symbol, which
indicates to the C program that the strlcat () routine is available.
d_strlcpy
From d_strlcpy.U:
This variable conditionally defines the HAS_STRLCPY
symbol, which
indicates to the C program that the strlcpy () routine is available.
d_strtod
From d_strtod.U:
This variable conditionally defines the HAS_STRTOD
symbol, which
indicates to the C program that the strtod() routine is available
to provide better numeric string conversion than atof().
d_strtol
From d_strtol.U:
This variable conditionally defines the HAS_STRTOL
symbol, which
indicates to the C program that the strtol() routine is available
to provide better numeric string conversion than atoi() and friends.
d_strtold
From d_strtold.U:
This variable conditionally defines the HAS_STRTOLD
symbol, which
indicates to the C program that the strtold() routine is available.
d_strtoll
From d_strtoll.U:
This variable conditionally defines the HAS_STRTOLL
symbol, which
indicates to the C program that the strtoll() routine is available.
d_strtoq
From d_strtoq.U:
This variable conditionally defines the HAS_STRTOQ
symbol, which
indicates to the C program that the strtoq() routine is available.
d_strtoul
From d_strtoul.U:
This variable conditionally defines the HAS_STRTOUL
symbol, which
indicates to the C program that the strtoul() routine is available
to provide conversion of strings to unsigned long.
d_strtoull
From d_strtoull.U:
This variable conditionally defines the HAS_STRTOULL
symbol, which
indicates to the C program that the strtoull() routine is available.
d_strtouq
From d_strtouq.U:
This variable conditionally defines the HAS_STRTOUQ
symbol, which
indicates to the C program that the strtouq() routine is available.
d_strxfrm
From d_strxfrm.U:
This variable conditionally defines HAS_STRXFRM
if strxfrm() is
available to transform strings.
d_suidsafe
From d_dosuid.U:
This variable conditionally defines SETUID_SCRIPTS_ARE_SECURE_NOW
if setuid scripts can be secure. This test looks in /dev/fd/.
d_symlink
From d_symlink.U:
This variable conditionally defines the HAS_SYMLINK
symbol, which
indicates to the C program that the symlink() routine is available
to create symbolic links.
d_syscall
From d_syscall.U:
This variable conditionally defines HAS_SYSCALL
if syscall() is
available call arbitrary system calls.
d_syscallproto
From d_syscallproto.U:
This variable conditionally defines the HAS_SYSCALL_PROTO
symbol,
which indicates to the C program that the system provides
a prototype for the syscall() function. Otherwise, it is
up to the program to supply one.
d_sysconf
From d_sysconf.U:
This variable conditionally defines the HAS_SYSCONF
symbol, which
indicates to the C program that the sysconf() routine is available
to determine system related limits and options.
d_sysernlst
From d_strerror.U:
This variable conditionally defines HAS_SYS_ERRNOLIST
if sys_errnolist[]
is available to translate error numbers to the symbolic name.
d_syserrlst
From d_strerror.U:
This variable conditionally defines HAS_SYS_ERRLIST
if sys_errlist[] is
available to translate error numbers to strings.
d_system
From d_system.U:
This variable conditionally defines HAS_SYSTEM
if system() is
available to issue a shell command.
d_tcgetpgrp
From d_tcgtpgrp.U:
This variable conditionally defines the HAS_TCGETPGRP
symbol, which
indicates to the C program that the tcgetpgrp() routine is available.
to get foreground process group ID
.
d_tcsetpgrp
From d_tcstpgrp.U:
This variable conditionally defines the HAS_TCSETPGRP
symbol, which
indicates to the C program that the tcsetpgrp() routine is available
to set foreground process group ID
.
d_telldir
From d_readdir.U:
This variable conditionally defines HAS_TELLDIR
if telldir() is
available.
d_telldirproto
From d_telldirproto.U:
This variable conditionally defines the HAS_TELLDIR_PROTO
symbol,
which indicates to the C program that the system provides
a prototype for the telldir() function. Otherwise, it is
up to the program to supply one.
d_time
From d_time.U:
This variable conditionally defines the HAS_TIME
symbol, which indicates
that the time() routine exists. The time() routine is normally
provided on UNIX
systems.
d_timegm
From d_timegm.U:
This variable conditionally defines the HAS_TIMEGM
symbol, which
indicates to the C program that the timegm () routine is available.
d_times
From d_times.U:
This variable conditionally defines the HAS_TIMES
symbol, which indicates
that the times() routine exists. The times() routine is normally
provided on UNIX
systems. You may have to include <sys/times.h>.
d_tm_tm_gmtoff
From i_time.U:
This variable conditionally defines HAS_TM_TM_GMTOFF
, which indicates
indicates to the C program that the struct tm has the tm_gmtoff field.
d_tm_tm_zone
From i_time.U:
This variable conditionally defines HAS_TM_TM_ZONE
, which indicates
indicates to the C program that the struct tm has the tm_zone field.
d_tmpnam_r
From d_tmpnam_r.U:
This variable conditionally defines the HAS_TMPNAM_R
symbol,
which indicates to the C program that the tmpnam_r()
routine is available.
d_truncate
From d_truncate.U:
This variable conditionally defines HAS_TRUNCATE
if truncate() is
available to truncate files.
d_ttyname_r
From d_ttyname_r.U:
This variable conditionally defines the HAS_TTYNAME_R
symbol,
which indicates to the C program that the ttyname_r()
routine is available.
d_tzname
From d_tzname.U:
This variable conditionally defines HAS_TZNAME
if tzname[] is
available to access timezone names.
d_u32align
From d_u32align.U:
This variable tells whether you must access character data through U32-aligned pointers.
d_ualarm
From d_ualarm.U:
This variable conditionally defines the HAS_UALARM
symbol, which
indicates to the C program that the ualarm() routine is available.
d_umask
From d_umask.U:
This variable conditionally defines the HAS_UMASK
symbol, which
indicates to the C program that the umask() routine is available.
to set and get the value of the file creation mask.
d_uname
From d_gethname.U:
This variable conditionally defines the HAS_UNAME
symbol, which
indicates to the C program that the uname() routine may be
used to derive the host name.
d_union_semun
From d_union_semun.U:
This variable conditionally defines HAS_UNION_SEMUN
if the
union semun is defined by including <sys/sem.h>.
d_unordered
From d_unordered.U:
This variable conditionally defines the HAS_UNORDERED
symbol, which
indicates to the C program that the unordered() routine is available.
d_unsetenv
From d_unsetenv.U:
This variable conditionally defines the HAS_UNSETENV
symbol, which
indicates to the C program that the unsetenv () routine is available.
d_usleep
From d_usleep.U:
This variable conditionally defines HAS_USLEEP
if usleep() is
available to do high granularity sleeps.
d_usleepproto
From d_usleepproto.U:
This variable conditionally defines the HAS_USLEEP_PROTO
symbol,
which indicates to the C program that the system provides
a prototype for the usleep() function. Otherwise, it is
up to the program to supply one.
d_ustat
From d_ustat.U:
This variable conditionally defines HAS_USTAT
if ustat() is
available to query file system statistics by dev_t.
d_vendorarch
From vendorarch.U:
This variable conditionally defined PERL_VENDORARCH
.
d_vendorbin
From vendorbin.U:
This variable conditionally defines PERL_VENDORBIN
.
d_vendorlib
From vendorlib.U:
This variable conditionally defines PERL_VENDORLIB
.
d_vendorscript
From vendorscript.U:
This variable conditionally defines PERL_VENDORSCRIPT
.
d_vfork
From d_vfork.U:
This variable conditionally defines the HAS_VFORK
symbol, which
indicates the vfork() routine is available.
d_void_closedir
From d_closedir.U:
This variable conditionally defines VOID_CLOSEDIR
if closedir()
does not return a value.
d_voidsig
From d_voidsig.U:
This variable conditionally defines VOIDSIG
if this system
declares "void (*signal(...))()" in signal.h. The old way was to
declare it as "int (*signal(...))()".
d_voidtty
From i_sysioctl.U:
This variable conditionally defines USE_IOCNOTTY
to indicate that the
ioctl() call with TIOCNOTTY
should be used to void tty association.
Otherwise (on USG
probably), it is enough to close the standard file
descriptors and do a setpgrp().
d_volatile
From d_volatile.U:
This variable conditionally defines the HASVOLATILE
symbol, which
indicates to the C program that this C compiler knows about the
volatile declaration.
d_vprintf
From d_vprintf.U:
This variable conditionally defines the HAS_VPRINTF
symbol, which
indicates to the C program that the vprintf() routine is available
to printf with a pointer to an argument list.
d_vsnprintf
From d_snprintf.U:
This variable conditionally defines the HAS_VSNPRINTF
symbol, which
indicates to the C program that the vsnprintf () library function
is available.
d_wait4
From d_wait4.U:
This variable conditionally defines the HAS_WAIT4 symbol, which indicates the wait4() routine is available.
d_waitpid
From d_waitpid.U:
This variable conditionally defines HAS_WAITPID
if waitpid() is
available to wait for child process.
d_wcstombs
From d_wcstombs.U:
This variable conditionally defines the HAS_WCSTOMBS
symbol, which
indicates to the C program that the wcstombs() routine is available
to convert wide character strings to multibyte strings.
d_wctomb
From d_wctomb.U:
This variable conditionally defines the HAS_WCTOMB
symbol, which
indicates to the C program that the wctomb() routine is available
to convert a wide character to a multibyte.
d_writev
From d_writev.U:
This variable conditionally defines the HAS_WRITEV
symbol, which
indicates to the C program that the writev() routine is available.
d_xenix
From Guess.U:
This variable conditionally defines the symbol XENIX
, which alerts
the C program that it runs under Xenix.
date
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the date program. After Configure runs,
the value is reset to a plain date
and is not useful.
db_hashtype
From i_db.U:
This variable contains the type of the hash structure element
in the <db.h> header file. In older versions of DB
, it was
int, while in newer ones it is u_int32_t.
db_prefixtype
From i_db.U:
This variable contains the type of the prefix structure element
in the <db.h> header file. In older versions of DB
, it was
int, while in newer ones it is size_t.
db_version_major
From i_db.U:
This variable contains the major version number of
Berkeley DB
found in the <db.h> header file.
db_version_minor
From i_db.U:
This variable contains the minor version number of
Berkeley DB
found in the <db.h> header file.
For DB
version 1 this is always 0.
db_version_patch
From i_db.U:
This variable contains the patch version number of
Berkeley DB
found in the <db.h> header file.
For DB
version 1 this is always 0.
defvoidused
From voidflags.U:
This variable contains the default value of the VOIDUSED
symbol (15).
direntrytype
From i_dirent.U:
This symbol is set to struct direct
or struct dirent
depending on
whether dirent is available or not. You should use this pseudo type to
portably declare your directory entries.
dlext
From dlext.U:
This variable contains the extension that is to be used for the dynamically loaded modules that perl generates.
dlsrc
From dlsrc.U:
This variable contains the name of the dynamic loading file that will be used with the package.
doublesize
From doublesize.U:
This variable contains the value of the DOUBLESIZE
symbol, which
indicates to the C program how many bytes there are in a double.
drand01
From randfunc.U:
Indicates the macro to be used to generate normalized
random numbers. Uses randfunc, often divided by
(double) (((unsigned long) 1 << randbits)) in order to
normalize the result.
In C programs, the macro Drand01
is mapped to drand01.
drand48_r_proto
From d_drand48_r.U:
This variable encodes the prototype of drand48_r.
It is zero if d_drand48_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_drand48_r
is defined.
dtrace
From usedtrace.U:
This variable holds the location of the dtrace executable.
dynamic_ext
From Extensions.U:
This variable holds a list of XS
extension files we want to
link dynamically into the package. It is used by Makefile.
eagain
From nblock_io.U:
This variable bears the symbolic errno code set by read() when no data is present on the file and non-blocking I/O was enabled (otherwise, read() blocks naturally).
ebcdic
From ebcdic.U:
This variable conditionally defines EBCDIC
if this
system uses EBCDIC
encoding.
echo
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the echo program. After Configure runs,
the value is reset to a plain echo
and is not useful.
egrep
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the egrep program. After Configure runs,
the value is reset to a plain egrep
and is not useful.
emacs
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
endgrent_r_proto
From d_endgrent_r.U:
This variable encodes the prototype of endgrent_r.
It is zero if d_endgrent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_endgrent_r
is defined.
endhostent_r_proto
From d_endhostent_r.U:
This variable encodes the prototype of endhostent_r.
It is zero if d_endhostent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_endhostent_r
is defined.
endnetent_r_proto
From d_endnetent_r.U:
This variable encodes the prototype of endnetent_r.
It is zero if d_endnetent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_endnetent_r
is defined.
endprotoent_r_proto
From d_endprotoent_r.U:
This variable encodes the prototype of endprotoent_r.
It is zero if d_endprotoent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_endprotoent_r
is defined.
endpwent_r_proto
From d_endpwent_r.U:
This variable encodes the prototype of endpwent_r.
It is zero if d_endpwent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_endpwent_r
is defined.
endservent_r_proto
From d_endservent_r.U:
This variable encodes the prototype of endservent_r.
It is zero if d_endservent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_endservent_r
is defined.
eunicefix
From Init.U:
When running under Eunice this variable contains a command which will convert a shell script to the proper form of text file for it to be executable by the shell. On other systems it is a no-op.
exe_ext
From Unix.U:
This is an old synonym for _exe.
expr
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the expr program. After Configure runs,
the value is reset to a plain expr
and is not useful.
extensions
From Extensions.U:
This variable holds a list of all extension files (both XS
and
non-xs linked into the package. It is propagated to Config.pm
and is typically used to test whether a particular extension
is available.
extern_C
From Csym.U:
ANSI
C requires extern
where C++ requires 'extern C
'. This
variable can be used in Configure to do the right thing.
extras
From Extras.U:
This variable holds a list of extra modules to install.
fflushall
From fflushall.U:
This symbol, if defined, tells that to flush all pending stdio output one must loop through all the stdio file handles stored in an array and fflush them. Note that if fflushNULL is defined, fflushall will not even be probed for and will be left undefined.
fflushNULL
From fflushall.U:
This symbol, if defined, tells that fflush(NULL
) does flush
all pending stdio output.
find
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
firstmakefile
From Unix.U:
This variable defines the first file searched by make. On unix, it is makefile (then Makefile). On case-insensitive systems, it might be something else. This is only used to deal with convoluted make depend tricks.
flex
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
fpossize
From fpossize.U:
This variable contains the size of a fpostype in bytes.
fpostype
From fpostype.U:
This variable defines Fpos_t to be something like fpos_t, long, uint, or whatever type is used to declare file positions in libc.
freetype
From mallocsrc.U:
This variable contains the return type of free(). It is usually void, but occasionally int.
from
From Cross.U:
This variable contains the command used by Configure
to copy files from the target host. Useful and available
only during Perl build.
The string :
if not cross-compiling.
full_ar
From Loc_ar.U:
This variable contains the full pathname to ar
, whether or
not the user has specified portability
. This is only used
in the Makefile.SH.
full_csh
From d_csh.U:
This variable contains the full pathname to csh
, whether or
not the user has specified portability
. This is only used
in the compiled C program, and we assume that all systems which
can share this executable will have the same full pathname to
csh.
full_sed
From Loc_sed.U:
This variable contains the full pathname to sed
, whether or
not the user has specified portability
. This is only used
in the compiled C program, and we assume that all systems which
can share this executable will have the same full pathname to
sed.
gccansipedantic
From gccvers.U:
If GNU
cc (gcc) is used, this variable will enable (if set) the
-ansi and -pedantic ccflags for building core files (through
cflags script). (See Porting/pumpkin.pod for full description).
gccosandvers
From gccvers.U:
If GNU
cc (gcc) is used, this variable holds the operating system
and version used to compile gcc. It is set to '' if not gcc,
or if nothing useful can be parsed as the os version.
gccversion
From gccvers.U:
If GNU
cc (gcc) is used, this variable holds 1
or 2
to
indicate whether the compiler is version 1 or 2. This is used in
setting some of the default cflags. It is set to '' if not gcc.
getgrent_r_proto
From d_getgrent_r.U:
This variable encodes the prototype of getgrent_r.
It is zero if d_getgrent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getgrent_r
is defined.
getgrgid_r_proto
From d_getgrgid_r.U:
This variable encodes the prototype of getgrgid_r.
It is zero if d_getgrgid_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getgrgid_r
is defined.
getgrnam_r_proto
From d_getgrnam_r.U:
This variable encodes the prototype of getgrnam_r.
It is zero if d_getgrnam_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getgrnam_r
is defined.
gethostbyaddr_r_proto
From d_gethostbyaddr_r.U:
This variable encodes the prototype of gethostbyaddr_r.
It is zero if d_gethostbyaddr_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_gethostbyaddr_r
is defined.
gethostbyname_r_proto
From d_gethostbyname_r.U:
This variable encodes the prototype of gethostbyname_r.
It is zero if d_gethostbyname_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_gethostbyname_r
is defined.
gethostent_r_proto
From d_gethostent_r.U:
This variable encodes the prototype of gethostent_r.
It is zero if d_gethostent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_gethostent_r
is defined.
getlogin_r_proto
From d_getlogin_r.U:
This variable encodes the prototype of getlogin_r.
It is zero if d_getlogin_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getlogin_r
is defined.
getnetbyaddr_r_proto
From d_getnetbyaddr_r.U:
This variable encodes the prototype of getnetbyaddr_r.
It is zero if d_getnetbyaddr_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getnetbyaddr_r
is defined.
getnetbyname_r_proto
From d_getnetbyname_r.U:
This variable encodes the prototype of getnetbyname_r.
It is zero if d_getnetbyname_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getnetbyname_r
is defined.
getnetent_r_proto
From d_getnetent_r.U:
This variable encodes the prototype of getnetent_r.
It is zero if d_getnetent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getnetent_r
is defined.
getprotobyname_r_proto
From d_getprotobyname_r.U:
This variable encodes the prototype of getprotobyname_r.
It is zero if d_getprotobyname_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getprotobyname_r
is defined.
getprotobynumber_r_proto
From d_getprotobynumber_r.U:
This variable encodes the prototype of getprotobynumber_r.
It is zero if d_getprotobynumber_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getprotobynumber_r
is defined.
getprotoent_r_proto
From d_getprotoent_r.U:
This variable encodes the prototype of getprotoent_r.
It is zero if d_getprotoent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getprotoent_r
is defined.
getpwent_r_proto
From d_getpwent_r.U:
This variable encodes the prototype of getpwent_r.
It is zero if d_getpwent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getpwent_r
is defined.
getpwnam_r_proto
From d_getpwnam_r.U:
This variable encodes the prototype of getpwnam_r.
It is zero if d_getpwnam_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getpwnam_r
is defined.
getpwuid_r_proto
From d_getpwuid_r.U:
This variable encodes the prototype of getpwuid_r.
It is zero if d_getpwuid_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getpwuid_r
is defined.
getservbyname_r_proto
From d_getservbyname_r.U:
This variable encodes the prototype of getservbyname_r.
It is zero if d_getservbyname_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getservbyname_r
is defined.
getservbyport_r_proto
From d_getservbyport_r.U:
This variable encodes the prototype of getservbyport_r.
It is zero if d_getservbyport_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getservbyport_r
is defined.
getservent_r_proto
From d_getservent_r.U:
This variable encodes the prototype of getservent_r.
It is zero if d_getservent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getservent_r
is defined.
getspnam_r_proto
From d_getspnam_r.U:
This variable encodes the prototype of getspnam_r.
It is zero if d_getspnam_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_getspnam_r
is defined.
gidformat
From gidf.U:
This variable contains the format string used for printing a Gid_t.
gidsign
From gidsign.U:
This variable contains the signedness of a gidtype. 1 for unsigned, -1 for signed.
gidsize
From gidsize.U:
This variable contains the size of a gidtype in bytes.
gidtype
From gidtype.U:
This variable defines Gid_t to be something like gid_t, int, ushort, or whatever type is used to declare the return type of getgid(). Typically, it is the type of group ids in the kernel.
glibpth
From libpth.U:
This variable holds the general path (space-separated) used to find libraries. It may contain directories that do not exist on this platform, libpth is the cleaned-up version.
gmake
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the gmake program. After Configure runs,
the value is reset to a plain gmake
and is not useful.
gmtime_r_proto
From d_gmtime_r.U:
This variable encodes the prototype of gmtime_r.
It is zero if d_gmtime_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_gmtime_r
is defined.
gnulibc_version
From d_gnulibc.U:
This variable contains the version number of the GNU
C library.
It is usually something like 2.2.5. It is a plain '' if this
is not the GNU
C library, or if the version is unknown.
grep
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the grep program. After Configure runs,
the value is reset to a plain grep and is not useful.
groupcat
From nis.U:
This variable contains a command that produces the text of the
/etc/group file. This is normally "cat /etc/group", but can be
"ypcat group" when NIS
is used.
On some systems, such as os390, there may be no equivalent
command, in which case this variable is unset.
groupstype
From groupstype.U:
This variable defines Groups_t to be something like gid_t, int, ushort, or whatever type is used for the second argument to getgroups() and setgroups(). Usually, this is the same as gidtype (gid_t), but sometimes it isn't.
gzip
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the gzip program. After Configure runs,
the value is reset to a plain gzip
and is not useful.
h_fcntl
From h_fcntl.U:
This is variable gets set in various places to tell i_fcntl that <fcntl.h> should be included.
h_sysfile
From h_sysfile.U:
This is variable gets set in various places to tell i_sys_file that <sys/file.h> should be included.
hint
From Oldconfig.U:
Gives the type of hints used for previous answers. May be one of
default
, recommended
or previous
.
hostcat
From nis.U:
This variable contains a command that produces the text of the
/etc/hosts file. This is normally "cat /etc/hosts", but can be
"ypcat hosts" when NIS
is used.
On some systems, such as os390, there may be no equivalent
command, in which case this variable is unset.
html1dir
From html1dir.U:
This variable contains the name of the directory in which html source pages are to be put. This directory is for pages that describe whole programs, not libraries or modules. It is intended to correspond roughly to section 1 of the Unix manuals.
html1direxp
From html1dir.U:
This variable is the same as the html1dir variable, but is filename expanded at configuration time, for convenient use in makefiles.
html3dir
From html3dir.U:
This variable contains the name of the directory in which html source pages are to be put. This directory is for pages that describe libraries or modules. It is intended to correspond roughly to section 3 of the Unix manuals.
html3direxp
From html3dir.U:
This variable is the same as the html3dir variable, but is filename expanded at configuration time, for convenient use in makefiles.
i16size
From perlxv.U:
This variable is the size of an I16 in bytes.
i16type
From perlxv.U:
This variable contains the C type used for Perl's I16.
i32size
From perlxv.U:
This variable is the size of an I32 in bytes.
i32type
From perlxv.U:
This variable contains the C type used for Perl's I32.
i64size
From perlxv.U:
This variable is the size of an I64 in bytes.
i64type
From perlxv.U:
This variable contains the C type used for Perl's I64.
i8size
From perlxv.U:
This variable is the size of an I8 in bytes.
i8type
From perlxv.U:
This variable contains the C type used for Perl's I8.
i_arpainet
From i_arpainet.U:
This variable conditionally defines the I_ARPA_INET
symbol,
and indicates whether a C program should include <arpa/inet.h>.
i_assert
From i_assert.U:
This variable conditionally defines the I_ASSERT
symbol, which
indicates to the C program that <assert.h> exists and could be
included.
i_bsdioctl
From i_sysioctl.U:
This variable conditionally defines the I_SYS_BSDIOCTL
symbol, which
indicates to the C program that <sys/bsdioctl.h> exists and should
be included.
i_crypt
From i_crypt.U:
This variable conditionally defines the I_CRYPT
symbol, and indicates
whether a C program should include <crypt.h>.
i_db
From i_db.U:
This variable conditionally defines the I_DB
symbol, and indicates
whether a C program may include Berkeley's DB
include file <db.h>.
i_dbm
From i_dbm.U:
This variable conditionally defines the I_DBM
symbol, which
indicates to the C program that <dbm.h> exists and should
be included.
i_dirent
From i_dirent.U:
This variable conditionally defines I_DIRENT
, which indicates
to the C program that it should include <dirent.h>.
i_dld
From i_dld.U:
This variable conditionally defines the I_DLD
symbol, which
indicates to the C program that <dld.h> (GNU
dynamic loading)
exists and should be included.
i_dlfcn
From i_dlfcn.U:
This variable conditionally defines the I_DLFCN
symbol, which
indicates to the C program that <dlfcn.h> exists and should
be included.
i_fcntl
From i_fcntl.U:
This variable controls the value of I_FCNTL
(which tells
the C program to include <fcntl.h>).
i_float
From i_float.U:
This variable conditionally defines the I_FLOAT
symbol, and indicates
whether a C program may include <float.h> to get symbols like DBL_MAX
or DBL_MIN
, i.e. machine dependent floating point values.
i_fp
From i_fp.U:
This variable conditionally defines the I_FP
symbol, and indicates
whether a C program should include <fp.h>.
i_fp_class
From i_fp_class.U:
This variable conditionally defines the I_FP_CLASS
symbol, and indicates
whether a C program should include <fp_class.h>.
i_gdbm
From i_gdbm.U:
This variable conditionally defines the I_GDBM
symbol, which
indicates to the C program that <gdbm.h> exists and should
be included.
i_gdbm_ndbm
From i_ndbm.U:
This variable conditionally defines the I_GDBM_NDBM
symbol, which
indicates to the C program that <gdbm-ndbm.h> exists and should
be included. This is the location of the ndbm.h compatibility file
in Debian 4.0.
i_gdbmndbm
From i_ndbm.U:
This variable conditionally defines the I_GDBMNDBM
symbol, which
indicates to the C program that <gdbm/ndbm.h> exists and should
be included. This was the location of the ndbm.h compatibility file
in RedHat 7.1.
i_grp
From i_grp.U:
This variable conditionally defines the I_GRP
symbol, and indicates
whether a C program should include <grp.h>.
i_ieeefp
From i_ieeefp.U:
This variable conditionally defines the I_IEEEFP
symbol, and indicates
whether a C program should include <ieeefp.h>.
i_inttypes
From i_inttypes.U:
This variable conditionally defines the I_INTTYPES
symbol,
and indicates whether a C program should include <inttypes.h>.
i_langinfo
From i_langinfo.U:
This variable conditionally defines the I_LANGINFO
symbol,
and indicates whether a C program should include <langinfo.h>.
i_libutil
From i_libutil.U:
This variable conditionally defines the I_LIBUTIL
symbol, and indicates
whether a C program should include <libutil.h>.
i_limits
From i_limits.U:
This variable conditionally defines the I_LIMITS
symbol, and indicates
whether a C program may include <limits.h> to get symbols like WORD_BIT
and friends.
i_locale
From i_locale.U:
This variable conditionally defines the I_LOCALE
symbol,
and indicates whether a C program should include <locale.h>.
i_machcthr
From i_machcthr.U:
This variable conditionally defines the I_MACH_CTHREADS
symbol,
and indicates whether a C program should include <mach/cthreads.h>.
i_malloc
From i_malloc.U:
This variable conditionally defines the I_MALLOC
symbol, and indicates
whether a C program should include <malloc.h>.
i_mallocmalloc
From i_mallocmalloc.U:
This variable conditionally defines the I_MALLOCMALLOC
symbol,
and indicates whether a C program should include <malloc/malloc.h>.
i_math
From i_math.U:
This variable conditionally defines the I_MATH
symbol, and indicates
whether a C program may include <math.h>.
i_memory
From i_memory.U:
This variable conditionally defines the I_MEMORY
symbol, and indicates
whether a C program should include <memory.h>.
i_mntent
From i_mntent.U:
This variable conditionally defines the I_MNTENT
symbol, and indicates
whether a C program should include <mntent.h>.
i_ndbm
From i_ndbm.U:
This variable conditionally defines the I_NDBM
symbol, which
indicates to the C program that <ndbm.h> exists and should
be included.
i_netdb
From i_netdb.U:
This variable conditionally defines the I_NETDB
symbol, and indicates
whether a C program should include <netdb.h>.
i_neterrno
From i_neterrno.U:
This variable conditionally defines the I_NET_ERRNO
symbol, which
indicates to the C program that <net/errno.h> exists and should
be included.
i_netinettcp
From i_netinettcp.U:
This variable conditionally defines the I_NETINET_TCP
symbol,
and indicates whether a C program should include <netinet/tcp.h>.
i_niin
From i_niin.U:
This variable conditionally defines I_NETINET_IN
, which indicates
to the C program that it should include <netinet/in.h>. Otherwise,
you may try <sys/in.h>.
i_poll
From i_poll.U:
This variable conditionally defines the I_POLL
symbol, and indicates
whether a C program should include <poll.h>.
i_prot
From i_prot.U:
This variable conditionally defines the I_PROT
symbol, and indicates
whether a C program should include <prot.h>.
i_pthread
From i_pthread.U:
This variable conditionally defines the I_PTHREAD
symbol,
and indicates whether a C program should include <pthread.h>.
i_pwd
From i_pwd.U:
This variable conditionally defines I_PWD
, which indicates
to the C program that it should include <pwd.h>.
i_rpcsvcdbm
From i_dbm.U:
This variable conditionally defines the I_RPCSVC_DBM
symbol, which
indicates to the C program that <rpcsvc/dbm.h> exists and should
be included. Some System V systems might need this instead of <dbm.h>.
i_sfio
From i_sfio.U:
This variable conditionally defines the I_SFIO
symbol,
and indicates whether a C program should include <sfio.h>.
i_sgtty
From i_termio.U:
This variable conditionally defines the I_SGTTY
symbol, which
indicates to the C program that it should include <sgtty.h> rather
than <termio.h>.
i_shadow
From i_shadow.U:
This variable conditionally defines the I_SHADOW
symbol, and indicates
whether a C program should include <shadow.h>.
i_socks
From i_socks.U:
This variable conditionally defines the I_SOCKS
symbol, and indicates
whether a C program should include <socks.h>.
i_stdarg
From i_varhdr.U:
This variable conditionally defines the I_STDARG
symbol, which
indicates to the C program that <stdarg.h> exists and should
be included.
i_stdbool
From i_stdbool.U:
This variable conditionally defines the I_STDBOOL
symbol, which
indicates to the C program that <stdbool.h> exists and should
be included.
i_stddef
From i_stddef.U:
This variable conditionally defines the I_STDDEF
symbol, which
indicates to the C program that <stddef.h> exists and should
be included.
i_stdlib
From i_stdlib.U:
This variable conditionally defines the I_STDLIB
symbol, which
indicates to the C program that <stdlib.h> exists and should
be included.
i_string
From i_string.U:
This variable conditionally defines the I_STRING
symbol, which
indicates that <string.h> should be included rather than <strings.h>.
i_sunmath
From i_sunmath.U:
This variable conditionally defines the I_SUNMATH
symbol, and indicates
whether a C program should include <sunmath.h>.
i_sysaccess
From i_sysaccess.U:
This variable conditionally defines the I_SYS_ACCESS
symbol,
and indicates whether a C program should include <sys/access.h>.
i_sysdir
From i_sysdir.U:
This variable conditionally defines the I_SYS_DIR
symbol, and indicates
whether a C program should include <sys/dir.h>.
i_sysfile
From i_sysfile.U:
This variable conditionally defines the I_SYS_FILE
symbol, and indicates
whether a C program should include <sys/file.h> to get R_OK
and friends.
i_sysfilio
From i_sysioctl.U:
This variable conditionally defines the I_SYS_FILIO
symbol, which
indicates to the C program that <sys/filio.h> exists and should
be included in preference to <sys/ioctl.h>.
i_sysin
From i_niin.U:
This variable conditionally defines I_SYS_IN
, which indicates
to the C program that it should include <sys/in.h> instead of
<netinet/in.h>.
i_sysioctl
From i_sysioctl.U:
This variable conditionally defines the I_SYS_IOCTL
symbol, which
indicates to the C program that <sys/ioctl.h> exists and should
be included.
i_syslog
From i_syslog.U:
This variable conditionally defines the I_SYSLOG
symbol,
and indicates whether a C program should include <syslog.h>.
i_sysmman
From i_sysmman.U:
This variable conditionally defines the I_SYS_MMAN
symbol, and
indicates whether a C program should include <sys/mman.h>.
i_sysmode
From i_sysmode.U:
This variable conditionally defines the I_SYSMODE
symbol,
and indicates whether a C program should include <sys/mode.h>.
i_sysmount
From i_sysmount.U:
This variable conditionally defines the I_SYSMOUNT
symbol,
and indicates whether a C program should include <sys/mount.h>.
i_sysndir
From i_sysndir.U:
This variable conditionally defines the I_SYS_NDIR
symbol, and indicates
whether a C program should include <sys/ndir.h>.
i_sysparam
From i_sysparam.U:
This variable conditionally defines the I_SYS_PARAM
symbol, and indicates
whether a C program should include <sys/param.h>.
i_syspoll
From i_syspoll.U:
This variable conditionally defines the I_SYS_POLL
symbol, which
indicates to the C program that it should include <sys/poll.h>.
i_sysresrc
From i_sysresrc.U:
This variable conditionally defines the I_SYS_RESOURCE
symbol,
and indicates whether a C program should include <sys/resource.h>.
i_syssecrt
From i_syssecrt.U:
This variable conditionally defines the I_SYS_SECURITY
symbol,
and indicates whether a C program should include <sys/security.h>.
i_sysselct
From i_sysselct.U:
This variable conditionally defines I_SYS_SELECT
, which indicates
to the C program that it should include <sys/select.h> in order to
get the definition of struct timeval.
i_syssockio
From i_sysioctl.U:
This variable conditionally defines I_SYS_SOCKIO
to indicate to the
C program that socket ioctl codes may be found in <sys/sockio.h>
instead of <sys/ioctl.h>.
i_sysstat
From i_sysstat.U:
This variable conditionally defines the I_SYS_STAT
symbol,
and indicates whether a C program should include <sys/stat.h>.
i_sysstatfs
From i_sysstatfs.U:
This variable conditionally defines the I_SYSSTATFS
symbol,
and indicates whether a C program should include <sys/statfs.h>.
i_sysstatvfs
From i_sysstatvfs.U:
This variable conditionally defines the I_SYSSTATVFS
symbol,
and indicates whether a C program should include <sys/statvfs.h>.
i_systime
From i_time.U:
This variable conditionally defines I_SYS_TIME
, which indicates
to the C program that it should include <sys/time.h>.
i_systimek
From i_time.U:
This variable conditionally defines I_SYS_TIME_KERNEL
, which
indicates to the C program that it should include <sys/time.h>
with KERNEL
defined.
i_systimes
From i_systimes.U:
This variable conditionally defines the I_SYS_TIMES
symbol, and indicates
whether a C program should include <sys/times.h>.
i_systypes
From i_systypes.U:
This variable conditionally defines the I_SYS_TYPES
symbol,
and indicates whether a C program should include <sys/types.h>.
i_sysuio
From i_sysuio.U:
This variable conditionally defines the I_SYSUIO
symbol, and indicates
whether a C program should include <sys/uio.h>.
i_sysun
From i_sysun.U:
This variable conditionally defines I_SYS_UN
, which indicates
to the C program that it should include <sys/un.h> to get UNIX
domain socket definitions.
i_sysutsname
From i_sysutsname.U:
This variable conditionally defines the I_SYSUTSNAME
symbol,
and indicates whether a C program should include <sys/utsname.h>.
i_sysvfs
From i_sysvfs.U:
This variable conditionally defines the I_SYSVFS
symbol,
and indicates whether a C program should include <sys/vfs.h>.
i_syswait
From i_syswait.U:
This variable conditionally defines I_SYS_WAIT
, which indicates
to the C program that it should include <sys/wait.h>.
i_termio
From i_termio.U:
This variable conditionally defines the I_TERMIO
symbol, which
indicates to the C program that it should include <termio.h> rather
than <sgtty.h>.
i_termios
From i_termio.U:
This variable conditionally defines the I_TERMIOS
symbol, which
indicates to the C program that the POSIX
<termios.h> file is
to be included.
i_time
From i_time.U:
This variable conditionally defines I_TIME
, which indicates
to the C program that it should include <time.h>.
i_unistd
From i_unistd.U:
This variable conditionally defines the I_UNISTD
symbol, and indicates
whether a C program should include <unistd.h>.
i_ustat
From i_ustat.U:
This variable conditionally defines the I_USTAT
symbol, and indicates
whether a C program should include <ustat.h>.
i_utime
From i_utime.U:
This variable conditionally defines the I_UTIME
symbol, and indicates
whether a C program should include <utime.h>.
i_values
From i_values.U:
This variable conditionally defines the I_VALUES
symbol, and indicates
whether a C program may include <values.h> to get symbols like MAXLONG
and friends.
i_varargs
From i_varhdr.U:
This variable conditionally defines I_VARARGS
, which indicates
to the C program that it should include <varargs.h>.
i_varhdr
From i_varhdr.U:
Contains the name of the header to be included to get va_dcl definition. Typically one of varargs.h or stdarg.h.
i_vfork
From i_vfork.U:
This variable conditionally defines the I_VFORK
symbol, and indicates
whether a C program should include vfork.h.
ignore_versioned_solibs
From libs.U:
This variable should be non-empty if non-versioned shared libraries (libfoo.so.x.y) are to be ignored (because they cannot be linked against).
inc_version_list
From inc_version_list.U:
This variable specifies the list of subdirectories in over
which perl.c:incpush() and lib/lib.pm will automatically
search when adding directories to @INC
. The elements in
the list are separated by spaces. This is only useful
if you have a perl library directory tree structured like the
default one. See INSTALL
for how this works. The versioned
site_perl directory was introduced in 5.005, so that is the
lowest possible value.
This list includes architecture-dependent directories back to version $api_versionstring (e.g. 5.5.640) and architecture-independent directories all the way back to 5.005.
inc_version_list_init
From inc_version_list.U:
This variable holds the same list as inc_version_list, but
each item is enclosed in double quotes and separated by commas,
suitable for use in the PERL_INC_VERSION_LIST
initialization.
incpath
From usrinc.U:
This variable must precede the normal include path to get the right one, as in $incpath/usr/include or $incpath/usr/lib. Value can be "" or /bsd43 on mips.
inews
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
initialinstalllocation
From bin.U:
When userelocatableinc is true, this variable holds the location that make install should copy the perl binary to, with all the run-time relocatable paths calculated from this at install time. When used, it is initialized to the original value of binexp, and then binexp is set to .../, as the other binaries are found relative to the perl binary.
installarchlib
From archlib.U:
This variable is really the same as archlibexp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
installbin
From bin.U:
This variable is the same as binexp unless AFS
is running in which case
the user is explicitly prompted for it. This variable should always
be used in your makefiles for maximum portability.
installhtml1dir
From html1dir.U:
This variable is really the same as html1direxp, unless you are using a different installprefix. For extra portability, you should only use this variable within your makefiles.
installhtml3dir
From html3dir.U:
This variable is really the same as html3direxp, unless you are using a different installprefix. For extra portability, you should only use this variable within your makefiles.
installman1dir
From man1dir.U:
This variable is really the same as man1direxp, unless you are using
AFS
in which case it points to the read/write location whereas
man1direxp only points to the read-only access location. For extra
portability, you should only use this variable within your makefiles.
installman3dir
From man3dir.U:
This variable is really the same as man3direxp, unless you are using
AFS
in which case it points to the read/write location whereas
man3direxp only points to the read-only access location. For extra
portability, you should only use this variable within your makefiles.
installprefix
From installprefix.U:
This variable holds the name of the directory below which "make install" will install the package. For most users, this is the same as prefix. However, it is useful for installing the software into a different (usually temporary) location after which it can be bundled up and moved somehow to the final location specified by prefix.
installprefixexp
From installprefix.U:
This variable holds the full absolute path of installprefix with all ~-expansion done.
installprivlib
From privlib.U:
This variable is really the same as privlibexp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
installscript
From scriptdir.U:
This variable is usually the same as scriptdirexp, unless you are on
a system running AFS
, in which case they may differ slightly. You
should always use this variable within your makefiles for portability.
installsitearch
From sitearch.U:
This variable is really the same as sitearchexp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
installsitebin
From sitebin.U:
This variable is usually the same as sitebinexp, unless you are on
a system running AFS
, in which case they may differ slightly. You
should always use this variable within your makefiles for portability.
installsitehtml1dir
From sitehtml1dir.U:
This variable is really the same as sitehtml1direxp, unless you are using
AFS
in which case it points to the read/write location whereas
html1direxp only points to the read-only access location. For extra
portability, you should only use this variable within your makefiles.
installsitehtml3dir
From sitehtml3dir.U:
This variable is really the same as sitehtml3direxp, unless you are using
AFS
in which case it points to the read/write location whereas
html3direxp only points to the read-only access location. For extra
portability, you should only use this variable within your makefiles.
installsitelib
From sitelib.U:
This variable is really the same as sitelibexp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
installsiteman1dir
From siteman1dir.U:
This variable is really the same as siteman1direxp, unless you are using
AFS
in which case it points to the read/write location whereas
man1direxp only points to the read-only access location. For extra
portability, you should only use this variable within your makefiles.
installsiteman3dir
From siteman3dir.U:
This variable is really the same as siteman3direxp, unless you are using
AFS
in which case it points to the read/write location whereas
man3direxp only points to the read-only access location. For extra
portability, you should only use this variable within your makefiles.
installsitescript
From sitescript.U:
This variable is usually the same as sitescriptexp, unless you are on
a system running AFS
, in which case they may differ slightly. You
should always use this variable within your makefiles for portability.
installstyle
From installstyle.U:
This variable describes the style
of the perl installation.
This is intended to be useful for tools that need to
manipulate entire perl distributions. Perl itself doesn't use
this to find its libraries -- the library directories are
stored directly in Config.pm. Currently, there are only two
styles: lib
and lib/perl5. The default library locations
(e.g. privlib, sitelib) are either $prefix/lib or
$prefix/lib/perl5. The former is useful if $prefix is a
directory dedicated to perl (e.g. /opt/perl), while the latter
is useful if $prefix is shared by many packages, e.g. if
$prefix=/usr/local.
Unfortunately, while this style
variable is used to set
defaults for all three directory hierarchies (core, vendor, and
site), there is no guarantee that the same style is actually
appropriate for all those directories. For example, $prefix
might be /opt/perl, but $siteprefix might be /usr/local.
(Perhaps, in retrospect, the lib
style should never have been
supported, but it did seem like a nice idea at the time.)
The situation is even less clear for tools such as MakeMaker
that can be used to install additional modules into
non-standard places. For example, if a user intends to install
a module into a private directory (perhaps by setting PREFIX
on
the Makefile.PL command line), then there is no reason to
assume that the Configure-time $installstyle setting will be
relevant for that PREFIX
.
This may later be extended to include other information, so be careful with pattern-matching on the results.
For compatibility with perl5.005 and earlier, the default
setting is based on whether or not $prefix contains the string
perl
.
installusrbinperl
From instubperl.U:
This variable tells whether Perl should be installed also as /usr/bin/perl in addition to $installbin/perl
installvendorarch
From vendorarch.U:
This variable is really the same as vendorarchexp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
installvendorbin
From vendorbin.U:
This variable is really the same as vendorbinexp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
installvendorhtml1dir
From vendorhtml1dir.U:
This variable is really the same as vendorhtml1direxp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
installvendorhtml3dir
From vendorhtml3dir.U:
This variable is really the same as vendorhtml3direxp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
installvendorlib
From vendorlib.U:
This variable is really the same as vendorlibexp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
installvendorman1dir
From vendorman1dir.U:
This variable is really the same as vendorman1direxp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
installvendorman3dir
From vendorman3dir.U:
This variable is really the same as vendorman3direxp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
installvendorscript
From vendorscript.U:
This variable is really the same as vendorscriptexp but may differ on
those systems using AFS
. For extra portability, only this variable
should be used in makefiles.
intsize
From intsize.U:
This variable contains the value of the INTSIZE
symbol, which
indicates to the C program how many bytes there are in an int.
issymlink
From issymlink.U:
This variable holds the test command to test for a symbolic link
(if they are supported). Typical values include test -h
and
test -L
.
ivdformat
From perlxvf.U:
This variable contains the format string used for printing
a Perl IV
as a signed decimal integer.
ivsize
From perlxv.U:
This variable is the size of an IV
in bytes.
ivtype
From perlxv.U:
This variable contains the C type used for Perl's IV
.
known_extensions
From Extensions.U:
This variable holds a list of all XS
extensions included in
the package.
ksh
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
ld
From dlsrc.U:
This variable indicates the program to be used to link
libraries for dynamic loading. On some systems, it is ld
.
On ELF
systems, it should be $cc. Mostly, we'll try to respect
the hint file setting.
ld_can_script
From dlsrc.U:
This variable shows if the loader accepts scripts in the form of
-Wl,--version-script=ld.script. This is currently only supported
for GNU
ld on ELF
in dynamic loading builds.
lddlflags
From dlsrc.U:
This variable contains any special flags that might need to be
passed to $ld to create a shared library suitable for dynamic
loading. It is up to the makefile to use it. For hpux, it
should be -b
. For sunos 4.1, it is empty.
ldflags
From ccflags.U:
This variable contains any additional C loader flags desired by the user. It is up to the Makefile to use this.
ldflags_uselargefiles
From uselfs.U:
This variable contains the loader flags needed by large file builds and added to ldflags by hints files.
ldlibpthname
From libperl.U:
This variable holds the name of the shared library
search path, often LD_LIBRARY_PATH
. To get an empty
string, the hints file must set this to none
.
less
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the less program. After Configure runs,
the value is reset to a plain less
and is not useful.
lib_ext
From Unix.U:
This is an old synonym for _a.
libc
From libc.U:
This variable contains the location of the C library.
libperl
From libperl.U:
The perl executable is obtained by linking perlmain.c with libperl, any static extensions (usually just DynaLoader), and any other libraries needed on this system. libperl is usually libperl.a, but can also be libperl.so.xxx if the user wishes to build a perl executable with a shared library.
libpth
From libpth.U:
This variable holds the general path (space-separated) used to find libraries. It is intended to be used by other units.
libs
From libs.U:
This variable holds the additional libraries we want to use. It is up to the Makefile to deal with it. The list can be empty.
libsdirs
From libs.U:
This variable holds the directory names aka dirnames of the libraries we found and accepted, duplicates are removed.
libsfiles
From libs.U:
This variable holds the filenames aka basenames of the libraries we found and accepted.
libsfound
From libs.U:
This variable holds the full pathnames of the libraries we found and accepted.
libspath
From libs.U:
This variable holds the directory names probed for libraries.
libswanted
From Myinit.U:
This variable holds a list of all the libraries we want to search. The order is chosen to pick up the c library ahead of ucb or bsd libraries for SVR4.
libswanted_uselargefiles
From uselfs.U:
This variable contains the libraries needed by large file builds
and added to ldflags by hints files. It is a space separated list
of the library names without the lib
prefix or any suffix, just
like libswanted..
line
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
lint
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
lkflags
From ccflags.U:
This variable contains any additional C partial linker flags desired by the user. It is up to the Makefile to use this.
ln
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the ln program. After Configure runs,
the value is reset to a plain ln
and is not useful.
lns
From lns.U:
This variable holds the name of the command to make
symbolic links (if they are supported). It can be used
in the Makefile. It is either ln -s
or ln
localtime_r_proto
From d_localtime_r.U:
This variable encodes the prototype of localtime_r.
It is zero if d_localtime_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_localtime_r
is defined.
locincpth
From ccflags.U:
This variable contains a list of additional directories to be
searched by the compiler. The appropriate -I
directives will
be added to ccflags. This is intended to simplify setting
local directories from the Configure command line.
It's not much, but it parallels the loclibpth stuff in libpth.U.
loclibpth
From libpth.U:
This variable holds the paths (space-separated) used to find local libraries. It is prepended to libpth, and is intended to be easily set from the command line.
longdblsize
From d_longdbl.U:
This variable contains the value of the LONG_DOUBLESIZE
symbol, which
indicates to the C program how many bytes there are in a long double,
if this system supports long doubles.
longlongsize
From d_longlong.U:
This variable contains the value of the LONGLONGSIZE
symbol, which
indicates to the C program how many bytes there are in a long long,
if this system supports long long.
longsize
From intsize.U:
This variable contains the value of the LONGSIZE
symbol, which
indicates to the C program how many bytes there are in a long.
lp
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
lpr
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
ls
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the ls program. After Configure runs,
the value is reset to a plain ls
and is not useful.
lseeksize
From lseektype.U:
This variable defines lseektype to be something like off_t, long, or whatever type is used to declare lseek offset's type in the kernel (which also appears to be lseek's return type).
lseektype
From lseektype.U:
This variable defines lseektype to be something like off_t, long, or whatever type is used to declare lseek offset's type in the kernel (which also appears to be lseek's return type).
mad
From mad.U:
This variable indicates that the Misc Attribute Definition code is to be compiled.
madlyh
From mad.U:
If the Misc Attribute Decoration is to be compiled, this variable is set to the name of the extra header files to be used, else it is ''
madlyobj
From mad.U:
If the Misc Attribute Decoration is to be compiled, this variable is set to the name of the extra object files to be used, else it is ''
madlysrc
From mad.U:
If the Misc Attribute Decoration is to be compiled, this variable is set to the name of the extra C source files to be used, else it is ''
mail
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
mailx
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
make
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the make program. After Configure runs,
the value is reset to a plain make
and is not useful.
make_set_make
From make.U:
Some versions of make
set the variable MAKE
. Others do not.
This variable contains the string to be included in Makefile.SH
so that MAKE
is set if needed, and not if not needed.
Possible values are:
make_set_make=#
# If your make program handles this for you,
make_set_make=MAKE=$make
# if it doesn't.
This uses a comment character so that we can distinguish a
set
value (from a previous config.sh or Configure -D
option)
from an uncomputed value.
mallocobj
From mallocsrc.U:
This variable contains the name of the malloc.o that this package generates, if that malloc.o is preferred over the system malloc. Otherwise the value is null. This variable is intended for generating Makefiles. See mallocsrc.
mallocsrc
From mallocsrc.U:
This variable contains the name of the malloc.c that comes with the package, if that malloc.c is preferred over the system malloc. Otherwise the value is null. This variable is intended for generating Makefiles.
malloctype
From mallocsrc.U:
This variable contains the kind of ptr returned by malloc and realloc.
man1dir
From man1dir.U:
This variable contains the name of the directory in which manual source pages are to be put. It is the responsibility of the Makefile.SH to get the value of this into the proper command. You must be prepared to do the ~name expansion yourself.
man1direxp
From man1dir.U:
This variable is the same as the man1dir variable, but is filename expanded at configuration time, for convenient use in makefiles.
man1ext
From man1dir.U:
This variable contains the extension that the manual page should
have: one of n
, l
, or 1
. The Makefile must supply the ..
See man1dir.
man3dir
From man3dir.U:
This variable contains the name of the directory in which manual source pages are to be put. It is the responsibility of the Makefile.SH to get the value of this into the proper command. You must be prepared to do the ~name expansion yourself.
man3direxp
From man3dir.U:
This variable is the same as the man3dir variable, but is filename expanded at configuration time, for convenient use in makefiles.
man3ext
From man3dir.U:
This variable contains the extension that the manual page should
have: one of n
, l
, or 3
. The Makefile must supply the ..
See man3dir.
mips_type
From usrinc.U:
This variable holds the environment type for the mips system. Possible values are "BSD 4.3" and "System V".
mistrustnm
From Csym.U:
This variable can be used to establish a fallthrough for the cases
where nm fails to find a symbol. If usenm is false or usenm is true
and mistrustnm is false, this variable has no effect. If usenm is true
and mistrustnm is compile
, a test program will be compiled to try to
find any symbol that can't be located via nm lookup. If mistrustnm is
run
, the test program will be run as well as being compiled.
mkdir
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the mkdir program. After Configure runs,
the value is reset to a plain mkdir and is not useful.
mmaptype
From d_mmap.U:
This symbol contains the type of pointer returned by mmap()
(and simultaneously the type of the first argument).
It can be void *
or caddr_t
.
modetype
From modetype.U:
This variable defines modetype to be something like mode_t, int, unsigned short, or whatever type is used to declare file modes for system calls.
more
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the more program. After Configure runs,
the value is reset to a plain more
and is not useful.
multiarch
From multiarch.U:
This variable conditionally defines the MULTIARCH
symbol
which signifies the presence of multiplatform files.
This is normally set by hints files.
mv
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
myarchname
From archname.U:
This variable holds the architecture name computed by Configure in a previous run. It is not intended to be perused by any user and should never be set in a hint file.
mydomain
From myhostname.U:
This variable contains the eventual value of the MYDOMAIN
symbol,
which is the domain of the host the program is going to run on.
The domain must be appended to myhostname to form a complete host name.
The dot comes with mydomain, and need not be supplied by the program.
myhostname
From myhostname.U:
This variable contains the eventual value of the MYHOSTNAME
symbol,
which is the name of the host the program is going to run on.
The domain is not kept with hostname, but must be gotten from mydomain.
The dot comes with mydomain, and need not be supplied by the program.
myuname
From Oldconfig.U:
The output of uname -a
if available, otherwise the hostname.
The whole thing is then lower-cased and slashes and single quotes are
removed.
n
From n.U:
This variable contains the -n
flag if that is what causes the echo
command to suppress newline. Otherwise it is null. Correct usage is
$echo $n "prompt for a question: $c".
need_va_copy
From need_va_copy.U:
This symbol, if defined, indicates that the system stores
the variable argument list datatype, va_list, in a format
that cannot be copied by simple assignment, so that some
other means must be used when copying is required.
As such systems vary in their provision (or non-provision)
of copying mechanisms, handy.h defines a platform-
independent
macro, Perl_va_copy(src, dst), to do the job.
netdb_hlen_type
From netdbtype.U:
This variable holds the type used for the 2nd argument to gethostbyaddr(). Usually, this is int or size_t or unsigned. This is only useful if you have gethostbyaddr(), naturally.
netdb_host_type
From netdbtype.U:
This variable holds the type used for the 1st argument to gethostbyaddr(). Usually, this is char * or void *, possibly with or without a const prefix. This is only useful if you have gethostbyaddr(), naturally.
netdb_name_type
From netdbtype.U:
This variable holds the type used for the argument to gethostbyname(). Usually, this is char * or const char *. This is only useful if you have gethostbyname(), naturally.
netdb_net_type
From netdbtype.U:
This variable holds the type used for the 1st argument to getnetbyaddr(). Usually, this is int or long. This is only useful if you have getnetbyaddr(), naturally.
nm
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the nm program. After Configure runs,
the value is reset to a plain nm
and is not useful.
nm_opt
From usenm.U:
This variable holds the options that may be necessary for nm.
nm_so_opt
From usenm.U:
This variable holds the options that may be necessary for nm
to work on a shared library but that can not be used on an
archive library. Currently, this is only used by Linux, where
nm --dynamic is *required* to get symbols from an ELF
library which
has been stripped, but nm --dynamic is *fatal* on an archive library.
Maybe Linux should just always set usenm=false.
nonxs_ext
From Extensions.U:
This variable holds a list of all non-xs extensions included in the package. All of them will be built.
nroff
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the nroff program. After Configure runs,
the value is reset to a plain nroff
and is not useful.
nv_overflows_integers_at
From perlxv.U:
This variable gives the largest integer value that NVs can hold as a constant floating point expression. If it could not be determined, it holds the value 0.
nv_preserves_uv_bits
From perlxv.U:
This variable indicates how many of bits type uvtype a variable nvtype can preserve.
nveformat
From perlxvf.U:
This variable contains the format string used for printing
a Perl NV
using %e-ish floating point format.
nvEUformat
From perlxvf.U:
This variable contains the format string used for printing
a Perl NV
using %E-ish floating point format.
nvfformat
From perlxvf.U:
This variable contains the format string used for printing
a Perl NV
using %f-ish floating point format.
nvFUformat
From perlxvf.U:
This variable contains the format string used for printing
a Perl NV
using %F-ish floating point format.
nvgformat
From perlxvf.U:
This variable contains the format string used for printing
a Perl NV
using %g-ish floating point format.
nvGUformat
From perlxvf.U:
This variable contains the format string used for printing
a Perl NV
using %G-ish floating point format.
nvsize
From perlxv.U:
This variable is the size of an NV
in bytes.
nvtype
From perlxv.U:
This variable contains the C type used for Perl's NV
.
o_nonblock
From nblock_io.U:
This variable bears the symbol value to be used during open() or fcntl()
to turn on non-blocking I/O for a file descriptor. If you wish to switch
between blocking and non-blocking, you may try ioctl(FIOSNBIO
) instead,
but that is only supported by some devices.
obj_ext
From Unix.U:
This is an old synonym for _o.
old_pthread_create_joinable
From d_pthrattrj.U:
This variable defines the constant to use for creating joinable
(aka undetached) pthreads. Unused if pthread.h defines
PTHREAD_CREATE_JOINABLE
. If used, possible values are
PTHREAD_CREATE_UNDETACHED
and __UNDETACHED
.
optimize
From ccflags.U:
This variable contains any optimizer/debugger flag that should be used. It is up to the Makefile to use it.
orderlib
From orderlib.U:
This variable is true
if the components of libraries must be ordered
(with `lorder $* | tsort`) before placing them in an archive. Set to
false
if ranlib or ar can generate random libraries.
osname
From Oldconfig.U:
This variable contains the operating system name (e.g. sunos, solaris, hpux, etc.). It can be useful later on for setting defaults. Any spaces are replaced with underscores. It is set to a null string if we can't figure it out.
osvers
From Oldconfig.U:
This variable contains the operating system version (e.g. 4.1.3, 5.2, etc.). It is primarily used for helping select an appropriate hints file, but might be useful elsewhere for setting defaults. It is set to '' if we can't figure it out. We try to be flexible about how much of the version number to keep, e.g. if 4.1.1, 4.1.2, and 4.1.3 are essentially the same for this package, hints files might just be os_4.0 or os_4.1, etc., not keeping separate files for each little release.
otherlibdirs
From otherlibdirs.U:
This variable contains a colon-separated set of paths for the perl
binary to search for additional library files or modules.
These directories will be tacked to the end of @INC
.
Perl will automatically search below each path for version-
and architecture-specific directories. See inc_version_list
for more details.
A value of
means none
and is used to preserve this value
for the next run through Configure.
package
From package.U:
This variable contains the name of the package being constructed. It is primarily intended for the use of later Configure units.
pager
From pager.U:
This variable contains the name of the preferred pager on the system. Usual values are (the full pathnames of) more, less, pg, or cat.
passcat
From nis.U:
This variable contains a command that produces the text of the
/etc/passwd file. This is normally "cat /etc/passwd", but can be
"ypcat passwd" when NIS
is used.
On some systems, such as os390, there may be no equivalent
command, in which case this variable is unset.
patchlevel
From patchlevel.U:
The patchlevel level of this package.
The value of patchlevel comes from the patchlevel.h file.
In a version number such as 5.6.1, this is the 6
.
In patchlevel.h, this is referred to as PERL_VERSION
.
path_sep
From Unix.U:
This is an old synonym for p_ in Head.U, the character
used to separate elements in the command shell search PATH
.
perl
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the perl program. After Configure runs,
the value is reset to a plain perl
and is not useful.
perl5
From perl5.U:
This variable contains the full path (if any) to a previously installed perl5.005 or later suitable for running the script to determine inc_version_list.
PERL_API_REVISION
From patchlevel.h:
This number describes the earliest compatible PERL_REVISION
of
Perl (compatibility
here being defined as sufficient binary/API
compatibility to run XS
code built with the older version).
Normally this does not change across maintenance releases.
Please read the comment in patchlevel.h.
PERL_API_SUBVERSION
From patchlevel.h:
This number describes the earliest compatible PERL_SUBVERSION
of
Perl (compatibility
here being defined as sufficient binary/API
compatibility to run XS
code built with the older version).
Normally this does not change across maintenance releases.
Please read the comment in patchlevel.h.
PERL_API_VERSION
From patchlevel.h:
This number describes the earliest compatible PERL_VERSION
of
Perl (compatibility
here being defined as sufficient binary/API
compatibility to run XS
code built with the older version).
Normally this does not change across maintenance releases.
Please read the comment in patchlevel.h.
PERL_CONFIG_SH
From Oldsyms.U:
This is set to true
in config.sh so that a shell script
sourcing config.sh can tell if it has been sourced already.
PERL_PATCHLEVEL
From Oldsyms.U:
This symbol reflects the patchlevel, if available. Will usually come from the .patch file, which is available when the perl source tree was fetched with rsync.
perl_patchlevel
From patchlevel.U:
This is the Perl patch level, a numeric change identifier, as defined by whichever source code maintenance system is used to maintain the patches; currently Perforce. It does not correlate with the Perl version numbers or the maintenance versus development dichotomy except by also being increasing.
PERL_REVISION
From Oldsyms.U:
In a Perl version number such as 5.6.2, this is the 5. This value is manually set in patchlevel.h
perl_static_inline
From d_static_inline.U:
This variable defines the PERL_STATIC_INLINE
symbol to
the best-guess incantation to use for static inline functions.
Possibilities include
static inline (c99)
static __inline__ (gcc -ansi)
static __inline (MSVC
)
static _inline (older MSVC
)
static (c89 compilers)
PERL_SUBVERSION
From Oldsyms.U:
In a Perl version number such as 5.6.2, this is the 2. Values greater than 50 represent potentially unstable development subversions. This value is manually set in patchlevel.h
PERL_VERSION
From Oldsyms.U:
In a Perl version number such as 5.6.2, this is the 6. This value is manually set in patchlevel.h
perladmin
From perladmin.U:
Electronic mail address of the perl5 administrator.
perllibs
From End.U:
The list of libraries needed by Perl only (any libraries needed by extensions only will by dropped, if using dynamic loading).
perlpath
From perlpath.U:
This variable contains the eventual value of the PERLPATH
symbol,
which contains the name of the perl interpreter to be used in
shell scripts and in the "eval exec" idiom. This variable is
not necessarily the pathname of the file containing the perl
interpreter; you must append the executable extension (_exe) if
it is not already present. Note that Perl code that runs during
the Perl build process cannot reference this variable, as Perl
may not have been installed, or even if installed, may be a
different version of Perl.
pg
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the pg program. After Configure runs,
the value is reset to a plain pg
and is not useful.
phostname
From myhostname.U:
This variable contains the eventual value of the PHOSTNAME
symbol,
which is a command that can be fed to popen() to get the host name.
The program should probably not presume that the domain is or isn't
there already.
pidtype
From pidtype.U:
This variable defines PIDTYPE
to be something like pid_t, int,
ushort, or whatever type is used to declare process ids in the kernel.
plibpth
From libpth.U:
Holds the private path used by Configure to find out the libraries. Its value is prepend to libpth. This variable takes care of special machines, like the mips. Usually, it should be empty.
pmake
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
pr
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
prefix
From prefix.U:
This variable holds the name of the directory below which the user will install the package. Usually, this is /usr/local, and executables go in /usr/local/bin, library stuff in /usr/local/lib, man pages in /usr/local/man, etc. It is only used to set defaults for things in bin.U, mansrc.U, privlib.U, or scriptdir.U.
prefixexp
From prefix.U:
This variable holds the full absolute path of the directory below which the user will install the package. Derived from prefix.
privlib
From privlib.U:
This variable contains the eventual value of the PRIVLIB
symbol,
which is the name of the private library for this package. It may
have a ~ on the front. It is up to the makefile to eventually create
this directory while performing installation (with ~ substitution).
privlibexp
From privlib.U:
This variable is the ~name expanded version of privlib, so that you may use it directly in Makefiles or shell scripts.
procselfexe
From d_procselfexe.U:
If d_procselfexe is defined, $procselfexe is the filename of the symbolic link pointing to the absolute pathname of the executing program.
prototype
From prototype.U:
This variable holds the eventual value of CAN_PROTOTYPE
, which
indicates the C compiler can handle function prototypes.
ptrsize
From ptrsize.U:
This variable contains the value of the PTRSIZE
symbol, which
indicates to the C program how many bytes there are in a pointer.
quadkind
From quadtype.U:
This variable, if defined, encodes the type of a quad: 1 = int, 2 = long, 3 = long long, 4 = int64_t.
quadtype
From quadtype.U:
This variable defines Quad_t to be something like long, int, long long, int64_t, or whatever type is used for 64-bit integers.
randbits
From randfunc.U:
Indicates how many bits are produced by the function used to generate normalized random numbers.
randfunc
From randfunc.U:
Indicates the name of the random number function to use.
Values include drand48, random, and rand. In C programs,
the Drand01
macro is defined to generate uniformly distributed
random numbers over the range [0., 1.[ (see drand01 and nrand).
random_r_proto
From d_random_r.U:
This variable encodes the prototype of random_r.
It is zero if d_random_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_random_r
is defined.
randseedtype
From randfunc.U:
Indicates the type of the argument of the seedfunc.
ranlib
From orderlib.U:
This variable is set to the pathname of the ranlib program, if it is
needed to generate random libraries. Set to :
if ar can generate
random libraries or if random libraries are not supported
rd_nodata
From nblock_io.U:
This variable holds the return code from read() when no data is
present. It should be -1, but some systems return 0 when O_NDELAY
is
used, which is a shame because you cannot make the difference between
no data and an EOF.. Sigh!
readdir64_r_proto
From d_readdir64_r.U:
This variable encodes the prototype of readdir64_r.
It is zero if d_readdir64_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_readdir64_r
is defined.
readdir_r_proto
From d_readdir_r.U:
This variable encodes the prototype of readdir_r.
It is zero if d_readdir_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_readdir_r
is defined.
revision
From patchlevel.U:
The value of revision comes from the patchlevel.h file.
In a version number such as 5.6.1, this is the 5
.
In patchlevel.h, this is referred to as PERL_REVISION
.
rm
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the rm program. After Configure runs,
the value is reset to a plain rm
and is not useful.
rm_try
From Unix.U:
This is a cleanup variable for try test programs. Internal Configure use only.
rmail
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
run
From Cross.U:
This variable contains the command used by Configure to copy and execute a cross-compiled executable in the target host. Useful and available only during Perl build. Empty string '' if not cross-compiling.
runnm
From usenm.U:
This variable contains true
or false
depending whether the
nm extraction should be performed or not, according to the value
of usenm and the flags on the Configure command line.
sched_yield
From d_pthread_y.U:
This variable defines the way to yield the execution of the current thread.
scriptdir
From scriptdir.U:
This variable holds the name of the directory in which the user wants to put publicly scripts for the package in question. It is either the same directory as for binaries, or a special one that can be mounted across different architectures, like /usr/share. Programs must be prepared to deal with ~name expansion.
scriptdirexp
From scriptdir.U:
This variable is the same as scriptdir, but is filename expanded at configuration time, for programs not wanting to bother with it.
sed
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the sed program. After Configure runs,
the value is reset to a plain sed
and is not useful.
seedfunc
From randfunc.U:
Indicates the random number generating seed function. Values include srand48, srandom, and srand.
selectminbits
From selectminbits.U:
This variable holds the minimum number of bits operated by select. That is, if you do select(n, ...), how many bits at least will be cleared in the masks if some activity is detected. Usually this is either n or 32*ceil(n/32), especially many little-endians do the latter. This is only useful if you have select(), naturally.
selecttype
From selecttype.U:
This variable holds the type used for the 2nd, 3rd, and 4th
arguments to select. Usually, this is fd_set *
, if HAS_FD_SET
is defined, and int *
otherwise. This is only useful if you
have select(), naturally.
sendmail
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
setgrent_r_proto
From d_setgrent_r.U:
This variable encodes the prototype of setgrent_r.
It is zero if d_setgrent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_setgrent_r
is defined.
sethostent_r_proto
From d_sethostent_r.U:
This variable encodes the prototype of sethostent_r.
It is zero if d_sethostent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_sethostent_r
is defined.
setlocale_r_proto
From d_setlocale_r.U:
This variable encodes the prototype of setlocale_r.
It is zero if d_setlocale_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_setlocale_r
is defined.
setnetent_r_proto
From d_setnetent_r.U:
This variable encodes the prototype of setnetent_r.
It is zero if d_setnetent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_setnetent_r
is defined.
setprotoent_r_proto
From d_setprotoent_r.U:
This variable encodes the prototype of setprotoent_r.
It is zero if d_setprotoent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_setprotoent_r
is defined.
setpwent_r_proto
From d_setpwent_r.U:
This variable encodes the prototype of setpwent_r.
It is zero if d_setpwent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_setpwent_r
is defined.
setservent_r_proto
From d_setservent_r.U:
This variable encodes the prototype of setservent_r.
It is zero if d_setservent_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_setservent_r
is defined.
sGMTIME_max
From time_size.U:
This variable defines the maximum value of the time_t offset that the system function gmtime () accepts
sGMTIME_min
From time_size.U:
This variable defines the minimum value of the time_t offset that the system function gmtime () accepts
sh
From sh.U:
This variable contains the full pathname of the shell used
on this system to execute Bourne shell scripts. Usually, this will be
/bin/sh, though it's possible that some systems will have /bin/ksh,
/bin/pdksh, /bin/ash, /bin/bash, or even something such as
D:/bin/sh.exe.
This unit comes before Options.U, so you can't set sh with a -D
option, though you can override this (and startsh)
with -O -Dsh=/bin/whatever -Dstartsh=whatever
shar
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
sharpbang
From spitshell.U:
This variable contains the string #! if this system supports that construct.
shmattype
From d_shmat.U:
This symbol contains the type of pointer returned by shmat().
It can be void *
or char *
.
shortsize
From intsize.U:
This variable contains the value of the SHORTSIZE
symbol which
indicates to the C program how many bytes there are in a short.
shrpenv
From libperl.U:
If the user builds a shared libperl.so, then we need to tell the
perl
executable where it will be able to find the installed libperl.so.
One way to do this on some systems is to set the environment variable
LD_RUN_PATH
to the directory that will be the final location of the
shared libperl.so. The makefile can use this with something like
$shrpenv $(CC
) -o perl perlmain.o $libperl $libs
Typical values are
shrpenv="env LD_RUN_PATH
=$archlibexp/CORE
"
or
shrpenv=''
See the main perl Makefile.SH for actual working usage.
Alternatively, we might be able to use a command line option such
as -R $archlibexp/CORE
(Solaris) or -Wl,-rpath
$archlibexp/CORE
(Linux).
shsharp
From spitshell.U:
This variable tells further Configure units whether your sh can handle # comments.
sig_count
From sig_name.U:
This variable holds a number larger than the largest valid
signal number. This is usually the same as the NSIG
macro.
sig_name
From sig_name.U:
This variable holds the signal names, space separated. The leading
SIG
in signal name is removed. A ZERO
is prepended to the list.
This is currently not used, sig_name_init is used instead.
sig_name_init
From sig_name.U:
This variable holds the signal names, enclosed in double quotes and
separated by commas, suitable for use in the SIG_NAME
definition
below. A ZERO
is prepended to the list, and the list is
terminated with a plain 0. The leading SIG
in signal names
is removed. See sig_num.
sig_num
From sig_name.U:
This variable holds the signal numbers, space separated. A ZERO
is
prepended to the list (corresponding to the fake SIGZERO
).
Those numbers correspond to the value of the signal listed
in the same place within the sig_name list.
This is currently not used, sig_num_init is used instead.
sig_num_init
From sig_name.U:
This variable holds the signal numbers, enclosed in double quotes and
separated by commas, suitable for use in the SIG_NUM
definition
below. A ZERO
is prepended to the list, and the list is
terminated with a plain 0.
sig_size
From sig_name.U:
This variable contains the number of elements of the sig_name and sig_num arrays.
signal_t
From d_voidsig.U:
This variable holds the type of the signal handler (void or int).
sitearch
From sitearch.U:
This variable contains the eventual value of the SITEARCH
symbol,
which is the name of the private library for this package. It may
have a ~ on the front. It is up to the makefile to eventually create
this directory while performing installation (with ~ substitution).
The standard distribution will put nothing in this directory.
After perl has been installed, users may install their own local
architecture-dependent modules in this directory with
MakeMaker Makefile.PL
or equivalent. See INSTALL
for details.
sitearchexp
From sitearch.U:
This variable is the ~name expanded version of sitearch, so that you may use it directly in Makefiles or shell scripts.
sitebin
From sitebin.U:
This variable holds the name of the directory in which the user wants
to put add-on publicly executable files for the package in question. It
is most often a local directory such as /usr/local/bin. Programs using
this variable must be prepared to deal with ~name substitution.
The standard distribution will put nothing in this directory.
After perl has been installed, users may install their own local
executables in this directory with
MakeMaker Makefile.PL
or equivalent. See INSTALL
for details.
sitebinexp
From sitebin.U:
This is the same as the sitebin variable, but is filename expanded at configuration time, for use in your makefiles.
sitehtml1dir
From sitehtml1dir.U:
This variable contains the name of the directory in which site-specific
html source pages are to be put. It is the responsibility of the
Makefile.SH to get the value of this into the proper command.
You must be prepared to do the ~name expansion yourself.
The standard distribution will put nothing in this directory.
After perl has been installed, users may install their own local
html pages in this directory with
MakeMaker Makefile.PL
or equivalent. See INSTALL
for details.
sitehtml1direxp
From sitehtml1dir.U:
This variable is the same as the sitehtml1dir variable, but is filename expanded at configuration time, for convenient use in makefiles.
sitehtml3dir
From sitehtml3dir.U:
This variable contains the name of the directory in which site-specific
library html source pages are to be put. It is the responsibility of the
Makefile.SH to get the value of this into the proper command.
You must be prepared to do the ~name expansion yourself.
The standard distribution will put nothing in this directory.
After perl has been installed, users may install their own local
library html pages in this directory with
MakeMaker Makefile.PL
or equivalent. See INSTALL
for details.
sitehtml3direxp
From sitehtml3dir.U:
This variable is the same as the sitehtml3dir variable, but is filename expanded at configuration time, for convenient use in makefiles.
sitelib
From sitelib.U:
This variable contains the eventual value of the SITELIB
symbol,
which is the name of the private library for this package. It may
have a ~ on the front. It is up to the makefile to eventually create
this directory while performing installation (with ~ substitution).
The standard distribution will put nothing in this directory.
After perl has been installed, users may install their own local
architecture-independent modules in this directory with
MakeMaker Makefile.PL
or equivalent. See INSTALL
for details.
sitelib_stem
From sitelib.U:
This variable is $sitelibexp with any trailing version-specific component removed. The elements in inc_version_list (inc_version_list.U) can be tacked onto this variable to generate a list of directories to search.
sitelibexp
From sitelib.U:
This variable is the ~name expanded version of sitelib, so that you may use it directly in Makefiles or shell scripts.
siteman1dir
From siteman1dir.U:
This variable contains the name of the directory in which site-specific
manual source pages are to be put. It is the responsibility of the
Makefile.SH to get the value of this into the proper command.
You must be prepared to do the ~name expansion yourself.
The standard distribution will put nothing in this directory.
After perl has been installed, users may install their own local
man1 pages in this directory with
MakeMaker Makefile.PL
or equivalent. See INSTALL
for details.
siteman1direxp
From siteman1dir.U:
This variable is the same as the siteman1dir variable, but is filename expanded at configuration time, for convenient use in makefiles.
siteman3dir
From siteman3dir.U:
This variable contains the name of the directory in which site-specific
library man source pages are to be put. It is the responsibility of the
Makefile.SH to get the value of this into the proper command.
You must be prepared to do the ~name expansion yourself.
The standard distribution will put nothing in this directory.
After perl has been installed, users may install their own local
man3 pages in this directory with
MakeMaker Makefile.PL
or equivalent. See INSTALL
for details.
siteman3direxp
From siteman3dir.U:
This variable is the same as the siteman3dir variable, but is filename expanded at configuration time, for convenient use in makefiles.
siteprefix
From siteprefix.U:
This variable holds the full absolute path of the directory below
which the user will install add-on packages.
See INSTALL
for usage and examples.
siteprefixexp
From siteprefix.U:
This variable holds the full absolute path of the directory below which the user will install add-on packages. Derived from siteprefix.
sitescript
From sitescript.U:
This variable holds the name of the directory in which the user wants
to put add-on publicly executable files for the package in question. It
is most often a local directory such as /usr/local/bin. Programs using
this variable must be prepared to deal with ~name substitution.
The standard distribution will put nothing in this directory.
After perl has been installed, users may install their own local
scripts in this directory with
MakeMaker Makefile.PL
or equivalent. See INSTALL
for details.
sitescriptexp
From sitescript.U:
This is the same as the sitescript variable, but is filename expanded at configuration time, for use in your makefiles.
sizesize
From sizesize.U:
This variable contains the size of a sizetype in bytes.
sizetype
From sizetype.U:
This variable defines sizetype to be something like size_t, unsigned long, or whatever type is used to declare length parameters for string functions.
sleep
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
sLOCALTIME_max
From time_size.U:
This variable defines the maximum value of the time_t offset that the system function localtime () accepts
sLOCALTIME_min
From time_size.U:
This variable defines the minimum value of the time_t offset that the system function localtime () accepts
smail
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
so
From so.U:
This variable holds the extension used to identify shared libraries
(also known as shared objects) on the system. Usually set to so
.
sockethdr
From d_socket.U:
This variable has any cpp -I
flags needed for socket support.
socketlib
From d_socket.U:
This variable has the names of any libraries needed for socket support.
socksizetype
From socksizetype.U:
This variable holds the type used for the size argument for various socket calls like accept. Usual values include socklen_t, size_t, and int.
sort
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the sort program. After Configure runs,
the value is reset to a plain sort and is not useful.
spackage
From package.U:
This variable contains the name of the package being constructed, with the first letter uppercased, i.e. suitable for starting sentences.
spitshell
From spitshell.U:
This variable contains the command necessary to spit out a runnable
shell on this system. It is either cat or a grep -v
for # comments.
sPRId64
From quadfio.U:
This variable, if defined, contains the string used by stdio to
format 64-bit decimal numbers (format d
) for output.
sPRIeldbl
From longdblfio.U:
This variable, if defined, contains the string used by stdio to
format long doubles (format e
) for output.
sPRIEUldbl
From longdblfio.U:
This variable, if defined, contains the string used by stdio to
format long doubles (format E
) for output.
The U
in the name is to separate this from sPRIeldbl so that even
case-blind systems can see the difference.
sPRIfldbl
From longdblfio.U:
This variable, if defined, contains the string used by stdio to
format long doubles (format f
) for output.
sPRIFUldbl
From longdblfio.U:
This variable, if defined, contains the string used by stdio to
format long doubles (format F
) for output.
The U
in the name is to separate this from sPRIfldbl so that even
case-blind systems can see the difference.
sPRIgldbl
From longdblfio.U:
This variable, if defined, contains the string used by stdio to
format long doubles (format g
) for output.
sPRIGUldbl
From longdblfio.U:
This variable, if defined, contains the string used by stdio to
format long doubles (format G
) for output.
The U
in the name is to separate this from sPRIgldbl so that even
case-blind systems can see the difference.
sPRIi64
From quadfio.U:
This variable, if defined, contains the string used by stdio to
format 64-bit decimal numbers (format i
) for output.
sPRIo64
From quadfio.U:
This variable, if defined, contains the string used by stdio to
format 64-bit octal numbers (format o
) for output.
sPRIu64
From quadfio.U:
This variable, if defined, contains the string used by stdio to
format 64-bit unsigned decimal numbers (format u
) for output.
sPRIx64
From quadfio.U:
This variable, if defined, contains the string used by stdio to
format 64-bit hexadecimal numbers (format x
) for output.
sPRIXU64
From quadfio.U:
This variable, if defined, contains the string used by stdio to
format 64-bit hExADECimAl numbers (format X
) for output.
The U
in the name is to separate this from sPRIx64 so that even
case-blind systems can see the difference.
srand48_r_proto
From d_srand48_r.U:
This variable encodes the prototype of srand48_r.
It is zero if d_srand48_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_srand48_r
is defined.
srandom_r_proto
From d_srandom_r.U:
This variable encodes the prototype of srandom_r.
It is zero if d_srandom_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_srandom_r
is defined.
src
From src.U:
This variable holds the (possibly relative) path of the package source.
It is up to the Makefile to use this variable and set VPATH
accordingly
to find the sources remotely. Use $pkgsrc to have an absolute path.
sSCNfldbl
From longdblfio.U:
This variable, if defined, contains the string used by stdio to
format long doubles (format f
) for input.
ssizetype
From ssizetype.U:
This variable defines ssizetype to be something like ssize_t, long or int. It is used by functions that return a count of bytes or an error condition. It must be a signed type. We will pick a type such that sizeof(SSize_t) == sizeof(Size_t).
st_ino_sign
From st_ino_def.U:
This variable contains the signedness of struct stat's st_ino. 1 for unsigned, -1 for signed.
st_ino_size
From st_ino_def.U:
This variable contains the size of struct stat's st_ino in bytes.
startperl
From startperl.U:
This variable contains the string to put on the front of a perl
script to make sure (hopefully) that it runs with perl and not some
shell. Of course, that leading line must be followed by the classical
perl idiom:
eval 'exec perl -S $0 ${1+$@
}'
if $running_under_some_shell;
to guarantee perl startup should the shell execute the script. Note
that this magic incantation is not understood by csh.
startsh
From startsh.U:
This variable contains the string to put on the front of a shell script to make sure (hopefully) that it runs with sh and not some other shell.
static_ext
From Extensions.U:
This variable holds a list of XS
extension files we want to
link statically into the package. It is used by Makefile.
stdchar
From stdchar.U:
This variable conditionally defines STDCHAR
to be the type of char
used in stdio.h. It has the values "unsigned char" or char
.
stdio_base
From d_stdstdio.U:
This variable defines how, given a FILE
pointer, fp, to access the
_base field (or equivalent) of stdio.h's FILE
structure. This will
be used to define the macro FILE_base(fp).
stdio_bufsiz
From d_stdstdio.U:
This variable defines how, given a FILE
pointer, fp, to determine
the number of bytes store in the I/O buffer pointer to by the
_base field (or equivalent) of stdio.h's FILE
structure. This will
be used to define the macro FILE_bufsiz(fp).
stdio_cnt
From d_stdstdio.U:
This variable defines how, given a FILE
pointer, fp, to access the
_cnt field (or equivalent) of stdio.h's FILE
structure. This will
be used to define the macro FILE_cnt(fp).
stdio_filbuf
From d_stdstdio.U:
This variable defines how, given a FILE
pointer, fp, to tell
stdio to refill its internal buffers (?). This will
be used to define the macro FILE_filbuf(fp).
stdio_ptr
From d_stdstdio.U:
This variable defines how, given a FILE
pointer, fp, to access the
_ptr field (or equivalent) of stdio.h's FILE
structure. This will
be used to define the macro FILE_ptr(fp).
stdio_stream_array
From stdio_streams.U:
This variable tells the name of the array holding the stdio streams. Usual values include _iob, __iob, and __sF.
strerror_r_proto
From d_strerror_r.U:
This variable encodes the prototype of strerror_r.
It is zero if d_strerror_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_strerror_r
is defined.
strings
From i_string.U:
This variable holds the full path of the string header that will be used. Typically /usr/include/string.h or /usr/include/strings.h.
submit
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
subversion
From patchlevel.U:
The subversion level of this package.
The value of subversion comes from the patchlevel.h file.
In a version number such as 5.6.1, this is the 1
.
In patchlevel.h, this is referred to as PERL_SUBVERSION
.
This is unique to perl.
sysman
From sysman.U:
This variable holds the place where the manual is located on this system. It is not the place where the user wants to put his manual pages. Rather it is the place where Configure may look to find manual for unix commands (section 1 of the manual usually). See mansrc.
tail
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
tar
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
targetarch
From Cross.U:
If cross-compiling, this variable contains the target architecture. If not, this will be empty.
tbl
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
tee
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
test
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the test program. After Configure runs,
the value is reset to a plain test
and is not useful.
timeincl
From i_time.U:
This variable holds the full path of the included time header(s).
timetype
From d_time.U:
This variable holds the type returned by time(). It can be long,
or time_t on BSD
sites (in which case <sys/types.h> should be
included). Anyway, the type Time_t should be used.
tmpnam_r_proto
From d_tmpnam_r.U:
This variable encodes the prototype of tmpnam_r.
It is zero if d_tmpnam_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_tmpnam_r
is defined.
to
From Cross.U:
This variable contains the command used by Configure
to copy to from the target host. Useful and available
only during Perl build.
The string :
if not cross-compiling.
touch
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the touch program. After Configure runs,
the value is reset to a plain touch
and is not useful.
tr
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the tr program. After Configure runs,
the value is reset to a plain tr and is not useful.
trnl
From trnl.U:
This variable contains the value to be passed to the tr(1)
command to transliterate a newline. Typical values are
\012
and \n
. This is needed for EBCDIC
systems where
newline is not necessarily \012
.
troff
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
ttyname_r_proto
From d_ttyname_r.U:
This variable encodes the prototype of ttyname_r.
It is zero if d_ttyname_r is undef, and one of the
REENTRANT_PROTO_T_ABC
macros of reentr.h if d_ttyname_r
is defined.
u16size
From perlxv.U:
This variable is the size of an U16 in bytes.
u16type
From perlxv.U:
This variable contains the C type used for Perl's U16.
u32size
From perlxv.U:
This variable is the size of an U32 in bytes.
u32type
From perlxv.U:
This variable contains the C type used for Perl's U32.
u64size
From perlxv.U:
This variable is the size of an U64 in bytes.
u64type
From perlxv.U:
This variable contains the C type used for Perl's U64.
u8size
From perlxv.U:
This variable is the size of an U8 in bytes.
u8type
From perlxv.U:
This variable contains the C type used for Perl's U8.
uidformat
From uidf.U:
This variable contains the format string used for printing a Uid_t.
uidsign
From uidsign.U:
This variable contains the signedness of a uidtype. 1 for unsigned, -1 for signed.
uidsize
From uidsize.U:
This variable contains the size of a uidtype in bytes.
uidtype
From uidtype.U:
This variable defines Uid_t to be something like uid_t, int, ushort, or whatever type is used to declare user ids in the kernel.
uname
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the uname program. After Configure runs,
the value is reset to a plain uname
and is not useful.
uniq
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the uniq program. After Configure runs,
the value is reset to a plain uniq
and is not useful.
uquadtype
From quadtype.U:
This variable defines Uquad_t to be something like unsigned long, unsigned int, unsigned long long, uint64_t, or whatever type is used for 64-bit integers.
use5005threads
From usethreads.U:
This variable conditionally defines the USE_5005THREADS symbol, and indicates that Perl should be built to use the 5.005-based threading implementation. Only valid up to 5.8.x.
use64bitall
From use64bits.U:
This variable conditionally defines the USE_64_BIT_ALL symbol,
and indicates that 64-bit integer types should be used
when available. The maximal possible
64-bitness is employed: LP64 or ILP64, meaning that you will
be able to use more than 2 gigabytes of memory. This mode is
even more binary incompatible than USE_64_BIT_INT. You may not
be able to run the resulting executable in a 32-bit CPU
at all or
you may need at least to reboot your OS
to 64-bit mode.
use64bitint
From use64bits.U:
This variable conditionally defines the USE_64_BIT_INT symbol, and indicates that 64-bit integer types should be used when available. The minimal possible 64-bitness is employed, just enough to get 64-bit integers into Perl. This may mean using for example "long longs", while your memory may still be limited to 2 gigabytes.
usecrosscompile
From Cross.U:
This variable conditionally defines the USE_CROSS_COMPILE
symbol,
and indicates that Perl has been cross-compiled.
usedevel
From Devel.U:
This variable indicates that Perl was configured with development features enabled. This should not be done for production builds.
usedl
From dlsrc.U:
This variable indicates if the system supports dynamic loading of some sort. See also dlsrc and dlobj.
usedtrace
From usedtrace.U:
This variable indicates whether we are compiling with dtrace support. See also dtrace.
usefaststdio
From usefaststdio.U:
This variable conditionally defines the USE_FAST_STDIO
symbol,
and indicates that Perl should be built to use fast stdio
.
Defaults to define in Perls 5.8 and earlier, to undef later.
useithreads
From usethreads.U:
This variable conditionally defines the USE_ITHREADS
symbol,
and indicates that Perl should be built to use the interpreter-based
threading implementation.
usekernprocpathname
From usekernprocpathname.U:
This variable, indicates that we can use sysctl with
KERN_PROC_PATHNAME
to get a full path for the executable, and hence
convert $^X to an absolute path.
uselargefiles
From uselfs.U:
This variable conditionally defines the USE_LARGE_FILES
symbol,
and indicates that large file interfaces should be used when
available.
uselongdouble
From uselongdbl.U:
This variable conditionally defines the USE_LONG_DOUBLE
symbol,
and indicates that long doubles should be used when available.
usemallocwrap
From mallocsrc.U:
This variable contains y if we are wrapping malloc to prevent integer overflow during size calculations.
usemorebits
From usemorebits.U:
This variable conditionally defines the USE_MORE_BITS
symbol,
and indicates that explicit 64-bit interfaces and long doubles
should be used when available.
usemultiplicity
From usemultiplicity.U:
This variable conditionally defines the MULTIPLICITY
symbol,
and indicates that Perl should be built to use multiplicity.
usemymalloc
From mallocsrc.U:
This variable contains y if the malloc that comes with this package
is desired over the system's version of malloc. People often include
special versions of malloc for efficiency, but such versions are often
less portable. See also mallocsrc and mallocobj.
If this is y, then -lmalloc is removed from $libs.
usenm
From usenm.U:
This variable contains true
or false
depending whether the
nm extraction is wanted or not.
usensgetexecutablepath
From usensgetexecutablepath.U:
This symbol, if defined, indicates that we can use _NSGetExecutablePath and realpath to get a full path for the executable, and hence convert $^X to an absolute path.
useopcode
From Extensions.U:
This variable holds either true
or false
to indicate
whether the Opcode extension should be used. The sole
use for this currently is to allow an easy mechanism
for users to skip the Opcode extension from the Configure
command line.
useperlio
From useperlio.U:
This variable conditionally defines the USE_PERLIO
symbol,
and indicates that the PerlIO abstraction should be
used throughout.
useposix
From Extensions.U:
This variable holds either true
or false
to indicate
whether the POSIX
extension should be used. The sole
use for this currently is to allow an easy mechanism
for hints files to indicate that POSIX
will not compile
on a particular system.
usereentrant
From usethreads.U:
This variable conditionally defines the USE_REENTRANT_API
symbol,
which indicates that the thread code may try to use the various
_r versions of library functions. This is only potentially
meaningful if usethreads is set and is very experimental, it is
not even prompted for.
userelocatableinc
From bin.U:
This variable is set to true to indicate that perl should relocate
@INC
entries at runtime based on the path to the perl binary.
Any @INC
paths starting .../ are relocated relative to the directory
containing the perl binary, and a logical cleanup of the path is then
made around the join point (removing dir/../ pairs)
usesfio
From d_sfio.U:
This variable is set to true when the user agrees to use sfio. It is set to false when sfio is not available or when the user explicitly requests not to use sfio. It is here primarily so that command-line settings can override the auto-detection of d_sfio without running into a "WHOA THERE".
useshrplib
From libperl.U:
This variable is set to true
if the user wishes
to build a shared libperl, and false
otherwise.
usesitecustomize
From d_sitecustomize.U:
This variable is set to true when the user requires a mechanism that
allows the sysadmin to add entries to @INC
at runtime. This variable
being set, makes perl run $sitelib/sitecustomize.pl at startup.
usesocks
From usesocks.U:
This variable conditionally defines the USE_SOCKS
symbol,
and indicates that Perl should be built to use SOCKS
.
usethreads
From usethreads.U:
This variable conditionally defines the USE_THREADS
symbol,
and indicates that Perl should be built to use threads.
usevendorprefix
From vendorprefix.U:
This variable tells whether the vendorprefix and consequently other vendor* paths are in use.
useversionedarchname
From archname.U:
This variable indicates whether to include the $api_versionstring as a component of the $archname.
usevfork
From d_vfork.U:
This variable is set to true when the user accepts to use vfork. It is set to false when no vfork is available or when the user explicitly requests not to use vfork.
usrinc
From usrinc.U:
This variable holds the path of the include files, which is usually /usr/include. It is mainly used by other Configure units.
uuname
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
uvoformat
From perlxvf.U:
This variable contains the format string used for printing
a Perl UV
as an unsigned octal integer.
uvsize
From perlxv.U:
This variable is the size of a UV
in bytes.
uvtype
From perlxv.U:
This variable contains the C type used for Perl's UV
.
uvuformat
From perlxvf.U:
This variable contains the format string used for printing
a Perl UV
as an unsigned decimal integer.
uvxformat
From perlxvf.U:
This variable contains the format string used for printing
a Perl UV
as an unsigned hexadecimal integer in lowercase abcdef.
uvXUformat
From perlxvf.U:
This variable contains the format string used for printing
a Perl UV
as an unsigned hexadecimal integer in uppercase ABCDEF
.
vaproto
From vaproto.U:
This variable conditionally defines CAN_VAPROTO
on systems supporting
prototype declaration of functions with a variable number of
arguments. See also prototype.
vendorarch
From vendorarch.U:
This variable contains the value of the PERL_VENDORARCH
symbol.
It may have a ~ on the front.
The standard distribution will put nothing in this directory.
Vendors who distribute perl may wish to place their own
architecture-dependent modules and extensions in this directory with
MakeMaker Makefile.PL INSTALLDIRS
=vendor
or equivalent. See INSTALL
for details.
vendorarchexp
From vendorarch.U:
This variable is the ~name expanded version of vendorarch, so that you may use it directly in Makefiles or shell scripts.
vendorbin
From vendorbin.U:
This variable contains the eventual value of the VENDORBIN
symbol.
It may have a ~ on the front.
The standard distribution will put nothing in this directory.
Vendors who distribute perl may wish to place additional
binaries in this directory with
MakeMaker Makefile.PL INSTALLDIRS
=vendor
or equivalent. See INSTALL
for details.
vendorbinexp
From vendorbin.U:
This variable is the ~name expanded version of vendorbin, so that you may use it directly in Makefiles or shell scripts.
vendorhtml1dir
From vendorhtml1dir.U:
This variable contains the name of the directory for html
pages. It may have a ~ on the front.
The standard distribution will put nothing in this directory.
Vendors who distribute perl may wish to place their own
html pages in this directory with
MakeMaker Makefile.PL INSTALLDIRS
=vendor
or equivalent. See INSTALL
for details.
vendorhtml1direxp
From vendorhtml1dir.U:
This variable is the ~name expanded version of vendorhtml1dir, so that you may use it directly in Makefiles or shell scripts.
vendorhtml3dir
From vendorhtml3dir.U:
This variable contains the name of the directory for html
library pages. It may have a ~ on the front.
The standard distribution will put nothing in this directory.
Vendors who distribute perl may wish to place their own
html pages for modules and extensions in this directory with
MakeMaker Makefile.PL INSTALLDIRS
=vendor
or equivalent. See INSTALL
for details.
vendorhtml3direxp
From vendorhtml3dir.U:
This variable is the ~name expanded version of vendorhtml3dir, so that you may use it directly in Makefiles or shell scripts.
vendorlib
From vendorlib.U:
This variable contains the eventual value of the VENDORLIB
symbol,
which is the name of the private library for this package.
The standard distribution will put nothing in this directory.
Vendors who distribute perl may wish to place their own
modules in this directory with
MakeMaker Makefile.PL INSTALLDIRS
=vendor
or equivalent. See INSTALL
for details.
vendorlib_stem
From vendorlib.U:
This variable is $vendorlibexp with any trailing version-specific component removed. The elements in inc_version_list (inc_version_list.U) can be tacked onto this variable to generate a list of directories to search.
vendorlibexp
From vendorlib.U:
This variable is the ~name expanded version of vendorlib, so that you may use it directly in Makefiles or shell scripts.
vendorman1dir
From vendorman1dir.U:
This variable contains the name of the directory for man1
pages. It may have a ~ on the front.
The standard distribution will put nothing in this directory.
Vendors who distribute perl may wish to place their own
man1 pages in this directory with
MakeMaker Makefile.PL INSTALLDIRS
=vendor
or equivalent. See INSTALL
for details.
vendorman1direxp
From vendorman1dir.U:
This variable is the ~name expanded version of vendorman1dir, so that you may use it directly in Makefiles or shell scripts.
vendorman3dir
From vendorman3dir.U:
This variable contains the name of the directory for man3
pages. It may have a ~ on the front.
The standard distribution will put nothing in this directory.
Vendors who distribute perl may wish to place their own
man3 pages in this directory with
MakeMaker Makefile.PL INSTALLDIRS
=vendor
or equivalent. See INSTALL
for details.
vendorman3direxp
From vendorman3dir.U:
This variable is the ~name expanded version of vendorman3dir, so that you may use it directly in Makefiles or shell scripts.
vendorprefix
From vendorprefix.U:
This variable holds the full absolute path of the directory below
which the vendor will install add-on packages.
See INSTALL
for usage and examples.
vendorprefixexp
From vendorprefix.U:
This variable holds the full absolute path of the directory below which the vendor will install add-on packages. Derived from vendorprefix.
vendorscript
From vendorscript.U:
This variable contains the eventual value of the VENDORSCRIPT
symbol.
It may have a ~ on the front.
The standard distribution will put nothing in this directory.
Vendors who distribute perl may wish to place additional
executable scripts in this directory with
MakeMaker Makefile.PL INSTALLDIRS
=vendor
or equivalent. See INSTALL
for details.
vendorscriptexp
From vendorscript.U:
This variable is the ~name expanded version of vendorscript, so that you may use it directly in Makefiles or shell scripts.
version
From patchlevel.U:
The full version number of this package, such as 5.6.1 (or 5_6_1). This combines revision, patchlevel, and subversion to get the full version number, including any possible subversions. This is suitable for use as a directory name, and hence is filesystem dependent.
version_patchlevel_string
From patchlevel.U:
This is a string combining version, subversion and perl_patchlevel (if perl_patchlevel is non-zero). It is typically something like 'version 7 subversion 1' or 'version 7 subversion 1 patchlevel 11224' It is computed here to avoid duplication of code in myconfig.SH and lib/Config.pm.
versiononly
From versiononly.U:
If set, this symbol indicates that only the version-specific
components of a perl installation should be installed.
This may be useful for making a test installation of a new
version without disturbing the existing installation.
Setting versiononly is equivalent to setting installperl's -v option.
In particular, the non-versioned scripts and programs such as
a2p, c2ph, h2xs, pod2*, and perldoc are not installed
(see INSTALL
for a more complete list). Nor are the man
pages installed.
Usually, this is undef.
vi
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
voidflags
From voidflags.U:
This variable contains the eventual value of the VOIDFLAGS
symbol,
which indicates how much support of the void type is given by this
compiler. See VOIDFLAGS
for more info.
xlibpth
From libpth.U:
This variable holds extra path (space-separated) used to find
libraries on this platform, for example CPU
-specific libraries
(on multi-CPU
platforms) may be listed here.
yacc
From yacc.U:
This variable holds the name of the compiler compiler we want to use in the Makefile. It can be yacc, byacc, or bison -y.
yaccflags
From yacc.U:
This variable contains any additional yacc flags desired by the user. It is up to the Makefile to use this.
zcat
From Loc.U:
This variable is defined but not used by Configure. The value is the empty string and is not useful.
zip
From Loc.U:
This variable is used internally by Configure to determine the
full pathname (if any) of the zip program. After Configure runs,
the value is reset to a plain zip
and is not useful.
Information on the git commit from which the current perl binary was compiled
can be found in the variable $Config::Git_Data
. The variable is a
structured string that looks something like this:
- git_commit_id='ea0c2dbd5f5ac6845ecc7ec6696415bf8e27bd52'
- git_describe='GitLive-blead-1076-gea0c2db'
- git_branch='smartmatch'
- git_uncommitted_changes=''
- git_commit_id_title='Commit id:'
- git_commit_date='2009-05-09 17:47:31 +0200'
Its format is not guaranteed not to change over time.
This module contains a good example of how to use tie to implement a cache and an example of how to make a tied variable readonly to those outside of it.
Cwd - get pathname of current working directory
This module provides functions for determining the pathname of the current working directory. It is recommended that getcwd (or another *cwd() function) be used in all code to ensure portability.
By default, it exports the functions cwd(), getcwd(), fastcwd(), and fastgetcwd() (and, on Win32, getdcwd()) into the caller's namespace.
Each of these functions are called without arguments and return the absolute path of the current working directory.
- my $cwd = getcwd();
Returns the current working directory.
Exposes the POSIX function getcwd(3) or re-implements it if it's not available.
- my $cwd = cwd();
The cwd() is the most natural form for the current architecture. For most systems it is identical to `pwd` (but without the trailing line terminator).
- my $cwd = fastcwd();
A more dangerous version of getcwd(), but potentially faster.
It might conceivably chdir() you out of a directory that it can't
chdir() you back into. If fastcwd encounters a problem it will return
undef but will probably leave you in a different directory. For a
measure of extra security, if everything appears to have worked, the
fastcwd() function will check that it leaves you in the same directory
that it started in. If it has changed it will die with the message
"Unstable directory path, current directory changed
unexpectedly". That should never happen.
- my $cwd = fastgetcwd();
The fastgetcwd() function is provided as a synonym for cwd().
The getdcwd() function is also provided on Win32 to get the current working directory on the specified drive, since Windows maintains a separate current working directory for each drive. If no drive is specified then the current drive is assumed.
This function simply calls the Microsoft C library _getdcwd() function.
These functions are exported only on request. They each take a single argument and return the absolute pathname for it. If no argument is given they'll use the current working directory.
- my $abs_path = abs_path($file);
Uses the same algorithm as getcwd(). Symbolic links and relative-path components ("." and "..") are resolved to return the canonical pathname, just like realpath(3).
- my $abs_path = realpath($file);
A synonym for abs_path().
- my $abs_path = fast_abs_path($file);
A more dangerous, but potentially faster version of abs_path.
If you ask to override your chdir() built-in function,
- use Cwd qw(chdir);
then your PWD environment variable will be kept up to date. Note that it will only be kept up to date if all packages which use chdir import it from Cwd.
Since the path separators are different on some operating systems ('/' on Unix, ':' on MacPerl, etc...) we recommend you use the File::Spec modules wherever portability is a concern.
Actually, on Mac OS, the getcwd()
, fastgetcwd()
and fastcwd()
functions are all aliases for the cwd()
function, which, on Mac OS,
calls `pwd`. Likewise, the abs_path()
function is an alias for
fast_abs_path()
.
Originally by the perl5-porters.
Maintained by Ken Williams <KWILLIAMS@cpan.org>
Copyright (c) 2004 by the Perl 5 Porters. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Portions of the C code in this library are copyright (c) 1994 by the Regents of the University of California. All rights reserved. The license on this code is compatible with the licensing of the rest of the distribution - please see the source code in Cwd.xs for the details.
DB - programmatic interface to the Perl debugging API
- package CLIENT;
- use DB;
- @ISA = qw(DB);
- # these (inherited) methods can be called by the client
- CLIENT->register() # register a client package name
- CLIENT->done() # de-register from the debugging API
- CLIENT->skippkg('hide::hide') # ask DB not to stop in this package
- CLIENT->cont([WHERE]) # run some more (until BREAK or another breakpt)
- CLIENT->step() # single step
- CLIENT->next() # step over
- CLIENT->ret() # return from current subroutine
- CLIENT->backtrace() # return the call stack description
- CLIENT->ready() # call when client setup is done
- CLIENT->trace_toggle() # toggle subroutine call trace mode
- CLIENT->subs([SUBS]) # return subroutine information
- CLIENT->files() # return list of all files known to DB
- CLIENT->lines() # return lines in currently loaded file
- CLIENT->loadfile(FILE,LINE) # load a file and let other clients know
- CLIENT->lineevents() # return info on lines with actions
- CLIENT->set_break([WHERE],[COND])
- CLIENT->set_tbreak([WHERE])
- CLIENT->clr_breaks([LIST])
- CLIENT->set_action(WHERE,ACTION)
- CLIENT->clr_actions([LIST])
- CLIENT->evalcode(STRING) # eval STRING in executing code's context
- CLIENT->prestop([STRING]) # execute in code context before stopping
- CLIENT->poststop([STRING])# execute in code context before resuming
- # These methods will be called at the appropriate times.
- # Stub versions provided do nothing.
- # None of these can block.
- CLIENT->init() # called when debug API inits itself
- CLIENT->stop(FILE,LINE) # when execution stops
- CLIENT->idle() # while stopped (can be a client event loop)
- CLIENT->cleanup() # just before exit
- CLIENT->output(LIST) # called to print any output that API must show
Perl debug information is frequently required not just by debuggers, but also by modules that need some "special" information to do their job properly, like profilers.
This module abstracts and provides all of the hooks into Perl internal debugging functionality, so that various implementations of Perl debuggers (or packages that want to simply get at the "privileged" debugging data) can all benefit from the development of this common code. Currently used by Swat, the perl/Tk GUI debugger.
Note that multiple "front-ends" can latch into this debugging API simultaneously. This is intended to facilitate things like debugging with a command line and GUI at the same time, debugging debuggers etc. [Sounds nice, but this needs some serious support -- GSAR]
In particular, this API does not provide the following functions:
data display
command processing
command alias management
user interface (tty or graphical)
These are intended to be services performed by the clients of this API.
This module attempts to be squeaky clean w.r.t use strict;
and when
warnings are enabled.
The following "public" global names can be read by clients of this API. Beware that these should be considered "readonly".
Name of current executing subroutine.
The keys of this hash are the names of all the known subroutines. Each value
is an encoded string that has the sprintf(3) format
("%s:%d-%d", filename, fromline, toline)
.
Single-step flag. Will be true if the API will stop at the next statement.
Signal flag. Will be set to a true value if a signal was caught. Clients may check for this flag to abort time-consuming operations.
This flag is set to true if the API is tracing through subroutine calls.
Contains the arguments of current subroutine, or the @ARGV
array if in the
toplevel context.
List of lines in currently loaded file.
Actions in current file (keys are line numbers). The values are strings that
have the sprintf(3) format ("%s\000%s", breakcondition, actioncode)
.
Package namespace of currently executing code.
Currently loaded filename.
Fully qualified name of currently executing subroutine.
Line number that will be executed next.
The following are methods in the DB base class. A client must access these methods by inheritance (*not* by calling them directly), since the API keeps track of clients through the inheritance mechanism.
register a client object/package
eval STRING in executing code context
ask DB not to stop in these packages
run some more (until a breakpt is reached)
single step
step over
de-register from the debugging API
The following "virtual" methods can be defined by the client. They will be called by the API at appropriate points. Note that unless specified otherwise, the debug API only defines empty, non-functional default versions of these methods.
Called after debug API inits itself.
Usually inherited from DB package. If no arguments are passed, returns the prestop action string.
Called when execution stops (w/ args file, line).
Called while stopped (can be a client event loop).
Usually inherited from DB package. If no arguments are passed, returns the poststop action string.
Usually inherited from DB package. Ask for a STRING to be eval-ed
in executing code context.
Called just before exit.
Called when API must show a message (warnings, errors etc.).
The interface defined by this module is missing some of the later additions to perl's debugging functionality. As such, this interface should be considered highly experimental and subject to change.
Gurusamy Sarathy gsar@activestate.com
This code heavily adapted from an early version of perl5db.pl attributable to Larry Wall and the Perl Porters.
DBM_Filter -- Filter DBM keys/values
- use DBM_Filter ;
- use SDBM_File; # or DB_File, or GDBM_File, or NDBM_File, or ODBM_File
- $db = tie %hash, ...
- $db->Filter_Push(Fetch => sub {...},
- Store => sub {...});
- $db->Filter_Push('my_filter1');
- $db->Filter_Push('my_filter2', params...);
- $db->Filter_Key_Push(...) ;
- $db->Filter_Value_Push(...) ;
- $db->Filter_Pop();
- $db->Filtered();
- package DBM_Filter::my_filter1;
- sub Store { ... }
- sub Fetch { ... }
- 1;
- package DBM_Filter::my_filter2;
- sub Filter
- {
- my @opts = @_;
- ...
- return (
- sub Store { ... },
- sub Fetch { ... } );
- }
- 1;
This module provides an interface that allows filters to be applied to tied Hashes associated with DBM files. It builds on the DBM Filter hooks that are present in all the *DB*_File modules included with the standard Perl source distribution from version 5.6.1 onwards. In addition to the *DB*_File modules distributed with Perl, the BerkeleyDB module, available on CPAN, supports the DBM Filter hooks. See perldbmfilter for more details on the DBM Filter hooks.
A DBM Filter allows the keys and/or values in a tied hash to be modified by some user-defined code just before it is written to the DBM file and just after it is read back from the DBM file. For example, this snippet of code
- $some_hash{"abc"} = 42;
could potentially trigger two filters, one for the writing of the key "abc" and another for writing the value 42. Similarly, this snippet
will trigger two filters, one for the reading of the key and one for the reading of the value.
Like the existing DBM Filter functionality, this module arranges for the
$_
variable to be populated with the key or value that a filter will
check. This usually means that most DBM filters tend to be very short.
The main enhancements over the standard DBM Filter hooks are:
A cleaner interface.
The ability to easily apply multiple filters to a single DBM file.
The ability to create "canned" filters. These allow commonly used filters to be packaged into a stand-alone module.
This module will arrange for the following methods to be available via
the object returned from the tie call.
Add a filter to filter stack for the database, $db
. The three formats
vary only in whether they apply to the DBM key, the DBM value or both.
The filter is applied to both keys and values.
The filter is applied to the key only.
The filter is applied to the value only.
Removes the last filter that was applied to the DBM file associated with
$db
, if present.
Returns TRUE if there are any filters applied to the DBM associated
with $db
. Otherwise returns FALSE.
Filters can be created in two main ways
An immediate filter allows you to specify the filter code to be used at the point where the filter is applied to a dbm. In this mode the Filter_*_Push methods expects to receive exactly two parameters.
The code reference associated with Store
will be called before any
key/value is written to the database and the code reference associated
with Fetch
will be called after any key/value is read from the
database.
For example, here is a sample filter that adds a trailing NULL character to all strings before they are written to the DBM file, and removes the trailing NULL when they are read from the DBM file
Points to note:
Both the Store and Fetch filters manipulate $_
.
Immediate filters are useful for one-off situations. For more generic problems it can be useful to package the filter up in its own module.
The usage is for a canned filter is:
- $db->Filter_Push("name", params)
where
is the name of the module to load. If the string specified does not contain the package separator characters "::", it is assumed to refer to the full module name "DBM_Filter::name". This means that the full names for canned filters, "null" and "utf8", included with this module are:
- DBM_Filter::null
- DBM_Filter::utf8
any optional parameters that need to be sent to the filter. See the encode filter for an example of a module that uses parameters.
The module that implements the canned filter can take one of two forms. Here is a template for the first
Notes:
The package name uses the DBM_Filter::
prefix.
The module must have both a Store and a Fetch method. If only one is present, or neither are present, a fatal error will be thrown.
The second form allows the filter to hold state information using a closure, thus:
In this instance the "Store" and "Fetch" methods are encapsulated inside a "Filter" method.
A number of canned filers are provided with this module. They cover a number of the main areas that filters are needed when interfacing with DBM files. They also act as templates for your own filters.
The filter included are:
This module will ensure that all data written to the DBM will be encoded in UTF-8.
This module needs the Encode module.
Allows you to choose the character encoding will be store in the DBM file.
This filter will compress all data before it is written to the database and uncompressed it on reading.
This module needs Compress::Zlib.
This module is used when interoperating with a C/C++ application that uses a C int as either the key and/or value in the DBM file.
This module ensures that all data written to the DBM file is null terminated. This is useful when you have a perl script that needs to interoperate with a DBM file that a C program also uses. A fairly common issue is for the C application to include the terminating null in a string when it writes to the DBM file. This filter will ensure that all data written to the DBM file can be read by the C application.
When writing a DBM filter it is very important to ensure that it is possible to retrieve all data that you have written when the DBM filter is in place. In practice, this means that whatever transformation is applied to the data in the Store method, the exact inverse operation should be applied in the Fetch method.
If you don't provide an exact inverse transformation, you will find that code like this will not behave as you expect.
Depending on the transformation, you will find that one or more of the following will happen
The loop will never terminate.
Too few records will be retrieved.
Too many will be retrieved.
The loop will do the right thing for a while, but it will unexpectedly fail.
This is just a restatement of the previous section. Unless you are completely certain you know what you are doing, avoid mixing filtered & non-filtered data.
Say you need to interoperate with a legacy C application that stores keys as C ints and the values and null terminated UTF-8 strings. Here is how you would set that up
<DB_File>, GDBM_File, NDBM_File, ODBM_File, SDBM_File, perldbmfilter
Paul Marquess <pmqs@cpan.org>
DB_File - Perl5 access to Berkeley DB version 1.x
- use DB_File;
- [$X =] tie %hash, 'DB_File', [$filename, $flags, $mode, $DB_HASH] ;
- [$X =] tie %hash, 'DB_File', $filename, $flags, $mode, $DB_BTREE ;
- [$X =] tie @array, 'DB_File', $filename, $flags, $mode, $DB_RECNO ;
- $status = $X->del($key [, $flags]) ;
- $status = $X->put($key, $value [, $flags]) ;
- $status = $X->get($key, $value [, $flags]) ;
- $status = $X->seq($key, $value, $flags) ;
- $status = $X->sync([$flags]) ;
- $status = $X->fd ;
- # BTREE only
- $count = $X->get_dup($key) ;
- @list = $X->get_dup($key) ;
- %list = $X->get_dup($key, 1) ;
- $status = $X->find_dup($key, $value) ;
- $status = $X->del_dup($key, $value) ;
- # RECNO only
- $a = $X->length;
- $a = $X->pop ;
- $X->push(list);
- $a = $X->shift;
- $X->unshift(list);
- @r = $X->splice(offset, length, elements);
- # DBM Filters
- $old_filter = $db->filter_store_key ( sub { ... } ) ;
- $old_filter = $db->filter_store_value( sub { ... } ) ;
- $old_filter = $db->filter_fetch_key ( sub { ... } ) ;
- $old_filter = $db->filter_fetch_value( sub { ... } ) ;
- untie %hash ;
- untie @array ;
DB_File is a module which allows Perl programs to make use of the facilities provided by Berkeley DB version 1.x (if you have a newer version of DB, see Using DB_File with Berkeley DB version 2 or greater). It is assumed that you have a copy of the Berkeley DB manual pages at hand when reading this documentation. The interface defined here mirrors the Berkeley DB interface closely.
Berkeley DB is a C library which provides a consistent interface to a number of database formats. DB_File provides an interface to all three of the database types currently supported by Berkeley DB.
The file types are:
This database type allows arbitrary key/value pairs to be stored in data files. This is equivalent to the functionality provided by other hashing packages like DBM, NDBM, ODBM, GDBM, and SDBM. Remember though, the files created using DB_HASH are not compatible with any of the other packages mentioned.
A default hashing algorithm, which will be adequate for most applications, is built into Berkeley DB. If you do need to use your own hashing algorithm it is possible to write your own in Perl and have DB_File use it instead.
The btree format allows arbitrary key/value pairs to be stored in a sorted, balanced binary tree.
As with the DB_HASH format, it is possible to provide a user defined Perl routine to perform the comparison of keys. By default, though, the keys are stored in lexical order.
DB_RECNO allows both fixed-length and variable-length flat text files to be manipulated using the same key/value pair interface as in DB_HASH and DB_BTREE. In this case the key will consist of a record (line) number.
Although DB_File is intended to be used with Berkeley DB version 1, it can also be used with version 2, 3 or 4. In this case the interface is limited to the functionality provided by Berkeley DB 1.x. Anywhere the version 2 or greater interface differs, DB_File arranges for it to work like version 1. This feature allows DB_File scripts that were built with version 1 to be migrated to version 2 or greater without any changes.
If you want to make use of the new features available in Berkeley DB 2.x or greater, use the Perl module BerkeleyDB instead.
Note: The database file format has changed multiple times in Berkeley
DB version 2, 3 and 4. If you cannot recreate your databases, you
must dump any existing databases with either the db_dump
or the
db_dump185
utility that comes with Berkeley DB.
Once you have rebuilt DB_File to use Berkeley DB version 2 or greater,
your databases can be recreated using db_load
. Refer to the Berkeley DB
documentation for further details.
Please read COPYRIGHT before using version 2.x or greater of Berkeley DB with DB_File.
DB_File allows access to Berkeley DB files using the tie() mechanism in Perl 5 (for full details, see tie()). This facility allows DB_File to access Berkeley DB files using either an associative array (for DB_HASH & DB_BTREE file types) or an ordinary array (for the DB_RECNO file type).
In addition to the tie() interface, it is also possible to access most of the functions provided in the Berkeley DB API directly. See THE API INTERFACE.
Berkeley DB uses the function dbopen() to open or create a database. Here is the C prototype for dbopen():
The parameter type
is an enumeration which specifies which of the 3
interface methods (DB_HASH, DB_BTREE or DB_RECNO) is to be used.
Depending on which of these is actually chosen, the final parameter,
openinfo points to a data structure which allows tailoring of the
specific interface method.
This interface is handled slightly differently in DB_File. Here is an equivalent call using DB_File:
- tie %array, 'DB_File', $filename, $flags, $mode, $DB_HASH ;
The filename
, flags
and mode
parameters are the direct
equivalent of their dbopen() counterparts. The final parameter $DB_HASH
performs the function of both the type
and openinfo
parameters in
dbopen().
In the example above $DB_HASH is actually a pre-defined reference to a hash object. DB_File has three of these pre-defined references. Apart from $DB_HASH, there is also $DB_BTREE and $DB_RECNO.
The keys allowed in each of these pre-defined references is limited to
the names used in the equivalent C structure. So, for example, the
$DB_HASH reference will only allow keys called bsize
, cachesize
,
ffactor
, hash
, lorder
and nelem
.
To change one of these elements, just assign to it like this:
- $DB_HASH->{'cachesize'} = 10000 ;
The three predefined variables $DB_HASH, $DB_BTREE and $DB_RECNO are usually adequate for most applications. If you do need to create extra instances of these objects, constructors are available for each file type.
Here are examples of the constructors and the valid options available for DB_HASH, DB_BTREE and DB_RECNO respectively.
- $a = new DB_File::HASHINFO ;
- $a->{'bsize'} ;
- $a->{'cachesize'} ;
- $a->{'ffactor'};
- $a->{'hash'} ;
- $a->{'lorder'} ;
- $a->{'nelem'} ;
- $b = new DB_File::BTREEINFO ;
- $b->{'flags'} ;
- $b->{'cachesize'} ;
- $b->{'maxkeypage'} ;
- $b->{'minkeypage'} ;
- $b->{'psize'} ;
- $b->{'compare'} ;
- $b->{'prefix'} ;
- $b->{'lorder'} ;
- $c = new DB_File::RECNOINFO ;
- $c->{'bval'} ;
- $c->{'cachesize'} ;
- $c->{'psize'} ;
- $c->{'flags'} ;
- $c->{'lorder'} ;
- $c->{'reclen'} ;
- $c->{'bfname'} ;
The values stored in the hashes above are mostly the direct equivalent of their C counterpart. Like their C counterparts, all are set to a default values - that means you don't have to set all of the values when you only want to change one. Here is an example:
- $a = new DB_File::HASHINFO ;
- $a->{'cachesize'} = 12345 ;
- tie %y, 'DB_File', "filename", $flags, 0777, $a ;
A few of the options need extra discussion here. When used, the C
equivalent of the keys hash
, compare
and prefix
store pointers
to C functions. In DB_File these keys are used to store references
to Perl subs. Below are templates for each of the subs:
- sub hash
- {
- my ($data) = @_ ;
- ...
- # return the hash value for $data
- return $hash ;
- }
- sub compare
- {
- my ($key, $key2) = @_ ;
- ...
- # return 0 if $key1 eq $key2
- # -1 if $key1 lt $key2
- # 1 if $key1 gt $key2
- return (-1 , 0 or 1) ;
- }
- sub prefix
- {
- my ($key, $key2) = @_ ;
- ...
- # return number of bytes of $key2 which are
- # necessary to determine that it is greater than $key1
- return $bytes ;
- }
See Changing the BTREE sort order for an example of using the
compare
template.
If you are using the DB_RECNO interface and you intend making use of
bval
, you should check out The 'bval' Option.
It is possible to omit some or all of the final 4 parameters in the
call to tie and let them take default values. As DB_HASH is the most
common file format used, the call:
- tie %A, "DB_File", "filename" ;
is equivalent to:
- tie %A, "DB_File", "filename", O_CREAT|O_RDWR, 0666, $DB_HASH ;
It is also possible to omit the filename parameter as well, so the call:
- tie %A, "DB_File" ;
is equivalent to:
See In Memory Databases for a discussion on the use of undef
in place of a filename.
Berkeley DB allows the creation of in-memory databases by using NULL
(that is, a (char *)0 in C) in place of the filename. DB_File
uses undef instead of NULL to provide this functionality.
The DB_HASH file format is probably the most commonly used of the three file formats that DB_File supports. It is also very straightforward to use.
This example shows how to create a database, add key/value pairs to the database, delete keys/value pairs and finally how to enumerate the contents of the database.
- use warnings ;
- use strict ;
- use DB_File ;
- our (%h, $k, $v) ;
- unlink "fruit" ;
- tie %h, "DB_File", "fruit", O_RDWR|O_CREAT, 0666, $DB_HASH
- or die "Cannot open file 'fruit': $!\n";
- # Add a few key/value pairs to the file
- $h{"apple"} = "red" ;
- $h{"orange"} = "orange" ;
- $h{"banana"} = "yellow" ;
- $h{"tomato"} = "red" ;
- # Check for existence of a key
- print "Banana Exists\n\n" if $h{"banana"} ;
- # Delete a key/value pair.
- delete $h{"apple"} ;
- # print the contents of the file
- while (($k, $v) = each %h)
- { print "$k -> $v\n" }
- untie %h ;
here is the output:
- Banana Exists
- orange -> orange
- tomato -> red
- banana -> yellow
Note that the like ordinary associative arrays, the order of the keys retrieved is in an apparently random order.
The DB_BTREE format is useful when you want to store data in a given order. By default the keys will be stored in lexical order, but as you will see from the example shown in the next section, it is very easy to define your own sorting function.
This script shows how to override the default sorting algorithm that BTREE uses. Instead of using the normal lexical ordering, a case insensitive compare function will be used.
- use warnings ;
- use strict ;
- use DB_File ;
- my %h ;
- sub Compare
- {
- my ($key1, $key2) = @_ ;
- "\L$key1" cmp "\L$key2" ;
- }
- # specify the Perl sub that will do the comparison
- $DB_BTREE->{'compare'} = \&Compare ;
- unlink "tree" ;
- tie %h, "DB_File", "tree", O_RDWR|O_CREAT, 0666, $DB_BTREE
- or die "Cannot open file 'tree': $!\n" ;
- # Add a key/value pair to the file
- $h{'Wall'} = 'Larry' ;
- $h{'Smith'} = 'John' ;
- $h{'mouse'} = 'mickey' ;
- $h{'duck'} = 'donald' ;
- # Delete
- delete $h{"duck"} ;
- # Cycle through the keys printing them in order.
- # Note it is not necessary to sort the keys as
- # the btree will have kept them in order automatically.
- foreach (keys %h)
- { print "$_\n" }
- untie %h ;
Here is the output from the code above.
- mouse
- Smith
- Wall
There are a few point to bear in mind if you want to change the ordering in a BTREE database:
The new compare function must be specified when you create the database.
You cannot change the ordering once the database has been created. Thus you must use the same compare function every time you access the database.
Duplicate keys are entirely defined by the comparison function. In the case-insensitive example above, the keys: 'KEY' and 'key' would be considered duplicates, and assigning to the second one would overwrite the first. If duplicates are allowed for (with the R_DUP flag discussed below), only a single copy of duplicate keys is stored in the database --- so (again with example above) assigning three values to the keys: 'KEY', 'Key', and 'key' would leave just the first key: 'KEY' in the database with three values. For some situations this results in information loss, so care should be taken to provide fully qualified comparison functions when necessary. For example, the above comparison routine could be modified to additionally compare case-sensitively if two keys are equal in the case insensitive comparison:
And now you will only have duplicates when the keys themselves are truly the same. (note: in versions of the db library prior to about November 1996, such duplicate keys were retained so it was possible to recover the original keys in sets of keys that compared as equal).
The BTREE file type optionally allows a single key to be associated
with an arbitrary number of values. This option is enabled by setting
the flags element of $DB_BTREE
to R_DUP when creating the database.
There are some difficulties in using the tied hash interface if you want to manipulate a BTREE database with duplicate keys. Consider this code:
- use warnings ;
- use strict ;
- use DB_File ;
- my ($filename, %h) ;
- $filename = "tree" ;
- unlink $filename ;
- # Enable duplicate records
- $DB_BTREE->{'flags'} = R_DUP ;
- tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE
- or die "Cannot open $filename: $!\n";
- # Add some key/value pairs to the file
- $h{'Wall'} = 'Larry' ;
- $h{'Wall'} = 'Brick' ; # Note the duplicate key
- $h{'Wall'} = 'Brick' ; # Note the duplicate key and value
- $h{'Smith'} = 'John' ;
- $h{'mouse'} = 'mickey' ;
- # iterate through the associative array
- # and print each key/value pair.
- foreach (sort keys %h)
- { print "$_ -> $h{$_}\n" }
- untie %h ;
Here is the output:
- Smith -> John
- Wall -> Larry
- Wall -> Larry
- Wall -> Larry
- mouse -> mickey
As you can see 3 records have been successfully created with key Wall
- the only thing is, when they are retrieved from the database they
seem to have the same value, namely Larry
. The problem is caused
by the way that the associative array interface works. Basically, when
the associative array interface is used to fetch the value associated
with a given key, it will only ever retrieve the first value.
Although it may not be immediately obvious from the code above, the associative array interface can be used to write values with duplicate keys, but it cannot be used to read them back from the database.
The way to get around this problem is to use the Berkeley DB API method
called seq
. This method allows sequential access to key/value
pairs. See THE API INTERFACE for details of both the seq
method
and the API in general.
Here is the script above rewritten using the seq
API method.
- use warnings ;
- use strict ;
- use DB_File ;
- my ($filename, $x, %h, $status, $key, $value) ;
- $filename = "tree" ;
- unlink $filename ;
- # Enable duplicate records
- $DB_BTREE->{'flags'} = R_DUP ;
- $x = tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE
- or die "Cannot open $filename: $!\n";
- # Add some key/value pairs to the file
- $h{'Wall'} = 'Larry' ;
- $h{'Wall'} = 'Brick' ; # Note the duplicate key
- $h{'Wall'} = 'Brick' ; # Note the duplicate key and value
- $h{'Smith'} = 'John' ;
- $h{'mouse'} = 'mickey' ;
- # iterate through the btree using seq
- # and print each key/value pair.
- $key = $value = 0 ;
- for ($status = $x->seq($key, $value, R_FIRST) ;
- $status == 0 ;
- $status = $x->seq($key, $value, R_NEXT) )
- { print "$key -> $value\n" }
- undef $x ;
- untie %h ;
that prints:
- Smith -> John
- Wall -> Brick
- Wall -> Brick
- Wall -> Larry
- mouse -> mickey
This time we have got all the key/value pairs, including the multiple
values associated with the key Wall
.
To make life easier when dealing with duplicate keys, DB_File comes with a few utility methods.
The get_dup
method assists in
reading duplicate values from BTREE databases. The method can take the
following forms:
- $count = $x->get_dup($key) ;
- @list = $x->get_dup($key) ;
- %list = $x->get_dup($key, 1) ;
In a scalar context the method returns the number of values associated
with the key, $key
.
In list context, it returns all the values which match $key
. Note
that the values will be returned in an apparently random order.
In list context, if the second parameter is present and evaluates TRUE, the method returns an associative array. The keys of the associative array correspond to the values that matched in the BTREE and the values of the array are a count of the number of times that particular value occurred in the BTREE.
So assuming the database created above, we can use get_dup
like
this:
- use warnings ;
- use strict ;
- use DB_File ;
- my ($filename, $x, %h) ;
- $filename = "tree" ;
- # Enable duplicate records
- $DB_BTREE->{'flags'} = R_DUP ;
- $x = tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE
- or die "Cannot open $filename: $!\n";
- my $cnt = $x->get_dup("Wall") ;
- print "Wall occurred $cnt times\n" ;
- my %hash = $x->get_dup("Wall", 1) ;
- print "Larry is there\n" if $hash{'Larry'} ;
- print "There are $hash{'Brick'} Brick Walls\n" ;
- my @list = sort $x->get_dup("Wall") ;
- print "Wall => [@list]\n" ;
- @list = $x->get_dup("Smith") ;
- print "Smith => [@list]\n" ;
- @list = $x->get_dup("Dog") ;
- print "Dog => [@list]\n" ;
and it will print:
- Wall occurred 3 times
- Larry is there
- There are 2 Brick Walls
- Wall => [Brick Brick Larry]
- Smith => [John]
- Dog => []
- $status = $X->find_dup($key, $value) ;
This method checks for the existence of a specific key/value pair. If the pair exists, the cursor is left pointing to the pair and the method returns 0. Otherwise the method returns a non-zero value.
Assuming the database from the previous example:
- use warnings ;
- use strict ;
- use DB_File ;
- my ($filename, $x, %h, $found) ;
- $filename = "tree" ;
- # Enable duplicate records
- $DB_BTREE->{'flags'} = R_DUP ;
- $x = tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE
- or die "Cannot open $filename: $!\n";
- $found = ( $x->find_dup("Wall", "Larry") == 0 ? "" : "not") ;
- print "Larry Wall is $found there\n" ;
- $found = ( $x->find_dup("Wall", "Harry") == 0 ? "" : "not") ;
- print "Harry Wall is $found there\n" ;
- undef $x ;
- untie %h ;
prints this
- Larry Wall is there
- Harry Wall is not there
- $status = $X->del_dup($key, $value) ;
This method deletes a specific key/value pair. It returns 0 if they exist and have been deleted successfully. Otherwise the method returns a non-zero value.
Again assuming the existence of the tree
database
- use warnings ;
- use strict ;
- use DB_File ;
- my ($filename, $x, %h, $found) ;
- $filename = "tree" ;
- # Enable duplicate records
- $DB_BTREE->{'flags'} = R_DUP ;
- $x = tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE
- or die "Cannot open $filename: $!\n";
- $x->del_dup("Wall", "Larry") ;
- $found = ( $x->find_dup("Wall", "Larry") == 0 ? "" : "not") ;
- print "Larry Wall is $found there\n" ;
- undef $x ;
- untie %h ;
prints this
- Larry Wall is not there
The BTREE interface has a feature which allows partial keys to be
matched. This functionality is only available when the seq
method
is used along with the R_CURSOR flag.
- $x->seq($key, $value, R_CURSOR) ;
Here is the relevant quote from the dbopen man page where it defines the use of the R_CURSOR flag with seq:
- Note, for the DB_BTREE access method, the returned key is not
- necessarily an exact match for the specified key. The returned key
- is the smallest key greater than or equal to the specified key,
- permitting partial key matches and range searches.
In the example script below, the match
sub uses this feature to find
and print the first matching key/value pair given a partial key.
- use warnings ;
- use strict ;
- use DB_File ;
- use Fcntl ;
- my ($filename, $x, %h, $st, $key, $value) ;
- sub match
- {
- my $key = shift ;
- my $value = 0;
- my $orig_key = $key ;
- $x->seq($key, $value, R_CURSOR) ;
- print "$orig_key\t-> $key\t-> $value\n" ;
- }
- $filename = "tree" ;
- unlink $filename ;
- $x = tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE
- or die "Cannot open $filename: $!\n";
- # Add some key/value pairs to the file
- $h{'mouse'} = 'mickey' ;
- $h{'Wall'} = 'Larry' ;
- $h{'Walls'} = 'Brick' ;
- $h{'Smith'} = 'John' ;
- $key = $value = 0 ;
- print "IN ORDER\n" ;
- for ($st = $x->seq($key, $value, R_FIRST) ;
- $st == 0 ;
- $st = $x->seq($key, $value, R_NEXT) )
- { print "$key -> $value\n" }
- print "\nPARTIAL MATCH\n" ;
- match "Wa" ;
- match "A" ;
- match "a" ;
- undef $x ;
- untie %h ;
Here is the output:
- IN ORDER
- Smith -> John
- Wall -> Larry
- Walls -> Brick
- mouse -> mickey
- PARTIAL MATCH
- Wa -> Wall -> Larry
- A -> Smith -> John
- a -> mouse -> mickey
DB_RECNO provides an interface to flat text files. Both variable and fixed length records are supported.
In order to make RECNO more compatible with Perl, the array offset for all RECNO arrays begins at 0 rather than 1 as in Berkeley DB.
As with normal Perl arrays, a RECNO array can be accessed using negative indexes. The index -1 refers to the last element of the array, -2 the second last, and so on. Attempting to access an element before the start of the array will raise a fatal run-time error.
The operation of the bval option warrants some discussion. Here is the definition of bval from the Berkeley DB 1.85 recno manual page:
- The delimiting byte to be used to mark the end of a
- record for variable-length records, and the pad charac-
- ter for fixed-length records. If no value is speci-
- fied, newlines (``\n'') are used to mark the end of
- variable-length records and fixed-length records are
- padded with spaces.
The second sentence is wrong. In actual fact bval will only default to
"\n"
when the openinfo parameter in dbopen is NULL. If a non-NULL
openinfo parameter is used at all, the value that happens to be in bval
will be used. That means you always have to specify bval when making
use of any of the options in the openinfo parameter. This documentation
error will be fixed in the next release of Berkeley DB.
That clarifies the situation with regards Berkeley DB itself. What about DB_File? Well, the behavior defined in the quote above is quite useful, so DB_File conforms to it.
That means that you can specify other options (e.g. cachesize) and
still have bval default to "\n"
for variable length records, and
space for fixed length records.
Also note that the bval option only allows you to specify a single byte as a delimiter.
Here is a simple example that uses RECNO (if you are using a version of Perl earlier than 5.004_57 this example won't work -- see Extra RECNO Methods for a workaround).
- use warnings ;
- use strict ;
- use DB_File ;
- my $filename = "text" ;
- unlink $filename ;
- my @h ;
- tie @h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_RECNO
- or die "Cannot open file 'text': $!\n" ;
- # Add a few key/value pairs to the file
- $h[0] = "orange" ;
- $h[1] = "blue" ;
- $h[2] = "yellow" ;
- push @h, "green", "black" ;
- my $elements = scalar @h ;
- print "The array contains $elements entries\n" ;
- my $last = pop @h ;
- print "popped $last\n" ;
- unshift @h, "white" ;
- my $first = shift @h ;
- print "shifted $first\n" ;
- # Check for existence of a key
- print "Element 1 Exists with value $h[1]\n" if $h[1] ;
- # use a negative index
- print "The last element is $h[-1]\n" ;
- print "The 2nd last element is $h[-2]\n" ;
- untie @h ;
Here is the output from the script:
- The array contains 5 entries
- popped black
- shifted white
- Element 1 Exists with value blue
- The last element is green
- The 2nd last element is yellow
If you are using a version of Perl earlier than 5.004_57, the tied
array interface is quite limited. In the example script above
push, pop, shift, unshift
or determining the array length will not work with a tied array.
To make the interface more useful for older versions of Perl, a number of methods are supplied with DB_File to simulate the missing array operations. All these methods are accessed via the object returned from the tie call.
Here are the methods:
Pushes the elements of list
to the end of the array.
Removes and returns the last element of the array.
Removes and returns the first element of the array.
Pushes the elements of list
to the start of the array.
Returns the number of elements in the array.
Returns a splice of the array.
Here is a more complete example that makes use of some of the methods described above. It also makes use of the API interface directly (see THE API INTERFACE).
- use warnings ;
- use strict ;
- my (@h, $H, $file, $i) ;
- use DB_File ;
- use Fcntl ;
- $file = "text" ;
- unlink $file ;
- $H = tie @h, "DB_File", $file, O_RDWR|O_CREAT, 0666, $DB_RECNO
- or die "Cannot open file $file: $!\n" ;
- # first create a text file to play with
- $h[0] = "zero" ;
- $h[1] = "one" ;
- $h[2] = "two" ;
- $h[3] = "three" ;
- $h[4] = "four" ;
- # Print the records in order.
- #
- # The length method is needed here because evaluating a tied
- # array in a scalar context does not return the number of
- # elements in the array.
- print "\nORIGINAL\n" ;
- foreach $i (0 .. $H->length - 1) {
- print "$i: $h[$i]\n" ;
- }
- # use the push & pop methods
- $a = $H->pop ;
- $H->push("last") ;
- print "\nThe last record was [$a]\n" ;
- # and the shift & unshift methods
- $a = $H->shift ;
- $H->unshift("first") ;
- print "The first record was [$a]\n" ;
- # Use the API to add a new record after record 2.
- $i = 2 ;
- $H->put($i, "Newbie", R_IAFTER) ;
- # and a new record before record 1.
- $i = 1 ;
- $H->put($i, "New One", R_IBEFORE) ;
- # delete record 3
- $H->del(3) ;
- # now print the records in reverse order
- print "\nREVERSE\n" ;
- for ($i = $H->length - 1 ; $i >= 0 ; -- $i)
- { print "$i: $h[$i]\n" }
- # same again, but use the API functions instead
- print "\nREVERSE again\n" ;
- my ($s, $k, $v) = (0, 0, 0) ;
- for ($s = $H->seq($k, $v, R_LAST) ;
- $s == 0 ;
- $s = $H->seq($k, $v, R_PREV))
- { print "$k: $v\n" }
- undef $H ;
- untie @h ;
and this is what it outputs:
- ORIGINAL
- 0: zero
- 1: one
- 2: two
- 3: three
- 4: four
- The last record was [four]
- The first record was [zero]
- REVERSE
- 5: last
- 4: three
- 3: Newbie
- 2: one
- 1: New One
- 0: first
- REVERSE again
- 5: last
- 4: three
- 3: Newbie
- 2: one
- 1: New One
- 0: first
Notes:
Rather than iterating through the array, @h
like this:
- foreach $i (@h)
it is necessary to use either this:
- foreach $i (0 .. $H->length - 1)
or this:
- for ($a = $H->get($k, $v, R_FIRST) ;
- $a == 0 ;
- $a = $H->get($k, $v, R_NEXT) )
Notice that both times the put
method was used the record index was
specified using a variable, $i
, rather than the literal value
itself. This is because put
will return the record number of the
inserted line via that parameter.
As well as accessing Berkeley DB using a tied hash or array, it is also possible to make direct use of most of the API functions defined in the Berkeley DB documentation.
To do this you need to store a copy of the object returned from the tie.
- $db = tie %hash, "DB_File", "filename" ;
Once you have done that, you can access the Berkeley DB API functions as DB_File methods directly like this:
- $db->put($key, $value, R_NOOVERWRITE) ;
Important: If you have saved a copy of the object returned from
tie, the underlying database file will not be closed until both
the tied variable is untied and all copies of the saved object are
destroyed.
See The untie() Gotcha for more details.
All the functions defined in dbopen are available except for close() and dbopen() itself. The DB_File method interface to the supported functions have been implemented to mirror the way Berkeley DB works whenever possible. In particular note that:
The methods return a status value. All return 0 on success.
All return -1 to signify an error and set $!
to the exact
error code. The return code 1 generally (but not always) means that the
key specified did not exist in the database.
Other return codes are defined. See below and in the Berkeley DB documentation for details. The Berkeley DB documentation should be used as the definitive source.
Whenever a Berkeley DB function returns data via one of its parameters, the equivalent DB_File method does exactly the same.
If you are careful, it is possible to mix API calls with the tied hash/array interface in the same piece of code. Although only a few of the methods used to implement the tied interface currently make use of the cursor, you should always assume that the cursor has been changed any time the tied hash/array interface is used. As an example, this code will probably not do what you expect:
- $X = tie %x, 'DB_File', $filename, O_RDWR|O_CREAT, 0777, $DB_BTREE
- or die "Cannot tie $filename: $!" ;
- # Get the first key/value pair and set the cursor
- $X->seq($key, $value, R_FIRST) ;
- # this line will modify the cursor
- $count = scalar keys %x ;
- # Get the second key/value pair.
- # oops, it didn't, it got the last key/value pair!
- $X->seq($key, $value, R_NEXT) ;
The code above can be rearranged to get around the problem, like this:
- $X = tie %x, 'DB_File', $filename, O_RDWR|O_CREAT, 0777, $DB_BTREE
- or die "Cannot tie $filename: $!" ;
- # this line will modify the cursor
- $count = scalar keys %x ;
- # Get the first key/value pair and set the cursor
- $X->seq($key, $value, R_FIRST) ;
- # Get the second key/value pair.
- # worked this time.
- $X->seq($key, $value, R_NEXT) ;
All the constants defined in dbopen for use in the flags parameters in the methods defined below are also available. Refer to the Berkeley DB documentation for the precise meaning of the flags values.
Below is a list of the methods available.
Given a key ($key
) this method reads the value associated with it
from the database. The value read from the database is returned in the
$value
parameter.
If the key does not exist the method returns 1.
No flags are currently defined for this method.
Stores the key/value pair in the database.
If you use either the R_IAFTER or R_IBEFORE flags, the $key
parameter
will have the record number of the inserted key/value pair set.
Valid flags are R_CURSOR, R_IAFTER, R_IBEFORE, R_NOOVERWRITE and R_SETCURSOR.
Removes all key/value pairs with key $key
from the database.
A return code of 1 means that the requested key was not in the database.
R_CURSOR is the only valid flag at present.
Returns the file descriptor for the underlying database.
See Locking: The Trouble with fd for an explanation for why you should
not use fd
to lock your database.
This interface allows sequential retrieval from the database. See dbopen for full details.
Both the $key
and $value
parameters will be set to the key/value
pair read from the database.
The flags parameter is mandatory. The valid flag values are R_CURSOR, R_FIRST, R_LAST, R_NEXT and R_PREV.
Flushes any cached buffers to disk.
R_RECNOSYNC is the only valid flag at present.
A DBM Filter is a piece of code that is be used when you always want to make the same transformation to all keys and/or values in a DBM database.
There are four methods associated with DBM Filters. All work identically, and each is used to install (or uninstall) a single DBM Filter. Each expects a single parameter, namely a reference to a sub. The only difference between them is the place that the filter is installed.
To summarise:
If a filter has been installed with this method, it will be invoked every time you write a key to a DBM database.
If a filter has been installed with this method, it will be invoked every time you write a value to a DBM database.
If a filter has been installed with this method, it will be invoked every time you read a key from a DBM database.
If a filter has been installed with this method, it will be invoked every time you read a value from a DBM database.
You can use any combination of the methods, from none, to all four.
All filter methods return the existing filter, if present, or undef
in not.
To delete a filter pass undef to it.
When each filter is called by Perl, a local copy of $_
will contain
the key or value to be filtered. Filtering is achieved by modifying
the contents of $_
. The return code from the filter is ignored.
Consider the following scenario. You have a DBM database that you need to share with a third-party C application. The C application assumes that all keys and values are NULL terminated. Unfortunately when Perl writes to DBM databases it doesn't use NULL termination, so your Perl application will have to manage NULL termination itself. When you write to the database you will have to use something like this:
- $hash{"$key\0"} = "$value\0" ;
Similarly the NULL needs to be taken into account when you are considering the length of existing keys/values.
It would be much better if you could ignore the NULL terminations issue in the main application code and have a mechanism that automatically added the terminating NULL to all keys and values whenever you write to the database and have them removed when you read from the database. As I'm sure you have already guessed, this is a problem that DBM Filters can fix very easily.
- use warnings ;
- use strict ;
- use DB_File ;
- my %hash ;
- my $filename = "filt" ;
- unlink $filename ;
- my $db = tie %hash, 'DB_File', $filename, O_CREAT|O_RDWR, 0666, $DB_HASH
- or die "Cannot open $filename: $!\n" ;
- # Install DBM Filters
- $db->filter_fetch_key ( sub { s/\0$// } ) ;
- $db->filter_store_key ( sub { $_ .= "\0" } ) ;
- $db->filter_fetch_value( sub { s/\0$// } ) ;
- $db->filter_store_value( sub { $_ .= "\0" } ) ;
- $hash{"abc"} = "def" ;
- my $a = $hash{"ABC"} ;
- # ...
- undef $db ;
- untie %hash ;
Hopefully the contents of each of the filters should be self-explanatory. Both "fetch" filters remove the terminating NULL, and both "store" filters add a terminating NULL.
Here is another real-life example. By default, whenever Perl writes to a DBM database it always writes the key and value as strings. So when you use this:
- $hash{12345} = "something" ;
the key 12345 will get stored in the DBM database as the 5 byte string
"12345". If you actually want the key to be stored in the DBM database
as a C int, you will have to use pack when writing, and unpack
when reading.
Here is a DBM Filter that does it:
- use warnings ;
- use strict ;
- use DB_File ;
- my %hash ;
- my $filename = "filt" ;
- unlink $filename ;
- my $db = tie %hash, 'DB_File', $filename, O_CREAT|O_RDWR, 0666, $DB_HASH
- or die "Cannot open $filename: $!\n" ;
- $db->filter_fetch_key ( sub { $_ = unpack("i", $_) } ) ;
- $db->filter_store_key ( sub { $_ = pack ("i", $_) } ) ;
- $hash{123} = "def" ;
- # ...
- undef $db ;
- untie %hash ;
This time only two filters have been used -- we only need to manipulate the contents of the key, so it wasn't necessary to install any value filters.
Until version 1.72 of this module, the recommended technique for locking DB_File databases was to flock the filehandle returned from the "fd" function. Unfortunately this technique has been shown to be fundamentally flawed (Kudos to David Harris for tracking this down). Use it at your own peril!
The locking technique went like this.
In simple terms, this is what happens:
Use "tie" to open the database.
Lock the database with fd & flock.
Read & Write to the database.
Unlock and close the database.
Here is the crux of the problem. A side-effect of opening the DB_File database in step 2 is that an initial block from the database will get read from disk and cached in memory.
To see why this is a problem, consider what can happen when two processes, say "A" and "B", both want to update the same DB_File database using the locking steps outlined above. Assume process "A" has already opened the database and has a write lock, but it hasn't actually updated the database yet (it has finished step 2, but not started step 3 yet). Now process "B" tries to open the same database - step 1 will succeed, but it will block on step 2 until process "A" releases the lock. The important thing to notice here is that at this point in time both processes will have cached identical initial blocks from the database.
Now process "A" updates the database and happens to change some of the data held in the initial buffer. Process "A" terminates, flushing all cached data to disk and releasing the database lock. At this point the database on disk will correctly reflect the changes made by process "A".
With the lock released, process "B" can now continue. It also updates the database and unfortunately it too modifies the data that was in its initial buffer. Once that data gets flushed to disk it will overwrite some/all of the changes process "A" made to the database.
The result of this scenario is at best a database that doesn't contain what you expect. At worst the database will corrupt.
The above won't happen every time competing process update the same DB_File database, but it does illustrate why the technique should not be used.
Starting with version 2.x, Berkeley DB has internal support for locking. The companion module to this one, BerkeleyDB, provides an interface to this locking functionality. If you are serious about locking Berkeley DB databases, I strongly recommend using BerkeleyDB.
If using BerkeleyDB isn't an option, there are a number of modules available on CPAN that can be used to implement locking. Each one implements locking differently and has different goals in mind. It is therefore worth knowing the difference, so that you can pick the right one for your application. Here are the three locking wrappers:
A DB_File wrapper which creates copies of the database file for read access, so that you have a kind of a multiversioning concurrent read system. However, updates are still serial. Use for databases where reads may be lengthy and consistency problems may occur.
A DB_File wrapper that has the ability to lock and unlock the database while it is being used. Avoids the tie-before-flock problem by simply re-tie-ing the database when you get or drop a lock. Because of the flexibility in dropping and re-acquiring the lock in the middle of a session, this can be massaged into a system that will work with long updates and/or reads if the application follows the hints in the POD documentation.
An extremely lightweight DB_File wrapper that simply flocks a lockfile before tie-ing the database and drops the lock after the untie. Allows one to use the same lockfile for multiple databases to avoid deadlock problems, if desired. Use for databases where updates are reads are quick and simple flock locking semantics are enough.
There is no technical reason why a Berkeley DB database cannot be shared by both a Perl and a C application.
The vast majority of problems that are reported in this area boil down to the fact that C strings are NULL terminated, whilst Perl strings are not. See DBM FILTERS for a generic way to work around this problem.
Here is a real example. Netscape 2.0 keeps a record of the locations you visit along with the time you last visited them in a DB_HASH database. This is usually stored in the file ~/.netscape/history.db. The key field in the database is the location string and the value field is the time the location was last visited stored as a 4 byte binary value.
If you haven't already guessed, the location string is stored with a terminating NULL. This means you need to be careful when accessing the database.
Here is a snippet of code that is loosely based on Tom Christiansen's ggh script (available from your nearest CPAN archive in authors/id/TOMC/scripts/nshist.gz).
- use warnings ;
- use strict ;
- use DB_File ;
- use Fcntl ;
- my ($dotdir, $HISTORY, %hist_db, $href, $binary_time, $date) ;
- $dotdir = $ENV{HOME} || $ENV{LOGNAME};
- $HISTORY = "$dotdir/.netscape/history.db";
- tie %hist_db, 'DB_File', $HISTORY
- or die "Cannot open $HISTORY: $!\n" ;;
- # Dump the complete database
- while ( ($href, $binary_time) = each %hist_db ) {
- # remove the terminating NULL
- $href =~ s/\x00$// ;
- # convert the binary time into a user friendly string
- $date = localtime unpack("V", $binary_time);
- print "$date $href\n" ;
- }
- # check for the existence of a specific key
- # remember to add the NULL
- if ( $binary_time = $hist_db{"http://mox.perl.com/\x00"} ) {
- $date = localtime unpack("V", $binary_time) ;
- print "Last visited mox.perl.com on $date\n" ;
- }
- else {
- print "Never visited mox.perl.com\n"
- }
- untie %hist_db ;
If you make use of the Berkeley DB API, it is very strongly recommended that you read The untie Gotcha in perltie.
Even if you don't currently make use of the API interface, it is still worth reading it.
Here is an example which illustrates the problem from a DB_File perspective:
When run, the script will produce this error message:
- Cannot tie second time: Invalid argument at bad.file line 14.
Although the error message above refers to the second tie() statement in the script, the source of the problem is really with the untie() statement that precedes it.
Having read perltie you will probably have already guessed that the
error is caused by the extra copy of the tied object stored in $X
.
If you haven't, then the problem boils down to the fact that the
DB_File destructor, DESTROY, will not be called until all
references to the tied object are destroyed. Both the tied variable,
%x
, and $X
above hold a reference to the object. The call to
untie() will destroy the first, but $X
still holds a valid
reference, so the destructor will not get called and the database file
tst.fil will remain open. The fact that Berkeley DB then reports the
attempt to open a database that is already open via the catch-all
"Invalid argument" doesn't help.
If you run the script with the -w
flag the error message becomes:
- untie attempted while 1 inner references still exist at bad.file line 12.
- Cannot tie second time: Invalid argument at bad.file line 14.
which pinpoints the real problem. Finally the script can now be modified to fix the original problem by destroying the API object before the untie:
If you look at the contents of a database file created by DB_File, there can sometimes be part of a Perl script included in it.
This happens because Berkeley DB uses dynamic memory to allocate buffers which will subsequently be written to the database file. Being dynamic, the memory could have been used for anything before DB malloced it. As Berkeley DB doesn't clear the memory once it has been allocated, the unused portions will contain random junk. In the case where a Perl script gets written to the database, the random junk will correspond to an area of dynamic memory that happened to be used during the compilation of the script.
Unless you don't like the possibility of there being part of your Perl scripts embedded in a database file, this is nothing to worry about.
Although DB_File cannot do this directly, there is a module which can layer transparently over DB_File to accomplish this feat.
Check out the MLDBM module, available on CPAN in the directory modules/by-module/MLDBM.
You will get this error message when one of the parameters in the
tie call is wrong. Unfortunately there are quite a few parameters to
get wrong, so it can be difficult to figure out which one it is.
Here are a couple of possibilities:
Attempting to reopen a database without closing it.
Using the O_WRONLY flag.
You will encounter this particular error message when you have the
strict 'subs'
pragma (or the full strict pragma) in your script.
Consider this script:
Running it produces the error in question:
- Bareword "DB_File" not allowed while "strict subs" in use
To get around the error, place the word DB_File
in either single or
double quotes, like this:
- tie %x, "DB_File", "filename" ;
Although it might seem like a real pain, it is really worth the effort
of having a use strict
in all your scripts.
Articles that are either about DB_File or make use of it.
Full-Text Searching in Perl, Tim Kientzle (tkientzle@ddj.com), Dr. Dobb's Journal, Issue 295, January 1999, pp 34-41
Moved to the Changes file.
Some older versions of Berkeley DB had problems with fixed length records using the RECNO file format. This problem has been fixed since version 1.85 of Berkeley DB.
I am sure there are bugs in the code. If you do find any, or can suggest any enhancements, I would welcome your comments.
DB_File comes with the standard Perl source distribution. Look in the directory ext/DB_File. Given the amount of time between releases of Perl the version that ships with Perl is quite likely to be out of date, so the most recent version can always be found on CPAN (see CPAN in perlmodlib for details), in the directory modules/by-module/DB_File.
This version of DB_File will work with either version 1.x, 2.x or 3.x of Berkeley DB, but is limited to the functionality provided by version 1.
The official web site for Berkeley DB is http://www.oracle.com/technology/products/berkeley-db/db/index.html. All versions of Berkeley DB are available there.
Alternatively, Berkeley DB version 1 is available at your nearest CPAN archive in src/misc/db.1.85.tar.gz.
Copyright (c) 1995-2012 Paul Marquess. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Although DB_File is covered by the Perl license, the library it makes use of, namely Berkeley DB, is not. Berkeley DB has its own copyright and its own license. Please take the time to read it.
Here are are few words taken from the Berkeley DB FAQ (at http://www.oracle.com/technology/products/berkeley-db/db/index.html) regarding the license:
- Do I have to license DB to use it in Perl scripts?
- No. The Berkeley DB license requires that software that uses
- Berkeley DB be freely redistributable. In the case of Perl, that
- software is Perl, and not your scripts. Any Perl scripts that you
- write are your property, including scripts that make use of
- Berkeley DB. Neither the Perl license nor the Berkeley DB license
- place any restriction on what you may do with them.
If you are in any doubt about the license situation, contact either the Berkeley DB authors or the author of DB_File. See AUTHOR for details.
perl, dbopen(3), hash(3), recno(3), btree(3), perldbmfilter
The DB_File interface was written by Paul Marquess <pmqs@cpan.org>.
Digest - Modules that calculate message digests
- $md5 = Digest->new("MD5");
- $sha1 = Digest->new("SHA-1");
- $sha256 = Digest->new("SHA-256");
- $sha384 = Digest->new("SHA-384");
- $sha512 = Digest->new("SHA-512");
- $hmac = Digest->HMAC_MD5($key);
The Digest::
modules calculate digests, also called "fingerprints"
or "hashes", of some data, called a message. The digest is (usually)
some small/fixed size string. The actual size of the digest depend of
the algorithm used. The message is simply a sequence of arbitrary
bytes or bits.
An important property of the digest algorithms is that the digest is likely to change if the message change in some way. Another property is that digest functions are one-way functions, that is it should be hard to find a message that correspond to some given digest. Algorithms differ in how "likely" and how "hard", as well as how efficient they are to compute.
Note that the properties of the algorithms change over time, as the algorithms are analyzed and machines grow faster. If your application for instance depends on it being "impossible" to generate the same digest for a different message it is wise to make it easy to plug in stronger algorithms as the one used grow weaker. Using the interface documented here should make it easy to change algorithms later.
All Digest::
modules provide the same programming interface. A
functional interface for simple use, as well as an object oriented
interface that can handle messages of arbitrary length and which can
read files directly.
The digest can be delivered in three formats:
This is the most compact form, but it is not well suited for printing or embedding in places that can't handle arbitrary data.
A twice as long string of lowercase hexadecimal digits.
A string of portable printable characters. This is the base64 encoded representation of the digest with any trailing padding removed. The string will be about 30% longer than the binary version. MIME::Base64 tells you more about this encoding.
The functional interface is simply importable functions with the same name as the algorithm. The functions take the message as argument and return the digest. Example:
- use Digest::MD5 qw(md5);
- $digest = md5($message);
There are also versions of the functions with "_hex" or "_base64" appended to the name, which returns the digest in the indicated form.
The following methods are available for all Digest::
modules:
The constructor returns some object that encapsulate the state of the message-digest algorithm. You can add data to the object and finally ask for the digest. The "XXX" should of course be replaced by the proper name of the digest algorithm you want to use.
The two first forms are simply syntactic sugar which automatically load the right module on first use. The second form allow you to use algorithm names which contains letters which are not legal perl identifiers, e.g. "SHA-1". If no implementation for the given algorithm can be found, then an exception is raised.
If new() is called as an instance method (i.e. $ctx->new) it will just reset the state the object to the state of a newly created object. No new object is created in this case, and the return value is the reference to the object (i.e. $ctx).
The clone method creates a copy of the digest state object and returns a reference to the copy.
This is just an alias for $ctx->new.
The string value of the $data provided as argument is appended to the message we calculate the digest for. The return value is the $ctx object itself.
If more arguments are provided then they are all appended to the message, thus all these lines will have the same effect on the state of the $ctx object:
- $ctx->add("a"); $ctx->add("b"); $ctx->add("c");
- $ctx->add("a")->add("b")->add("c");
- $ctx->add("a", "b", "c");
- $ctx->add("abc");
Most algorithms are only defined for strings of bytes and this method might therefore croak if the provided arguments contain chars with ordinal number above 255.
The $io_handle is read until EOF and the content is appended to the message we calculate the digest for. The return value is the $ctx object itself.
The addfile() method will croak() if it fails reading data for some reason. If it croaks it is unpredictable what the state of the $ctx object will be in. The addfile() method might have been able to read the file partially before it failed. It is probably wise to discard or reset the $ctx object if this occurs.
In most cases you want to make sure that the $io_handle is in "binmode" before you pass it as argument to the addfile() method.
The add_bits() method is an alternative to add() that allow partial bytes to be appended to the message. Most users should just ignore this method as partial bytes is very unlikely to be of any practical use.
The two argument form of add_bits() will add the first $nbits bits
from $data. For the last potentially partial byte only the high order
$nbits % 8
bits are used. If $nbits is greater than length($data) * 8
, then this method would do the same as $ctx->add($data)
.
The one argument form of add_bits() takes a $bitstring of "1" and "0"
chars as argument. It's a shorthand for $ctx->add_bits(pack("B*",
$bitstring), length($bitstring))
.
The return value is the $ctx object itself.
This example shows two calls that should have the same effect:
- $ctx->add_bits("111100001010");
- $ctx->add_bits("\xF0\xA0", 12);
Most digest algorithms are byte based and for these it is not possible to add bits that are not a multiple of 8, and the add_bits() method will croak if you try.
Return the binary digest for the message.
Note that the digest
operation is effectively a destructive,
read-once operation. Once it has been performed, the $ctx object is
automatically reset and can be used to calculate another digest
value. Call $ctx->clone->digest if you want to calculate the digest
without resetting the digest state.
Same as $ctx->digest, but will return the digest in hexadecimal form.
Same as $ctx->digest, but will return the digest as a base64 encoded string.
This table should give some indication on the relative speed of different algorithms. It is sorted by throughput based on a benchmark done with of some implementations of this API:
- Algorithm Size Implementation MB/s
- MD4 128 Digest::MD4 v1.3 165.0
- MD5 128 Digest::MD5 v2.33 98.8
- SHA-256 256 Digest::SHA2 v1.1.0 66.7
- SHA-1 160 Digest::SHA v4.3.1 58.9
- SHA-1 160 Digest::SHA1 v2.10 48.8
- SHA-256 256 Digest::SHA v4.3.1 41.3
- Haval-256 256 Digest::Haval256 v1.0.4 39.8
- SHA-384 384 Digest::SHA2 v1.1.0 19.6
- SHA-512 512 Digest::SHA2 v1.1.0 19.3
- SHA-384 384 Digest::SHA v4.3.1 19.2
- SHA-512 512 Digest::SHA v4.3.1 19.2
- Whirlpool 512 Digest::Whirlpool v1.0.2 13.0
- MD2 128 Digest::MD2 v2.03 9.5
- Adler-32 32 Digest::Adler32 v0.03 1.3
- CRC-16 16 Digest::CRC v0.05 1.1
- CRC-32 32 Digest::CRC v0.05 1.1
- MD5 128 Digest::Perl::MD5 v1.5 1.0
- CRC-CCITT 16 Digest::CRC v0.05 0.8
These numbers was achieved Apr 2004 with ActivePerl-5.8.3 running under Linux on a P4 2.8 GHz CPU. The last 5 entries differ by being pure perl implementations of the algorithms, which explains why they are so slow.
Digest::Adler32, Digest::CRC, Digest::Haval256, Digest::HMAC, Digest::MD2, Digest::MD4, Digest::MD5, Digest::SHA, Digest::SHA1, Digest::SHA2, Digest::Whirlpool
New digest implementations should consider subclassing from Digest::base.
http://en.wikipedia.org/wiki/Cryptographic_hash_function
Gisle Aas <gisle@aas.no>
The Digest::
interface is based on the interface originally
developed by Neil Winton for his MD5
module.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
- Copyright 1998-2006 Gisle Aas.
- Copyright 1995,1996 Neil Winton.
DirHandle - supply object methods for directory handles
The DirHandle
method provide an alternative interface to the
opendir(), closedir(), readdir(), and rewinddir() functions.
The only objective benefit to using DirHandle
is that it avoids
namespace pollution by creating globs to hold directory handles.
Dumpvalue - provides screen dump of Perl data.
A new dumper is created by a call
- $d = Dumpvalue->new(option1 => value1, option2 => value2)
Recognized options:
arrayDepth
, hashDepth
Print only first N elements of arrays and hashes. If false, prints all the elements.
compactDump
, veryCompact
Change style of array and hash dump. If true, short array may be printed on one line.
globPrint
Whether to print contents of globs.
dumpDBFiles
Dump arrays holding contents of debugged files.
dumpPackages
Dump symbol tables of packages.
dumpReused
Dump contents of "reused" addresses.
tick
, quoteHighBit
, printUndef
Change style of string dump. Default value of tick
is auto
, one
can enable either double-quotish dump, or single-quotish by setting it
to " or '. By default, characters with high bit set are printed
as is. If quoteHighBit
is set, they will be quoted.
usageOnly
rudimentary per-package memory usage dump. If set,
dumpvars
calculates total size of strings in variables in the package.
Changes the style of printout of strings. Possible values are
unctrl
and quote
.
Whether to try to find the subroutine name given the reference.
Whether to write the non-overloaded form of the stringify-overloaded objects.
Whether to print chars with high bit set in binary or "as is".
Whether to abort printing if debugger signal flag is raised.
Later in the life of the object the methods may be queries with get() method and set() method (which accept multiple arguments).
- $dumper->dumpValue($value);
- $dumper->dumpValue([$value1, $value2]);
Prints a dump to the currently selected filehandle.
- $dumper->dumpValues($value1, $value2);
Same as $dumper->dumpValue([$value1, $value2]);
.
- my $dump = $dumper->stringify($value [,$noticks] );
Returns the dump of a single scalar without printing. If the second argument is true, the return value does not contain enclosing ticks. Does not handle data structures.
- $dumper->dumpvars('my_package');
- $dumper->dumpvars('my_package', 'foo', '~bar$', '!......');
The optional arguments are considered as literal strings unless they
start with ~
or !
, in which case they are interpreted as regular
expressions (possibly negated).
The second example prints entries with names foo
, and also entries
with names which ends on bar
, or are shorter than 5 chars.
- $d->set_quote('"');
Sets tick
and unctrl
options to suitable values for printout with the
given quote char. Possible values are auto
, ' and ".
- $d->set_unctrl('unctrl');
Sets unctrl
option with checking for an invalid argument.
Possible values are unctrl
and quote
.
- $d->compactDump(1);
Sets compactDump
option. If the value is 1, sets to a reasonable
big number.
- $d->veryCompact(1);
Sets compactDump
and veryCompact
options simultaneously.
- $d->set(option1 => value1, option2 => value2);
- @values = $d->get('option1', 'option2');
DynaLoader - Dynamically load C libraries into Perl code
- package YourPackage;
- require DynaLoader;
- @ISA = qw(... DynaLoader ...);
- bootstrap YourPackage;
- # optional method for 'global' loading
- sub dl_load_flags { 0x01 }
This document defines a standard generic interface to the dynamic linking mechanisms available on many platforms. Its primary purpose is to implement automatic dynamic loading of Perl modules.
This document serves as both a specification for anyone wishing to implement the DynaLoader for a new platform and as a guide for anyone wishing to use the DynaLoader directly in an application.
The DynaLoader is designed to be a very simple high-level interface that is sufficiently general to cover the requirements of SunOS, HP-UX, NeXT, Linux, VMS and other platforms.
It is also hoped that the interface will cover the needs of OS/2, NT
etc and also allow pseudo-dynamic linking (using ld -A
at runtime).
It must be stressed that the DynaLoader, by itself, is practically useless for accessing non-Perl libraries because it provides almost no Perl-to-C 'glue'. There is, for example, no mechanism for calling a C library function or supplying arguments. A C::DynaLib module is available from CPAN sites which performs that function for some common system types. And since the year 2000, there's also Inline::C, a module that allows you to write Perl subroutines in C. Also available from your local CPAN site.
DynaLoader Interface Summary
- @dl_library_path
- @dl_resolve_using
- @dl_require_symbols
- $dl_debug
- @dl_librefs
- @dl_modules
- @dl_shared_objects
- Implemented in:
- bootstrap($modulename) Perl
- @filepaths = dl_findfile(@names) Perl
- $flags = $modulename->dl_load_flags Perl
- $symref = dl_find_symbol_anywhere($symbol) Perl
- $libref = dl_load_file($filename, $flags) C
- $status = dl_unload_file($libref) C
- $symref = dl_find_symbol($libref, $symbol) C
- @symbols = dl_undef_symbols() C
- dl_install_xsub($name, $symref [, $filename]) C
- $message = dl_error C
The standard/default list of directories in which dl_findfile() will search for libraries etc. Directories are searched in order: $dl_library_path[0], [1], ... etc
@dl_library_path is initialised to hold the list of 'normal' directories
(/usr/lib, etc) determined by Configure ($Config{'libpth'}
). This should
ensure portability across a wide range of platforms.
@dl_library_path should also be initialised with any other directories that can be determined from the environment at runtime (such as LD_LIBRARY_PATH for SunOS).
After initialisation @dl_library_path can be manipulated by an application using push and unshift before calling dl_findfile(). Unshift can be used to add directories to the front of the search order either to save search time or to override libraries with the same name in the 'normal' directories.
The load function that dl_load_file() calls may require an absolute pathname. The dl_findfile() function and @dl_library_path can be used to search for and return the absolute pathname for the library/object that you wish to load.
A list of additional libraries or other shared objects which can be used to resolve any undefined symbols that might be generated by a later call to load_file().
This is only required on some platforms which do not handle dependent libraries automatically. For example the Socket Perl extension library (auto/Socket/Socket.so) contains references to many socket functions which need to be resolved when it's loaded. Most platforms will automatically know where to find the 'dependent' library (e.g., /usr/lib/libsocket.so). A few platforms need to be told the location of the dependent library explicitly. Use @dl_resolve_using for this.
Example usage:
- @dl_resolve_using = dl_findfile('-lsocket');
A list of one or more symbol names that are in the library/object file to be dynamically loaded. This is only required on some platforms.
An array of the handles returned by successful calls to dl_load_file(), made by bootstrap, in the order in which they were loaded. Can be used with dl_find_symbol() to look for a symbol in any of the loaded files.
An array of module (package) names that have been bootstrap'ed.
An array of file names for the shared objects that were loaded.
Syntax:
- $message = dl_error();
Error message text from the last failed DynaLoader function. Note that, similar to errno in unix, a successful function call does not reset this message.
Implementations should detect the error as soon as it occurs in any of the other functions and save the corresponding message for later retrieval. This will avoid problems on some platforms (such as SunOS) where the error message is very temporary (e.g., dlerror()).
Internal debugging messages are enabled when $dl_debug is set true. Currently setting $dl_debug only affects the Perl side of the DynaLoader. These messages should help an application developer to resolve any DynaLoader usage problems.
$dl_debug is set to $ENV{'PERL_DL_DEBUG'}
if defined.
For the DynaLoader developer/porter there is a similar debugging variable added to the C code (see dlutils.c) and enabled if Perl was built with the -DDEBUGGING flag. This can also be set via the PERL_DL_DEBUG environment variable. Set to 1 for minimal information or higher for more.
Syntax:
- @filepaths = dl_findfile(@names)
Determine the full paths (including file suffix) of one or more loadable files given their generic names and optionally one or more directories. Searches directories in @dl_library_path by default and returns an empty list if no files were found.
Names can be specified in a variety of platform independent forms. Any names in the form -lname are converted into libname.*, where .* is an appropriate suffix for the platform.
If a name does not already have a suitable prefix and/or suffix then the corresponding file will be searched for by trying combinations of prefix and suffix appropriate to the platform: "$name.o", "lib$name.*" and "$name".
If any directories are included in @names they are searched before @dl_library_path. Directories may be specified as -Ldir. Any other names are treated as filenames to be searched for.
Using arguments of the form -Ldir
and -lname
is recommended.
Example:
- @dl_resolve_using = dl_findfile(qw(-L/usr/5lib -lposix));
Syntax:
- $filepath = dl_expandspec($spec)
Some unusual systems, such as VMS, require special filename handling in order to deal with symbolic names for files (i.e., VMS's Logical Names).
To support these systems a dl_expandspec() function can be implemented either in the dl_*.xs file or code can be added to the dl_expandspec() function in DynaLoader.pm. See DynaLoader_pm.PL for more information.
Syntax:
- $libref = dl_load_file($filename, $flags)
Dynamically load $filename, which must be the path to a shared object or library. An opaque 'library reference' is returned as a handle for the loaded object. Returns undef on error.
The $flags argument to alters dl_load_file behaviour. Assigned bits:
- 0x01 make symbols available for linking later dl_load_file's.
- (only known to work on Solaris 2 using dlopen(RTLD_GLOBAL))
- (ignored under VMS; this is a normal part of image linking)
(On systems that provide a handle for the loaded object such as SunOS and HPUX, $libref will be that handle. On other systems $libref will typically be $filename or a pointer to a buffer containing $filename. The application should not examine or alter $libref in any way.)
This is the function that does the real work. It should use the current values of @dl_require_symbols and @dl_resolve_using if required.
- SunOS: dlopen($filename)
- HP-UX: shl_load($filename)
- Linux: dld_create_reference(@dl_require_symbols); dld_link($filename)
- NeXT: rld_load($filename, @dl_resolve_using)
- VMS: lib$find_image_symbol($filename,$dl_require_symbols[0])
(The dlopen() function is also used by Solaris and some versions of Linux, and is a common choice when providing a "wrapper" on other mechanisms as is done in the OS/2 port.)
Syntax:
- $status = dl_unload_file($libref)
Dynamically unload $libref, which must be an opaque 'library reference' as returned from dl_load_file. Returns one on success and zero on failure.
This function is optional and may not necessarily be provided on all platforms. If it is defined, it is called automatically when the interpreter exits for every shared object or library loaded by DynaLoader::bootstrap. All such library references are stored in @dl_librefs by DynaLoader::Bootstrap as it loads the libraries. The files are unloaded in last-in, first-out order.
This unloading is usually necessary when embedding a shared-object perl (e.g. one configured with -Duseshrplib) within a larger application, and the perl interpreter is created and destroyed several times within the lifetime of the application. In this case it is possible that the system dynamic linker will unload and then subsequently reload the shared libperl without relocating any references to it from any files DynaLoaded by the previous incarnation of the interpreter. As a result, any shared objects opened by DynaLoader may point to a now invalid 'ghost' of the libperl shared object, causing apparently random memory corruption and crashes. This behaviour is most commonly seen when using Apache and mod_perl built with the APXS mechanism.
- SunOS: dlclose($libref)
- HP-UX: ???
- Linux: ???
- NeXT: ???
- VMS: ???
(The dlclose() function is also used by Solaris and some versions of Linux, and is a common choice when providing a "wrapper" on other mechanisms as is done in the OS/2 port.)
Syntax:
- $flags = dl_load_flags $modulename;
Designed to be a method call, and to be overridden by a derived class (i.e. a class which has DynaLoader in its @ISA). The definition in DynaLoader itself returns 0, which produces standard behavior from dl_load_file().
Syntax:
- $symref = dl_find_symbol($libref, $symbol)
Return the address of the symbol $symbol or undef if not found. If the
target system has separate functions to search for symbols of different
types then dl_find_symbol() should search for function symbols first and
then other types.
The exact manner in which the address is returned in $symref is not currently defined. The only initial requirement is that $symref can be passed to, and understood by, dl_install_xsub().
- SunOS: dlsym($libref, $symbol)
- HP-UX: shl_findsym($libref, $symbol)
- Linux: dld_get_func($symbol) and/or dld_get_symbol($symbol)
- NeXT: rld_lookup("_$symbol")
- VMS: lib$find_image_symbol($libref,$symbol)
Syntax:
- $symref = dl_find_symbol_anywhere($symbol)
Applies dl_find_symbol() to the members of @dl_librefs and returns the first match found.
Example
- @symbols = dl_undef_symbols()
Return a list of symbol names which remain undefined after load_file().
Returns ()
if not known. Don't worry if your platform does not provide
a mechanism for this. Most do not need it and hence do not provide it,
they just return an empty list.
Syntax:
- dl_install_xsub($perl_name, $symref [, $filename])
Create a new Perl external subroutine named $perl_name using $symref as a pointer to the function which implements the routine. This is simply a direct call to newXSUB(). Returns a reference to the installed function.
The $filename parameter is used by Perl to identify the source file for the function if required by die(), caller() or the debugger. If $filename is not defined then "DynaLoader" will be used.
Syntax:
bootstrap($module [...])
This is the normal entry point for automatic dynamic loading in Perl.
It performs the following actions:
locates an auto/$module directory by searching @INC
uses dl_findfile() to determine the filename to load
sets @dl_require_symbols to ("boot_$module")
executes an auto/$module/$module.bs file if it exists (typically used to add to @dl_resolve_using any files which are required to load the module on the current platform)
calls dl_load_flags() to determine how to load the file.
calls dl_load_file() to load the file
calls dl_undef_symbols() and warns if any symbols are undefined
calls dl_find_symbol() for "boot_$module"
calls dl_install_xsub() to install it as "${module}::bootstrap"
calls &{"${module}::bootstrap"} to bootstrap the module (actually it uses the function reference returned by dl_install_xsub for speed)
All arguments to bootstrap() are passed to the module's bootstrap function.
The default code generated by xsubpp expects $module [, $version]
If the optional $version argument is not given, it defaults to
$XS_VERSION // $VERSION
in the module's symbol table. The default code
compares the Perl-space version with the version of the compiled XS code,
and croaks with an error if they do not match.
Tim Bunce, 11 August 1994.
This interface is based on the work and comments of (in no particular order): Larry Wall, Robert Sanders, Dean Roehrich, Jeff Okamoto, Anno Siegel, Thomas Neumann, Paul Marquess, Charles Bailey, myself and others.
Larry Wall designed the elegant inherited bootstrap mechanism and implemented the first Perl 5 dynamic loader using it.
Solaris global loading added by Nick Ing-Simmons with design/coding assistance from Tim Bunce, January 1996.
Encode - character encodings in Perl
- use Encode qw(decode encode);
- $characters = decode('UTF-8', $octets, Encode::FB_CROAK);
- $octets = encode('UTF-8', $characters, Encode::FB_CROAK);
Encode consists of a collection of modules whose details are too extensive to fit in one document. This one itself explains the top-level APIs and general topics at a glance. For other topics and more details, see the documentation for these modules:
The Encode
module provides the interface between Perl strings
and the rest of the system. Perl strings are sequences of
characters.
The repertoire of characters that Perl can represent is a superset of those
defined by the Unicode Consortium. On most platforms the ordinal
values of a character as returned by ord(S) is the Unicode
codepoint for that character. The exceptions are platforms where
the legacy encoding is some variant of EBCDIC rather than a superset
of ASCII; see perlebcdic.
During recent history, data is moved around a computer in 8-bit chunks, often called "bytes" but also known as "octets" in standards documents. Perl is widely used to manipulate data of many types: not only strings of characters representing human or computer languages, but also "binary" data, being the machine's representation of numbers, pixels in an image, or just about anything.
When Perl is processing "binary data", the programmer wants Perl to process "sequences of bytes". This is not a problem for Perl: because a byte has 256 possible values, it easily fits in Perl's much larger "logical character".
This document mostly explains the how. perlunitut and perlunifaq explain the why.
A character in the range 0 .. 2**32-1 (or more); what Perl's strings are made of.
A character in the range 0..255; a special case of a Perl character.
8 bits of data, with ordinal values 0..255; term for bytes passed to or from a non-Perl context, such as a disk file, standard I/O stream, database, command-line argument, environment variable, socket etc.
- $octets = encode(ENCODING, STRING[, CHECK])
Encodes the scalar value STRING from Perl's internal form into ENCODING and returns a sequence of octets. ENCODING can be either a canonical name or an alias. For encoding names and aliases, see Defining Aliases. For CHECK, see Handling Malformed Data.
For example, to convert a string from Perl's internal format into ISO-8859-1, also known as Latin1:
- $octets = encode("iso-8859-1", $string);
CAVEAT: When you run $octets = encode("utf8", $string)
, then
$octets might not be equal to $string. Though both contain the
same data, the UTF8 flag for $octets is always off. When you
encode anything, the UTF8 flag on the result is always off, even when it
contains a completely valid utf8 string. See The UTF8 flag below.
If the $string is undef, then undef is returned.
- $string = decode(ENCODING, OCTETS[, CHECK])
This function returns the string that results from decoding the scalar value OCTETS, assumed to be a sequence of octets in ENCODING, into Perl's internal form. The returns the resulting string. As with encode(), ENCODING can be either a canonical name or an alias. For encoding names and aliases, see Defining Aliases; for CHECK, see Handling Malformed Data.
For example, to convert ISO-8859-1 data into a string in Perl's internal format:
- $string = decode("iso-8859-1", $octets);
CAVEAT: When you run $string = decode("utf8", $octets)
, then $string
might not be equal to $octets. Though both contain the same data, the
UTF8 flag for $string is on unless $octets consists entirely of ASCII data
on ASCII machines or EBCDIC on EBCDIC machines. See The UTF8 flag
below.
If the $string is undef, then undef is returned.
- [$obj =] find_encoding(ENCODING)
Returns the encoding object corresponding to ENCODING. Returns
undef if no matching ENCODING is find. The returned object is
what does the actual encoding or decoding.
- $utf8 = decode($name, $bytes);
is in fact
with more error checking.
You can therefore save time by reusing this object as follows;
Besides decode and encode, other methods are
available as well. For instance, name()
returns the canonical
name of the encoding object.
- find_encoding("latin1")->name; # iso-8859-1
See Encode::Encoding for details.
- [$length =] from_to($octets, FROM_ENC, TO_ENC [, CHECK])
Converts in-place data between two encodings. The data in $octets must be encoded as octets and not as characters in Perl's internal format. For example, to convert ISO-8859-1 data into Microsoft's CP1250 encoding:
- from_to($octets, "iso-8859-1", "cp1250");
and to convert it back:
- from_to($octets, "cp1250", "iso-8859-1");
Because the conversion happens in place, the data to be converted cannot be a string constant: it must be a scalar variable.
from_to()
returns the length of the converted string in octets on success,
and undef on error.
CAVEAT: The following operations may look the same, but are not:
- from_to($data, "iso-8859-1", "utf8"); #1
- $data = decode("iso-8859-1", $data); #2
Both #1 and #2 make $data consist of a completely valid UTF-8 string, but only #2 turns the UTF8 flag on. #1 is equivalent to:
- $data = encode("utf8", decode("iso-8859-1", $data));
See The UTF8 flag below.
Also note that:
- from_to($octets, $from, $to, $check);
is equivalent t:o
- $octets = encode($to, decode($from, $octets), $check);
Yes, it does not respect the $check during decoding. It is
deliberately done that way. If you need minute control, use decode
followed by encode
as follows:
- $octets = encode($to, decode($from, $octets, $check_from), $check_to);
- $octets = encode_utf8($string);
Equivalent to $octets = encode("utf8", $string)
. The characters in
$string are encoded in Perl's internal format, and the result is returned
as a sequence of octets. Because all possible characters in Perl have a
(loose, not strict) UTF-8 representation, this function cannot fail.
- $string = decode_utf8($octets [, CHECK]);
Equivalent to $string = decode("utf8", $octets [, CHECK])
.
The sequence of octets represented by $octets is decoded
from UTF-8 into a sequence of logical characters.
Because not all sequences of octets are valid UTF-8,
it is quite possible for this function to fail.
For CHECK, see Handling Malformed Data.
- use Encode;
- @list = Encode->encodings();
Returns a list of canonical names of available encodings that have already been loaded. To get a list of all available encodings including those that have not yet been loaded, say:
- @all_encodings = Encode->encodings(":all");
Or you can give the name of a specific module:
- @with_jp = Encode->encodings("Encode::JP");
When "::
" is not in the name, "Encode::
" is assumed.
- @ebcdic = Encode->encodings("EBCDIC");
To find out in detail which encodings are supported by this package, see Encode::Supported.
To add a new alias to a given encoding, use:
After that, NEWNAME can be used as an alias for ENCODING. ENCODING may be either the name of an encoding or an encoding object.
Before you do that, first make sure the alias is nonexistent using
resolve_alias()
, which returns the canonical name thereof.
For example:
- Encode::resolve_alias("latin1") eq "iso-8859-1" # true
- Encode::resolve_alias("iso-8859-12") # false; nonexistent
- Encode::resolve_alias($name) eq $name # true if $name is canonical
resolve_alias()
does not need use Encode::Alias
; it can be
imported via use Encode qw(resolve_alias)
.
See Encode::Alias for details.
The canonical name of a given encoding does not necessarily agree with
IANA Character Set Registry, commonly seen as Content-Type:
text/plain; charset=WHATEVER. For most cases, the canonical name
works, but sometimes it does not, most notably with "utf-8-strict".
As of Encode
version 2.21, a new method mime_name()
is therefore added.
See also: Encode::Encoding
If your perl supports PerlIO
(which is the default), you can use a
PerlIO
layer to decode and encode directly via a filehandle. The
following two examples are fully identical in functionality:
- ### Version 1 via PerlIO
- open(INPUT, "< :encoding(shiftjis)", $infile)
- || die "Can't open < $infile for reading: $!";
- open(OUTPUT, "> :encoding(euc-jp)", $outfile)
- || die "Can't open > $output for writing: $!";
- while (<INPUT>) { # auto decodes $_
- print OUTPUT; # auto encodes $_
- }
- close(INPUT) || die "can't close $infile: $!";
- close(OUTPUT) || die "can't close $outfile: $!";
- ### Version 2 via from_to()
- open(INPUT, "< :raw", $infile)
- || die "Can't open < $infile for reading: $!";
- open(OUTPUT, "> :raw", $outfile)
- || die "Can't open > $output for writing: $!";
- while (<INPUT>) {
- from_to($_, "shiftjis", "euc-jp", 1); # switch encoding
- print OUTPUT; # emit raw (but properly encoded) data
- }
- close(INPUT) || die "can't close $infile: $!";
- close(OUTPUT) || die "can't close $outfile: $!";
In the first version above, you let the appropriate encoding layer handle the conversion. In the second, you explicitly translate from one encoding to the other.
Unfortunately, it may be that encodings are PerlIO
-savvy. You can check
to see whether your encoding is supported by PerlIO
by invoking the
perlio_ok
method on it:
- Encode::perlio_ok("hz"); # false
- find_encoding("euc-cn")->perlio_ok; # true wherever PerlIO is available
- use Encode qw(perlio_ok); # imported upon request
- perlio_ok("euc-jp")
Fortunately, all encodings that come with Encode
core are PerlIO
-savvy
except for hz
and ISO-2022-kr
. For the gory details, see
Encode::Encoding and Encode::PerlIO.
The optional CHECK argument tells Encode
what to do when
encountering malformed data. Without CHECK, Encode::FB_DEFAULT
(== 0) is assumed.
As of version 2.12, Encode
supports coderef values for CHECK
;
see below.
NOTE: Not all encodings support this feature. Some encodings ignore the CHECK argument. For example, Encode::Unicode ignores CHECK and it always croaks on error.
- I<CHECK> = Encode::FB_DEFAULT ( == 0)
If CHECK is 0, encoding and decoding replace any malformed character
with a substitution character. When you encode, SUBCHAR is used.
When you decode, the Unicode REPLACEMENT CHARACTER, code point U+FFFD, is
used. If the data is supposed to be UTF-8, an optional lexical warning of
warning category "utf8"
is given.
- I<CHECK> = Encode::FB_CROAK ( == 1)
If CHECK is 1, methods immediately die with an error
message. Therefore, when CHECK is 1, you should trap
exceptions with eval{}, unless you really want to let it die.
- I<CHECK> = Encode::FB_QUIET
If CHECK is set to Encode::FB_QUIET
, encoding and decoding immediately
return the portion of the data that has been processed so far when an
error occurs. The data argument is overwritten with everything
after that point; that is, the unprocessed portion of the data. This is
handy when you have to call decode
repeatedly in the case where your
source data may contain partial multi-byte character sequences,
(that is, you are reading with a fixed-width buffer). Here's some sample
code to do exactly that:
- I<CHECK> = Encode::FB_WARN
This is the same as FB_QUIET
above, except that instead of being silent
on errors, it issues a warning. This is handy for when you are debugging.
For encodings that are implemented by the Encode::XS
module, CHECK
==
Encode::FB_PERLQQ
puts encode
and decode
into perlqq
fallback mode.
When you decode, \xHH is inserted for a malformed character, where
HH is the hex representation of the octet that could not be decoded to
utf8. When you encode, \x{HHHH} will be inserted, where HHHH is
the Unicode code point (in any number of hex digits) of the character that
cannot be found in the character repertoire of the encoding.
The HTML/XML character reference modes are about the same. In place of
\x{HHHH}, HTML uses &#NNN; where NNN is a decimal number, and
XML uses &#xHHHH; where HHHH is the hexadecimal number.
In Encode
2.10 or later, LEAVE_SRC
is also implied.
These modes are all actually set via a bitmask. Here is how the FB_XXX
constants are laid out. You can import the FB_XXX constants via
use Encode qw(:fallbacks)
, and you can import the generic bitmask
constants via use Encode qw(:fallback_all)
.
- FB_DEFAULT FB_CROAK FB_QUIET FB_WARN FB_PERLQQ
- DIE_ON_ERR 0x0001 X
- WARN_ON_ERR 0x0002 X
- RETURN_ON_ERR 0x0004 X X
- LEAVE_SRC 0x0008 X
- PERLQQ 0x0100 X
- HTMLCREF 0x0200
- XMLCREF 0x0400
- Encode::LEAVE_SRC
If the Encode::LEAVE_SRC
bit is not set but CHECK is set, then the
source string to encode() or decode() will be overwritten in place.
If you're not interested in this, then bitwise-OR it with the bitmask.
As of Encode
2.12, CHECK
can also be a code reference which takes the
ordinal value of the unmapped character as an argument and returns a string
that represents the fallback character. For instance:
Acts like FB_PERLQQ
but U+XXXX is used instead of \x{XXXX}.
To define a new encoding, use:
- use Encode qw(define_encoding);
- define_encoding($object, CANONICAL_NAME [, alias...]);
CANONICAL_NAME will be associated with $object. The object should provide the interface described in Encode::Encoding. If more than two arguments are provided, additional arguments are considered aliases for $object.
See Encode::Encoding for details.
Before the introduction of Unicode support in Perl, The eq
operator
just compared the strings represented by two scalars. Beginning with
Perl 5.8, eq
compares two strings with simultaneous consideration of
the UTF8 flag. To explain why we made it so, I quote from page 402 of
Programming Perl, 3rd ed.
Old byte-oriented programs should not spontaneously break on the old byte-oriented data they used to work on.
Old byte-oriented programs should magically start working on the new character-oriented data when appropriate.
Programs should run just as fast in the new character-oriented mode as in the old byte-oriented mode.
Perl should remain one language, rather than forking into a byte-oriented Perl and a character-oriented Perl.
When Programming Perl, 3rd ed. was written, not even Perl 5.6.0 had been born yet, many features documented in the book remained unimplemented for a long time. Perl 5.8 corrected much of this, and the introduction of the UTF8 flag is one of them. You can think of there being two fundamentally different kinds of strings and string-operations in Perl: one a byte-oriented mode for when the internal UTF8 flag is off, and the other a character-oriented mode for when the internal UTF8 flag is on.
Here is how Encode
handles the UTF8 flag.
When you encode, the resulting UTF8 flag is always off.
When you decode, the resulting UTF8 flag is on--unless you can
unambiguously represent data. Here is what we mean by "unambiguously".
After $utf8 = decode("foo", $octet)
,
- When $octet is... The UTF8 flag in $utf8 is
- ---------------------------------------------
- In ASCII only (or EBCDIC only) OFF
- In ISO-8859-1 ON
- In any other Encoding ON
- ---------------------------------------------
As you see, there is one exception: in ASCII. That way you can assume
Goal #1. And with Encode
, Goal #2 is assumed but you still have to be
careful in the cases mentioned in the CAVEAT paragraphs above.
This UTF8 flag is not visible in Perl scripts, exactly for the same reason you cannot (or rather, you don't have to) see whether a scalar contains a string, an integer, or a floating-point number. But you can still peek and poke these if you will. See the next section.
The following API uses parts of Perl's internals in the current implementation. As such, they are efficient but may change in a future release.
- is_utf8(STRING [, CHECK])
[INTERNAL] Tests whether the UTF8 flag is turned on in the STRING. If CHECK is true, also checks whether STRING contains well-formed UTF-8. Returns true if successful, false otherwise.
As of Perl 5.8.1, utf8 also has the utf8::is_utf8
function.
- _utf8_on(STRING)
[INTERNAL] Turns the STRING's internal UTF8 flag on. The STRING
is not checked for containing only well-formed UTF-8. Do not use this
unless you know with absolute certainty that the STRING holds only
well-formed UTF-8. Returns the previous state of the UTF8 flag (so please
don't treat the return value as indicating success or failure), or undef
if STRING is not a string.
NOTE: For security reasons, this function does not work on tainted values.
- _utf8_off(STRING)
[INTERNAL] Turns the STRING's internal UTF8 flag off. Do not use
frivolously. Returns the previous state of the UTF8 flag, or undef if
STRING is not a string. Do not treat the return value as indicative of
success or failure, because that isn't what it means: it is only the
previous setting.
NOTE: For security reasons, this function does not work on tainted values.
- ....We now view strings not as sequences of bytes, but as sequences
- of numbers in the range 0 .. 2**32-1 (or in the case of 64-bit
- computers, 0 .. 2**64-1) -- Programming Perl, 3rd ed.
That has historically been Perl's notion of UTF-8, as that is how UTF-8 was first conceived by Ken Thompson when he invented it. However, thanks to later revisions to the applicable standards, official UTF-8 is now rather stricter than that. For example, its range is much narrower (0 .. 0x10_FFFF to cover only 21 bits instead of 32 or 64 bits) and some sequences are not allowed, like those used in surrogate pairs, the 31 non-character code points 0xFDD0 .. 0xFDEF, the last two code points in any plane (0xXX_FFFE and 0xXX_FFFF), all non-shortest encodings, etc.
The former default in which Perl would always use a loose interpretation of UTF-8 has now been overruled:
- From: Larry Wall <larry@wall.org>
- Date: December 04, 2004 11:51:58 JST
- To: perl-unicode@perl.org
- Subject: Re: Make Encode.pm support the real UTF-8
- Message-Id: <20041204025158.GA28754@wall.org>
- On Fri, Dec 03, 2004 at 10:12:12PM +0000, Tim Bunce wrote:
- : I've no problem with 'utf8' being perl's unrestricted uft8 encoding,
- : but "UTF-8" is the name of the standard and should give the
- : corresponding behaviour.
- For what it's worth, that's how I've always kept them straight in my
- head.
- Also for what it's worth, Perl 6 will mostly default to strict but
- make it easy to switch back to lax.
- Larry
Got that? As of Perl 5.8.7, "UTF-8" means UTF-8 in its current
sense, which is conservative and strict and security-conscious, whereas
"utf8" means UTF-8 in its former sense, which was liberal and loose and
lax. Encode
version 2.10 or later thus groks this subtle but critically
important distinction between "UTF-8"
and "utf8"
.
- encode("utf8", "\x{FFFF_FFFF}", 1); # okay
- encode("UTF-8", "\x{FFFF_FFFF}", 1); # croaks
In the Encode
module, "UTF-8"
is actually a canonical name for
"utf-8-strict"
. That hyphen between the "UTF"
and the "8"
is
critical; without it, Encode
goes "liberal" and (perhaps overly-)permissive:
- find_encoding("UTF-8")->name # is 'utf-8-strict'
- find_encoding("utf-8")->name # ditto. names are case insensitive
- find_encoding("utf_8")->name # ditto. "_" are treated as "-"
- find_encoding("UTF8")->name # is 'utf8'.
Perl's internal UTF8 flag is called "UTF8", without a hyphen. It indicates whether a string is internally encoded as "utf8", also without a hyphen.
Encode::Encoding, Encode::Supported, Encode::PerlIO, encoding, perlebcdic, open, perlunicode, perluniintro, perlunifaq, perlunitut utf8, the Perl Unicode Mailing List http://lists.perl.org/list/perl-unicode.html
This project was originated by the late Nick Ing-Simmons and later maintained by Dan Kogai <dankogai@cpan.org>. See AUTHORS for a full list of people involved. For any questions, send mail to <perl-unicode@perl.org> so that we can all share.
While Dan Kogai retains the copyright as a maintainer, credit should go to all those involved. See AUTHORS for a list of those who submitted code to the project.
Copyright 2002-2012 Dan Kogai <dankogai@cpan.org>.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
English - use nice English (or awk) names for ugly punctuation variables
This module provides aliases for the built-in variables whose names no one seems to like to read. Variables with side-effects which get triggered just by accessing them (like $0) will still be affected.
For those variables that have an awk version, both long
and short English alternatives are provided. For example,
the $/
variable can be referred to either $RS or
$INPUT_RECORD_SEPARATOR if you are using the English module.
See perlvar for a complete list of these.
NOTE: This was fixed in perl 5.20. Mentioning these three variables no longer makes a speed difference. This section still applies if your code is to run on perl 5.18 or earlier.
This module can provoke sizeable inefficiencies for regular expressions, due to unfortunate implementation details. If performance matters in your application and you don't need $PREMATCH, $MATCH, or $POSTMATCH, try doing
- use English qw( -no_match_vars ) ;
. It is especially important to do this in modules to avoid penalizing all applications which use them.
Env - perl module that imports environment variables as scalars or arrays
Perl maintains environment variables in a special hash named %ENV
. For
when this access method is inconvenient, the Perl module Env
allows
environment variables to be treated as scalar or array variables.
The Env::import()
function ties environment variables with suitable
names to global Perl variables with the same names. By default it
ties all existing environment variables (keys %ENV
) to scalars. If
the import function receives arguments, it takes them to be a list of
variables to tie; it's okay if they don't yet exist. The scalar type
prefix '$' is inferred for any element of this list not prefixed by '$'
or '@'. Arrays are implemented in terms of split and join, using
$Config::Config{path_sep}
as the delimiter.
After an environment variable is tied, merely use it like a normal variable. You may access its value
or modify it
- $PATH .= ":.";
- push @LD_LIBRARY_PATH, $dir;
however you'd like. Bear in mind, however, that each access to a tied array variable requires splitting the environment variable's string anew.
The code:
is equivalent to:
- use Env qw(PATH);
- $PATH .= ":.";
except that if $ENV{PATH}
started out empty, the second approach leaves
it with the (odd) value ":.", but the first approach leaves it with ".".
To remove a tied environment variable from the environment, assign it the undefined value
On VMS systems, arrays tied to environment variables are read-only. Attempting to change anything will cause a warning.
Chip Salzenberg <chip@fin.uucp> and Gregor N. Purdy <gregor@focusresearch.com>
Errno - System errno constants
- use Errno qw(EINTR EIO :POSIX);
Errno
defines and conditionally exports all the error constants
defined in your system errno.h
include file. It has a single export
tag, :POSIX
, which will export all POSIX defined error numbers.
Errno
also makes %!
magic such that each element of %!
has a
non-zero value only if $!
is set to that value. For example:
If a specified constant EFOO
does not exist on the system, $!{EFOO}
returns ""
. You may use exists $!{EFOO}
to check whether the
constant is available on the system.
Importing a particular constant may not be very portable, because the
import will fail on platforms that do not have that constant. A more
portable way to set $!
to a valid value is to use:
- if (exists &Errno::EFOO) {
- $! = &Errno::EFOO;
- }
Graham Barr <gbarr@pobox.com>
Copyright (c) 1997-8 Graham Barr. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Exporter - Implements default import method for modules
In module YourModule.pm:
- package YourModule;
- require Exporter;
- @ISA = qw(Exporter);
- @EXPORT_OK = qw(munge frobnicate); # symbols to export on request
or
- package YourModule;
- use Exporter 'import'; # gives you Exporter's import() method directly
- @EXPORT_OK = qw(munge frobnicate); # symbols to export on request
In other files which wish to use YourModule
:
- use YourModule qw(frobnicate); # import listed symbols
- frobnicate ($left, $right) # calls YourModule::frobnicate
Take a look at Good Practices for some variants you will like to use in modern Perl code.
The Exporter module implements an import method which allows a module
to export functions and variables to its users' namespaces. Many modules
use Exporter rather than implementing their own import method because
Exporter provides a highly flexible interface, with an implementation optimised
for the common case.
Perl automatically calls the import method when processing a
use statement for a module. Modules and use are documented
in perlfunc and perlmod. Understanding the concept of
modules and how the use statement operates is important to
understanding the Exporter.
The arrays @EXPORT
and @EXPORT_OK
in a module hold lists of
symbols that are going to be exported into the users name space by
default, or which they can request to be exported, respectively. The
symbols can represent functions, scalars, arrays, hashes, or typeglobs.
The symbols must be given by full name with the exception that the
ampersand in front of a function is optional, e.g.
- @EXPORT = qw(afunc $scalar @array); # afunc is a function
- @EXPORT_OK = qw(&bfunc %hash *typeglob); # explicit prefix on &bfunc
If you are only exporting function names it is recommended to omit the ampersand, as the implementation is faster this way.
Do not export method names!
Do not export anything else by default without a good reason!
Exports pollute the namespace of the module user. If you must export
try to use @EXPORT_OK
in preference to @EXPORT
and avoid short or
common symbol names to reduce the risk of name clashes.
Generally anything not exported is still accessible from outside the
module using the YourModule::item_name
(or $blessed_ref->method
)
syntax. By convention you can use a leading underscore on names to
informally indicate that they are 'internal' and not for public use.
(It is actually possible to get private functions by saying:
However if you use them for methods it is up to you to figure out how to make inheritance work.)
As a general rule, if the module is trying to be object oriented
then export nothing. If it's just a collection of functions then
@EXPORT_OK
anything but use @EXPORT
with caution. For function and
method names use barewords in preference to names prefixed with
ampersands for the export lists.
Other module design guidelines can be found in perlmod.
In other files which wish to use your module there are three basic ways for them to load your module and import its symbols:
use YourModule;
This imports all the symbols from YourModule's @EXPORT
into the namespace
of the use statement.
use YourModule ();
This causes perl to load your module but does not import any symbols.
use YourModule qw(...);
This imports only the symbols listed by the caller into their namespace.
All listed symbols must be in your @EXPORT
or @EXPORT_OK
, else an error
occurs. The advanced export features of Exporter are accessed like this,
but with list entries that are syntactically distinct from symbol names.
Unless you want to use its advanced features, this is probably all you need to know to use Exporter.
If any of the entries in an import list begins with !, : or / then the list is treated as a series of specifications which either add to or delete from the list of names to import. They are processed left to right. Specifications are in the form:
- [!]name This name only
- [!]:DEFAULT All names in @EXPORT
- [!]:tag All names in $EXPORT_TAGS{tag} anonymous list
- [!]/pattern/ All names in @EXPORT and @EXPORT_OK which match
A leading ! indicates that matching names should be deleted from the list of names to import. If the first specification is a deletion it is treated as though preceded by :DEFAULT. If you just want to import extra names in addition to the default set you will still need to include :DEFAULT explicitly.
e.g., Module.pm defines:
- @EXPORT = qw(A1 A2 A3 A4 A5);
- @EXPORT_OK = qw(B1 B2 B3 B4 B5);
- %EXPORT_TAGS = (T1 => [qw(A1 A2 B1 B2)], T2 => [qw(A1 A2 B3 B4)]);
Note that you cannot use tags in @EXPORT or @EXPORT_OK.
Names in EXPORT_TAGS must also appear in @EXPORT or @EXPORT_OK.
An application using Module can say something like:
- use Module qw(:DEFAULT :T2 !B3 A3);
Other examples include:
Remember that most patterns (using //) will need to be anchored
with a leading ^, e.g., /^EXIT/
rather than /EXIT/
.
You can say BEGIN { $Exporter::Verbose=1 }
to see how the
specifications are being processed and what is actually being imported
into modules.
Exporter has a special method, 'export_to_level' which is used in situations where you can't directly call Exporter's import method. The export_to_level method looks like:
- MyPackage->export_to_level(
- $where_to_export, $package, @what_to_export
- );
where $where_to_export
is an integer telling how far up the calling stack
to export your symbols, and @what_to_export
is an array telling what
symbols *to* export (usually this is @_
). The $package
argument is
currently unused.
For example, suppose that you have a module, A, which already has an import function:
and you want to Export symbol $A::b
back to the module that called
package A. Since Exporter relies on the import method to work, via
inheritance, as it stands Exporter::import() will never get called.
Instead, say the following:
This will export the symbols one level 'above' the current package - ie: to the program or module that used package A.
Note: Be careful not to modify @_
at all before you call export_to_level
- or people using your package will get very unexplained results!
By including Exporter in your @ISA
you inherit an Exporter's import() method
but you also inherit several other helper methods which you probably don't
want. To avoid this you can do
- package YourModule;
- use Exporter qw( import );
which will export Exporter's own import() method into YourModule.
Everything will work as before but you won't need to include Exporter in
@YourModule::ISA
.
Note: This feature was introduced in version 5.57 of Exporter, released with perl 5.8.3.
The Exporter module will convert an attempt to import a number from a
module into a call to $module_name->VERSION($value)
. This can
be used to validate that the version of the module being used is
greater than or equal to the required version.
For historical reasons, Exporter supplies a require_version
method that
simply delegates to VERSION
. Originally, before UNIVERSAL::VERSION
existed, Exporter would call require_version
.
Since the UNIVERSAL::VERSION
method treats the $VERSION
number as
a simple numeric value it will regard version 1.10 as lower than
1.9. For this reason it is strongly recommended that you use numbers
with at least two decimal places, e.g., 1.09.
In some situations you may want to prevent certain symbols from being exported. Typically this applies to extensions which have functions or constants that may not exist on some systems.
The names of any symbols that cannot be exported should be listed
in the @EXPORT_FAIL
array.
If a module attempts to import any of these symbols the Exporter will give the module an opportunity to handle the situation before generating an error. The Exporter will call an export_fail method with a list of the failed symbols:
- @failed_symbols = $module_name->export_fail(@failed_symbols);
If the export_fail
method returns an empty list then no error is
recorded and all the requested symbols are exported. If the returned
list is not empty then an error is generated for each symbol and the
export fails. The Exporter provides a default export_fail
method which
simply returns the list unchanged.
Uses for the export_fail
method include giving better error messages
for some symbols and performing lazy architectural checks (put more
symbols into @EXPORT_FAIL
by default and then take them out if someone
actually tries to use them and an expensive check shows that they are
usable on that platform).
Since the symbols listed within %EXPORT_TAGS
must also appear in either
@EXPORT
or @EXPORT_OK
, two utility functions are provided which allow
you to easily add tagged sets of symbols to @EXPORT
or @EXPORT_OK
:
- %EXPORT_TAGS = (foo => [qw(aa bb cc)], bar => [qw(aa cc dd)]);
- Exporter::export_tags('foo'); # add aa, bb and cc to @EXPORT
- Exporter::export_ok_tags('bar'); # add aa, cc and dd to @EXPORT_OK
Any names which are not tags are added to @EXPORT
or @EXPORT_OK
unchanged but will trigger a warning (with -w
) to avoid misspelt tags
names being silently added to @EXPORT
or @EXPORT_OK
. Future versions
may make this a fatal error.
If several symbol categories exist in %EXPORT_TAGS
, it's usually
useful to create the utility ":all" to simplify "use" statements.
The simplest way to do this is:
CGI.pm creates an ":all" tag which contains some (but not really all) of its categories. That could be done with one small change:
Note that the tag names in %EXPORT_TAGS
don't have the leading ':'.
AUTOLOAD
ed ConstantsMany modules make use of AUTOLOAD
ing for constant subroutines to
avoid having to compile and waste memory on rarely used values (see
perlsub for details on constant subroutines). Calls to such
constant subroutines are not optimized away at compile time because
they can't be checked at compile time for constancy.
Even if a prototype is available at compile time, the body of the
subroutine is not (it hasn't been AUTOLOAD
ed yet). perl needs to
examine both the ()
prototype and the body of a subroutine at
compile time to detect that it can safely replace calls to that
subroutine with the constant value.
A workaround for this is to call the constants once in a BEGIN
block:
This forces the AUTOLOAD
for SO_LINGER
to take place before
SO_LINGER is encountered later in My
package.
If you are writing a package that AUTOLOAD
s, consider forcing
an AUTOLOAD
for any constants explicitly imported by other packages
or which are usually used when your package is used.
@EXPORT_OK
and FriendsWhen using Exporter
with the standard strict
and warnings
pragmas, the our keyword is needed to declare the package
variables @EXPORT_OK
, @EXPORT
, @ISA
, etc.
If backward compatibility for Perls under 5.6 is important,
one must write instead a use vars
statement.
- use vars qw(@ISA @EXPORT_OK);
- @ISA = qw(Exporter);
- @EXPORT_OK = qw(munge frobnicate);
There are some caveats with the use of runtime statements
like require Exporter
and the assignment to package
variables, which can very subtle for the unaware programmer.
This may happen for instance with mutually recursive
modules, which are affected by the time the relevant
constructions are executed.
The ideal (but a bit ugly) way to never have to think
about that is to use BEGIN
blocks. So the first part
of the SYNOPSIS code could be rewritten as:
The BEGIN
will assure that the loading of Exporter.pm
and the assignments to @ISA
and @EXPORT_OK
happen
immediately, leaving no room for something to get awry
or just plain wrong.
With respect to loading Exporter
and inheriting, there
are alternatives with the use of modules like base
and parent
.
Any of these statements are nice replacements for
BEGIN { require Exporter; @ISA = qw(Exporter); }
with the same compile-time effect. The basic difference
is that base
code interacts with declared fields
while parent
is a streamlined version of the older
base
code to just establish the IS-A relationship.
For more details, see the documentation and code of base and parent.
Another thorough remedy to that runtime vs. compile-time trap is to use Exporter::Easy, which is a wrapper of Exporter that allows all boilerplate code at a single gulp in the use statement.
- use Exporter::Easy (
- OK => [ qw(munge frobnicate) ],
- );
- # @ISA setup is automatic
- # all assignments happen at compile time
You have been warned already in Selecting What to Export to not export:
method names (because you don't need to and that's likely to not do what you want),
anything by default (because you don't want to surprise your users... badly)
anything you don't need to (because less is more)
There's one more item to add to this list. Do not
export variable names. Just because Exporter
lets you
do that, it does not mean you should.
- @EXPORT_OK = qw( $svar @avar %hvar ); # DON'T!
Exporting variables is not a good idea. They can change under the hood, provoking horrible effects at-a-distance, that are too hard to track and to fix. Trust me: they are not worth it.
To provide the capability to set/get class-wide settings, it is best instead to provide accessors as subroutines or class methods instead.
Exporter
is definitely not the only module with
symbol exporter capabilities. At CPAN, you may find
a bunch of them. Some are lighter. Some
provide improved APIs and features. Peek the one
that fits your needs. The following is
a sample list of such modules.
- Exporter::Easy
- Exporter::Lite
- Exporter::Renaming
- Exporter::Tidy
- Sub::Exporter / Sub::Installer
- Perl6::Export / Perl6::Export::Attrs
This library is free software. You can redistribute it and/or modify it under the same terms as Perl itself.
Fatal - Replace functions with equivalents which succeed or die
- use Fatal qw(open close);
- open(my $fh, "<", $filename); # No need to check errors!
- use File::Copy qw(move);
- use Fatal qw(move);
- move($file1, $file2); # No need to check errors!
- sub juggle { . . . }
- Fatal->import('juggle');
Fatal has been obsoleted by the new autodie pragma. Please use
autodie in preference to Fatal
. autodie supports lexical scoping,
throws real exception objects, and provides much nicer error messages.
The use of :void
with Fatal is discouraged.
Fatal
provides a way to conveniently replace
functions which normally return a false value when they fail with
equivalents which raise exceptions if they are not successful. This
lets you use these functions without having to test their return
values explicitly on each call. Exceptions can be caught using
eval{}. See perlfunc and perlvar for details.
The do-or-die equivalents are set up simply by calling Fatal's
import routine, passing it the names of the functions to be
replaced. You may wrap both user-defined functions and overridable
CORE operators (except exec, system, print, or any other
built-in that cannot be expressed via prototypes) in this way.
If the symbol :void
appears in the import list, then functions
named later in that import list raise an exception only when
these are called in void context--that is, when their return
values are ignored. For example
- use Fatal qw/:void open close/;
- # properly checked, so no exception raised on error
- if (not open(my $fh, '<', '/bogotic') {
- warn "Can't open /bogotic: $!";
- }
- # not checked, so error raises an exception
- close FH;
The use of :void
is discouraged, as it can result in exceptions
not being thrown if you accidentally call a method without
void context. Use autodie instead if you need to be able to
disable autodying/Fatal behaviour for a small block of code.
You've called Fatal
with an argument that doesn't look like
a subroutine name, nor a switch that this version of Fatal
understands.
You've asked Fatal
to try and replace a subroutine which does not
exist, or has not yet been defined.
You've asked Fatal
to replace a subroutine, but it's not a Perl
built-in, and Fatal
couldn't find it as a regular subroutine.
It either doesn't exist or has not yet been defined.
You've tried to use Fatal
on a Perl built-in that can't be
overridden, such as print or system, which means that
Fatal
can't help you, although some other modules might.
See the SEE ALSO section of this documentation.
You've found a bug in Fatal
. Please report it using
the perlbug
command.
Fatal
clobbers the context in which a function is called and always
makes it a scalar context, except when the :void
tag is used.
This problem does not exist in autodie.
"Used only once" warnings can be generated when autodie
or Fatal
is used with package filehandles (eg, FILE
). It's strongly recommended
you use scalar filehandles instead.
Original module by Lionel Cons (CERN).
Prototype updates by Ilya Zakharevich <ilya@math.ohio-state.edu>.
autodie support, bugfixes, extended diagnostics, system
support, and major overhauling by Paul Fenwick <pjf@perltraining.com.au>
This module is free software, you may distribute it under the same terms as Perl itself.
autodie for a nicer way to use lexical Fatal.
IPC::System::Simple for a similar idea for calls to system()
and backticks.
Fcntl - load the C Fcntl.h defines
This module is just a translation of the C fcntl.h file. Unlike the old mechanism of requiring a translated fcntl.ph file, this uses the h2xs program (see the Perl source distribution) and your native C compiler. This means that it has a far more likely chance of getting the numbers right.
Only #define
symbols get translated; you must still correctly
pack up your own arguments to pass as args for locking functions, etc.
By default your system's F_* and O_* constants (eg, F_DUPFD and O_CREAT) and the FD_CLOEXEC constant are exported into your namespace.
You can request that the flock() constants (LOCK_SH, LOCK_EX, LOCK_NB
and LOCK_UN) be provided by using the tag :flock
. See Exporter.
You can request that the old constants (FAPPEND, FASYNC, FCREAT,
FDEFER, FEXCL, FNDELAY, FNONBLOCK, FSYNC, FTRUNC) be provided for
compatibility reasons by using the tag :Fcompat
. For new
applications the newer versions of these constants are suggested
(O_APPEND, O_ASYNC, O_CREAT, O_DEFER, O_EXCL, O_NDELAY, O_NONBLOCK,
O_SYNC, O_TRUNC).
For ease of use also the SEEK_* constants (for seek() and sysseek(),
e.g. SEEK_END) and the S_I* constants (for chmod() and stat()) are
available for import. They can be imported either separately or using
the tags :seek
and :mode
.
Please refer to your native fcntl(2), open(2), fseek(3), lseek(2) (equal to Perl's seek() and sysseek(), respectively), and chmod(2) documentation to see what constants are implemented in your system.
See perlopentut to learn about the uses of the O_* constants with sysopen().
See seek and sysseek about the SEEK_* constants.
See stat about the S_I* constants.
FileCache - keep more files open than the system permits
The cacheout
function will make sure that there's a filehandle open
for reading or writing available as the pathname you give it. It
automatically closes and re-opens files if you exceed your system's
maximum number of file descriptors, or the suggested maximum maxopen.
The 1-argument form of cacheout will open a file for writing ('>'
)
on it's first use, and appending ('>>'
) thereafter.
Returns EXPR on success for convenience. You may neglect the return value and manipulate EXPR as the filehandle directly if you prefer.
The 2-argument form of cacheout will use the supplied mode for the initial
and subsequent openings. Most valid modes for 3-argument open are supported
namely; '>'
, '+>'
, '<'
, '<+'
, '>>'
,
'|-'
and '-|'
To pass supplemental arguments to a program opened with '|-'
or '-|'
append them to the command string as you would system EXPR.
Returns EXPR on success for convenience. You may neglect the return value and manipulate EXPR as the filehandle directly if you prefer.
While it is permissible to close a FileCache managed file,
do not do so if you are calling FileCache::cacheout
from a package other
than which it was imported, or with another module which overrides close.
If you must, use FileCache::cacheout_close
.
Although FileCache can be used with piped opens ('-|' or '|-') doing so is strongly discouraged. If FileCache finds it necessary to close and then reopen a pipe, the command at the far end of the pipe will be reexecuted - the results of performing IO on FileCache'd pipes is unlikely to be what you expect. The ability to use FileCache on pipes may be removed in a future release.
FileCache does not store the current file offset if it finds it necessary to
close a file. When the file is reopened, the offset will be as specified by the
original open file mode. This could be construed to be a bug.
The module functionality relies on symbolic references, so things will break under 'use strict' unless 'no strict "refs"' is also specified.
sys/param.h lies with its NOFILE
define on some systems,
so you may have to set maxopen yourself.
FileHandle - supply object methods for filehandles
- use FileHandle;
- $fh = FileHandle->new;
- if ($fh->open("< file")) {
- print <$fh>;
- $fh->close;
- }
- $fh = FileHandle->new("> FOO");
- if (defined $fh) {
- print $fh "bar\n";
- $fh->close;
- }
- $fh = FileHandle->new("file", "r");
- if (defined $fh) {
- print <$fh>;
- undef $fh; # automatically closes the file
- }
- $fh = FileHandle->new("file", O_WRONLY|O_APPEND);
- if (defined $fh) {
- print $fh "corge\n";
- undef $fh; # automatically closes the file
- }
- $pos = $fh->getpos;
- $fh->setpos($pos);
- $fh->setvbuf($buffer_var, _IOLBF, 1024);
- ($readfh, $writefh) = FileHandle::pipe;
- autoflush STDOUT 1;
NOTE: This class is now a front-end to the IO::* classes.
FileHandle::new
creates a FileHandle
, which is a reference to a
newly created symbol (see the Symbol
package). If it receives any
parameters, they are passed to FileHandle::open
; if the open fails,
the FileHandle
object is destroyed. Otherwise, it is returned to
the caller.
FileHandle::new_from_fd
creates a FileHandle
like new
does.
It requires two parameters, which are passed to FileHandle::fdopen
;
if the fdopen fails, the FileHandle
object is destroyed.
Otherwise, it is returned to the caller.
FileHandle::open
accepts one parameter or two. With one parameter,
it is just a front end for the built-in open function. With two
parameters, the first parameter is a filename that may include
whitespace or other special characters, and the second parameter is
the open mode, optionally followed by a file permission value.
If FileHandle::open
receives a Perl mode string (">", "+<", etc.)
or a POSIX fopen() mode string ("w", "r+", etc.), it uses the basic
Perl open operator.
If FileHandle::open
is given a numeric mode, it passes that mode
and the optional permissions value to the Perl sysopen operator.
For convenience, FileHandle::import
tries to import the O_XXX
constants from the Fcntl module. If dynamic loading is not available,
this may fail, but the rest of FileHandle will still work.
FileHandle::fdopen
is like open except that its first parameter
is not a filename but rather a file handle name, a FileHandle object,
or a file descriptor number.
If the C functions fgetpos() and fsetpos() are available, then
FileHandle::getpos
returns an opaque value that represents the
current position of the FileHandle, and FileHandle::setpos
uses
that value to return to a previously visited position.
If the C function setvbuf() is available, then FileHandle::setvbuf
sets the buffering policy for the FileHandle. The calling sequence
for the Perl function is the same as its C counterpart, including the
macros _IOFBF
, _IOLBF
, and _IONBF
, except that the buffer
parameter specifies a scalar variable to use as a buffer. WARNING: A
variable used as a buffer by FileHandle::setvbuf
must not be
modified in any way until the FileHandle is closed or until
FileHandle::setvbuf
is called again, or memory corruption may
result!
See perlfunc for complete descriptions of each of the following
supported FileHandle
methods, which are just front ends for the
corresponding built-in functions:
See perlvar for complete descriptions of each of the following
supported FileHandle
methods:
- autoflush
- output_field_separator
- output_record_separator
- input_record_separator
- input_line_number
- format_page_number
- format_lines_per_page
- format_lines_left
- format_name
- format_top_name
- format_line_break_characters
- format_formfeed
Furthermore, for doing normal I/O you might need these:
See print.
See printf.
This works like <$fh> described in I/O Operators in perlop except that it's more readable and can be safely called in a list context but still returns just one line.
This works like <$fh> when called in a list context to read all the remaining lines in a file, except that it's more readable. It will also croak() if accidentally called in a scalar context.
There are many other functions available since FileHandle is descended from IO::File, IO::Seekable, and IO::Handle. Please see those respective pages for documentation on more functions.
The IO extension, perlfunc, I/O Operators in perlop.
FindBin - Locate directory of original perl script
Locates the full path to the script bin directory to allow the use of paths relative to the bin directory.
This allows a user to setup a directory tree for some software with
directories <root>/bin
and <root>/lib
, and then the above
example will allow the use of modules in the lib directory without knowing
where the software tree is installed.
If perl is invoked using the -e option or the perl script is read from
STDIN
then FindBin sets both $Bin
and $RealBin
to the current
directory.
- $Bin - path to bin directory from where script was invoked
- $Script - basename of script from which perl was invoked
- $RealBin - $Bin with all links resolved
- $RealScript - $Script with all links resolved
If there are two modules using FindBin
from different directories
under the same interpreter, this won't work. Since FindBin
uses a
BEGIN
block, it'll be executed only once, and only the first caller
will get it right. This is a problem under mod_perl and other persistent
Perl environments, where you shouldn't use this module. Which also means
that you should avoid using FindBin
in modules that you plan to put
on CPAN. To make sure that FindBin
will work is to call the again
function:
- use FindBin;
- FindBin::again(); # or FindBin->again;
In former versions of FindBin there was no again
function. The
workaround was to force the BEGIN
block to be executed again:
FindBin is supported as part of the core perl distribution. Please send bug reports to <perlbug@perl.org> using the perlbug program included with perl.
Graham Barr <gbarr@pobox.com> Nick Ing-Simmons <nik@tiuk.ti.com>
Copyright (c) 1995 Graham Barr & Nick Ing-Simmons. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
IO - load various IO modules
IO
provides a simple mechanism to load several of the IO modules
in one go. The IO modules belonging to the core are:
- IO::Handle
- IO::Seekable
- IO::File
- IO::Pipe
- IO::Socket
- IO::Dir
- IO::Select
- IO::Poll
Some other IO modules don't belong to the perl core but can be loaded as well if they have been installed from CPAN. You can discover which ones exist by searching for "^IO::" on http://search.cpan.org.
For more information on any of these modules, please see its respective documentation.
- use IO; # loads all the modules listed below
The loaded modules are IO::Handle, IO::Seekable, IO::File, IO::Pipe, IO::Socket, IO::Dir. You should instead explicitly import the IO modules you want.
Memoize - Make functions faster by trading space for time
- # This is the documentation for Memoize 1.03
- use Memoize;
- memoize('slow_function');
- slow_function(arguments); # Is faster than it was before
This is normally all you need to know. However, many options are available:
- memoize(function, options...);
Options include:
- NORMALIZER => function
- INSTALL => new_name
- SCALAR_CACHE => 'MEMORY'
- SCALAR_CACHE => ['HASH', \%cache_hash ]
- SCALAR_CACHE => 'FAULT'
- SCALAR_CACHE => 'MERGE'
- LIST_CACHE => 'MEMORY'
- LIST_CACHE => ['HASH', \%cache_hash ]
- LIST_CACHE => 'FAULT'
- LIST_CACHE => 'MERGE'
`Memoizing' a function makes it faster by trading space for time. It
does this by caching the return values of the function in a table.
If you call the function again with the same arguments, memoize
jumps in and gives you the value out of the table, instead of letting
the function compute the value all over again.
Here is an extreme example. Consider the Fibonacci sequence, defined by the following function:
This function is very slow. Why? To compute fib(14), it first wants to compute fib(13) and fib(12), and add the results. But to compute fib(13), it first has to compute fib(12) and fib(11), and then it comes back and computes fib(12) all over again even though the answer is the same. And both of the times that it wants to compute fib(12), it has to compute fib(11) from scratch, and then it has to do it again each time it wants to compute fib(13). This function does so much recomputing of old results that it takes a really long time to run---fib(14) makes 1,200 extra recursive calls to itself, to compute and recompute things that it already computed.
This function is a good candidate for memoization. If you memoize the `fib' function above, it will compute fib(14) exactly once, the first time it needs to, and then save the result in a table. Then if you ask for fib(14) again, it gives you the result out of the table. While computing fib(14), instead of computing fib(12) twice, it does it once; the second time it needs the value it gets it from the table. It doesn't compute fib(11) four times; it computes it once, getting it from the table the next three times. Instead of making 1,200 recursive calls to `fib', it makes 15. This makes the function about 150 times faster.
You could do the memoization yourself, by rewriting the function, like this:
Or you could use this module, like this:
- use Memoize;
- memoize('fib');
- # Rest of the fib function just like the original version.
This makes it easy to turn memoizing on and off.
Here's an even simpler example: I wrote a simple ray tracer; the program would look in a certain direction, figure out what it was looking at, and then convert the `color' value (typically a string like `red') of that object to a red, green, and blue pixel value, like this:
- for ($direction = 0; $direction < 300; $direction++) {
- # Figure out which object is in direction $direction
- $color = $object->{color};
- ($r, $g, $b) = @{&ColorToRGB($color)};
- ...
- }
Since there are relatively few objects in a picture, there are only a
few colors, which get looked up over and over again. Memoizing
ColorToRGB
sped up the program by several percent.
This module exports exactly one function, memoize
. The rest of the
functions in this package are None of Your Business.
You should say
- memoize(function)
where function
is the name of the function you want to memoize, or
a reference to it. memoize
returns a reference to the new,
memoized version of the function, or undef on a non-fatal error.
At present, there are no non-fatal errors, but there might be some in
the future.
If function
was the name of a function, then memoize
hides the
old version and installs the new memoized version under the old name,
so that &function(...)
actually invokes the memoized version.
There are some optional options you can pass to memoize
to change
the way it behaves a little. To supply options, invoke memoize
like this:
- memoize(function, NORMALIZER => function,
- INSTALL => newname,
- SCALAR_CACHE => option,
- LIST_CACHE => option
- );
Each of these options is optional; you can include some, all, or none of them.
If you supply a function name with INSTALL
, memoize will install
the new, memoized version of the function under the name you give.
For example,
- memoize('fib', INSTALL => 'fastfib')
installs the memoized version of fib
as fastfib
; without the
INSTALL
option it would have replaced the old fib
with the
memoized version.
To prevent memoize
from installing the memoized version anywhere, use
INSTALL => undef
.
Suppose your function looks like this:
Now, the following calls to your function are all completely equivalent:
- f(OUCH);
- f(OUCH, B => 2);
- f(OUCH, C => 7);
- f(OUCH, B => 2, C => 7);
- f(OUCH, C => 7, B => 2);
- (etc.)
However, unless you tell Memoize
that these calls are equivalent,
it will not know that, and it will compute the values for these
invocations of your function separately, and store them separately.
To prevent this, supply a NORMALIZER
function that turns the
program arguments into a string in a way that equivalent arguments
turn into the same string. A NORMALIZER
function for f
above
might look like this:
Each of the argument lists above comes out of the normalize_f
function looking exactly the same, like this:
- OUCH,B,2,C,7
You would tell Memoize
to use this normalizer this way:
- memoize('f', NORMALIZER => 'normalize_f');
memoize
knows that if the normalized version of the arguments is
the same for two argument lists, then it can safely look up the value
that it computed for one argument list and return it as the result of
calling the function with the other argument list, even if the
argument lists look different.
The default normalizer just concatenates the arguments with character 28 in between. (In ASCII, this is called FS or control-\.) This always works correctly for functions with only one string argument, and also when the arguments never contain character 28. However, it can confuse certain argument lists:
- normalizer("a\034", "b")
- normalizer("a", "\034b")
- normalizer("a\034\034b")
for example.
Since hash keys are strings, the default normalizer will not
distinguish between undef and the empty string. It also won't work
when the function's arguments are references. For example, consider a
function g
which gets two arguments: A number, and a reference to
an array of numbers:
- g(13, [1,2,3,4,5,6,7]);
The default normalizer will turn this into something like
"13\034ARRAY(0x436c1f)"
. That would be all right, except that a
subsequent array of numbers might be stored at a different location
even though it contains the same data. If this happens, Memoize
will think that the arguments are different, even though they are
equivalent. In this case, a normalizer like this is appropriate:
- sub normalize { join ' ', $_[0], @{$_[1]} }
For the example above, this produces the key "13 1 2 3 4 5 6 7".
Another use for normalizers is when the function depends on data other than those in its arguments. Suppose you have a function which returns a value which depends on the current hour of the day:
- sub on_duty {
- my ($problem_type) = @_;
- my $hour = (localtime)[2];
- open my $fh, "$DIR/$problem_type" or die...;
- my $line;
- while ($hour-- > 0)
- $line = <$fh>;
- }
- return $line;
- }
At 10:23, this function generates the 10th line of a data file; at
3:45 PM it generates the 15th line instead. By default, Memoize
will only see the $problem_type argument. To fix this, include the
current hour in the normalizer:
The calling context of the function (scalar or list context) is
propagated to the normalizer. This means that if the memoized
function will treat its arguments differently in list context than it
would in scalar context, you can have the normalizer function select
its behavior based on the results of wantarray. Even if called in
a list context, a normalizer should still return a single string.
SCALAR_CACHE
, LIST_CACHE
Normally, Memoize
caches your function's return values into an
ordinary Perl hash variable. However, you might like to have the
values cached on the disk, so that they persist from one run of your
program to the next, or you might like to associate some other
interesting semantics with the cached values.
There's a slight complication under the hood of Memoize
: There are
actually two caches, one for scalar values and one for list values.
When your function is called in scalar context, its return value is
cached in one hash, and when your function is called in list context,
its value is cached in the other hash. You can control the caching
behavior of both contexts independently with these options.
The argument to LIST_CACHE
or SCALAR_CACHE
must either be one of
the following four strings:
- MEMORY
- FAULT
- MERGE
- HASH
or else it must be a reference to an array whose first element is one of
these four strings, such as [HASH, arguments...]
.
MEMORY
MEMORY
means that return values from the function will be cached in
an ordinary Perl hash variable. The hash variable will not persist
after the program exits. This is the default.
HASH
HASH
allows you to specify that a particular hash that you supply
will be used as the cache. You can tie this hash beforehand to give
it any behavior you want.
A tied hash can have any semantics at all. It is typically tied to an
on-disk database, so that cached values are stored in the database and
retrieved from it again when needed, and the disk file typically
persists after your program has exited. See perltie
for more
complete details about tie.
A typical example is:
This has the effect of storing the cache in a DB_File
database
whose name is in $filename
. The cache will persist after the
program has exited. Next time the program runs, it will find the
cache already populated from the previous run of the program. Or you
can forcibly populate the cache by constructing a batch program that
runs in the background and populates the cache file. Then when you
come to run your real program the memoized function will be fast
because all its results have been precomputed.
Another reason to use HASH
is to provide your own hash variable.
You can then inspect or modify the contents of the hash to gain finer
control over the cache management.
TIE
This option is no longer supported. It is still documented only to
aid in the debugging of old programs that use it. Old programs should
be converted to use the HASH
option instead.
- memoize ... ['TIE', PACKAGE, ARGS...]
is merely a shortcut for
FAULT
FAULT
means that you never expect to call the function in scalar
(or list) context, and that if Memoize
detects such a call, it
should abort the program. The error message is one of
- `foo' function called in forbidden list context at line ...
- `foo' function called in forbidden scalar context at line ...
MERGE
MERGE
normally means that the memoized function does not
distinguish between list and sclar context, and that return values in
both contexts should be stored together. Both LIST_CACHE =>
MERGE
and SCALAR_CACHE => MERGE
mean the same thing.
Consider this function:
- sub complicated {
- # ... time-consuming calculation of $result
- return $result;
- }
The complicated
function will return the same numeric $result
regardless of whether it is called in list or in scalar context.
Normally, the following code will result in two calls to complicated
, even
if complicated
is memoized:
- $x = complicated(142);
- ($y) = complicated(142);
- $z = complicated(142);
The first call will cache the result, say 37, in the scalar cache; the
second will cach the list (37)
in the list cache. The third call
doesn't call the real complicated
function; it gets the value 37
from the scalar cache.
Obviously, the second call to complicated
is a waste of time, and
storing its return value is a waste of space. Specifying LIST_CACHE
=> MERGE
will make memoize
use the same cache for scalar and
list context return values, so that the second call uses the scalar
cache that was populated by the first call. complicated
ends up
being called only once, and both subsequent calls return 3
from the
cache, regardless of the calling context.
Consider this function:
This function normally returns a list. Suppose you memoize it and merge the caches:
- memoize 'iota', SCALAR_CACHE => 'MERGE';
- @i7 = iota(7);
- $i7 = iota(7);
Here the first call caches the list (1,2,3,4,5,6,7). The second call
does not really make sense. Memoize
cannot guess what behavior
iota
should have in scalar context without actually calling it in
scalar context. Normally Memoize
would call iota
in scalar
context and cache the result, but the SCALAR_CACHE => 'MERGE'
option says not to do that, but to use the cache list-context value
instead. But it cannot return a list of seven elements in a scalar
context. In this case $i7
will receive the first element of the
cached list value, namely 7.
Another use for MERGE
is when you want both kinds of return values
stored in the same disk file; this saves you from having to deal with
two disk files instead of one. You can use a normalizer function to
keep the two sets of return values separate. For example:
This normalizer function will store scalar context return values in
the disk file under keys that begin with S:
, and list context
return values under keys that begin with L:
.
unmemoize
There's an unmemoize
function that you can import if you want to.
Why would you want to? Here's an example: Suppose you have your cache
tied to a DBM file, and you want to make sure that the cache is
written out to disk if someone interrupts the program. If the program
exits normally, this will happen anyway, but if someone types
control-C or something then the program will terminate immediately
without synchronizing the database. So what you can do instead is
- $SIG{INT} = sub { unmemoize 'function' };
unmemoize
accepts a reference to, or the name of a previously
memoized function, and undoes whatever it did to provide the memoized
version in the first place, including making the name refer to the
unmemoized version if appropriate. It returns a reference to the
unmemoized version of the function.
If you ask it to unmemoize a function that was never memoized, it croaks.
flush_cache
flush_cache(function)
will flush out the caches, discarding all
the cached data. The argument may be a function name or a reference
to a function. For finer control over when data is discarded or
expired, see the documentation for Memoize::Expire
, included in
this package.
Note that if the cache is a tied hash, flush_cache
will attempt to
invoke the CLEAR
method on the hash. If there is no CLEAR
method, this will cause a run-time error.
An alternative approach to cache flushing is to use the HASH
option
(see above) to request that Memoize
use a particular hash variable
as its cache. Then you can examine or modify the hash at any time in
any way you desire. You may flush the cache by using %hash = ()
.
Memoization is not a cure-all:
Do not memoize a function whose behavior depends on program state other than its own arguments, such as global variables, the time of day, or file input. These functions will not produce correct results when memoized. For a particularly easy example:
- sub f {
- time;
- }
This function takes no arguments, and as far as Memoize
is
concerned, it always returns the same result. Memoize
is wrong, of
course, and the memoized version of this function will call time once
to get the current time, and it will return that same time
every time you call it after that.
Do not memoize a function with side effects.
This function accepts two arguments, adds them, and prints their sum.
Its return value is the numuber of characters it printed, but you
probably didn't care about that. But Memoize
doesn't understand
that. If you memoize this function, you will get the result you
expect the first time you ask it to print the sum of 2 and 3, but
subsequent calls will return 1 (the return value of
print) without actually printing anything.
Do not memoize a function that returns a data structure that is modified by its caller.
Consider these functions: getusers
returns a list of users somehow,
and then main
throws away the first user on the list and prints the
rest:
If you memoize getusers
here, it will work right exactly once. The
reference to the users list will be stored in the memo table. main
will discard the first element from the referenced list. The next
time you invoke main
, Memoize
will not call getusers
; it will
just return the same reference to the same list it got last time. But
this time the list has already had its head removed; main
will
erroneously remove another element from it. The list will get shorter
and shorter every time you call main
.
Similarly, this:
- $u1 = getusers();
- $u2 = getusers();
- pop @$u1;
will modify $u2 as well as $u1, because both variables are references
to the same array. Had getusers
not been memoized, $u1 and $u2
would have referred to different arrays.
Do not memoize a very simple function.
Recently someone mentioned to me that the Memoize module made his program run slower instead of faster. It turned out that he was memoizing the following function:
I pointed out that Memoize
uses a hash, and that looking up a
number in the hash is necessarily going to take a lot longer than a
single multiplication. There really is no way to speed up the
square
function.
Memoization is not magical.
You can tie the cache tables to any sort of tied hash that you want
to, as long as it supports TIEHASH
, FETCH
, STORE
, and
EXISTS
. For example,
works just fine. For some storage methods, you need a little glue.
SDBM_File
doesn't supply an EXISTS
method, so included in this
package is a glue module called Memoize::SDBM_File
which does
provide one. Use this instead of plain SDBM_File
to store your
cache table on disk in an SDBM_File
database:
NDBM_File
has the same problem and the same solution. (Use
Memoize::NDBM_File instead of plain NDBM_File.
)
Storable
isn't a tied hash class at all. You can use it to store a
hash to disk and retrieve it again, but you can't modify the hash while
it's on the disk. So if you want to store your cache table in a
Storable
database, use Memoize::Storable
, which puts a hashlike
front-end onto Storable
. The hash table is actually kept in
memory, and is loaded from your Storable
file at the time you
memoize the function, and stored back at the time you unmemoize the
function (or when your program exits):
Include the `nstore' option to have the Storable
database written
in `network order'. (See Storable for more details about this.)
The flush_cache()
function will raise a run-time error unless the
tied package provides a CLEAR
method.
See Memoize::Expire, which is a plug-in module that adds expiration functionality to Memoize. If you don't like the kinds of policies that Memoize::Expire implements, it is easy to write your own plug-in module to implement whatever policy you desire. Memoize comes with several examples. An expiration manager that implements a LRU policy is available on CPAN as Memoize::ExpireLRU.
The test suite is much better, but always needs improvement.
There is some problem with the way goto &f
works under threaded
Perl, perhaps because of the lexical scoping of @_
. This is a bug
in Perl, and until it is resolved, memoized functions will see a
slightly different caller() and will perform a little more slowly
on threaded perls than unthreaded perls.
Some versions of DB_File
won't let you store data under a key of
length 0. That means that if you have a function f
which you
memoized and the cache is in a DB_File
database, then the value of
f()
(f
called with no arguments) will not be memoized. If this
is a big problem, you can supply a normalizer function that prepends
"x"
to every key.
To join a very low-traffic mailing list for announcements about
Memoize
, send an empty note to mjd-perl-memoize-request@plover.com
.
Mark-Jason Dominus (mjd-perl-memoize+@plover.com
), Plover Systems co.
See the Memoize.pm
Page at http://perl.plover.com/Memoize/
for news and upgrades. Near this page, at
http://perl.plover.com/MiniMemoize/ there is an article about
memoization and about the internals of Memoize that appeared in The
Perl Journal, issue #13. (This article is also included in the
Memoize distribution as `article.html'.)
The author's book Higher-Order Perl (2005, ISBN 1558607013, published by Morgan Kaufmann) discusses memoization (and many other topics) in tremendous detail. It is available on-line for free. For more information, visit http://hop.perl.plover.com/ .
To join a mailing list for announcements about Memoize
, send an
empty message to mjd-perl-memoize-request@plover.com
. This mailing
list is for announcements only and has extremely low traffic---fewer than
two messages per year.
Copyright 1998, 1999, 2000, 2001, 2012 by Mark Jason Dominus
This library is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
Many thanks to Florian Ragwitz for administration and packaging
assistance, to John Tromp for bug reports, to Jonathan Roy for bug reports
and suggestions, to Michael Schwern for other bug reports and patches,
to Mike Cariaso for helping me to figure out the Right Thing to Do
About Expiration, to Joshua Gerth, Joshua Chamas, Jonathan Roy
(again), Mark D. Anderson, and Andrew Johnson for more suggestions
about expiration, to Brent Powers for the Memoize::ExpireLRU module,
to Ariel Scolnicov for delightful messages about the Fibonacci
function, to Dion Almaer for thought-provoking suggestions about the
default normalizer, to Walt Mankowski and Kurt Starsinic for much help
investigating problems under threaded Perl, to Alex Dudkevich for
reporting the bug in prototyped functions and for checking my patch,
to Tony Bass for many helpful suggestions, to Jonathan Roy (again) for
finding a use for unmemoize()
, to Philippe Verdret for enlightening
discussion of Hook::PrePostCall
, to Nat Torkington for advice I
ignored, to Chris Nandor for portability advice, to Randal Schwartz
for suggesting the 'flush_cache
function, and to Jenda Krynicky for
being a light in the world.
Special thanks to Jarkko Hietaniemi, the 5.8.0 pumpking, for including this module in the core and for his patient and helpful guidance during the integration process.
NDBM_File - Tied access to ndbm files
NDBM_File
establishes a connection between a Perl hash variable and
a file in NDBM_File format;. You can manipulate the data in the file
just as if it were in a Perl hash, but when your program exits, the
data will remain in the file, to be used the next time your program
runs.
Use NDBM_File
with the Perl built-in tie function to establish
the connection between the variable and the file. The arguments to
tie should be:
The hash variable you want to tie.
The string "NDBM_File"
. (Ths tells Perl to use the NDBM_File
package to perform the functions of the hash.)
The name of the file you want to tie to the hash.
Flags. Use one of:
O_RDONLY
Read-only access to the data in the file.
O_WRONLY
Write-only access to the data in the file.
O_RDWR
Both read and write access.
If you want to create the file if it does not exist, add O_CREAT
to
any of these, as in the example. If you omit O_CREAT
and the file
does not already exist, the tie call will fail.
The default permissions to use if a new file is created. The actual permissions will be modified by the user's umask, so you should probably use 0666 here. (See umask.)
On failure, the tie call returns an undefined value and probably
sets $!
to contain the reason the file could not be tied.
ndbm store returned -1, errno 22, key "..." at ...This warning is emitted when you try to store a key or a value that is too long. It means that the change was not recorded in the database. See BUGS AND WARNINGS below.
There are a number of limits on the size of the data that you can store in the NDBM file. The most important is that the length of a key, plus the length of its associated value, may not exceed 1008 bytes.
See tie, perldbmfilter, Fcntl
NEXT.pm - Provide a pseudo-class NEXT (et al) that allows method redispatch
- use NEXT;
- package A;
- sub A::method { print "$_[0]: A method\n"; $_[0]->NEXT::method() }
- sub A::DESTROY { print "$_[0]: A dtor\n"; $_[0]->NEXT::DESTROY() }
- package B;
- use base qw( A );
- sub B::AUTOLOAD { print "$_[0]: B AUTOLOAD\n"; $_[0]->NEXT::AUTOLOAD() }
- sub B::DESTROY { print "$_[0]: B dtor\n"; $_[0]->NEXT::DESTROY() }
- package C;
- sub C::method { print "$_[0]: C method\n"; $_[0]->NEXT::method() }
- sub C::AUTOLOAD { print "$_[0]: C AUTOLOAD\n"; $_[0]->NEXT::AUTOLOAD() }
- sub C::DESTROY { print "$_[0]: C dtor\n"; $_[0]->NEXT::DESTROY() }
- package D;
- use base qw( B C );
- sub D::method { print "$_[0]: D method\n"; $_[0]->NEXT::method() }
- sub D::AUTOLOAD { print "$_[0]: D AUTOLOAD\n"; $_[0]->NEXT::AUTOLOAD() }
- sub D::DESTROY { print "$_[0]: D dtor\n"; $_[0]->NEXT::DESTROY() }
- package main;
- my $obj = bless {}, "D";
- $obj->method(); # Calls D::method, A::method, C::method
- $obj->missing_method(); # Calls D::AUTOLOAD, B::AUTOLOAD, C::AUTOLOAD
- # Clean-up calls D::DESTROY, B::DESTROY, A::DESTROY, C::DESTROY
NEXT.pm adds a pseudoclass named NEXT
to any program
that uses it. If a method m calls $self->NEXT::m()
, the call to
m is redispatched as if the calling method had not originally been found.
In other words, a call to $self->NEXT::m()
resumes the depth-first,
left-to-right search of $self
's class hierarchy that resulted in the
original call to m.
Note that this is not the same thing as $self->SUPER::m()
, which
begins a new dispatch that is restricted to searching the ancestors
of the current class. $self->NEXT::m()
can backtrack
past the current class -- to look for a suitable method in other
ancestors of $self
-- whereas $self->SUPER::m()
cannot.
A typical use would be in the destructors of a class hierarchy, as illustrated in the synopsis above. Each class in the hierarchy has a DESTROY method that performs some class-specific action and then redispatches the call up the hierarchy. As a result, when an object of class D is destroyed, the destructors of all its parent classes are called (in depth-first, left-to-right order).
Another typical use of redispatch would be in AUTOLOAD
'ed methods.
If such a method determined that it was not able to handle a
particular call, it might choose to redispatch that call, in the
hope that some other AUTOLOAD
(above it, or to its left) might
do better.
By default, if a redispatch attempt fails to find another method
elsewhere in the objects class hierarchy, it quietly gives up and does
nothing (but see Enforcing redispatch). This gracious acquiescence
is also unlike the (generally annoying) behaviour of SUPER
, which
throws an exception if it cannot redispatch.
Note that it is a fatal error for any method (including AUTOLOAD
)
to attempt to redispatch any method that does not have the
same name. For example:
- sub D::oops { print "oops!\n"; $_[0]->NEXT::other_method() }
It is possible to make NEXT
redispatch more demandingly (i.e. like
SUPER
does), so that the redispatch throws an exception if it cannot
find a "next" method to call.
To do this, simple invoke the redispatch as:
- $self->NEXT::ACTUAL::method();
rather than:
- $self->NEXT::method();
The ACTUAL
tells NEXT
that there must actually be a next method to call,
or it should throw an exception.
NEXT::ACTUAL
is most commonly used in AUTOLOAD
methods, as a means to
decline an AUTOLOAD
request, but preserve the normal exception-on-failure
semantics:
- sub AUTOLOAD {
- if ($AUTOLOAD =~ /foo|bar/) {
- # handle here
- }
- else { # try elsewhere
- shift()->NEXT::ACTUAL::AUTOLOAD(@_);
- }
- }
By using NEXT::ACTUAL
, if there is no other AUTOLOAD
to handle the
method call, an exception will be thrown (as usually happens in the absence of
a suitable AUTOLOAD
).
If NEXT
redispatching is used in the methods of a "diamond" class hierarchy:
- # A B
- # / \ /
- # C D
- # \ /
- # E
- use NEXT;
- package A;
- sub foo { print "called A::foo\n"; shift->NEXT::foo() }
- package B;
- sub foo { print "called B::foo\n"; shift->NEXT::foo() }
- package C; @ISA = qw( A );
- sub foo { print "called C::foo\n"; shift->NEXT::foo() }
- package D; @ISA = qw(A B);
- sub foo { print "called D::foo\n"; shift->NEXT::foo() }
- package E; @ISA = qw(C D);
- sub foo { print "called E::foo\n"; shift->NEXT::foo() }
- E->foo();
then derived classes may (re-)inherit base-class methods through two or
more distinct paths (e.g. in the way E
inherits A::foo
twice --
through C
and D
). In such cases, a sequence of NEXT
redispatches
will invoke the multiply inherited method as many times as it is
inherited. For example, the above code prints:
- called E::foo
- called C::foo
- called A::foo
- called D::foo
- called A::foo
- called B::foo
(i.e. A::foo
is called twice).
In some cases this may be the desired effect within a diamond hierarchy, but in others (e.g. for destructors) it may be more appropriate to call each method only once during a sequence of redispatches.
To cover such cases, you can redispatch methods via:
- $self->NEXT::DISTINCT::method();
rather than:
- $self->NEXT::method();
This causes the redispatcher to only visit each distinct method
method
once. That is, to skip any classes in the hierarchy that it has
already visited during redispatch. So, for example, if the
previous example were rewritten:
- package A;
- sub foo { print "called A::foo\n"; shift->NEXT::DISTINCT::foo() }
- package B;
- sub foo { print "called B::foo\n"; shift->NEXT::DISTINCT::foo() }
- package C; @ISA = qw( A );
- sub foo { print "called C::foo\n"; shift->NEXT::DISTINCT::foo() }
- package D; @ISA = qw(A B);
- sub foo { print "called D::foo\n"; shift->NEXT::DISTINCT::foo() }
- package E; @ISA = qw(C D);
- sub foo { print "called E::foo\n"; shift->NEXT::DISTINCT::foo() }
- E->foo();
then it would print:
- called E::foo
- called C::foo
- called A::foo
- called D::foo
- called B::foo
and omit the second call to A::foo
(since it would not be distinct
from the first call to A::foo
).
Note that you can also use:
- $self->NEXT::DISTINCT::ACTUAL::method();
or:
- $self->NEXT::ACTUAL::DISTINCT::method();
to get both unique invocation and exception-on-failure.
Note that, for historical compatibility, you can also use
NEXT::UNSEEN
instead of NEXT::DISTINCT
.
Yet another pseudo-class that NEXT.pm provides is EVERY
.
Its behaviour is considerably simpler than that of the NEXT
family.
A call to:
- $obj->EVERY::foo();
calls every method named foo
that the object in $obj
has inherited.
That is:
- use NEXT;
- package A; @ISA = qw(B D X);
- sub foo { print "A::foo " }
- package B; @ISA = qw(D X);
- sub foo { print "B::foo " }
- package X; @ISA = qw(D);
- sub foo { print "X::foo " }
- package D;
- sub foo { print "D::foo " }
- package main;
- my $obj = bless {}, 'A';
- $obj->EVERY::foo(); # prints" A::foo B::foo X::foo D::foo
Prefixing a method call with EVERY::
causes every method in the
object's hierarchy with that name to be invoked. As the above example
illustrates, they are not called in Perl's usual "left-most-depth-first"
order. Instead, they are called "breadth-first-dependency-wise".
That means that the inheritance tree of the object is traversed breadth-first
and the resulting order of classes is used as the sequence in which methods
are called. However, that sequence is modified by imposing a rule that the
appropriate method of a derived class must be called before the same method of
any ancestral class. That's why, in the above example, X::foo
is called
before D::foo
, even though D
comes before X
in @B::ISA
.
In general, there's no need to worry about the order of calls. They will be
left-to-right, breadth-first, most-derived-first. This works perfectly for
most inherited methods (including destructors), but is inappropriate for
some kinds of methods (such as constructors, cloners, debuggers, and
initializers) where it's more appropriate that the least-derived methods be
called first (as more-derived methods may rely on the behaviour of their
"ancestors"). In that case, instead of using the EVERY
pseudo-class:
- $obj->EVERY::foo(); # prints" A::foo B::foo X::foo D::foo
you can use the EVERY::LAST
pseudo-class:
- $obj->EVERY::LAST::foo(); # prints" D::foo X::foo B::foo A::foo
which reverses the order of method call.
Whichever version is used, the actual methods are called in the same
context (list, scalar, or void) as the original call via EVERY
, and return:
A hash of array references in list context. Each entry of the hash has the fully qualified method name as its key and a reference to an array containing the method's list-context return values as its value.
A reference to a hash of scalar values in scalar context. Each entry of the hash has the fully qualified method name as its key and the method's scalar-context return values as its value.
Nothing in void context (obviously).
EVERY
methodsThe typical way to use an EVERY
call is to wrap it in another base
method, that all classes inherit. For example, to ensure that every
destructor an object inherits is actually called (as opposed to just the
left-most-depth-first-est one):
et cetera. Every derived class than needs its own clean-up
behaviour simply adds its own Destroy
method (not a DESTROY
method),
which the call to EVERY::LAST::Destroy
in the inherited destructor
then correctly picks up.
Likewise, to create a class hierarchy in which every initializer inherited by a new object is invoked:
et cetera. Every derived class than needs some additional initialization
behaviour simply adds its own Init
method (not a new
method),
which the call to EVERY::LAST::Init
in the inherited constructor
then correctly picks up.
Damian Conway (damian@conway.org)
Because it's a module, not an integral part of the interpreter, NEXT.pm has to guess where the surrounding call was found in the method look-up sequence. In the presence of diamond inheritance patterns it occasionally guesses wrong.
It's also too slow (despite caching).
Comment, suggestions, and patches welcome.
- Copyright (c) 2000-2001, Damian Conway. All Rights Reserved.
- This module is free software. It may be used, redistributed
- and/or modified under the same terms as Perl itself.
O - Generic interface to Perl Compiler backends
- perl -MO=[-q,]Backend[,OPTIONS] foo.pl
This is the module that is used as a frontend to the Perl Compiler.
If you pass the -q option to the module, then the STDOUT
filehandle will be redirected into the variable $O::BEGIN_output
during compilation. This has the effect that any output printed
to STDOUT by BEGIN blocks or use'd modules will be stored in this
variable rather than printed. It's useful with those backends which
produce output themselves (Deparse
, Concise
etc), so that
their output is not confused with that generated by the code
being compiled.
The -qq option behaves like -q, except that it also closes
STDERR after deparsing has finished. This suppresses the "Syntax OK"
message normally produced by perl.
Most compiler backends use the following conventions: OPTIONS
consists of a comma-separated list of words (no white-space).
The -v
option usually puts the backend into verbose mode.
The -ofile
option generates output to file instead of
stdout. The -D
option followed by various letters turns on
various internal debugging flags. See the documentation for the
desired backend (named B::Backend
for the example above) to
find out about that backend.
This section is only necessary for those who want to write a compiler backend module that can be used via this module.
The command-line mentioned in the SYNOPSIS section corresponds to the Perl code
- use O ("Backend", OPTIONS);
The O::import
function loads the appropriate B::Backend
module
and calls its compile
function, passing it OPTIONS. That function
is expected to return a sub reference which we'll call CALLBACK. Next,
the "compile-only" flag is switched on (equivalent to the command-line
option -c
) and a CHECK block is registered which calls
CALLBACK. Thus the main Perl program mentioned on the command-line is
read in, parsed and compiled into internal syntax tree form. Since the
-c
flag is set, the program does not start running (excepting BEGIN
blocks of course) but the CALLBACK function registered by the compiler
backend is called.
In summary, a compiler backend module should be called "B::Foo"
for some foo and live in the appropriate directory for that name.
It should define a function called compile
. When the user types
- perl -MO=Foo,OPTIONS foo.pl
that function is called and is passed those OPTIONS (split on
commas). It should return a sub ref to the main compilation function.
After the user's program is loaded and parsed, that returned sub ref
is invoked which can then go ahead and do the compilation, usually by
making use of the B
module's functionality.
The -q and -qq options don't work correctly if perl isn't
compiled with PerlIO support : STDOUT will be closed instead of being
redirected to $O::BEGIN_output
.
Malcolm Beattie, mbeattie@sable.ox.ac.uk
Opcode - Disable named opcodes when compiling perl code
- use Opcode;
Perl code is always compiled into an internal format before execution.
Evaluating perl code (e.g. via "eval" or "do 'file'") causes the code to be compiled into an internal format and then, provided there was no error in the compilation, executed. The internal format is based on many distinct opcodes.
By default no opmask is in effect and any code can be compiled.
The Opcode module allow you to define an operator mask to be in effect when perl next compiles any code. Attempting to compile code which contains a masked opcode will cause the compilation to fail with an error. The code will not be executed.
The Opcode module is not usually used directly. See the ops pragma and Safe modules for more typical uses.
The authors make no warranty, implied or otherwise, about the suitability of this software for safety or security purposes.
The authors shall not in any case be liable for special, incidental, consequential, indirect or other similar damages arising from the use of this software.
Your mileage will vary. If in any doubt do not use it.
The canonical list of operator names is the contents of the array PL_op_name defined and initialised in file opcode.h of the Perl source distribution (and installed into the perl library).
Each operator has both a terse name (its opname) and a more verbose or recognisable descriptive name. The opdesc function can be used to return a list of descriptions for a list of operators.
Many of the functions and methods listed below take a list of operators as parameters. Most operator lists can be made up of several types of element. Each element can be one of
Operator names are typically small lowercase words like enterloop, leaveloop, last, next, redo etc. Sometimes they are rather cryptic like gv2cv, i_ncmp and ftsvtx.
Operator tags can be used to refer to groups (or sets) of operators. Tag names always begin with a colon. The Opcode module defines several optags and the user can define others using the define_optag function.
An opname or optag can be prefixed with an exclamation mark, e.g., !mkdir. Negating an opname or optag means remove the corresponding ops from the accumulated set of ops at that point.
An opset as a binary string of approximately 44 bytes which holds a set or zero or more operators.
The opset and opset_to_ops functions can be used to convert from a list of operators to an opset and vice versa.
Wherever a list of operators can be given you can use one or more opsets. See also Manipulating Opsets below.
The Opcode package contains functions for manipulating operator names tags and sets. All are available for export by the package.
In a scalar context opcodes returns the number of opcodes in this version of perl (around 350 for perl-5.7.0).
In a list context it returns a list of all the operator names. (Not yet implemented, use @names = opset_to_ops(full_opset).)
Returns an opset containing the listed operators.
Returns a list of operator names corresponding to those operators in the set.
Returns a string representation of an opset. Can be handy for debugging.
Returns an opset which includes all operators.
Returns an opset which contains no operators.
Returns an opset which is the inverse set of the one supplied.
Returns true if the supplied opset looks like a valid opset (is the right length etc) otherwise it returns false. If an optional second parameter is true then verify_opset will croak on an invalid opset instead of returning false.
Most of the other Opcode functions call verify_opset automatically and will croak if given an invalid opset.
Define OPTAG as a symbolic name for OPSET. Optag names always start
with a colon :
.
The optag name used must not be defined already (define_optag will croak if it is already defined). Optag names are global to the perl process and optag definitions cannot be altered or deleted once defined.
It is strongly recommended that applications using Opcode should use a leading capital letter on their tag names since lowercase names are reserved for use by the Opcode module. If using Opcode within a module you should prefix your tags names with the name of your module to ensure uniqueness and thus avoid clashes with other modules.
Adds the supplied opset to the current opmask. Note that there is currently no mechanism for unmasking ops once they have been masked. This is intentional.
Returns an opset corresponding to the current opmask.
This takes a list of operator names and returns the corresponding list of operator descriptions.
Dumps to STDOUT a two column list of op names and op descriptions. If an optional pattern is given then only lines which match the (case insensitive) pattern will be output.
It's designed to be used as a handy command line utility:
- perl -MOpcode=opdump -e opdump
- perl -MOpcode=opdump -e 'opdump Eval'
Opsets may be manipulated using the perl bit vector operators & (and), | (or), ^ (xor) and ~ (negate/invert).
However you should never rely on the numerical position of any opcode within the opset. In other words both sides of a bit vector operator should be opsets returned from Opcode functions.
Also, since the number of opcodes in your current version of perl might not be an exact multiple of eight, there may be unused bits in the last byte of an upset. This should not cause any problems (Opcode functions ignore those extra bits) but it does mean that using the ~ operator will typically not produce the same 'physical' opset 'string' as the invert_opset function.
- null stub scalar pushmark wantarray const defined undef
- rv2sv sassign
- rv2av aassign aelem aelemfast aelemfast_lex aslice av2arylen
- rv2hv helem hslice each values keys exists delete aeach akeys
- avalues reach rvalues rkeys
- preinc i_preinc predec i_predec postinc i_postinc
- postdec i_postdec int hex oct abs pow multiply i_multiply
- divide i_divide modulo i_modulo add i_add subtract i_subtract
- left_shift right_shift bit_and bit_xor bit_or negate i_negate
- not complement
- lt i_lt gt i_gt le i_le ge i_ge eq i_eq ne i_ne ncmp i_ncmp
- slt sgt sle sge seq sne scmp
- substr vec stringify study pos length index rindex ord chr
- ucfirst lcfirst uc lc fc quotemeta trans transr chop schop
- chomp schomp
- match split qr
- list lslice splice push pop shift unshift reverse
- cond_expr flip flop andassign orassign dorassign and or dor xor
- warn die lineseq nextstate scope enter leave
- rv2cv anoncode prototype coreargs
- entersub leavesub leavesublv return method method_named
- -- XXX loops via recursion?
- leaveeval -- needed for Safe to operate, is safe
- without entereval
These memory related ops are not included in :base_core because they can easily be used to implement a resource attack (e.g., consume all available memory).
- concat repeat join range
- anonlist anonhash
Note that despite the existence of this optag a memory resource attack may still be possible using only :base_core ops.
Disabling these ops is a very heavy handed way to attempt to prevent a memory resource attack. It's probable that a specific memory limit mechanism will be added to perl in the near future.
These loop ops are not included in :base_core because they can easily be used to implement a resource attack (e.g., consume all available CPU time).
These ops enable filehandle (rather than filename) based input and output. These are safe on the assumption that only pre-existing filehandles are available for use. Usually, to create new filehandles other ops such as open would need to be enabled, if you don't take into account the magical open of ARGV.
These are a hotchpotch of opcodes still waiting to be considered
- gvsv gv gelem
- padsv padav padhv padcv padany padrange introcv clonecv
- once
- rv2gv refgen srefgen ref
- bless -- could be used to change ownership of objects
- (reblessing)
- pushre regcmaybe regcreset regcomp subst substcont
- sprintf prtf -- can core dump
- crypt
- tie untie
- dbmopen dbmclose
- sselect select
- pipe_op sockpair
- getppid getpgrp setpgrp getpriority setpriority
- localtime gmtime
- entertry leavetry -- can be used to 'hide' fatal errors
- entergiven leavegiven
- enterwhen leavewhen
- break continue
- smartmatch
- custom -- where should this go
These ops are not included in :base_core because of the risk of them being used to generate floating point exceptions (which would have to be caught using a $SIG{FPE} handler).
These ops are not included in :base_core because they have an effect beyond the scope of the compartment.
These ops are related to multi-threading.
A handy tag name for a reasonable default set of ops. (The current ops allowed are unstable while development continues. It will change.)
- :base_core :base_mem :base_loop :base_orig :base_thread
This list used to contain :base_io prior to Opcode 1.07.
If safety matters to you (and why else would you be using the Opcode module?) then you should not rely on the definition of this, or indeed any other, optag!
- ghbyname ghbyaddr ghostent shostent ehostent -- hosts
- gnbyname gnbyaddr gnetent snetent enetent -- networks
- gpbyname gpbynumber gprotoent sprotoent eprotoent -- protocols
- gsbyname gsbyport gservent sservent eservent -- services
- gpwnam gpwuid gpwent spwent epwent getlogin -- users
- ggrnam ggrgid ggrent sgrent egrent -- groups
A handy tag name for a reasonable default set of ops beyond the :default optag. Like :default (and indeed all the other optags) its current definition is unstable while development continues. It will change.
The :browse tag represents the next step beyond :default. It it a superset of the :default ops and adds :filesys_read the :sys_db. The intent being that scripts can access more (possibly sensitive) information about your system but not be able to change it.
- :default :filesys_read :sys_db
- sysopen open close
- umask binmode
- open_dir closedir -- other dir ops are in :base_io
- exec exit kill
- time tms -- could be used for timing attacks (paranoid?)
This tag holds groups of assorted specialist opcodes that don't warrant having optags defined for them.
SystemV Interprocess Communications:
This tag holds opcodes related to loading modules and getting information about calling environment and args.
- chdir
- flock ioctl
- socket getpeername ssockopt
- bind connect listen accept shutdown gsockopt getsockname
- sleep alarm -- changes global timer state and signal handling
- sort -- assorted problems including core dumps
- tied -- can be used to access object implementing a tie
- pack unpack -- can be used to create/use memory pointers
- hintseval -- constant op holding eval hints
- entereval -- can be used to hide code from initial compile
- reset
- dbstate -- perl -d version of nextstate(ment) opcode
This tag is simply a bucket for opcodes that are unlikely to be used via a tag name but need to be tagged for completeness and documentation.
ops -- perl pragma interface to Opcode module.
Safe -- Opcode and namespace limited execution compartments
Originally designed and implemented by Malcolm Beattie, mbeattie@sable.ox.ac.uk as part of Safe version 1.
Split out from Safe module version 1, named opcode tags and other changes added by Tim Bunce.
POSIX - Perl interface to IEEE Std 1003.1
The POSIX module permits you to access all (or nearly all) the standard POSIX 1003.1 identifiers. Many of these identifiers have been given Perl-ish interfaces.
Everything is exported by default with the exception of any POSIX
functions with the same name as a built-in Perl function, such as
abs, alarm, rmdir, write, etc.., which will be exported
only if you ask for them explicitly. This is an unfortunate backwards
compatibility feature. You can stop the exporting by saying use
POSIX ()
and then use the fully qualified names (ie. POSIX::SEEK_END
),
or by giving an explicit import list. If you do neither, and opt for the
default, use POSIX;
has to import 553 symbols.
This document gives a condensed list of the features available in the POSIX module. Consult your operating system's manpages for general information on most features. Consult perlfunc for functions which are noted as being identical to Perl's builtin functions.
The first section describes POSIX functions from the 1003.1 specification. The second section describes some classes for signal objects, TTY objects, and other miscellaneous objects. The remaining sections list various constants and macros in an organization which roughly follows IEEE Std 1003.1b-1993.
A few functions are not implemented because they are C specific. If you attempt to call these, they will print a message telling you that they aren't implemented, and suggest using the Perl equivalent should one exist. For example, trying to access the setjmp() call will elicit the message "setjmp() is C-specific: use eval {} instead".
Furthermore, some evil vendors will claim 1003.1 compliance, but in fact are not so: they will not pass the PCTS (POSIX Compliance Test Suites). For example, one vendor may not define EDEADLK, or the semantics of the errno values set by open(2) might not be quite right. Perl does not attempt to verify POSIX compliance. That means you can currently successfully say "use POSIX", and then later in your program you find that your vendor has been lax and there's no usable ICANON macro after all. This could be construed to be a bug.
This is identical to the C function _exit()
. It exits the program
immediately which means among other things buffered I/O is not flushed.
Note that when using threads and in Linux this is not a good way to exit a thread because in Linux processes and threads are kind of the same thing (Note: while this is the situation in early 2003 there are projects under way to have threads with more POSIXly semantics in Linux). If you want not to return from a thread, detach the thread.
This is identical to the C function abort()
. It terminates the
process with a SIGABRT
signal unless caught by a signal handler or
if the handler does not return normally (it e.g. does a longjmp
).
This is identical to Perl's builtin abs() function, returning
the absolute value of its numerical argument.
Determines the accessibility of a file.
- if( POSIX::access( "/", &POSIX::R_OK ) ){
- print "have read permission\n";
- }
Returns undef on failure. Note: do not use access()
for
security purposes. Between the access()
call and the operation
you are preparing for the permissions might change: a classic
race condition.
This is identical to the C function acos()
, returning
the arcus cosine of its numerical argument. See also Math::Trig.
This is identical to Perl's builtin alarm() function,
either for arming or disarming the SIGARLM
timer.
This is identical to the C function asctime()
. It returns
a string of the form
- "Fri Jun 2 18:22:13 2000\n\0"
and it is called thusly
- $asctime = asctime($sec, $min, $hour, $mday, $mon, $year,
- $wday, $yday, $isdst);
The $mon
is zero-based: January equals 0
. The $year
is
1900-based: 2001 equals 101
. $wday
and $yday
default to zero
(and are usually ignored anyway), and $isdst
defaults to -1.
This is identical to the C function asin()
, returning
the arcus sine of its numerical argument. See also Math::Trig.
Unimplemented, but you can use die and the Carp module to achieve similar things.
This is identical to the C function atan()
, returning the
arcus tangent of its numerical argument. See also Math::Trig.
This is identical to Perl's builtin atan2() function, returning
the arcus tangent defined by its two numerical arguments, the y
coordinate and the x coordinate. See also Math::Trig.
atexit() is C-specific: use END {}
instead, see perlsub.
atof() is C-specific. Perl converts strings to numbers transparently. If you need to force a scalar to a number, add a zero to it.
atoi() is C-specific. Perl converts strings to numbers transparently. If you need to force a scalar to a number, add a zero to it. If you need to have just the integer part, see int.
atol() is C-specific. Perl converts strings to numbers transparently. If you need to force a scalar to a number, add a zero to it. If you need to have just the integer part, see int.
bsearch() not supplied. For doing binary search on wordlists, see Search::Dict.
calloc() is C-specific. Perl does memory management transparently.
This is identical to the C function ceil()
, returning the smallest
integer value greater than or equal to the given numerical argument.
This is identical to Perl's builtin chdir() function, allowing
one to change the working (default) directory, see chdir.
This is identical to Perl's builtin chmod() function, allowing
one to change file and directory permissions, see chmod.
This is identical to Perl's builtin chown() function, allowing one
to change file and directory owners and groups, see chown.
Use the method IO::Handle::clearerr()
instead, to reset the error
state (if any) and EOF state (if any) of the given stream.
This is identical to the C function clock()
, returning the
amount of spent processor time in microseconds.
Close the file. This uses file descriptors such as those obtained by calling
POSIX::open
.
- $fd = POSIX::open( "foo", &POSIX::O_RDONLY );
- POSIX::close( $fd );
Returns undef on failure.
See also close.
This is identical to Perl's builtin closedir() function for closing
a directory handle, see closedir.
This is identical to Perl's builtin cos() function, for returning
the cosine of its numerical argument, see cos.
See also Math::Trig.
This is identical to the C function cosh()
, for returning
the hyperbolic cosine of its numeric argument. See also Math::Trig.
Create a new file. This returns a file descriptor like the ones returned by
POSIX::open
. Use POSIX::close
to close the file.
- $fd = POSIX::creat( "foo", 0611 );
- POSIX::close( $fd );
See also sysopen and its O_CREAT
flag.
Generates the path name for the controlling terminal.
- $path = POSIX::ctermid();
This is identical to the C function ctime()
and equivalent
to asctime(localtime(...))
, see asctime and localtime.
Get the login name of the owner of the current process.
- $name = POSIX::cuserid();
This is identical to the C function difftime()
, for returning
the time difference (in seconds) between two times (as returned
by time()), see time.
div() is C-specific, use int on the usual / division and
the modulus %
.
This is similar to the C function dup()
, for duplicating a file
descriptor.
This uses file descriptors such as those obtained by calling
POSIX::open
.
Returns undef on failure.
This is similar to the C function dup2()
, for duplicating a file
descriptor to an another known file descriptor.
This uses file descriptors such as those obtained by calling
POSIX::open
.
Returns undef on failure.
Returns the value of errno.
- $errno = POSIX::errno();
This identical to the numerical values of the $!
, see $ERRNO in perlvar.
execl() is C-specific, see exec.
execle() is C-specific, see exec.
execlp() is C-specific, see exec.
execv() is C-specific, see exec.
execve() is C-specific, see exec.
execvp() is C-specific, see exec.
This is identical to Perl's builtin exit() function for exiting the
program, see exit.
This is identical to Perl's builtin exp() function for
returning the exponent (e-based) of the numerical argument,
see exp.
This is identical to Perl's builtin abs() function for returning
the absolute value of the numerical argument, see abs.
Use method IO::Handle::close()
instead, or see close.
This is identical to Perl's builtin fcntl() function,
see fcntl.
Use method IO::Handle::new_from_fd()
instead, or see open.
Use method IO::Handle::eof()
instead, or see eof.
Use method IO::Handle::error()
instead.
Use method IO::Handle::flush()
instead.
See also $OUTPUT_AUTOFLUSH in perlvar.
Use method IO::Handle::getc()
instead, or see read.
Use method IO::Seekable::getpos()
instead, or see seek.
Use method IO::Handle::gets()
instead. Similar to <>, also known
as readline.
Use method IO::Handle::fileno()
instead, or see fileno.
This is identical to the C function floor()
, returning the largest
integer value less than or equal to the numerical argument.
This is identical to the C function fmod()
.
- $r = fmod($x, $y);
It returns the remainder $r = $x - $n*$y
, where $n = trunc($x/$y)
.
The $r
has the same sign as $x
and magnitude (absolute value)
less than the magnitude of $y
.
Use method IO::File::open()
instead, or see open.
This is identical to Perl's builtin fork() function
for duplicating the current process, see fork
and perlfork if you are in Windows.
Retrieves the value of a configurable limit on a file or directory. This
uses file descriptors such as those obtained by calling POSIX::open
.
The following will determine the maximum length of the longest allowable pathname on the filesystem which holds /var/foo.
- $fd = POSIX::open( "/var/foo", &POSIX::O_RDONLY );
- $path_max = POSIX::fpathconf( $fd, &POSIX::_PC_PATH_MAX );
Returns undef on failure.
fprintf() is C-specific, see printf instead.
fputc() is C-specific, see print instead.
fputs() is C-specific, see print instead.
fread() is C-specific, see read instead.
free() is C-specific. Perl does memory management transparently.
freopen() is C-specific, see open instead.
Return the mantissa and exponent of a floating-point number.
- ($mantissa, $exponent) = POSIX::frexp( 1.234e56 );
fscanf() is C-specific, use <> and regular expressions instead.
Use method IO::Seekable::seek()
instead, or see seek.
Use method IO::Seekable::setpos()
instead, or seek seek.
Get file status. This uses file descriptors such as those obtained by
calling POSIX::open
. The data returned is identical to the data from
Perl's builtin stat function.
- $fd = POSIX::open( "foo", &POSIX::O_RDONLY );
- @stats = POSIX::fstat( $fd );
Use method IO::Handle::sync()
instead.
Use method IO::Seekable::tell()
instead, or see tell.
fwrite() is C-specific, see print instead.
This is identical to Perl's builtin getc() function,
see getc.
Returns one character from STDIN. Identical to Perl's getc(),
see getc.
Returns the name of the current working directory. See also Cwd.
Returns the effective group identifier. Similar to Perl' s builtin
variable $(
, see $EGID in perlvar.
Returns the value of the specified environment variable.
The same information is available through the %ENV
array.
Returns the effective user identifier. Identical to Perl's builtin $>
variable, see $EUID in perlvar.
Returns the user's real group identifier. Similar to Perl's builtin
variable $)
, see $GID in perlvar.
This is identical to Perl's builtin getgrgid() function for
returning group entries by group identifiers, see
getgrgid.
This is identical to Perl's builtin getgrnam() function for
returning group entries by group names, see getgrnam.
Returns the ids of the user's supplementary groups. Similar to Perl's
builtin variable $)
, see $GID in perlvar.
This is identical to Perl's builtin getlogin() function for
returning the user name associated with the current session, see
getlogin.
This is identical to Perl's builtin getpgrp() function for
returning the process group identifier of the current process, see
getpgrp.
Returns the process identifier. Identical to Perl's builtin
variable $$
, see $PID in perlvar.
This is identical to Perl's builtin getppid() function for
returning the process identifier of the parent process of the current
process , see getppid.
This is identical to Perl's builtin getpwnam() function for
returning user entries by user names, see getpwnam.
This is identical to Perl's builtin getpwuid() function for
returning user entries by user identifiers, see getpwuid.
Returns one line from STDIN
, similar to <>, also known
as the readline() function, see readline.
NOTE: if you have C programs that still use gets()
, be very
afraid. The gets()
function is a source of endless grief because
it has no buffer overrun checks. It should never be used. The
fgets()
function should be preferred instead.
Returns the user's identifier. Identical to Perl's builtin $<
variable,
see $UID in perlvar.
This is identical to Perl's builtin gmtime() function for
converting seconds since the epoch to a date in Greenwich Mean Time,
see gmtime.
This is identical to the C function, except that it can apply to a
single character or to a whole string. Note that locale settings may
affect what characters are considered isalnum
. Does not work on
Unicode characters code point 256 or higher. Consider using regular
expressions and the /[[:alnum:]]/
construct instead, or possibly
the /\w/
construct.
This is identical to the C function, except that it can apply to
a single character or to a whole string. Note that locale settings
may affect what characters are considered isalpha
. Does not work
on Unicode characters code point 256 or higher. Consider using regular
expressions and the /[[:alpha:]]/
construct instead.
Returns a boolean indicating whether the specified filehandle is connected
to a tty. Similar to the -t
operator, see -X.
This is identical to the C function, except that it can apply to
a single character or to a whole string. Note that locale settings
may affect what characters are considered iscntrl
. Does not work
on Unicode characters code point 256 or higher. Consider using regular
expressions and the /[[:cntrl:]]/
construct instead.
This is identical to the C function, except that it can apply to
a single character or to a whole string. Note that locale settings
may affect what characters are considered isdigit
(unlikely, but
still possible). Does not work on Unicode characters code point 256
or higher. Consider using regular expressions and the /[[:digit:]]/
construct instead, or the /\d/
construct.
This is identical to the C function, except that it can apply to
a single character or to a whole string. Note that locale settings
may affect what characters are considered isgraph
. Does not work
on Unicode characters code point 256 or higher. Consider using regular
expressions and the /[[:graph:]]/
construct instead.
This is identical to the C function, except that it can apply to
a single character or to a whole string. Note that locale settings
may affect what characters are considered islower
. Does not work
on Unicode characters code point 256 or higher. Consider using regular
expressions and the /[[:lower:]]/
construct instead. Do not use
/[a-z]/
.
This is identical to the C function, except that it can apply to
a single character or to a whole string. Note that locale settings
may affect what characters are considered isprint
. Does not work
on Unicode characters code point 256 or higher. Consider using regular
expressions and the /[[:print:]]/
construct instead.
This is identical to the C function, except that it can apply to
a single character or to a whole string. Note that locale settings
may affect what characters are considered ispunct
. Does not work
on Unicode characters code point 256 or higher. Consider using regular
expressions and the /[[:punct:]]/
construct instead.
This is identical to the C function, except that it can apply to
a single character or to a whole string. Note that locale settings
may affect what characters are considered isspace
. Does not work
on Unicode characters code point 256 or higher. Consider using regular
expressions and the /[[:space:]]/
construct instead, or the /\s/
construct. (Note that /\s/
and /[[:space:]]/
are slightly
different in that /[[:space:]]/
can normally match a vertical tab,
while /\s/
does not.)
This is identical to the C function, except that it can apply to
a single character or to a whole string. Note that locale settings
may affect what characters are considered isupper
. Does not work
on Unicode characters code point 256 or higher. Consider using regular
expressions and the /[[:upper:]]/
construct instead. Do not use
/[A-Z]/
.
This is identical to the C function, except that it can apply to a single
character or to a whole string. Note that locale settings may affect what
characters are considered isxdigit
(unlikely, but still possible).
Does not work on Unicode characters code point 256 or higher.
Consider using regular expressions and the /[[:xdigit:]]/
construct instead, or simply /[0-9a-f]/i
.
This is identical to Perl's builtin kill() function for sending
signals to processes (often to terminate them), see kill.
(For returning absolute values of long integers.) labs() is C-specific, see abs instead.
This is identical to the C function, except the order of arguments is
consistent with Perl's builtin chown() with the added restriction
of only one path, not an list of paths. Does the same thing as the
chown() function but changes the owner of a symbolic link instead
of the file the symbolic link points to.
This is identical to the C function ldexp()
for multiplying floating point numbers with powers of two.
- $x_quadrupled = POSIX::ldexp($x, 2);
(For computing dividends of long integers.)
ldiv() is C-specific, use / and int() instead.
This is identical to Perl's builtin link() function
for creating hard links into files, see link.
Get numeric formatting information. Returns a reference to a hash containing the current locale formatting values.
Here is how to query the database for the de (Deutsch or German) locale.
- my $loc = POSIX::setlocale( &POSIX::LC_ALL, "de" );
- print "Locale: \"$loc\"\n";
- my $lconv = POSIX::localeconv();
- foreach my $property (qw(
- decimal_point
- thousands_sep
- grouping
- int_curr_symbol
- currency_symbol
- mon_decimal_point
- mon_thousands_sep
- mon_grouping
- positive_sign
- negative_sign
- int_frac_digits
- frac_digits
- p_cs_precedes
- p_sep_by_space
- n_cs_precedes
- n_sep_by_space
- p_sign_posn
- n_sign_posn
- ))
- {
- printf qq(%s: "%s",\n), $property, $lconv->{$property};
- }
This is identical to Perl's builtin localtime() function for
converting seconds since the epoch to a date see localtime.
This is identical to Perl's builtin log() function,
returning the natural (e-based) logarithm of the numerical argument,
see log.
This is identical to the C function log10()
,
returning the 10-base logarithm of the numerical argument.
You can also use
or
- sub log10 { log($_[0]) / 2.30258509299405 }
or
- sub log10 { log($_[0]) * 0.434294481903252 }
longjmp() is C-specific: use die instead.
Move the file's read/write position. This uses file descriptors such as
those obtained by calling POSIX::open
.
- $fd = POSIX::open( "foo", &POSIX::O_RDONLY );
- $off_t = POSIX::lseek( $fd, 0, &POSIX::SEEK_SET );
Returns undef on failure.
malloc() is C-specific. Perl does memory management transparently.
This is identical to the C function mblen()
.
Perl does not have any support for the wide and multibyte
characters of the C standards, so this might be a rather
useless function.
This is identical to the C function mbstowcs()
.
Perl does not have any support for the wide and multibyte
characters of the C standards, so this might be a rather
useless function.
This is identical to the C function mbtowc()
.
Perl does not have any support for the wide and multibyte
characters of the C standards, so this might be a rather
useless function.
memchr() is C-specific, see index instead.
memcmp() is C-specific, use eq
instead, see perlop.
memset() is C-specific, use x
instead, see perlop.
This is identical to Perl's builtin mkdir() function
for creating directories, see mkdir.
This is similar to the C function mkfifo()
for creating
FIFO special files.
- if (mkfifo($path, $mode)) { ....
Returns undef on failure. The $mode
is similar to the
mode of mkdir(), see mkdir, though for mkfifo
you must specify the $mode
.
Convert date/time info to a calendar time.
Synopsis:
- mktime(sec, min, hour, mday, mon, year, wday = 0, yday = 0, isdst = -1)
The month (mon
), weekday (wday
), and yearday (yday
) begin at zero.
I.e. January is 0, not 1; Sunday is 0, not 1; January 1st is 0, not 1. The
year (year
) is given in years since 1900. I.e. The year 1995 is 95; the
year 2001 is 101. Consult your system's mktime()
manpage for details
about these and the other arguments.
Calendar time for December 12, 1995, at 10:30 am.
- $time_t = POSIX::mktime( 0, 30, 10, 12, 11, 95 );
- print "Date = ", POSIX::ctime($time_t);
Returns undef on failure.
Return the integral and fractional parts of a floating-point number.
- ($fractional, $integral) = POSIX::modf( 3.14 );
This is similar to the C function nice()
, for changing
the scheduling preference of the current process. Positive
arguments mean more polite process, negative values more
needy process. Normal user processes can only be more polite.
Returns undef on failure.
offsetof() is C-specific, you probably want to see pack instead.
Open a file for reading for writing. This returns file descriptors, not
Perl filehandles. Use POSIX::close
to close the file.
Open a file read-only with mode 0666.
- $fd = POSIX::open( "foo" );
Open a file for read and write.
- $fd = POSIX::open( "foo", &POSIX::O_RDWR );
Open a file for write, with truncation.
- $fd = POSIX::open( "foo", &POSIX::O_WRONLY | &POSIX::O_TRUNC );
Create a new file with mode 0640. Set up the file for writing.
- $fd = POSIX::open( "foo", &POSIX::O_CREAT | &POSIX::O_WRONLY, 0640 );
Returns undef on failure.
See also sysopen.
Open a directory for reading.
- $dir = POSIX::opendir( "/var" );
- @files = POSIX::readdir( $dir );
- POSIX::closedir( $dir );
Returns undef on failure.
Retrieves the value of a configurable limit on a file or directory.
The following will determine the maximum length of the longest allowable
pathname on the filesystem which holds /var.
- $path_max = POSIX::pathconf( "/var", &POSIX::_PC_PATH_MAX );
Returns undef on failure.
This is similar to the C function pause()
, which suspends
the execution of the current process until a signal is received.
Returns undef on failure.
This is identical to the C function perror()
, which outputs to the
standard error stream the specified message followed by ": " and the
current error string. Use the warn() function and the $!
variable instead, see warn and $ERRNO in perlvar.
Create an interprocess channel. This returns file descriptors like those
returned by POSIX::open
.
- my ($read, $write) = POSIX::pipe();
- POSIX::write( $write, "hello", 5 );
- POSIX::read( $read, $buf, 5 );
See also pipe.
Computes $x
raised to the power $exponent
.
- $ret = POSIX::pow( $x, $exponent );
You can also use the **
operator, see perlop.
Formats and prints the specified arguments to STDOUT. See also printf.
putc() is C-specific, see print instead.
putchar() is C-specific, see print instead.
puts() is C-specific, see print instead.
qsort() is C-specific, see sort instead.
Sends the specified signal to the current process.
See also kill and the $$
in $PID in perlvar.
Read from a file. This uses file descriptors such as those obtained by
calling POSIX::open
. If the buffer $buf
is not large enough for the
read then Perl will extend it to make room for the request.
- $fd = POSIX::open( "foo", &POSIX::O_RDONLY );
- $bytes = POSIX::read( $fd, $buf, 3 );
Returns undef on failure.
See also sysread.
This is identical to Perl's builtin readdir() function
for reading directory entries, see readdir.
realloc() is C-specific. Perl does memory management transparently.
This is identical to Perl's builtin unlink() function
for removing files, see unlink.
This is identical to Perl's builtin rename() function
for renaming files, see rename.
Seeks to the beginning of the file.
This is identical to Perl's builtin rewinddir() function for
rewinding directory entry streams, see rewinddir.
This is identical to Perl's builtin rmdir() function
for removing (empty) directories, see rmdir.
scanf() is C-specific, use <> and regular expressions instead, see perlre.
Sets the real group identifier and the effective group identifier for
this process. Similar to assigning a value to the Perl's builtin
$)
variable, see $EGID in perlvar, except that the latter
will change only the real user identifier, and that the setgid()
uses only a single numeric argument, as opposed to a space-separated
list of numbers.
Modifies and queries program's locale. The following examples assume
- use POSIX qw(setlocale LC_ALL LC_CTYPE);
has been issued.
The following will set the traditional UNIX system locale behavior
(the second argument "C"
).
- $loc = setlocale( LC_ALL, "C" );
The following will query the current LC_CTYPE category. (No second argument means 'query'.)
- $loc = setlocale( LC_CTYPE );
The following will set the LC_CTYPE behaviour according to the locale
environment variables (the second argument ""
).
Please see your systems setlocale(3)
documentation for the locale
environment variables' meaning or consult perllocale.
- $loc = setlocale( LC_CTYPE, "" );
The following will set the LC_COLLATE behaviour to Argentinian Spanish. NOTE: The naming and availability of locales depends on your operating system. Please consult perllocale for how to find out which locales are available in your system.
- $loc = setlocale( LC_COLLATE, "es_AR.ISO8859-1" );
This is similar to the C function setpgid()
for
setting the process group identifier of the current process.
Returns undef on failure.
This is identical to the C function setsid()
for
setting the session identifier of the current process.
Sets the real user identifier and the effective user identifier for
this process. Similar to assigning a value to the Perl's builtin
$<
variable, see $UID in perlvar, except that the latter
will change only the real user identifier.
Detailed signal management. This uses POSIX::SigAction
objects for
the action
and oldaction
arguments (the oldaction can also be
just a hash reference). Consult your system's sigaction
manpage
for details, see also POSIX::SigRt
.
Synopsis:
- sigaction(signal, action, oldaction = 0)
Returns undef on failure. The signal
must be a number (like
SIGHUP), not a string (like "SIGHUP"), though Perl does try hard
to understand you.
If you use the SA_SIGINFO flag, the signal handler will in addition to the first argument, the signal name, also receive a second argument, a hash reference, inside which are the following keys with the following semantics, as defined by POSIX/SUSv3:
- signo the signal number
- errno the error number
- code if this is zero or less, the signal was sent by
- a user process and the uid and pid make sense,
- otherwise the signal was sent by the kernel
The following are also defined by POSIX/SUSv3, but unfortunately not very widely implemented:
- pid the process id generating the signal
- uid the uid of the process id generating the signal
- status exit value or signal for SIGCHLD
- band band event for SIGPOLL
A third argument is also passed to the handler, which contains a copy of the raw binary contents of the siginfo structure: if a system has some non-POSIX fields, this third argument is where to unpack() them from.
Note that not all siginfo values make sense simultaneously (some are
valid only for certain signals, for example), and not all values make
sense from Perl perspective, you should to consult your system's
sigaction
and possibly also siginfo
documentation.
siglongjmp() is C-specific: use die instead.
Examine signals that are blocked and pending. This uses POSIX::SigSet
objects for the sigset
argument. Consult your system's sigpending
manpage for details.
Synopsis:
- sigpending(sigset)
Returns undef on failure.
Change and/or examine calling process's signal mask. This uses
POSIX::SigSet
objects for the sigset
and oldsigset
arguments.
Consult your system's sigprocmask
manpage for details.
Synopsis:
- sigprocmask(how, sigset, oldsigset = 0)
Returns undef on failure.
Note that you can't reliably block or unblock a signal from its own signal handler if you're using safe signals. Other signals can be blocked or unblocked reliably.
Install a signal mask and suspend process until signal arrives. This uses
POSIX::SigSet
objects for the signal_mask
argument. Consult your
system's sigsuspend
manpage for details.
Synopsis:
- sigsuspend(signal_mask)
Returns undef on failure.
This is identical to Perl's builtin sin() function
for returning the sine of the numerical argument,
see sin. See also Math::Trig.
This is identical to the C function sinh()
for returning the hyperbolic sine of the numerical argument.
See also Math::Trig.
This is functionally identical to Perl's builtin sleep() function
for suspending the execution of the current for process for certain
number of seconds, see sleep. There is one significant
difference, however: POSIX::sleep()
returns the number of
unslept seconds, while the CORE::sleep()
returns the
number of slept seconds.
This is similar to Perl's builtin sprintf() function
for returning a string that has the arguments formatted as requested,
see sprintf.
This is identical to Perl's builtin sqrt() function.
for returning the square root of the numerical argument,
see sqrt.
Give a seed the pseudorandom number generator, see srand.
sscanf() is C-specific, use regular expressions instead, see perlre.
This is identical to Perl's builtin stat() function
for returning information about files and directories.
strcat() is C-specific, use .=
instead, see perlop.
strchr() is C-specific, see index instead.
strcmp() is C-specific, use eq
or cmp
instead, see perlop.
This is identical to the C function strcoll()
for collating (comparing) strings transformed using
the strxfrm()
function. Not really needed since
Perl can do this transparently, see perllocale.
strcpy() is C-specific, use =
instead, see perlop.
strcspn() is C-specific, use regular expressions instead, see perlre.
Returns the error string for the specified errno.
Identical to the string form of the $!
, see $ERRNO in perlvar.
Convert date and time information to string. Returns the string.
Synopsis:
- strftime(fmt, sec, min, hour, mday, mon, year, wday = -1, yday = -1, isdst = -1)
The month (mon
), weekday (wday
), and yearday (yday
) begin at zero.
I.e. January is 0, not 1; Sunday is 0, not 1; January 1st is 0, not 1. The
year (year
) is given in years since 1900. I.e., the year 1995 is 95; the
year 2001 is 101. Consult your system's strftime()
manpage for details
about these and the other arguments.
If you want your code to be portable, your format (fmt
) argument
should use only the conversion specifiers defined by the ANSI C
standard (C89, to play safe). These are aAbBcdHIjmMpSUwWxXyYZ%
.
But even then, the results of some of the conversion specifiers are
non-portable. For example, the specifiers aAbBcpZ
change according
to the locale settings of the user, and both how to set locales (the
locale names) and what output to expect are non-standard.
The specifier c
changes according to the timezone settings of the
user and the timezone computation rules of the operating system.
The Z
specifier is notoriously unportable since the names of
timezones are non-standard. Sticking to the numeric specifiers is the
safest route.
The given arguments are made consistent as though by calling
mktime()
before calling your system's strftime()
function,
except that the isdst
value is not affected.
The string for Tuesday, December 12, 1995.
- $str = POSIX::strftime( "%A, %B %d, %Y", 0, 0, 0, 12, 11, 95, 2 );
- print "$str\n";
strncat() is C-specific, use .=
instead, see perlop.
strncmp() is C-specific, use eq
instead, see perlop.
strncpy() is C-specific, use =
instead, see perlop.
strpbrk() is C-specific, use regular expressions instead, see perlre.
strrchr() is C-specific, see rindex instead.
strspn() is C-specific, use regular expressions instead, see perlre.
This is identical to Perl's builtin index() function,
see index.
String to double translation. Returns the parsed number and the number of characters in the unparsed portion of the string. Truly POSIX-compliant systems set $! ($ERRNO) to indicate a translation error, so clear $! before calling strtod. However, non-POSIX systems may not check for overflow, and therefore will never set $!.
strtod should respect any POSIX setlocale() settings.
To parse a string $str as a floating point number use
- $! = 0;
- ($num, $n_unparsed) = POSIX::strtod($str);
The second returned item and $! can be used to check for valid input:
- if (($str eq '') || ($n_unparsed != 0) || $!) {
- die "Non-numeric input $str" . ($! ? ": $!\n" : "\n");
- }
When called in a scalar context strtod returns the parsed number.
strtok() is C-specific, use regular expressions instead, see perlre, or split.
String to (long) integer translation. Returns the parsed number and the number of characters in the unparsed portion of the string. Truly POSIX-compliant systems set $! ($ERRNO) to indicate a translation error, so clear $! before calling strtol. However, non-POSIX systems may not check for overflow, and therefore will never set $!.
strtol should respect any POSIX setlocale() settings.
To parse a string $str as a number in some base $base use
- $! = 0;
- ($num, $n_unparsed) = POSIX::strtol($str, $base);
The base should be zero or between 2 and 36, inclusive. When the base is zero or omitted strtol will use the string itself to determine the base: a leading "0x" or "0X" means hexadecimal; a leading "0" means octal; any other leading characters mean decimal. Thus, "1234" is parsed as a decimal number, "01234" as an octal number, and "0x1234" as a hexadecimal number.
The second returned item and $! can be used to check for valid input:
- if (($str eq '') || ($n_unparsed != 0) || !$!) {
- die "Non-numeric input $str" . $! ? ": $!\n" : "\n";
- }
When called in a scalar context strtol returns the parsed number.
String to unsigned (long) integer translation. strtoul() is identical to strtol() except that strtoul() only parses unsigned integers. See strtol for details.
Note: Some vendors supply strtod() and strtol() but not strtoul(). Other vendors that do supply strtoul() parse "-1" as a valid value.
String transformation. Returns the transformed string.
- $dst = POSIX::strxfrm( $src );
Used in conjunction with the strcoll()
function, see strcoll.
Not really needed since Perl can do this transparently, see perllocale.
Retrieves values of system configurable variables.
The following will get the machine's clock speed.
- $clock_ticks = POSIX::sysconf( &POSIX::_SC_CLK_TCK );
Returns undef on failure.
This is identical to Perl's builtin system() function, see
system.
This is identical to the C function tan()
, returning the
tangent of the numerical argument. See also Math::Trig.
This is identical to the C function tanh()
, returning the
hyperbolic tangent of the numerical argument. See also Math::Trig.
This is similar to the C function tcdrain()
for draining
the output queue of its argument stream.
Returns undef on failure.
This is similar to the C function tcflow()
for controlling
the flow of its argument stream.
Returns undef on failure.
This is similar to the C function tcflush()
for flushing
the I/O buffers of its argument stream.
Returns undef on failure.
This is identical to the C function tcgetpgrp()
for returning the
process group identifier of the foreground process group of the controlling
terminal.
This is similar to the C function tcsendbreak()
for sending
a break on its argument stream.
Returns undef on failure.
This is similar to the C function tcsetpgrp()
for setting the
process group identifier of the foreground process group of the controlling
terminal.
Returns undef on failure.
This is identical to Perl's builtin time() function
for returning the number of seconds since the epoch
(whatever it is for the system), see time.
The times() function returns elapsed realtime since some point in the past (such as system startup), user and system times for this process, and user and system times used by child processes. All times are returned in clock ticks.
- ($realtime, $user, $system, $cuser, $csystem) = POSIX::times();
Note: Perl's builtin times() function returns four values, measured in
seconds.
Use method IO::File::new_tmpfile()
instead, or see File::Temp.
Returns a name for a temporary file.
- $tmpfile = POSIX::tmpnam();
For security reasons, which are probably detailed in your system's documentation for the C library tmpnam() function, this interface should not be used; instead see File::Temp.
This is identical to the C function, except that it can apply to a single
character or to a whole string. Consider using the lc() function,
see lc, or the equivalent \L
operator inside doublequotish
strings.
This is identical to the C function, except that it can apply to a single
character or to a whole string. Consider using the uc() function,
see uc, or the equivalent \U
operator inside doublequotish
strings.
This is identical to the C function ttyname()
for returning the
name of the current terminal.
Retrieves the time conversion information from the tzname
variable.
- POSIX::tzset();
- ($std, $dst) = POSIX::tzname();
This is identical to the C function tzset()
for setting
the current timezone based on the environment variable TZ
,
to be used by ctime()
, localtime(), mktime()
, and strftime()
functions.
This is identical to Perl's builtin umask() function
for setting (and querying) the file creation permission mask,
see umask.
Get name of current operating system.
- ($sysname, $nodename, $release, $version, $machine) = POSIX::uname();
Note that the actual meanings of the various fields are not
that well standardized, do not expect any great portability.
The $sysname
might be the name of the operating system,
the $nodename
might be the name of the host, the $release
might be the (major) release number of the operating system,
the $version
might be the (minor) release number of the
operating system, and the $machine
might be a hardware identifier.
Maybe.
Use method IO::Handle::ungetc()
instead.
This is identical to Perl's builtin unlink() function
for removing files, see unlink.
This is identical to Perl's builtin utime() function
for changing the time stamps of files and directories,
see utime.
vfprintf() is C-specific, see printf instead.
vprintf() is C-specific, see printf instead.
vsprintf() is C-specific, see sprintf instead.
This is identical to Perl's builtin wait() function,
see wait.
Wait for a child process to change state. This is identical to Perl's
builtin waitpid() function, see waitpid.
- $pid = POSIX::waitpid( -1, POSIX::WNOHANG );
- print "status = ", ($? / 256), "\n";
This is identical to the C function wcstombs()
.
Perl does not have any support for the wide and multibyte
characters of the C standards, so this might be a rather
useless function.
This is identical to the C function wctomb()
.
Perl does not have any support for the wide and multibyte
characters of the C standards, so this might be a rather
useless function.
Write to a file. This uses file descriptors such as those obtained by
calling POSIX::open
.
- $fd = POSIX::open( "foo", &POSIX::O_WRONLY );
- $buf = "hello";
- $bytes = POSIX::write( $fd, $buf, 5 );
Returns undef on failure.
See also syswrite.
Creates a new POSIX::SigAction
object which corresponds to the C
struct sigaction
. This object will be destroyed automatically when
it is no longer needed. The first parameter is the handler, a sub
reference. The second parameter is a POSIX::SigSet
object, it
defaults to the empty set. The third parameter contains the
sa_flags
, it defaults to 0.
- $sigset = POSIX::SigSet->new(SIGINT, SIGQUIT);
- $sigaction = POSIX::SigAction->new( \&handler, $sigset, &POSIX::SA_NOCLDSTOP );
This POSIX::SigAction
object is intended for use with the POSIX::sigaction()
function.
accessor functions to get/set the values of a SigAction object.
- $sigset = $sigaction->mask;
- $sigaction->flags(&POSIX::SA_RESTART);
accessor function for the "safe signals" flag of a SigAction object; see
perlipc for general information on safe (a.k.a. "deferred") signals. If
you wish to handle a signal safely, use this accessor to set the "safe" flag
in the POSIX::SigAction
object:
- $sigaction->safe(1);
You may also examine the "safe" flag on the output action object which is
filled in when given as the third parameter to POSIX::sigaction()
:
- sigaction(SIGINT, $new_action, $old_action);
- if ($old_action->safe) {
- # previous SIGINT handler used safe signals
- }
A hash of the POSIX realtime signal handlers. It is an extension of the standard %SIG, the $POSIX::SIGRT{SIGRTMIN} is roughly equivalent to $SIG{SIGRTMIN}, but the right POSIX moves (see below) are made with the POSIX::SigSet and POSIX::sigaction instead of accessing the %SIG.
You can set the %POSIX::SIGRT elements to set the POSIX realtime
signal handlers, use delete and exists on the elements, and use
scalar on the %POSIX::SIGRT
to find out how many POSIX realtime
signals there are available (SIGRTMAX - SIGRTMIN + 1, the SIGRTMAX is
a valid POSIX realtime signal).
Setting the %SIGRT elements is equivalent to calling this:
The flags default to zero, if you want something different you can
either use local on $POSIX::SigRt::SIGACTION_FLAGS, or you can
derive from POSIX::SigRt and define your own new()
(the tied hash
STORE method of the %SIGRT calls new($rtsig, $handler, $SIGACTION_FLAGS)
,
where the $rtsig ranges from zero to SIGRTMAX - SIGRTMIN + 1).
Just as with any signal, you can use sigaction($rtsig, undef, $oa) to retrieve the installed signal handler (or, rather, the signal action).
NOTE: whether POSIX realtime signals really work in your system, or whether Perl has been compiled so that it works with them, is outside of this discussion.
Return the minimum POSIX realtime signal number available, or undef
if no POSIX realtime signals are available.
Return the maximum POSIX realtime signal number available, or undef
if no POSIX realtime signals are available.
Create a new SigSet object. This object will be destroyed automatically when it is no longer needed. Arguments may be supplied to initialize the set.
Create an empty set.
- $sigset = POSIX::SigSet->new;
Create a set with SIGUSR1.
- $sigset = POSIX::SigSet->new( &POSIX::SIGUSR1 );
Add a signal to a SigSet object.
- $sigset->addset( &POSIX::SIGUSR2 );
Returns undef on failure.
Remove a signal from the SigSet object.
- $sigset->delset( &POSIX::SIGUSR2 );
Returns undef on failure.
Initialize the SigSet object to be empty.
- $sigset->emptyset();
Returns undef on failure.
Initialize the SigSet object to include all signals.
- $sigset->fillset();
Returns undef on failure.
Tests the SigSet object to see if it contains a specific signal.
- if( $sigset->ismember( &POSIX::SIGUSR1 ) ){
- print "contains SIGUSR1\n";
- }
Create a new Termios object. This object will be destroyed automatically when it is no longer needed. A Termios object corresponds to the termios C struct. new() mallocs a new one, getattr() fills it from a file descriptor, and setattr() sets a file descriptor's parameters to match Termios' contents.
- $termios = POSIX::Termios->new;
Get terminal control attributes.
Obtain the attributes for stdin.
- $termios->getattr( 0 ) # Recommended for clarity.
- $termios->getattr()
Obtain the attributes for stdout.
- $termios->getattr( 1 )
Returns undef on failure.
Retrieve a value from the c_cc field of a termios object. The c_cc field is an array so an index must be specified.
- $c_cc[1] = $termios->getcc(1);
Retrieve the c_cflag field of a termios object.
- $c_cflag = $termios->getcflag;
Retrieve the c_iflag field of a termios object.
- $c_iflag = $termios->getiflag;
Retrieve the input baud rate.
- $ispeed = $termios->getispeed;
Retrieve the c_lflag field of a termios object.
- $c_lflag = $termios->getlflag;
Retrieve the c_oflag field of a termios object.
- $c_oflag = $termios->getoflag;
Retrieve the output baud rate.
- $ospeed = $termios->getospeed;
Set terminal control attributes.
Set attributes immediately for stdout.
- $termios->setattr( 1, &POSIX::TCSANOW );
Returns undef on failure.
Set a value in the c_cc field of a termios object. The c_cc field is an array so an index must be specified.
- $termios->setcc( &POSIX::VEOF, 1 );
Set the c_cflag field of a termios object.
- $termios->setcflag( $c_cflag | &POSIX::CLOCAL );
Set the c_iflag field of a termios object.
- $termios->setiflag( $c_iflag | &POSIX::BRKINT );
Set the input baud rate.
- $termios->setispeed( &POSIX::B9600 );
Returns undef on failure.
Set the c_lflag field of a termios object.
- $termios->setlflag( $c_lflag | &POSIX::ECHO );
Set the c_oflag field of a termios object.
- $termios->setoflag( $c_oflag | &POSIX::OPOST );
Set the output baud rate.
- $termios->setospeed( &POSIX::B9600 );
Returns undef on failure.
B38400 B75 B200 B134 B300 B1800 B150 B0 B19200 B1200 B9600 B600 B4800 B50 B2400 B110
TCSADRAIN TCSANOW TCOON TCIOFLUSH TCOFLUSH TCION TCIFLUSH TCSAFLUSH TCIOFF TCOOFF
VEOF VEOL VERASE VINTR VKILL VQUIT VSUSP VSTART VSTOP VMIN VTIME NCCS
CLOCAL CREAD CSIZE CS5 CS6 CS7 CS8 CSTOPB HUPCL PARENB PARODD
BRKINT ICRNL IGNBRK IGNCR IGNPAR INLCR INPCK ISTRIP IXOFF IXON PARMRK
ECHO ECHOE ECHOK ECHONL ICANON IEXTEN ISIG NOFLSH TOSTOP
OPOST
_PC_CHOWN_RESTRICTED _PC_LINK_MAX _PC_MAX_CANON _PC_MAX_INPUT _PC_NAME_MAX _PC_NO_TRUNC _PC_PATH_MAX _PC_PIPE_BUF _PC_VDISABLE
_POSIX_ARG_MAX _POSIX_CHILD_MAX _POSIX_CHOWN_RESTRICTED _POSIX_JOB_CONTROL _POSIX_LINK_MAX _POSIX_MAX_CANON _POSIX_MAX_INPUT _POSIX_NAME_MAX _POSIX_NGROUPS_MAX _POSIX_NO_TRUNC _POSIX_OPEN_MAX _POSIX_PATH_MAX _POSIX_PIPE_BUF _POSIX_SAVED_IDS _POSIX_SSIZE_MAX _POSIX_STREAM_MAX _POSIX_TZNAME_MAX _POSIX_VDISABLE _POSIX_VERSION
_SC_ARG_MAX _SC_CHILD_MAX _SC_CLK_TCK _SC_JOB_CONTROL _SC_NGROUPS_MAX _SC_OPEN_MAX _SC_PAGESIZE _SC_SAVED_IDS _SC_STREAM_MAX _SC_TZNAME_MAX _SC_VERSION
E2BIG EACCES EADDRINUSE EADDRNOTAVAIL EAFNOSUPPORT EAGAIN EALREADY EBADF EBUSY ECHILD ECONNABORTED ECONNREFUSED ECONNRESET EDEADLK EDESTADDRREQ EDOM EDQUOT EEXIST EFAULT EFBIG EHOSTDOWN EHOSTUNREACH EINPROGRESS EINTR EINVAL EIO EISCONN EISDIR ELOOP EMFILE EMLINK EMSGSIZE ENAMETOOLONG ENETDOWN ENETRESET ENETUNREACH ENFILE ENOBUFS ENODEV ENOENT ENOEXEC ENOLCK ENOMEM ENOPROTOOPT ENOSPC ENOSYS ENOTBLK ENOTCONN ENOTDIR ENOTEMPTY ENOTSOCK ENOTTY ENXIO EOPNOTSUPP EPERM EPFNOSUPPORT EPIPE EPROCLIM EPROTONOSUPPORT EPROTOTYPE ERANGE EREMOTE ERESTART EROFS ESHUTDOWN ESOCKTNOSUPPORT ESPIPE ESRCH ESTALE ETIMEDOUT ETOOMANYREFS ETXTBSY EUSERS EWOULDBLOCK EXDEV
FD_CLOEXEC F_DUPFD F_GETFD F_GETFL F_GETLK F_OK F_RDLCK F_SETFD F_SETFL F_SETLK F_SETLKW F_UNLCK F_WRLCK O_ACCMODE O_APPEND O_CREAT O_EXCL O_NOCTTY O_NONBLOCK O_RDONLY O_RDWR O_TRUNC O_WRONLY
DBL_DIG DBL_EPSILON DBL_MANT_DIG DBL_MAX DBL_MAX_10_EXP DBL_MAX_EXP DBL_MIN DBL_MIN_10_EXP DBL_MIN_EXP FLT_DIG FLT_EPSILON FLT_MANT_DIG FLT_MAX FLT_MAX_10_EXP FLT_MAX_EXP FLT_MIN FLT_MIN_10_EXP FLT_MIN_EXP FLT_RADIX FLT_ROUNDS LDBL_DIG LDBL_EPSILON LDBL_MANT_DIG LDBL_MAX LDBL_MAX_10_EXP LDBL_MAX_EXP LDBL_MIN LDBL_MIN_10_EXP LDBL_MIN_EXP
ARG_MAX CHAR_BIT CHAR_MAX CHAR_MIN CHILD_MAX INT_MAX INT_MIN LINK_MAX LONG_MAX LONG_MIN MAX_CANON MAX_INPUT MB_LEN_MAX NAME_MAX NGROUPS_MAX OPEN_MAX PATH_MAX PIPE_BUF SCHAR_MAX SCHAR_MIN SHRT_MAX SHRT_MIN SSIZE_MAX STREAM_MAX TZNAME_MAX UCHAR_MAX UINT_MAX ULONG_MAX USHRT_MAX
SA_NOCLDSTOP SA_NOCLDWAIT SA_NODEFER SA_ONSTACK SA_RESETHAND SA_RESTART SA_SIGINFO SIGABRT SIGALRM SIGCHLD SIGCONT SIGFPE SIGHUP SIGILL SIGINT SIGKILL SIGPIPE SIGQUIT SIGSEGV SIGSTOP SIGTERM SIGTSTP SIGTTIN SIGTTOU SIGUSR1 SIGUSR2 SIG_BLOCK SIG_DFL SIG_ERR SIG_IGN SIG_SETMASK SIG_UNBLOCK
S_IRGRP S_IROTH S_IRUSR S_IRWXG S_IRWXO S_IRWXU S_ISGID S_ISUID S_IWGRP S_IWOTH S_IWUSR S_IXGRP S_IXOTH S_IXUSR
S_ISBLK S_ISCHR S_ISDIR S_ISFIFO S_ISREG
WNOHANG WUNTRACED
WIFEXITED WEXITSTATUS WIFSIGNALED WTERMSIG WIFSTOPPED WSTOPSIG
WIFEXITED(${^CHILD_ERROR_NATIVE}) returns true if the child process
exited normally (exit() or by falling off the end of main()
)
WEXITSTATUS(${^CHILD_ERROR_NATIVE}) returns the normal exit status of the child process (only meaningful if WIFEXITED(${^CHILD_ERROR_NATIVE}) is true)
WIFSIGNALED(${^CHILD_ERROR_NATIVE}) returns true if the child process terminated because of a signal
WTERMSIG(${^CHILD_ERROR_NATIVE}) returns the signal the child process terminated for (only meaningful if WIFSIGNALED(${^CHILD_ERROR_NATIVE}) is true)
WIFSTOPPED(${^CHILD_ERROR_NATIVE}) returns true if the child process is currently stopped (can happen only if you specified the WUNTRACED flag to waitpid())
WSTOPSIG(${^CHILD_ERROR_NATIVE}) returns the signal the child process was stopped for (only meaningful if WIFSTOPPED(${^CHILD_ERROR_NATIVE}) is true)
PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space
- open($fh,"<:crlf", "my.txt"); # support platform-native and CRLF text files
- open($fh,"<","his.jpg"); # portably open a binary file for reading
- binmode($fh);
- Shell:
- PERLIO=perlio perl ....
When an undefined layer 'foo' is encountered in an open or
binmode layer specification then C code performs the equivalent of:
- use PerlIO 'foo';
The perl code in PerlIO.pm then attempts to locate a layer by doing
- require PerlIO::foo;
Otherwise the PerlIO
package is a place holder for additional
PerlIO related functions.
The following layers are currently defined:
Lowest level layer which provides basic PerlIO operations in terms of UNIX/POSIX numeric file descriptor calls (open(), read(), write(), lseek(), close()).
Layer which calls fread
, fwrite
and fseek
/ftell
etc. Note
that as this is "real" stdio it will ignore any layers beneath it and
go straight to the operating system via the C library as usual.
A from scratch implementation of buffering for PerlIO. Provides fast
access to the buffer for sv_gets
which implements perl's readline/<>
and in general attempts to minimize data copying.
:perlio
will insert a :unix
layer below itself to do low level IO.
A layer that implements DOS/Windows like CRLF line endings. On read converts pairs of CR,LF to a single "\n" newline character. On write converts each "\n" to a CR,LF pair. Note that this layer will silently refuse to be pushed on top of itself.
It currently does not mimic MS-DOS as far as treating of Control-Z as being an end-of-file marker.
Based on the :perlio
layer.
Declares that the stream accepts perl's internal encoding of characters. (Which really is UTF-8 on ASCII machines, but is UTF-EBCDIC on EBCDIC machines.) This allows any character perl can represent to be read from or written to the stream. The UTF-X encoding is chosen to render simple text parts (i.e. non-accented letters, digits and common punctuation) human readable in the encoded file.
Here is how to write your native data out using UTF-8 (or UTF-EBCDIC) and then read it back in.
Note that this layer does not validate byte sequences. For reading
input, using :encoding(utf8)
instead of bare :utf8
is strongly
recommended.
This is the inverse of the :utf8
layer. It turns off the flag
on the layer below so that data read from it is considered to
be "octets" i.e. characters in the range 0..255 only. Likewise
on output perl will warn if a "wide" character is written
to a such a stream.
The :raw
layer is defined as being identical to calling
binmode($fh) - the stream is made suitable for passing binary data,
i.e. each byte is passed as-is. The stream will still be
buffered.
In Perl 5.6 and some books the :raw
layer (previously sometimes also
referred to as a "discipline") is documented as the inverse of the
:crlf
layer. That is no longer the case - other layers which would
alter the binary nature of the stream are also disabled. If you want UNIX
line endings on a platform that normally does CRLF translation, but still
want UTF-8 or encoding defaults, the appropriate thing to do is to add
:perlio
to the PERLIO environment variable.
The implementation of :raw
is as a pseudo-layer which when "pushed"
pops itself and then any layers which do not declare themselves as suitable
for binary data. (Undoing :utf8 and :crlf are implemented by clearing
flags rather than popping layers but that is an implementation detail.)
As a consequence of the fact that :raw
normally pops layers,
it usually only makes sense to have it as the only or first element in
a layer specification. When used as the first element it provides
a known base on which to build e.g.
- open($fh,":raw:utf8",...)
will construct a "binary" stream, but then enable UTF-8 translation.
A pseudo layer that removes the top-most layer. Gives perl code
a way to manipulate the layer stack. Should be considered
as experimental. Note that :pop
only works on real layers
and will not undo the effects of pseudo layers like :utf8
.
An example of a possible use might be:
A more elegant (and safer) interface is needed.
On Win32 platforms this experimental layer uses the native "handle" IO rather than the unix-like numeric file descriptor layer. Known to be buggy as of perl 5.8.2.
It is possible to write custom layers in addition to the above builtin ones, both in C/XS and Perl. Two such layers (and one example written in Perl using the latter) come with the Perl distribution.
Use :encoding(ENCODING)
either in open() or binmode() to install
a layer that transparently does character set and encoding transformations,
for example from Shift-JIS to Unicode. Note that under stdio
an :encoding
also enables :utf8
. See PerlIO::encoding
for more information.
A layer which implements "reading" of files by using mmap()
to
make a (whole) file appear in the process's address space, and then
using that as PerlIO's "buffer". This may be faster in certain
circumstances for large files, and may result in less physical memory
use when multiple processes are reading the same file.
Files which are not mmap()
-able revert to behaving like the :perlio
layer. Writes also behave like the :perlio
layer, as mmap()
for write
needs extra house-keeping (to extend the file) which negates any advantage.
The :mmap
layer will not exist if the platform does not support mmap()
.
Use :via(MODULE)
either in open() or binmode() to install a layer
that does whatever transformation (for example compression /
decompression, encryption / decryption) to the filehandle.
See PerlIO::via for more information.
To get a binary stream an alternate method is to use:
this has the advantage of being backward compatible with how such things have had to be coded on some platforms for years.
To get an unbuffered stream specify an unbuffered layer (e.g. :unix
)
in the open call:
- open($fh,"<:unix",$path)
If the platform is MS-DOS like and normally does CRLF to "\n" translation for text files then the default layers are :
- unix crlf
(The low level "unix" layer may be replaced by a platform specific low level layer.)
Otherwise if Configure
found out how to do "fast" IO using the system's
stdio, then the default layers are:
- unix stdio
Otherwise the default layers are
- unix perlio
These defaults may change once perlio has been better tested and tuned.
The default can be overridden by setting the environment variable
PERLIO to a space separated list of layers (unix
or platform low
level layer is always pushed first).
This can be used to see the effect of/bugs in the various layers e.g.
- cd .../perl/t
- PERLIO=stdio ./perl harness
- PERLIO=perlio ./perl harness
For the various values of PERLIO see PERLIO in perlrun.
The following returns the names of the PerlIO layers on a filehandle.
- my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH".
The layers are returned in the order an open() or binmode() call would use them. Note that the "default stack" depends on the operating system and on the Perl version, and both the compile-time and runtime configurations of Perl.
The following table summarizes the default layers on UNIX-like and
DOS-like platforms and depending on the setting of $ENV{PERLIO}
:
- PERLIO UNIX-like DOS-like
- ------ --------- --------
- unset / "" unix perlio / stdio [1] unix crlf
- stdio unix perlio / stdio [1] stdio
- perlio unix perlio unix perlio
- # [1] "stdio" if Configure found out how to do "fast stdio" (depends
- # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio"
By default the layers from the input side of the filehandle are
returned; to get the output side, use the optional output
argument:
- my @layers = PerlIO::get_layers($fh, output => 1);
(Usually the layers are identical on either side of a filehandle but
for example with sockets there may be differences, or if you have
been using the open pragma.)
There is no set_layers(), nor does get_layers() return a tied array
mirroring the stack, or anything fancy like that. This is not
accidental or unintentional. The PerlIO layer stack is a bit more
complicated than just a stack (see for example the behaviour of :raw
).
You are supposed to use open() and binmode() to manipulate the stack.
Implementation details follow, please close your eyes.
The arguments to layers are by default returned in parentheses after
the name of the layer, and certain layers (like utf8
) are not real
layers but instead flags on real layers; to get all of these returned
separately, use the optional details
argument:
- my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1);
The result will be up to be three times the number of layers:
the first element will be a name, the second element the arguments
(unspecified arguments will be undef), the third element the flags,
the fourth element a name again, and so forth.
You may open your eyes now.
Nick Ing-Simmons <nick@ing-simmons.net>
SDBM_File - Tied access to sdbm files
SDBM_File
establishes a connection between a Perl hash variable and
a file in SDBM_File format;. You can manipulate the data in the file
just as if it were in a Perl hash, but when your program exits, the
data will remain in the file, to be used the next time your program
runs.
Use SDBM_File
with the Perl built-in tie function to establish
the connection between the variable and the file. The arguments to
tie should be:
The hash variable you want to tie.
The string "SDBM_File"
. (Ths tells Perl to use the SDBM_File
package to perform the functions of the hash.)
The name of the file you want to tie to the hash.
Flags. Use one of:
O_RDONLY
Read-only access to the data in the file.
O_WRONLY
Write-only access to the data in the file.
O_RDWR
Both read and write access.
If you want to create the file if it does not exist, add O_CREAT
to
any of these, as in the example. If you omit O_CREAT
and the file
does not already exist, the tie call will fail.
The default permissions to use if a new file is created. The actual permissions will be modified by the user's umask, so you should probably use 0666 here. (See umask.)
On failure, the tie call returns an undefined value and probably
sets $!
to contain the reason the file could not be tied.
sdbm store returned -1, errno 22, key "..." at ...This warning is emitted when you try to store a key or a value that is too long. It means that the change was not recorded in the database. See BUGS AND WARNINGS below.
There are a number of limits on the size of the data that you can store in the SDBM file. The most important is that the length of a key, plus the length of its associated value, may not exceed 1008 bytes.
See tie, perldbmfilter, Fcntl
Safe - Compile and execute code in restricted compartments
- use Safe;
- $compartment = new Safe;
- $compartment->permit(qw(time sort :browse));
- $result = $compartment->reval($unsafe_code);
The Safe extension module allows the creation of compartments in which perl code can be evaluated. Each compartment has
The "root" of the namespace (i.e. "main::") is changed to a different package and code evaluated in the compartment cannot refer to variables outside this namespace, even with run-time glob lookups and other tricks.
Code which is compiled outside the compartment can choose to place variables into (or share variables with) the compartment's namespace and only that data will be visible to code evaluated in the compartment.
By default, the only variables shared with compartments are the "underscore" variables $_ and @_ (and, technically, the less frequently used %_, the _ filehandle and so on). This is because otherwise perl operators which default to $_ will not work and neither will the assignment of arguments to @_ on subroutine entry.
Each compartment has an associated "operator mask". Recall that perl code is compiled into an internal format before execution. Evaluating perl code (e.g. via "eval" or "do 'file'") causes the code to be compiled into an internal format and then, provided there was no error in the compilation, executed. Code evaluated in a compartment compiles subject to the compartment's operator mask. Attempting to evaluate code in a compartment which contains a masked operator will cause the compilation to fail with an error. The code will not be executed.
The default operator mask for a newly created compartment is the ':default' optag.
It is important that you read the Opcode module documentation for more information, especially for detailed definitions of opnames, optags and opsets.
Since it is only at the compilation stage that the operator mask applies, controlled access to potentially unsafe operations can be achieved by having a handle to a wrapper subroutine (written outside the compartment) placed into the compartment. For example,
The authors make no warranty, implied or otherwise, about the suitability of this software for safety or security purposes.
The authors shall not in any case be liable for special, incidental, consequential, indirect or other similar damages arising from the use of this software.
Your mileage will vary. If in any doubt do not use it.
To create a new compartment, use
- $cpt = new Safe;
Optional argument is (NAMESPACE), where NAMESPACE is the root namespace to use for the compartment (defaults to "Safe::Root0", incremented for each new compartment).
Note that version 1.00 of the Safe module supported a second optional parameter, MASK. That functionality has been withdrawn pending deeper consideration. Use the permit and deny methods described below.
The following methods can then be used on the compartment object returned by the above constructor. The object argument is implicit in each case.
Permit the listed operators to be used when compiling code in the compartment (in addition to any operators already permitted).
You can list opcodes by names, or use a tag name; see Predefined Opcode Tags in Opcode.
Permit only the listed operators to be used when compiling code in the compartment (no other operators are permitted).
Deny the listed operators from being used when compiling code in the compartment (other operators may still be permitted).
Deny only the listed operators from being used when compiling code in the compartment (all other operators will be permitted, so you probably don't want to use this method).
The trap and untrap methods are synonyms for deny and permit respectfully.
This shares the variable(s) in the argument list with the compartment. This is almost identical to exporting variables using the Exporter module.
Each NAME must be the name of a non-lexical variable, typically with the leading type identifier included. A bareword is treated as a function name.
Examples of legal names are '$foo' for a scalar, '@foo' for an array, '%foo' for a hash, '&foo' or 'foo' for a subroutine and '*foo' for a glob (i.e. all symbol table entries associated with "foo", including scalar, array, hash, sub and filehandle).
Each NAME is assumed to be in the calling package. See share_from
for an alternative method (which share
uses).
This method is similar to share() but allows you to explicitly name the package that symbols should be shared from. The symbol names (including type characters) are supplied as an array reference.
- $safe->share_from('main', [ '$foo', '%bar', 'func' ]);
Names can include package names, which are relative to the specified PACKAGE. So these two calls have the same effect:
- $safe->share_from('Scalar::Util', [ 'reftype' ]);
- $safe->share_from('main', [ 'Scalar::Util::reftype' ]);
This returns a glob reference for the symbol table entry of VARNAME in the package of the compartment. VARNAME must be the name of a variable without any leading type marker. For example:
- ${$cpt->varglob('foo')} = "Hello world";
has the same effect as:
- $cpt = new Safe 'Root';
- $Root::foo = "Hello world";
but avoids the need to know $cpt's package name.
This evaluates STRING as perl code inside the compartment.
The code can only see the compartment's namespace (as returned by the
root method). The compartment's root package appears to be the
main::
package to the code inside the compartment.
Any attempt by the code in STRING to use an operator which is not permitted by the compartment will cause an error (at run-time of the main program but at compile-time for the code in STRING). The error is of the form "'%s' trapped by operation mask...".
If an operation is trapped in this way, then the code in STRING will not be executed. If such a trapped operation occurs or any other compile-time or return error, then $@ is set to the error message, just as with an eval().
If there is no error, then the method returns the value of the last expression evaluated, or a return statement may be used, just as with subroutines and eval(). The context (list or scalar) is determined by the caller as usual.
If the return value of reval() is (or contains) any code reference, those code references are wrapped to be themselves executed always in the compartment. See wrap_code_refs_within.
The formerly undocumented STRICT argument sets strictness: if true 'use strict;' is used, otherwise it uses 'no strict;'. Note: if STRICT is omitted 'no strict;' is the default.
Some points to note:
If the entereval op is permitted then the code can use eval "..." to 'hide' code which might use denied ops. This is not a major problem since when the code tries to execute the eval it will fail because the opmask is still in effect. However this technique would allow clever, and possibly harmful, code to 'probe' the boundaries of what is possible.
Any string eval which is executed by code executing in a compartment, or by code called from code executing in a compartment, will be eval'd in the namespace of the compartment. This is potentially a serious problem.
Consider a function foo() in package pkg compiled outside a compartment but shared with it. Assume the compartment has a root package called 'Root'. If foo() contains an eval statement like eval '$foo = 1' then, normally, $pkg::foo will be set to 1. If foo() is called from the compartment (by whatever means) then instead of setting $pkg::foo, the eval will actually set $Root::pkg::foo.
This can easily be demonstrated by using a module, such as the Socket module, which uses eval "..." as part of an AUTOLOAD function. You can 'use' the module outside the compartment and share an (autoloaded) function with the compartment. If an autoload is triggered by code in the compartment, or by any code anywhere that is called by any means from the compartment, then the eval in the Socket module's AUTOLOAD function happens in the namespace of the compartment. Any variables created or used by the eval'd code are now under the control of the code in the compartment.
A similar effect applies to all runtime symbol lookups in code called from a compartment but not compiled within it.
This evaluates the contents of file FILENAME inside the compartment. See above documentation on the reval method for further details.
This method returns the name of the package that is the root of the compartment's namespace.
Note that this behaviour differs from version 1.00 of the Safe module where the root module could be used to change the namespace. That functionality has been withdrawn pending deeper consideration.
This is a get-or-set method for the compartment's operator mask.
With no MASK argument present, it returns the current operator mask of the compartment.
With the MASK argument present, it sets the operator mask for the compartment (equivalent to calling the deny_only method).
Returns a reference to an anonymous subroutine that, when executed, will call CODEREF with the Safe compartment 'in effect'. In other words, with the package namespace adjusted and the opmask enabled.
Note that the opmask doesn't affect the already compiled code, it only affects any further compilation that the already compiled code may try to perform.
This is particularly useful when applied to code references returned from reval().
(It also provides a kind of workaround for RT#60374: "Safe.pm sort {} bug with -Dusethreads". See http://rt.perl.org/rt3//Public/Bug/Display.html?id=60374 for much more detail.)
Wraps any CODE references found within the arguments by replacing each with the result of calling wrap_code_ref on the CODE reference. Any ARRAY or HASH references in the arguments are inspected recursively.
Returns nothing.
This section is just an outline of some of the things code in a compartment might do (intentionally or unintentionally) which can have an effect outside the compartment.
Consuming all (or nearly all) available memory.
Causing infinite loops etc.
Copying private information out of your system. Even something as simple as your user name is of value to others. Much useful information could be gleaned from your environment variables for example.
Causing signals (especially SIGFPE and SIGALARM) to affect your process.
Setting up a signal handler will need to be carefully considered and controlled. What mask is in effect when a signal handler gets called? If a user can get an imported function to get an exception and call the user's signal handler, does that user's restricted mask get re-instated before the handler is called? Does an imported handler get called with its original mask or the user's one?
Ops such as chdir obviously effect the process as a whole and not just the code in the compartment. Ops such as rand and srand have a similar but more subtle effect.
Originally designed and implemented by Malcolm Beattie.
Reworked to use the Opcode module and other changes added by Tim Bunce.
Currently maintained by the Perl 5 Porters, <perl5-porters@perl.org>.
SelectSaver - save and restore selected file handle
A SelectSaver
object contains a reference to the file handle that
was selected when it was created. If its new
method gets an extra
parameter, then that parameter is selected; otherwise, the selected
file handle remains unchanged.
When a SelectSaver
is destroyed, it re-selects the file handle
that was selected when it was created.
SelfLoader - load functions only on demand
- package FOOBAR;
- use SelfLoader;
- ... (initializing code)
- __DATA__
- sub {....
This module tells its users that functions in the FOOBAR package are to be
autoloaded from after the __DATA__
token. See also
Autoloading in perlsub.
The __DATA__
token tells the perl compiler that the perl code
for compilation is finished. Everything after the __DATA__
token
is available for reading via the filehandle FOOBAR::DATA,
where FOOBAR is the name of the current package when the __DATA__
token is reached. This works just the same as __END__
does in
package 'main', but for other modules data after __END__
is not
automatically retrievable, whereas data after __DATA__
is.
The __DATA__
token is not recognized in versions of perl prior to
5.001m.
Note that it is possible to have __DATA__
tokens in the same package
in multiple files, and that the last __DATA__
token in a given
package that is encountered by the compiler is the one accessible
by the filehandle. This also applies to __END__
and main, i.e. if
the 'main' program has an __END__
, but a module 'require'd (_not_ 'use'd)
by that program has a 'package main;' declaration followed by an '__DATA__
',
then the DATA
filehandle is set to access the data after the __DATA__
in the module, _not_ the data after the __END__
token in the 'main'
program, since the compiler encounters the 'require'd file later.
The SelfLoader works by the user placing the __DATA__
token after perl code which needs to be compiled and
run at 'require' time, but before subroutine declarations
that can be loaded in later - usually because they may never
be called.
The SelfLoader will read from the FOOBAR::DATA filehandle to
load in the data after __DATA__
, and load in any subroutine
when it is called. The costs are the one-time parsing of the
data after __DATA__
, and a load delay for the _first_
call of any autoloaded function. The benefits (hopefully)
are a speeded up compilation phase, with no need to load
functions which are never used.
The SelfLoader will stop reading from __DATA__
if
it encounters the __END__
token - just as you would expect.
If the __END__
token is present, and is followed by the
token DATA, then the SelfLoader leaves the FOOBAR::DATA
filehandle open on the line after that token.
The SelfLoader exports the AUTOLOAD
subroutine to the
package using the SelfLoader, and this loads the called
subroutine when it is first called.
There is no advantage to putting subroutines which will _always_
be called after the __DATA__
token.
A 'my $pack_lexical' statement makes the variable $pack_lexical
local _only_ to the file up to the __DATA__
token. Subroutines
declared elsewhere _cannot_ see these types of variables,
just as if you declared subroutines in the package but in another
file, they cannot see these variables.
So specifically, autoloaded functions cannot see package
lexicals (this applies to both the SelfLoader and the Autoloader).
The vars
pragma provides an alternative to defining package-level
globals that will be visible to autoloaded routines. See the documentation
on vars in the pragma section of perlmod.
The SelfLoader can replace the AutoLoader - just change 'use AutoLoader'
to 'use SelfLoader' (though note that the SelfLoader exports
the AUTOLOAD function - but if you have your own AUTOLOAD and
are using the AutoLoader too, you probably know what you're doing),
and the __END__
token to __DATA__
. You will need perl version 5.001m
or later to use this (version 5.001 with all patches up to patch m).
There is no need to inherit from the SelfLoader.
The SelfLoader works similarly to the AutoLoader, but picks up the
subs from after the __DATA__
instead of in the 'lib/auto' directory.
There is a maintenance gain in not needing to run AutoSplit on the module
at installation, and a runtime gain in not needing to keep opening and
closing files to load subs. There is a runtime loss in needing
to parse the code after the __DATA__
. Details of the AutoLoader and
another view of these distinctions can be found in that module's
documentation.
This section is only relevant if you want to use
the FOOBAR::DATA
together with the SelfLoader.
Data after the __DATA__
token in a module is read using the
FOOBAR::DATA filehandle. __END__
can still be used to denote the end
of the __DATA__
section if followed by the token DATA - this is supported
by the SelfLoader. The FOOBAR::DATA
filehandle is left open if an
__END__
followed by a DATA is found, with the filehandle positioned at
the start of the line after the __END__
token. If no __END__
token is
present, or an __END__
token with no DATA token on the same line, then
the filehandle is closed.
The SelfLoader reads from wherever the current
position of the FOOBAR::DATA
filehandle is, until the
EOF or __END__
. This means that if you want to use
that filehandle (and ONLY if you want to), you should either
1. Put all your subroutine declarations immediately after
the __DATA__
token and put your own data after those
declarations, using the __END__
token to mark the end
of subroutine declarations. You must also ensure that the SelfLoader
reads first by calling 'SelfLoader->load_stubs();', or by using a
function which is selfloaded;
or
2. You should read the FOOBAR::DATA
filehandle first, leaving
the handle open and positioned at the first line of subroutine
declarations.
You could conceivably do both.
For modules which are not classes, this section is not relevant. This section is only relevant if you have methods which could be inherited.
A subroutine stub (or forward declaration) looks like
- sub stub;
i.e. it is a subroutine declaration without the body of the subroutine. For modules which are not classes, there is no real need for stubs as far as autoloading is concerned.
For modules which ARE classes, and need to handle inherited methods, stubs are needed to ensure that the method inheritance mechanism works properly. You can load the stubs into the module at 'require' time, by adding the statement 'SelfLoader->load_stubs();' to the module to do this.
The alternative is to put the stubs in before the __DATA__
token BEFORE
releasing the module, and for this purpose the Devel::SelfStubber
module is available. However this does require the extra step of ensuring
that the stubs are in the module. If this is done I strongly recommend
that this is done BEFORE releasing the module - it should NOT be done
at install time in general.
Subroutines in multiple packages within the same file are supported - but you
should note that this requires exporting the SelfLoader::AUTOLOAD
to
every package which requires it. This is done automatically by the
SelfLoader when it first loads the subs into the cache, but you should
really specify it in the initialization before the __DATA__
by putting
a 'use SelfLoader' statement in each package.
Fully qualified subroutine names are also supported. For example,
will all be loaded correctly by the SelfLoader, and the SelfLoader
will ensure that the packages 'foo' and 'baz' correctly have the
SelfLoader AUTOLOAD
method when the data after __DATA__
is first
parsed.
SelfLoader
is maintained by the perl5-porters. Please direct
any questions to the canonical mailing list. Anything that
is applicable to the CPAN release can be sent to its maintainer,
though.
Author and Maintainer: The Perl5-Porters <perl5-porters@perl.org>
Maintainer of the CPAN release: Steffen Mueller <smueller@cpan.org>
This package has been part of the perl core since the first release of perl5. It has been released separately to CPAN so older installations can benefit from bug fixes.
This package has the same copyright and license as the perl core:
- Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999,
- 2000, 2001, 2002, 2003, 2004, 2005, 2006 by Larry Wall and others
- All rights reserved.
- This program is free software; you can redistribute it and/or modify
- it under the terms of either:
- a) the GNU General Public License as published by the Free
- Software Foundation; either version 1, or (at your option) any
- later version, or
- b) the "Artistic License" which comes with this Kit.
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See either
- the GNU General Public License or the Artistic License for more details.
- You should have received a copy of the Artistic License with this
- Kit, in the file named "Artistic". If not, I'll be glad to provide one.
- You should also have received a copy of the GNU General Public License
- along with this program in the file named "Copying". If not, write to the
- Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
- MA 02110-1301, USA or visit their web page on the internet at
- http://www.gnu.org/copyleft/gpl.html.
- For those of you that choose to use the GNU General Public License,
- my interpretation of the GNU General Public License is that no Perl
- script falls under the terms of the GPL unless you explicitly put
- said script under the terms of the GPL yourself. Furthermore, any
- object code linked with perl does not automatically fall under the
- terms of the GPL, provided such object code only adds definitions
- of subroutines and variables, and does not otherwise impair the
- resulting interpreter from executing any standard Perl script. I
- consider linking in C subroutines in this manner to be the moral
- equivalent of defining subroutines in the Perl language itself. You
- may sell such an object file as proprietary provided that you provide
- or offer to provide the Perl source, as specified by the GNU General
- Public License. (This is merely an alternate way of specifying input
- to the program.) You may also sell a binary produced by the dumping of
- a running Perl script that belongs to you, provided that you provide or
- offer to provide the Perl source as specified by the GPL. (The
- fact that a Perl interpreter and your code are in the same binary file
- is, in this case, a form of mere aggregation.) This is my interpretation
- of the GPL. If you still have concerns or difficulties understanding
- my intent, feel free to contact me. Of course, the Artistic License
- spells all this out for your protection, so you may prefer to use that.
Socket
- networking constants and support functions
Socket
a low-level module used by, among other things, the IO::Socket
family of modules. The following examples demonstrate some low-level uses but
a practical program would likely use the higher-level API provided by
IO::Socket
or similar instead.
- use Socket qw(PF_INET SOCK_STREAM pack_sockaddr_in inet_aton);
- socket(my $socket, PF_INET, SOCK_STREAM, 0)
- or die "socket: $!";
- my $port = getservbyname "echo", "tcp";
- connect($socket, pack_sockaddr_in($port, inet_aton("localhost")))
- or die "connect: $!";
- print $socket "Hello, world!\n";
- print <$socket>;
See also the EXAMPLES section.
This module provides a variety of constants, structure manipulators and other functions related to socket-based networking. The values and functions provided are useful when used in conjunction with Perl core functions such as socket(), setsockopt() and bind(). It also provides several other support functions, mostly for dealing with conversions of network addresses between human-readable and native binary forms, and for hostname resolver operations.
Some constants and functions are exported by default by this module; but for
backward-compatibility any recently-added symbols are not exported by default
and must be requested explicitly. When an import list is provided to the
use Socket
line, the default exports are not automatically imported. It is
therefore best practice to always to explicitly list all the symbols required.
Also, some common socket "newline" constants are provided: the constants
CR
, LF
, and CRLF
, as well as $CR
, $LF
, and $CRLF
, which map
to \015
, \012
, and \015\012
. If you do not want to use the literal
characters in your programs, then use the constants provided here. They are
not exported by default, but can be imported individually, and with the
:crlf
export tag:
- use Socket qw(:DEFAULT :crlf);
- $sock->print("GET / HTTP/1.0$CRLF");
The entire getaddrinfo() subsystem can be exported using the tag :addrinfo
;
this exports the getaddrinfo() and getnameinfo() functions, and all the
AI_*
, NI_*
, NIx_*
and EAI_*
constants.
In each of the following groups, there may be many more constants provided
than just the ones given as examples in the section heading. If the heading
ends ...
then this means there are likely more; the exact constants
provided will depend on the OS and headers found at compile-time.
Protocol family constants to use as the first argument to socket() or the
value of the SO_DOMAIN
or SO_FAMILY
socket option.
Address family constants used by the socket address structures, to pass to such functions as inet_pton() or getaddrinfo(), or are returned by such functions as sockaddr_family().
Socket type constants to use as the second argument to socket(), or the value
of the SO_TYPE
socket option.
Linux-specific shortcuts to specify the O_NONBLOCK
and FD_CLOEXEC
flags
during a socket(2) call.
Socket option level constant for setsockopt() and getsockopt().
Socket option name constants for setsockopt() and getsockopt() at the
SOL_SOCKET
level.
Socket option name constants for IPv4 socket options at the IPPROTO_IP
level.
Message flag constants for send() and recv().
Direction constants for shutdown().
Constants giving the special AF_INET
addresses for wildcard, broadcast,
local loopback, and invalid addresses.
Normally equivalent to inet_aton('0.0.0.0'), inet_aton('255.255.255.255'), inet_aton('localhost') and inet_aton('255.255.255.255') respectively.
IP protocol constants to use as the third argument to socket(), the level
argument to getsockopt() or setsockopt(), or the value of the SO_PROTOCOL
socket option.
Socket option name constants for TCP socket options at the IPPROTO_TCP
level.
Constants giving the special AF_INET6
addresses for wildcard and local
loopback.
Normally equivalent to inet_pton(AF_INET6, "::") and inet_pton(AF_INET6, "::1") respectively.
Socket option name constants for IPv6 socket options at the IPPROTO_IPV6
level.
The following functions convert between lists of Perl values and packed binary strings representing structures.
Takes a packed socket address (as returned by pack_sockaddr_in(),
pack_sockaddr_un() or the perl builtin functions getsockname() and
getpeername()). Returns the address family tag. This will be one of the
AF_*
constants, such as AF_INET
for a sockaddr_in
addresses or
AF_UNIX
for a sockaddr_un
. It can be used to figure out what unpack to
use for a sockaddr of unknown type.
Takes two arguments, a port number and an opaque string (as returned by
inet_aton(), or a v-string). Returns the sockaddr_in
structure with those
arguments packed in and AF_INET
filled in. For Internet domain sockets,
this structure is normally what you need for the arguments in bind(),
connect(), and send().
Takes a sockaddr_in
structure (as returned by pack_sockaddr_in(),
getpeername() or recv()). Returns a list of two elements: the port and an
opaque string representing the IP address (you can use inet_ntoa() to convert
the address to the four-dotted numeric format). Will croak if the structure
does not represent an AF_INET
address.
In scalar context will return just the IP address.
A wrapper of pack_sockaddr_in() or unpack_sockaddr_in(). In list context,
unpacks its argument and returns a list consisting of the port and IP address.
In scalar context, packs its port and IP address arguments as a sockaddr_in
and returns it.
Provided largely for legacy compatibility; it is better to use pack_sockaddr_in() or unpack_sockaddr_in() explicitly.
Takes two to four arguments, a port number, an opaque string (as returned by
inet_pton()), optionally a scope ID number, and optionally a flow label
number. Returns the sockaddr_in6
structure with those arguments packed in
and AF_INET6
filled in. IPv6 equivalent of pack_sockaddr_in().
Takes a sockaddr_in6
structure. Returns a list of four elements: the port
number, an opaque string representing the IPv6 address, the scope ID, and the
flow label. (You can use inet_ntop() to convert the address to the usual
string format). Will croak if the structure does not represent an AF_INET6
address.
In scalar context will return just the IP address.
A wrapper of pack_sockaddr_in6() or unpack_sockaddr_in6(). In list context, unpacks its argument according to unpack_sockaddr_in6(). In scalar context, packs its arguments according to pack_sockaddr_in6().
Provided largely for legacy compatibility; it is better to use pack_sockaddr_in6() or unpack_sockaddr_in6() explicitly.
Takes one argument, a pathname. Returns the sockaddr_un
structure with that
path packed in with AF_UNIX
filled in. For PF_UNIX
sockets, this
structure is normally what you need for the arguments in bind(), connect(),
and send().
Takes a sockaddr_un
structure (as returned by pack_sockaddr_un(),
getpeername() or recv()). Returns a list of one element: the pathname. Will
croak if the structure does not represent an AF_UNIX
address.
A wrapper of pack_sockaddr_un() or unpack_sockaddr_un(). In a list context,
unpacks its argument and returns a list consisting of the pathname. In a
scalar context, packs its pathname as a sockaddr_un
and returns it.
Provided largely for legacy compatibility; it is better to use pack_sockaddr_un() or unpack_sockaddr_un() explicitly.
These are only supported if your system has <sys/un.h>.
Takes an IPv4 multicast address and optionally an interface address (or
INADDR_ANY
). Returns the ip_mreq
structure with those arguments packed
in. Suitable for use with the IP_ADD_MEMBERSHIP
and IP_DROP_MEMBERSHIP
sockopts.
Takes an ip_mreq
structure. Returns a list of two elements; the IPv4
multicast address and interface address.
Takes an IPv4 multicast address, source address, and optionally an interface
address (or INADDR_ANY
). Returns the ip_mreq_source
structure with those
arguments packed in. Suitable for use with the IP_ADD_SOURCE_MEMBERSHIP
and IP_DROP_SOURCE_MEMBERSHIP
sockopts.
Takes an ip_mreq_source
structure. Returns a list of three elements; the
IPv4 multicast address, source address and interface address.
Takes an IPv6 multicast address and an interface number. Returns the
ipv6_mreq
structure with those arguments packed in. Suitable for use with
the IPV6_ADD_MEMBERSHIP
and IPV6_DROP_MEMBERSHIP
sockopts.
Takes an ipv6_mreq
structure. Returns a list of two elements; the IPv6
address and an interface number.
Takes a string giving the name of a host, or a textual representation of an IP
address and translates that to an packed binary address structure suitable to
pass to pack_sockaddr_in(). If passed a hostname that cannot be resolved,
returns undef. For multi-homed hosts (hosts with more than one address),
the first address found is returned.
For portability do not assume that the result of inet_aton() is 32 bits wide, in other words, that it would contain only the IPv4 address in network order.
This IPv4-only function is provided largely for legacy reasons. Newly-written code should use getaddrinfo() or inet_pton() instead for IPv6 support.
Takes a packed binary address structure such as returned by
unpack_sockaddr_in() (or a v-string representing the four octets of the IPv4
address in network order) and translates it into a string of the form
d.d.d.d
where the d
s are numbers less than 256 (the normal
human-readable four dotted number notation for Internet addresses).
This IPv4-only function is provided largely for legacy reasons. Newly-written code should use getnameinfo() or inet_ntop() instead for IPv6 support.
Takes an address family (such as AF_INET
or AF_INET6
) and a string
containing a textual representation of an address in that family and
translates that to an packed binary address structure.
See also getaddrinfo() for a more powerful and flexible function to look up socket addresses given hostnames or textual addresses.
Takes an address family and a packed binary address structure and translates
it into a human-readable textual representation of the address; typically in
d.d.d.d
form for AF_INET
or hhhh:hhhh::hhhh
form for AF_INET6
.
See also getnameinfo() for a more powerful and flexible function to turn socket addresses into human-readable textual representations.
Given both a hostname and service name, this function attempts to resolve the host name into a list of network addresses, and the service name into a protocol and port number, and then returns a list of address structures suitable to connect() to it.
Given just a host name, this function attempts to resolve it to a list of network addresses, and then returns a list of address structures giving these addresses.
Given just a service name, this function attempts to resolve it to a protocol
and port number, and then returns a list of address structures that represent
it suitable to bind() to. This use should be combined with the AI_PASSIVE
flag; see below.
Given neither name, it generates an error.
If present, $hints should be a reference to a hash, where the following keys are recognised:
A bitfield containing AI_*
constants; see below.
Restrict to only generating addresses in this address family
Restrict to only generating addresses of this socket type
Restrict to only generating addresses for this protocol
The return value will be a list; the first value being an error indication, followed by a list of address structures (if no error occurred).
The error value will be a dualvar; comparable to the EI_*
error constants,
or printable as a human-readable error message string. If no error occurred it
will be zero numerically and an empty string.
Each value in the results list will be a hash reference containing the following fields:
The address family (e.g. AF_INET
)
The socket type (e.g. SOCK_STREAM
)
The protocol (e.g. IPPROTO_TCP
)
The address in a packed string (such as would be returned by pack_sockaddr_in())
The canonical name for the host if the AI_CANONNAME
flag was provided, or
undef otherwise. This field will only be present on the first returned
address.
The following flag constants are recognised in the $hints hash. Other flag constants may exist as provided by the OS.
Indicates that this resolution is for a local bind() for a passive (i.e. listening) socket, rather than an active (i.e. connecting) socket.
Indicates that the caller wishes the canonical hostname (canonname
) field
of the result to be filled in.
Indicates that the caller will pass a numeric address, rather than a hostname, and that getaddrinfo() must not perform a resolve operation on this name. This flag will prevent a possibly-slow network lookup operation, and instead return an error if a hostname is passed.
Given a packed socket address (such as from getsockname(), getpeername(), or
returned by getaddrinfo() in a addr
field), returns the hostname and
symbolic service name it represents. $flags may be a bitmask of NI_*
constants, or defaults to 0 if unspecified.
The return value will be a list; the first value being an error condition, followed by the hostname and service name.
The error value will be a dualvar; comparable to the EI_*
error constants,
or printable as a human-readable error message string. The host and service
names will be plain strings.
The following flag constants are recognised as $flags. Other flag constants may exist as provided by the OS.
Requests that a human-readable string representation of the numeric address be returned directly, rather than performing a name resolve operation that may convert it into a hostname. This will also avoid potentially-blocking network IO.
Requests that the port number be returned directly as a number representation rather than performing a name resolve operation that may convert it into a service name.
If a name resolve operation fails to provide a name, then this flag will cause getnameinfo() to indicate an error, rather than returning the numeric representation as a human-readable string.
Indicates that the socket address relates to a SOCK_DGRAM
socket, for the
services whose name differs between TCP and UDP protocols.
The following constants may be supplied as $xflags.
Indicates that the caller is not interested in the hostname of the result, so
it does not have to be converted. undef will be returned as the hostname.
Indicates that the caller is not interested in the service name of the result,
so it does not have to be converted. undef will be returned as the service
name.
The following constants may be returned by getaddrinfo() or getnameinfo(). Others may be provided by the OS.
A temporary failure occurred during name resolution. The operation may be successful if it is retried later.
The value of the flags
hint to getaddrinfo(), or the $flags parameter to
getnameinfo() contains unrecognised flags.
The family
hint to getaddrinfo(), or the family of the socket address
passed to getnameinfo() is not supported.
The host name supplied to getaddrinfo() did not provide any usable address data.
The host name supplied to getaddrinfo() does not exist, or the address
supplied to getnameinfo() is not associated with a host name and the
NI_NAMEREQD
flag was supplied.
The service name supplied to getaddrinfo() is not available for the socket type given in the $hints.
The getaddrinfo() function converts a hostname and a service name into a list of structures, each containing a potential way to connect() to the named service on the named host.
- use IO::Socket;
- use Socket qw(SOCK_STREAM getaddrinfo);
- my %hints = (socktype => SOCK_STREAM);
- my ($err, @res) = getaddrinfo("localhost", "echo", \%hints);
- die "Cannot getaddrinfo - $err" if $err;
- my $sock;
- foreach my $ai (@res) {
- my $candidate = IO::Socket->new();
- $candidate->socket($ai->{family}, $ai->{socktype}, $ai->{protocol})
- or next;
- $candidate->connect($ai->{addr})
- or next;
- $sock = $candidate;
- last;
- }
- die "Cannot connect to localhost:echo" unless $sock;
- $sock->print("Hello, world!\n");
- print <$sock>;
Because a list of potential candidates is returned, the while
loop tries
each in turn until it it finds one that succeeds both the socket() and
connect() calls.
This function performs the work of the legacy functions gethostbyname(), getservbyname(), inet_aton() and pack_sockaddr_in().
In practice this logic is better performed by IO::Socket::IP.
The getnameinfo() function converts a socket address, such as returned by getsockname() or getpeername(), into a pair of human-readable strings representing the address and service name.
- use IO::Socket::IP;
- use Socket qw(getnameinfo);
- my $server = IO::Socket::IP->new(LocalPort => 12345, Listen => 1) or
- die "Cannot listen - $@";
- my $socket = $server->accept or die "accept: $!";
- my ($err, $hostname, $servicename) = getnameinfo($socket->peername);
- die "Cannot getnameinfo - $err" if $err;
- print "The peer is connected from $hostname\n";
Since in this example only the hostname was used, the redundant conversion of
the port number into a service name may be omitted by passing the
NIx_NOSERV
flag.
This function performs the work of the legacy functions unpack_sockaddr_in(), inet_ntoa(), gethostbyaddr() and getservbyport().
In practice this logic is better performed by IO::Socket::IP.
To turn a hostname into a human-readable plain IP address use getaddrinfo() to turn the hostname into a list of socket structures, then getnameinfo() on each one to make it a readable IP address again.
- use Socket qw(:addrinfo SOCK_RAW);
- my ($err, @res) = getaddrinfo($hostname, "", {socktype => SOCK_RAW});
- die "Cannot getaddrinfo - $err" if $err;
- while( my $ai = shift @res ) {
- my ($err, $ipaddr) = getnameinfo($ai->{addr}, NI_NUMERICHOST, NIx_NOSERV);
- die "Cannot getnameinfo - $err" if $err;
- print "$ipaddr\n";
- }
The socktype
hint to getaddrinfo() filters the results to only include one
socket type and protocol. Without this most OSes return three combinations,
for SOCK_STREAM
, SOCK_DGRAM
and SOCK_RAW
, resulting in triplicate
output of addresses. The NI_NUMERICHOST
flag to getnameinfo() causes it to
return a string-formatted plain IP address, rather than reverse resolving it
back into a hostname.
This combination performs the work of the legacy functions gethostbyname() and inet_ntoa().
The many SO_*
and other constants provide the socket option names for
getsockopt() and setsockopt().
- use IO::Socket::INET;
- use Socket qw(SOL_SOCKET SO_RCVBUF IPPROTO_IP IP_TTL);
- my $socket = IO::Socket::INET->new(LocalPort => 0, Proto => 'udp')
- or die "Cannot create socket: $@";
- $socket->setsockopt(SOL_SOCKET, SO_RCVBUF, 64*1024) or
- die "setsockopt: $!";
- print "Receive buffer is ", $socket->getsockopt(SOL_SOCKET, SO_RCVBUF),
- " bytes\n";
- print "IP TTL is ", $socket->getsockopt(IPPROTO_IP, IP_TTL), "\n";
As a convenience, IO::Socket's setsockopt() method will convert a number into a packed byte buffer, and getsockopt() will unpack a byte buffer of the correct size back into a number.
This module was originally maintained in Perl core by the Perl 5 Porters.
It was extracted to dual-life on CPAN at version 1.95 by Paul Evans <leonerd@leonerd.org.uk>
Storable - persistence for Perl data structures
- use Storable;
- store \%table, 'file';
- $hashref = retrieve('file');
- use Storable qw(nstore store_fd nstore_fd freeze thaw dclone);
- # Network order
- nstore \%table, 'file';
- $hashref = retrieve('file'); # There is NO nretrieve()
- # Storing to and retrieving from an already opened file
- store_fd \@array, \*STDOUT;
- nstore_fd \%table, \*STDOUT;
- $aryref = fd_retrieve(\*SOCKET);
- $hashref = fd_retrieve(\*SOCKET);
- # Serializing to memory
- $serialized = freeze \%table;
- %table_clone = %{ thaw($serialized) };
- # Deep (recursive) cloning
- $cloneref = dclone($ref);
- # Advisory locking
- use Storable qw(lock_store lock_nstore lock_retrieve)
- lock_store \%table, 'file';
- lock_nstore \%table, 'file';
- $hashref = lock_retrieve('file');
The Storable package brings persistence to your Perl data structures containing SCALAR, ARRAY, HASH or REF objects, i.e. anything that can be conveniently stored to disk and retrieved at a later time.
It can be used in the regular procedural way by calling store
with
a reference to the object to be stored, along with the file name where
the image should be written.
The routine returns undef for I/O problems or other internal error,
a true value otherwise. Serious errors are propagated as a die exception.
To retrieve data stored to disk, use retrieve
with a file name.
The objects stored into that file are recreated into memory for you,
and a reference to the root object is returned. In case an I/O error
occurs while reading, undef is returned instead. Other serious
errors are propagated via die.
Since storage is performed recursively, you might want to stuff references to objects that share a lot of common data into a single array or hash table, and then store that object. That way, when you retrieve back the whole thing, the objects will continue to share what they originally shared.
At the cost of a slight header overhead, you may store to an already
opened file descriptor using the store_fd
routine, and retrieve
from a file via fd_retrieve
. Those names aren't imported by default,
so you will have to do that explicitly if you need those routines.
The file descriptor you supply must be already opened, for read
if you're going to retrieve and for write if you wish to store.
- store_fd(\%table, *STDOUT) || die "can't store to stdout\n";
- $hashref = fd_retrieve(*STDIN);
You can also store data in network order to allow easy sharing across
multiple platforms, or when storing on a socket known to be remotely
connected. The routines to call have an initial n
prefix for network,
as in nstore
and nstore_fd
. At retrieval time, your data will be
correctly restored so you don't have to know whether you're restoring
from native or network ordered data. Double values are stored stringified
to ensure portability as well, at the slight risk of loosing some precision
in the last decimals.
When using fd_retrieve
, objects are retrieved in sequence, one
object (i.e. one recursive tree) per associated store_fd
.
If you're more from the object-oriented camp, you can inherit from
Storable and directly store your objects by invoking store
as
a method. The fact that the root of the to-be-stored tree is a
blessed reference (i.e. an object) is special-cased so that the
retrieve does not provide a reference to that object but rather the
blessed object reference itself. (Otherwise, you'd get a reference
to that blessed object).
The Storable engine can also store data into a Perl scalar instead, to later retrieve them. This is mainly used to freeze a complex structure in some safe compact memory place (where it can possibly be sent to another process via some IPC, since freezing the structure also serializes it in effect). Later on, and maybe somewhere else, you can thaw the Perl scalar out and recreate the original complex structure in memory.
Surprisingly, the routines to be called are named freeze
and thaw
.
If you wish to send out the frozen scalar to another machine, use
nfreeze
instead to get a portable image.
Note that freezing an object structure and immediately thawing it actually achieves a deep cloning of that structure:
- dclone(.) = thaw(freeze(.))
Storable provides you with a dclone
interface which does not create
that intermediary scalar but instead freezes the structure in some
internal memory space and then immediately thaws it out.
The lock_store
and lock_nstore
routine are equivalent to
store
and nstore
, except that they get an exclusive lock on
the file before writing. Likewise, lock_retrieve
does the same
as retrieve
, but also gets a shared lock on the file before reading.
As with any advisory locking scheme, the protection only works if you
systematically use lock_store
and lock_retrieve
. If one side of
your application uses store
whilst the other uses lock_retrieve
,
you will get no protection at all.
The internal advisory locking is implemented using Perl's flock() routine. If your system does not support any form of flock(), or if you share your files across NFS, you might wish to use other forms of locking by using modules such as LockFile::Simple which lock a file using a filesystem entry, instead of locking the file descriptor.
The heart of Storable is written in C for decent speed. Extra low-level optimizations have been made when manipulating perl internals, to sacrifice encapsulation for the benefit of greater speed.
Normally, Storable stores elements of hashes in the order they are
stored internally by Perl, i.e. pseudo-randomly. If you set
$Storable::canonical
to some TRUE
value, Storable will store
hashes with the elements sorted by their key. This allows you to
compare data structures by comparing their frozen representations (or
even the compressed frozen representations), which can be useful for
creating lookup tables for complicated queries.
Canonical order does not imply network order; those are two orthogonal settings.
Since Storable version 2.05, CODE references may be serialized with
the help of B::Deparse. To enable this feature, set
$Storable::Deparse
to a true value. To enable deserialization,
$Storable::Eval
should be set to a true value. Be aware that
deserialization is done through eval, which is dangerous if the
Storable file contains malicious data. You can set $Storable::Eval
to a subroutine reference which would be used instead of eval. See
below for an example using a Safe compartment for deserialization
of CODE references.
If $Storable::Deparse
and/or $Storable::Eval
are set to false
values, then the value of $Storable::forgive_me
(see below) is
respected while serializing and deserializing.
This release of Storable can be used on a newer version of Perl to
serialize data which is not supported by earlier Perls. By default,
Storable will attempt to do the right thing, by croak()
ing if it
encounters data that it cannot deserialize. However, the defaults
can be changed as follows:
Perl 5.6 added support for Unicode characters with code points > 255,
and Perl 5.8 has full support for Unicode characters in hash keys.
Perl internally encodes strings with these characters using utf8, and
Storable serializes them as utf8. By default, if an older version of
Perl encounters a utf8 value it cannot represent, it will croak()
.
To change this behaviour so that Storable deserializes utf8 encoded
values as the string of bytes (effectively dropping the is_utf8 flag)
set $Storable::drop_utf8
to some TRUE
value. This is a form of
data loss, because with $drop_utf8
true, it becomes impossible to tell
whether the original data was the Unicode string, or a series of bytes
that happen to be valid utf8.
Perl 5.8 adds support for restricted hashes, which have keys
restricted to a given set, and can have values locked to be read only.
By default, when Storable encounters a restricted hash on a perl
that doesn't support them, it will deserialize it as a normal hash,
silently discarding any placeholder keys and leaving the keys and
all values unlocked. To make Storable croak()
instead, set
$Storable::downgrade_restricted
to a FALSE
value. To restore
the default set it back to some TRUE
value.
Earlier versions of Storable would immediately croak if they encountered a file with a higher internal version number than the reading Storable knew about. Internal version numbers are increased each time new data types (such as restricted hashes) are added to the vocabulary of the file format. This meant that a newer Storable module had no way of writing a file readable by an older Storable, even if the writer didn't store newer data types.
This version of Storable will defer croaking until it encounters a data type in the file that it does not recognize. This means that it will continue to read files generated by newer Storable modules which are careful in what they write out, making it easier to upgrade Storable modules in a mixed environment.
The old behaviour of immediate croaking can be re-instated by setting
$Storable::accept_future_minor
to some FALSE
value.
All these variables have no effect on a newer Perl which supports the relevant feature.
Storable uses the "exception" paradigm, in that it does not try to workaround
failures: if something bad happens, an exception is generated from the
caller's perspective (see Carp and croak()
). Use eval {} to trap
those exceptions.
When Storable croaks, it tries to report the error via the logcroak()
routine from the Log::Agent
package, if it is available.
Normal errors are reported by having store() or retrieve() return undef.
Such errors are usually I/O errors (or truncated stream errors at retrieval).
Any class may define hooks that will be called during the serialization and deserialization process on objects that are instances of that class. Those hooks can redefine the way serialization is performed (and therefore, how the symmetrical deserialization should be conducted).
Since we said earlier:
- dclone(.) = thaw(freeze(.))
everything we say about hooks should also hold for deep cloning. However, hooks get to know whether the operation is a mere serialization, or a cloning.
Therefore, when serializing hooks are involved,
- dclone(.) <> thaw(freeze(.))
Well, you could keep them in sync, but there's no guarantee it will always hold on classes somebody else wrote. Besides, there is little to gain in doing so: a serializing hook could keep only one attribute of an object, which is probably not what should happen during a deep cloning of that same object.
Here is the hooking interface:
STORABLE_freeze
obj, cloning
The serializing hook, called on the object during serialization. It can be inherited, or defined in the class itself, like any other method.
Arguments: obj is the object to serialize, cloning is a flag indicating whether we're in a dclone() or a regular serialization via store() or freeze().
Returned value: A LIST ($serialized, $ref1, $ref2, ...)
where $serialized
is the serialized form to be used, and the optional $ref1, $ref2, etc... are
extra references that you wish to let the Storable engine serialize.
At deserialization time, you will be given back the same LIST, but all the extra references will be pointing into the deserialized structure.
The first time the hook is hit in a serialization flow, you may have it return an empty list. That will signal the Storable engine to further discard that hook for this class and to therefore revert to the default serialization of the underlying Perl data. The hook will again be normally processed in the next serialization.
Unless you know better, serializing hook should always say:
- sub STORABLE_freeze {
- my ($self, $cloning) = @_;
- return if $cloning; # Regular default serialization
- ....
- }
in order to keep reasonable dclone() semantics.
STORABLE_thaw
obj, cloning, serialized, ...
The deserializing hook called on the object during deserialization. But wait: if we're deserializing, there's no object yet... right?
Wrong: the Storable engine creates an empty one for you. If you know Eiffel,
you can view STORABLE_thaw
as an alternate creation routine.
This means the hook can be inherited like any other method, and that obj is your blessed reference for this particular instance.
The other arguments should look familiar if you know STORABLE_freeze
:
cloning is true when we're part of a deep clone operation, serialized
is the serialized string you returned to the engine in STORABLE_freeze
,
and there may be an optional list of references, in the same order you gave
them at serialization time, pointing to the deserialized objects (which
have been processed courtesy of the Storable engine).
When the Storable engine does not find any STORABLE_thaw
hook routine,
it tries to load the class by requiring the package dynamically (using
the blessed package name), and then re-attempts the lookup. If at that
time the hook cannot be located, the engine croaks. Note that this mechanism
will fail if you define several classes in the same file, but perlmod
warned you.
It is up to you to use this information to populate obj the way you want.
Returned value: none.
STORABLE_attach
class, cloning, serialized
While STORABLE_freeze
and STORABLE_thaw
are useful for classes where
each instance is independent, this mechanism has difficulty (or is
incompatible) with objects that exist as common process-level or
system-level resources, such as singleton objects, database pools, caches
or memoized objects.
The alternative STORABLE_attach
method provides a solution for these
shared objects. Instead of STORABLE_freeze
--> STORABLE_thaw
,
you implement STORABLE_freeze
--> STORABLE_attach
instead.
Arguments: class is the class we are attaching to, cloning is a flag indicating whether we're in a dclone() or a regular de-serialization via thaw(), and serialized is the stored string for the resource object.
Because these resource objects are considered to be owned by the entire
process/system, and not the "property" of whatever is being serialized,
no references underneath the object should be included in the serialized
string. Thus, in any class that implements STORABLE_attach
, the
STORABLE_freeze
method cannot return any references, and Storable
will throw an error if STORABLE_freeze
tries to return references.
All information required to "attach" back to the shared resource object
must be contained only in the STORABLE_freeze
return string.
Otherwise, STORABLE_freeze
behaves as normal for STORABLE_attach
classes.
Because STORABLE_attach
is passed the class (rather than an object),
it also returns the object directly, rather than modifying the passed
object.
Returned value: object of type class
Predicates are not exportable. They must be called by explicitly prefixing them with the Storable package name.
Storable::last_op_in_netorder
The Storable::last_op_in_netorder()
predicate will tell you whether
network order was used in the last store or retrieve operation. If you
don't know how to use this, just forget about it.
Storable::is_storing
Returns true if within a store operation (via STORABLE_freeze hook).
Storable::is_retrieving
Returns true if within a retrieve operation (via STORABLE_thaw hook).
With hooks comes the ability to recurse back to the Storable engine. Indeed, hooks are regular Perl code, and Storable is convenient when it comes to serializing and deserializing things, so why not use it to handle the serialization string?
There are a few things you need to know, however:
You can create endless loops if the things you serialize via freeze() (for instance) point back to the object we're trying to serialize in the hook.
Shared references among objects will not stay shared: if we're serializing the list of object [A, C] where both object A and C refer to the SAME object B, and if there is a serializing hook in A that says freeze(B), then when deserializing, we'll get [A', C'] where A' refers to B', but C' refers to D, a deep clone of B'. The topology was not preserved.
That's why STORABLE_freeze
lets you provide a list of references
to serialize. The engine guarantees that those will be serialized in the
same context as the other objects, and therefore that shared objects will
stay shared.
In the above [A, C] example, the STORABLE_freeze
hook could return:
- ("something", $self->{B})
and the B part would be serialized by the engine. In STORABLE_thaw
, you
would get back the reference to the B' object, deserialized for you.
Therefore, recursion should normally be avoided, but is nonetheless supported.
There is a Clone module available on CPAN which implements deep cloning natively, i.e. without freezing to memory and thawing the result. It is aimed to replace Storable's dclone() some day. However, it does not currently support Storable hooks to redefine the way deep cloning is performed.
Yes, there's a lot of that :-) But more precisely, in UNIX systems
there's a utility called file
, which recognizes data files based on
their contents (usually their first few bytes). For this to work,
a certain file called magic needs to taught about the signature
of the data. Where that configuration file lives depends on the UNIX
flavour; often it's something like /usr/share/misc/magic or
/etc/magic. Your system administrator needs to do the updating of
the magic file. The necessary signature information is output to
STDOUT by invoking Storable::show_file_magic(). Note that the GNU
implementation of the file
utility, version 3.38 or later,
is expected to contain support for recognising Storable files
out-of-the-box, in addition to other kinds of Perl files.
You can also use the following functions to extract the file header information from Storable images:
If the given file is a Storable image return a hash describing it. If
the file is readable, but not a Storable image return undef. If
the file does not exist or is unreadable then croak.
The hash returned has the following elements:
version
This returns the file format version. It is a string like "2.7".
Note that this version number is not the same as the version number of the Storable module itself. For instance Storable v0.7 create files in format v2.0 and Storable v2.15 create files in format v2.7. The file format version number only increment when additional features that would confuse older versions of the module are added.
Files older than v2.0 will have the one of the version numbers "-1", "0" or "1". No minor number was used at that time.
version_nv
This returns the file format version as number. It is a string like "2.007". This value is suitable for numeric comparisons.
The constant function Storable::BIN_VERSION_NV
returns a comparable
number that represents the highest file version number that this
version of Storable fully supports (but see discussion of
$Storable::accept_future_minor
above). The constant
Storable::BIN_WRITE_VERSION_NV
function returns what file version
is written and might be less than Storable::BIN_VERSION_NV
in some
configurations.
major
, minor
This also returns the file format version. If the version is "2.7" then major would be 2 and minor would be 7. The minor element is missing for when major is less than 2.
hdrsize
The is the number of bytes that the Storable header occupies.
netorder
This is TRUE if the image store data in network order. This means that it was created with nstore() or similar.
byteorder
This is only present when netorder
is FALSE. It is the
$Config{byteorder} string of the perl that created this image. It is
a string like "1234" (32 bit little endian) or "87654321" (64 bit big
endian). This must match the current perl for the image to be
readable by Storable.
intsize
, longsize
, ptrsize
, nvsize
These are only present when netorder
is FALSE. These are the sizes of
various C datatypes of the perl that created this image. These must
match the current perl for the image to be readable by Storable.
The nvsize
element is only present for file format v2.2 and
higher.
file
The name of the file.
The $buffer should be a Storable image or the first few bytes of it.
If $buffer starts with a Storable header, then a hash describing the
image is returned, otherwise undef is returned.
The hash has the same structure as the one returned by
Storable::file_magic(). The file
element is true if the image is a
file image.
If the $must_be_file argument is provided and is TRUE, then return
undef unless the image looks like it belongs to a file dump.
The maximum size of a Storable header is currently 21 bytes. If the provided $buffer is only the first part of a Storable image it should at least be this long to ensure that read_magic() will recognize it as such.
Here are some code samples showing a possible usage of Storable:
- use Storable qw(store retrieve freeze thaw dclone);
- %color = ('Blue' => 0.1, 'Red' => 0.8, 'Black' => 0, 'White' => 1);
- store(\%color, 'mycolors') or die "Can't store %a in mycolors!\n";
- $colref = retrieve('mycolors');
- die "Unable to retrieve from mycolors!\n" unless defined $colref;
- printf "Blue is still %lf\n", $colref->{'Blue'};
- $colref2 = dclone(\%color);
- $str = freeze(\%color);
- printf "Serialization of %%color is %d bytes long.\n", length($str);
- $colref3 = thaw($str);
which prints (on my machine):
- Blue is still 0.100000
- Serialization of %color is 102 bytes long.
Serialization of CODE references and deserialization in a safe compartment:
- use Storable qw(freeze thaw);
- use Safe;
- use strict;
- my $safe = new Safe;
- # because of opcodes used in "use strict":
- $safe->permit(qw(:default require));
- local $Storable::Deparse = 1;
- local $Storable::Eval = sub { $safe->reval($_[0]) };
- my $serialized = freeze(sub { 42 });
- my $code = thaw($serialized);
- $code->() == 42;
Do not accept Storable documents from untrusted sources!
Some features of Storable can lead to security vulnerabilities if you accept Storable documents from untrusted sources. Most obviously, the optional (off by default) CODE reference serialization feature allows transfer of code to the deserializing process. Furthermore, any serialized object will cause Storable to helpfully load the module corresponding to the class of the object in the deserializing module. For manipulated module names, this can load almost arbitrary code. Finally, the deserialized object's destructors will be invoked when the objects get destroyed in the deserializing process. Maliciously crafted Storable documents may put such objects in the value of a hash key that is overridden by another key/value pair in the same hash, thus causing immediate destructor execution.
In a future version of Storable, we intend to provide options to disable loading modules for classes and to disable deserializing objects altogether. Nonetheless, Storable deserializing documents from untrusted sources is expected to have other, yet undiscovered, security concerns such as allowing an attacker to cause the deserializer to crash hard.
Therefore, let me repeat: Do not accept Storable documents from untrusted sources!
If your application requires accepting data from untrusted sources, you are best off with a less powerful and more-likely safe serialization format and implementation. If your data is sufficently simple, JSON is a good choice and offers maximum interoperability.
If you're using references as keys within your hash tables, you're bound to be disappointed when retrieving your data. Indeed, Perl stringifies references used as hash table keys. If you later wish to access the items via another reference stringification (i.e. using the same reference that was used for the key originally to record the value into the hash table), it will work because both references stringify to the same string.
It won't work across a sequence of store
and retrieve
operations,
however, because the addresses in the retrieved objects, which are
part of the stringified references, will probably differ from the
original addresses. The topology of your structure is preserved,
but not hidden semantics like those.
On platforms where it matters, be sure to call binmode() on the
descriptors that you pass to Storable functions.
Storing data canonically that contains large hashes can be significantly slower than storing the same data normally, as temporary arrays to hold the keys for each hash have to be allocated, populated, sorted and freed. Some tests have shown a halving of the speed of storing -- the exact penalty will depend on the complexity of your data. There is no slowdown on retrieval.
You can't store GLOB, FORMLINE, REGEXP, etc.... If you can define semantics for those operations, feel free to enhance Storable so that it can deal with them.
The store functions will croak
if they run into such references
unless you set $Storable::forgive_me
to some TRUE
value. In that
case, the fatal message is turned in a warning and some
meaningless string is stored instead.
Setting $Storable::canonical
may not yield frozen strings that
compare equal due to possible stringification of numbers. When the
string version of a scalar exists, it is the form stored; therefore,
if you happen to use your numbers as strings between two freezing
operations on the same data structures, you will get different
results.
When storing doubles in network order, their value is stored as text. However, you should also not expect non-numeric floating-point values such as infinity and "not a number" to pass successfully through a nstore()/retrieve() pair.
As Storable neither knows nor cares about character sets (although it does know that characters may be more than eight bits wide), any difference in the interpretation of character codes between a host and a target system is your problem. In particular, if host and target use different code points to represent the characters used in the text representation of floating-point numbers, you will not be able be able to exchange floating-point data, even with nstore().
Storable::drop_utf8
is a blunt tool. There is no facility either to
return all strings as utf8 sequences, or to attempt to convert utf8
data back to 8 bit and croak()
if the conversion fails.
Prior to Storable 2.01, no distinction was made between signed and unsigned integers on storing. By default Storable prefers to store a scalars string representation (if it has one) so this would only cause problems when storing large unsigned integers that had never been converted to string or floating point. In other words values that had been generated by integer operations such as logic ops and then not used in any string or arithmetic context before storing.
This section only applies to you if you have existing data written out by Storable 2.02 or earlier on perl 5.6.0 or 5.6.1 on Unix or Linux which has been configured with 64 bit integer support (not the default) If you got a precompiled perl, rather than running Configure to build your own perl from source, then it almost certainly does not affect you, and you can stop reading now (unless you're curious). If you're using perl on Windows it does not affect you.
Storable writes a file header which contains the sizes of various C language types for the C compiler that built Storable (when not writing in network order), and will refuse to load files written by a Storable not on the same (or compatible) architecture. This check and a check on machine byteorder is needed because the size of various fields in the file are given by the sizes of the C language types, and so files written on different architectures are incompatible. This is done for increased speed. (When writing in network order, all fields are written out as standard lengths, which allows full interworking, but takes longer to read and write)
Perl 5.6.x introduced the ability to optional configure the perl interpreter
to use C's long long
type to allow scalars to store 64 bit integers on 32
bit systems. However, due to the way the Perl configuration system
generated the C configuration files on non-Windows platforms, and the way
Storable generates its header, nothing in the Storable file header reflected
whether the perl writing was using 32 or 64 bit integers, despite the fact
that Storable was storing some data differently in the file. Hence Storable
running on perl with 64 bit integers will read the header from a file
written by a 32 bit perl, not realise that the data is actually in a subtly
incompatible format, and then go horribly wrong (possibly crashing) if it
encountered a stored integer. This is a design failure.
Storable has now been changed to write out and read in a file header with information about the size of integers. It's impossible to detect whether an old file being read in was written with 32 or 64 bit integers (they have the same header) so it's impossible to automatically switch to a correct backwards compatibility mode. Hence this Storable defaults to the new, correct behaviour.
What this means is that if you have data written by Storable 1.x running
on perl 5.6.0 or 5.6.1 configured with 64 bit integers on Unix or Linux
then by default this Storable will refuse to read it, giving the error
Byte order is not compatible. If you have such data then you you
should set $Storable::interwork_56_64bit
to a true value to make this
Storable read and write files with the old header. You should also
migrate your data, or any older perl you are communicating with, to this
current version of Storable.
If you don't have data written with specific configuration of perl described above, then you do not and should not do anything. Don't set the flag - not only will Storable on an identically configured perl refuse to load them, but Storable a differently configured perl will load them believing them to be correct for it, and then may well fail or crash part way through reading them.
Thank you to (in chronological order):
- Jarkko Hietaniemi <jhi@iki.fi>
- Ulrich Pfeifer <pfeifer@charly.informatik.uni-dortmund.de>
- Benjamin A. Holzman <bholzman@earthlink.net>
- Andrew Ford <A.Ford@ford-mason.co.uk>
- Gisle Aas <gisle@aas.no>
- Jeff Gresham <gresham_jeffrey@jpmorgan.com>
- Murray Nesbitt <murray@activestate.com>
- Marc Lehmann <pcg@opengroup.org>
- Justin Banks <justinb@wamnet.com>
- Jarkko Hietaniemi <jhi@iki.fi> (AGAIN, as perl 5.7.0 Pumpkin!)
- Salvador Ortiz Garcia <sog@msg.com.mx>
- Dominic Dunlop <domo@computer.org>
- Erik Haugan <erik@solbors.no>
- Benjamin A. Holzman <ben.holzman@grantstreet.com>
for their bug reports, suggestions and contributions.
Benjamin Holzman contributed the tied variable support, Andrew Ford contributed the canonical order for hashes, and Gisle Aas fixed a few misunderstandings of mine regarding the perl internals, and optimized the emission of "tags" in the output streams by simply counting the objects instead of tagging them (leading to a binary incompatibility for the Storable image starting at version 0.6--older images are, of course, still properly understood). Murray Nesbitt made Storable thread-safe. Marc Lehmann added overloading and references to tied items support. Benjamin Holzman added a performance improvement for overloaded classes; thanks to Grant Street Group for footing the bill.
Storable was written by Raphael Manfredi <Raphael_Manfredi@pobox.com> Maintenance is now done by the perl5-porters <perl5-porters@perl.org>
Please e-mail us with problems, bug fixes, comments and complaints, although if you have compliments you should send them to Raphael. Please don't e-mail Raphael with problems, as he no longer works on Storable, and your message will be delayed while he forwards it to us.
Symbol - manipulate Perl symbols and their names
- use Symbol;
- $sym = gensym;
- open($sym, "filename");
- $_ = <$sym>;
- # etc.
- ungensym $sym; # no effect
- # replace *FOO{IO} handle but not $FOO, %FOO, etc.
- *FOO = geniosym;
- print qualify("x"), "\n"; # "main::x"
- print qualify("x", "FOO"), "\n"; # "FOO::x"
- print qualify("BAR::x"), "\n"; # "BAR::x"
- print qualify("BAR::x", "FOO"), "\n"; # "BAR::x"
- print qualify("STDOUT", "FOO"), "\n"; # "main::STDOUT" (global)
- print qualify(\*x), "\n"; # returns \*x
- print qualify(\*x, "FOO"), "\n"; # returns \*x
- use strict refs;
- print { qualify_to_ref $fh } "foo!\n";
- $ref = qualify_to_ref $name, $pkg;
- use Symbol qw(delete_package);
- delete_package('Foo::Bar');
- print "deleted\n" unless exists $Foo::{'Bar::'};
Symbol::gensym
creates an anonymous glob and returns a reference
to it. Such a glob reference can be used as a file or directory
handle.
For backward compatibility with older implementations that didn't
support anonymous globs, Symbol::ungensym
is also provided.
But it doesn't do anything.
Symbol::geniosym
creates an anonymous IO handle. This can be
assigned into an existing glob without affecting the non-IO portions
of the glob.
Symbol::qualify
turns unqualified symbol names into qualified
variable names (e.g. "myvar" -> "MyPackage::myvar"). If it is given a
second parameter, qualify
uses it as the default package;
otherwise, it uses the package of its caller. Regardless, global
variable names (e.g. "STDOUT", "ENV", "SIG") are always qualified with
"main::".
Qualification applies only to symbol names (strings). References are left unchanged under the assumption that they are glob references, which are qualified by their nature.
Symbol::qualify_to_ref
is just like Symbol::qualify
except that it
returns a glob ref rather than a symbol name, so you can use the result
even if use strict 'refs'
is in effect.
Symbol::delete_package
wipes out a whole package namespace. Note
this routine is not exported by default--you may want to import it
explicitly.
Symbol::delete_package
is a bit too powerful. It undefines every symbol that
lives in the specified package. Since perl, for performance reasons, does not
perform a symbol table lookup each time a function is called or a global
variable is accessed, some code that has already been loaded and that makes use
of symbols in package Foo
may stop working after you delete Foo
, even if
you reload the Foo
module afterwards.
Test - provides a simple framework for writing test scripts
- use strict;
- use Test;
- # use a BEGIN block so we print our plan before MyModule is loaded
- BEGIN { plan tests => 14, todo => [3,4] }
- # load your module...
- use MyModule;
- # Helpful notes. All note-lines must start with a "#".
- print "# I'm testing MyModule version $MyModule::VERSION\n";
- ok(0); # failure
- ok(1); # success
- ok(0); # ok, expected failure (see todo list, above)
- ok(1); # surprise success!
- ok(0,1); # failure: '0' ne '1'
- ok('broke','fixed'); # failure: 'broke' ne 'fixed'
- ok('fixed','fixed'); # success: 'fixed' eq 'fixed'
- ok('fixed',qr/x/); # success: 'fixed' =~ qr/x/
- ok(sub { 1+1 }, 2); # success: '2' eq '2'
- ok(sub { 1+1 }, 3); # failure: '2' ne '3'
- my @list = (0,0);
- ok @list, 3, "\@list=".join(',',@list); #extra notes
- ok 'segmentation fault', '/(?i)success/'; #regex match
- skip(
- $^O =~ m/MSWin/ ? "Skip if MSWin" : 0, # whether to skip
- $foo, $bar # arguments just like for ok(...)
- );
- skip(
- $^O =~ m/MSWin/ ? 0 : "Skip unless MSWin", # whether to skip
- $foo, $bar # arguments just like for ok(...)
- );
This module simplifies the task of writing test files for Perl modules, such that their output is in the format that Test::Harness expects to see.
To write a test for your new (and probably not even done) module, create a new file called t/test.t (in a new t directory). If you have multiple test files, to test the "foo", "bar", and "baz" feature sets, then feel free to call your files t/foo.t, t/bar.t, and t/baz.t
This module defines three public functions, plan(...)
, ok(...)
,
and skip(...)
. By default, all three are exported by
the use Test;
statement.
plan(...)
- BEGIN { plan %theplan; }
This should be the first thing you call in your test script. It declares your testing plan, how many there will be, if any of them should be allowed to fail, and so on.
Typical usage is just:
- use Test;
- BEGIN { plan tests => 23 }
These are the things that you can put in the parameters to plan:
tests => number
The number of tests in your script. This means all ok() and skip() calls.
todo => [1,5,14]
A reference to a list of tests which are allowed to fail. See TODO TESTS.
onfail => sub { ... }
onfail => \&some_sub
A subroutine reference to be run at the end of the test script, if any of the tests fail. See ONFAIL.
You must call plan(...)
once and only once. You should call it
in a BEGIN {...}
block, like so:
- BEGIN { plan tests => 23 }
ok(...)
- ok(1 + 1 == 2);
- ok($have, $expect);
- ok($have, $expect, $diagnostics);
This function is the reason for Test
's existence. It's
the basic function that
handles printing "ok
" or "not ok
", along with the
current test number. (That's what Test::Harness
wants to see.)
In its most basic usage, ok(...)
simply takes a single scalar
expression. If its value is true, the test passes; if false,
the test fails. Examples:
- # Examples of ok(scalar)
- ok( 1 + 1 == 2 ); # ok if 1 + 1 == 2
- ok( $foo =~ /bar/ ); # ok if $foo contains 'bar'
- ok( baz($x + $y) eq 'Armondo' ); # ok if baz($x + $y) returns
- # 'Armondo'
- ok( @a == @b ); # ok if @a and @b are the same length
The expression is evaluated in scalar context. So the following will work:
A special case is if the expression is a subroutine reference (in either
sub {...}
syntax or \&foo
syntax). In
that case, it is executed and its value (true or false) determines if
the test passes or fails. For example,
In its two-argument form, ok(arg1, arg2) compares the two
scalar values to see if they match. They match if both are undefined,
or if arg2 is a regex that matches arg1, or if they compare equal
with eq
.
- # Example of ok(scalar, scalar)
- ok( "this", "that" ); # not ok, 'this' ne 'that'
- ok( "", undef ); # not ok, "" is defined
The second argument is considered a regex if it is either a regex object or a string that looks like a regex. Regex objects are constructed with the qr// operator in recent versions of perl. A string is considered to look like a regex if its first and last characters are "/", or if the first character is "m" and its second and last characters are both the same non-alphanumeric non-whitespace character. These regexp
Regex examples:
- ok( 'JaffO', '/Jaff/' ); # ok, 'JaffO' =~ /Jaff/
- ok( 'JaffO', 'm|Jaff|' ); # ok, 'JaffO' =~ m|Jaff|
- ok( 'JaffO', qr/Jaff/ ); # ok, 'JaffO' =~ qr/Jaff/;
- ok( 'JaffO', '/(?i)jaff/ ); # ok, 'JaffO' =~ /jaff/i;
If either (or both!) is a subroutine reference, it is run and used as the value for comparing. For example:
The above test passes two values to ok(arg1, arg2)
-- the first
a coderef, and the second is the number 4. Before ok
compares them,
it calls the coderef, and uses its return value as the real value of
this parameter. Assuming that $bytecount
returns 4, ok
ends up
testing 4 eq 4
. Since that's true, this test passes.
Finally, you can append an optional third argument, in
ok(arg1,arg2, note), where note is a string value that
will be printed if the test fails. This should be some useful
information about the test, pertaining to why it failed, and/or
a description of the test. For example:
Unfortunately, a note cannot be used with the single argument
style of ok()
. That is, if you try ok(arg1, note), then
Test
will interpret this as ok(arg1, arg2), and probably
end up testing arg1 eq arg2 -- and that's not what you want!
All of the above special cases can occasionally cause some problems. See BUGS and CAVEATS.
skip(skip_if_true, args...)
This is used for tests that under some conditions can be skipped. It's basically equivalent to:
- if( $skip_if_true ) {
- ok(1);
- } else {
- ok( args... );
- }
...except that the ok(1)
emits not just "ok testnum" but
actually "ok testnum # skip_if_true_value".
The arguments after the skip_if_true are what is fed to ok(...)
if
this test isn't skipped.
Example usage:
- my $if_MSWin =
- $^O =~ m/MSWin/ ? 'Skip if under MSWin' : '';
- # A test to be skipped if under MSWin (i.e., run except under MSWin)
- skip($if_MSWin, thing($foo), thing($bar) );
Or, going the other way:
- my $unless_MSWin =
- $^O =~ m/MSWin/ ? '' : 'Skip unless under MSWin';
- # A test to be skipped unless under MSWin (i.e., run only under MSWin)
- skip($unless_MSWin, thing($foo), thing($bar) );
The tricky thing to remember is that the first parameter is true if
you want to skip the test, not run it; and it also doubles as a
note about why it's being skipped. So in the first codeblock above, read
the code as "skip if MSWin -- (otherwise) test whether thing($foo)
is
thing($bar)
" or for the second case, "skip unless MSWin...".
Also, when your skip_if_reason string is true, it really should (for backwards compatibility with older Test.pm versions) start with the string "Skip", as shown in the above examples.
Note that in the above cases, thing($foo)
and thing($bar)
are evaluated -- but as long as the skip_if_true
is true,
then we skip(...)
just tosses out their value (i.e., not
bothering to treat them like values to ok(...)
. But if
you need to not eval the arguments when skipping the
test, use
this format:
- skip( $unless_MSWin,
- sub {
- # This code returns true if the test passes.
- # (But it doesn't even get called if the test is skipped.)
- thing($foo) eq thing($bar)
- }
- );
or even this, which is basically equivalent:
That is, both are like this:
These tests are expected to succeed. Usually, most or all of your tests are in this category. If a normal test doesn't succeed, then that means that something is wrong.
The skip(...)
function is for tests that might or might not be
possible to run, depending
on the availability of platform-specific features. The first argument
should evaluate to true (think "yes, please skip") if the required
feature is not available. After the first argument, skip(...)
works
exactly the same way as ok(...)
does.
TODO tests are designed for maintaining an executable TODO list. These tests are expected to fail. If a TODO test does succeed, then the feature in question shouldn't be on the TODO list, now should it?
Packages should NOT be released with succeeding TODO tests. As soon as a TODO test starts working, it should be promoted to a normal test, and the newly working feature should be documented in the release notes or in the change log.
Although test failures should be enough, extra diagnostics can be
triggered at the end of a test run. onfail
is passed an array ref
of hash refs that describe each test failure. Each hash will contain
at least the following fields: package, repetition
, and
result
. (You shouldn't rely on any other fields being present.) If the test
had an expected value or a diagnostic (or "note") string, these will also be
included.
The optional onfail
hook might be used simply to print out the
version of your package and/or how to report problems. It might also
be used to generate extremely sophisticated diagnostics for a
particularly bizarre test failure. However it's not a panacea. Core
dumps or other unrecoverable errors prevent the onfail
hook from
running. (It is run inside an END
block.) Besides, onfail
is
probably over-kill in most cases. (Your test code should be simpler
than the code it is testing, yes?)
ok(...)
's special handing of strings which look like they might be
regexes can also cause unexpected behavior. An innocent:
- ok( $fileglob, '/path/to/some/*stuff/' );
will fail, since Test.pm considers the second argument to be a regex! The best bet is to use the one-argument form:
- ok( $fileglob eq '/path/to/some/*stuff/' );
ok(...)
's use of string eq
can sometimes cause odd problems
when comparing
numbers, especially if you're casting a string to a number:
- $foo = "1.0";
- ok( $foo, 1 ); # not ok, "1.0" ne 1
Your best bet is to use the single argument form:
- ok( $foo == 1 ); # ok "1.0" == 1
As you may have inferred from the above documentation and examples,
ok
's prototype is ($;$$) (and, incidentally, skip
's is
($;$$$)). This means, for example, that you can do ok @foo, @bar
to compare the size of the two arrays. But don't be fooled into
thinking that ok @foo, @bar
means a comparison of the contents of two
arrays -- you're comparing just the number of elements of each. It's
so easy to make that mistake in reading ok @foo, @bar
that you might
want to be very explicit about it, and instead write ok scalar(@foo),
scalar(@bar)
.
This almost definitely doesn't do what you expect:
- ok $thingy->can('some_method');
Why? Because can
returns a coderef to mean "yes it can (and the
method is this...)", and then ok
sees a coderef and thinks you're
passing a function that you want it to call and consider the truth of
the result of! I.e., just like:
- ok $thingy->can('some_method')->();
What you probably want instead is this:
- ok $thingy->can('some_method') && 1;
If the can
returns false, then that is passed to ok
. If it
returns true, then the larger expression $thingy->can('some_method') && 1
returns 1, which ok
sees as
a simple signal of success, as you would expect.
The syntax for skip
is about the only way it can be, but it's still
quite confusing. Just start with the above examples and you'll
be okay.
Moreover, users may expect this:
- skip $unless_mswin, foo($bar), baz($quux);
to not evaluate foo($bar)
and baz($quux)
when the test is being
skipped. But in reality, they are evaluated, but skip
just won't
bother comparing them if $unless_mswin
is true.
You could do this:
But that's not terribly pretty. You may find it simpler or clearer in the long run to just do things like this:
But be quite sure that ok
is called exactly as many times in the
first block as skip
is called in the second block.
If PERL_TEST_DIFF
environment variable is set, it will be used as a
command for comparing unexpected multiline results. If you have GNU
diff installed, you might want to set PERL_TEST_DIFF
to diff -u
.
If you don't have a suitable program, you might install the
Text::Diff
module and then set PERL_TEST_DIFF
to be perl
-MText::Diff -e 'print diff(@ARGV)'
. If PERL_TEST_DIFF
isn't set
but the Algorithm::Diff
module is available, then it will be used
to show the differences in multiline results.
A past developer of this module once said that it was no longer being actively developed. However, rumors of its demise were greatly exaggerated. Feedback and suggestions are quite welcome.
Be aware that the main value of this module is its simplicity. Note that there are already more ambitious modules out there, such as Test::More and Test::Unit.
Some earlier versions of this module had docs with some confusing
typos in the description of skip(...)
.
Test::Simple, Test::More, Devel::Cover
Test::Builder for building your own testing library.
Test::Unit is an interesting XUnit-style testing library.
Test::Inline and SelfTest let you embed tests in code.
Copyright (c) 1998-2000 Joshua Nathaniel Pritikin.
Copyright (c) 2001-2002 Michael G. Schwern.
Copyright (c) 2002-2004 Sean M. Burke.
Current maintainer: Jesse Vincent. <jesse@bestpractical.com>
This package is free software and is provided "as is" without express or implied warranty. It may be used, redistributed and/or modified under the same terms as Perl itself.
Thread - Manipulate threads in Perl (for old code only)
The Thread
module served as the frontend to the old-style thread model,
called 5005threads, that was introduced in release 5.005. That model was
deprecated, and has been removed in version 5.10.
For old code and interim backwards compatibility, the Thread
module has
been reworked to function as a frontend for the new interpreter threads
(ithreads) model. However, some previous functionality is not available.
Further, the data sharing models between the two thread models are completely
different, and anything to do with data sharing has to be thought differently.
With ithreads, you must explicitly share()
variables between the
threads.
You are strongly encouraged to migrate any existing threaded code to the new
model (i.e., use the threads
and threads::shared
modules) as soon as
possible.
In Perl 5.005, the thread model was that all data is implicitly shared, and shared access to data has to be explicitly synchronized. This model is called 5005threads.
In Perl 5.6, a new model was introduced in which all is was thread local and shared access to data has to be explicitly declared. This model is called ithreads, for "interpreter threads".
In Perl 5.6, the ithreads model was not available as a public API; only as an internal API that was available for extension writers, and to implement fork() emulation on Win32 platforms.
In Perl 5.8, the ithreads model became available through the threads
module, and the 5005threads model was deprecated.
In Perl 5.10, the 5005threads model was removed from the Perl interpreter.
The Thread
module provides multithreading support for Perl.
new
starts a new thread of execution in the referenced subroutine. The
optional list is passed as parameters to the subroutine. Execution
continues in both the subroutine and the code after the new
call.
Thread->new
returns a thread object representing the newly created
thread.
lock places a lock on a variable until the lock goes out of scope.
If the variable is locked by another thread, the lock call will
block until it's available. lock is recursive, so multiple calls
to lock are safe--the variable will remain locked until the
outermost lock on the variable goes out of scope.
Locks on variables only affect lock calls--they do not affect normal
access to a variable. (Locks on subs are different, and covered in a bit.)
If you really, really want locks to block access, then go ahead and tie
them to something and manage this yourself. This is done on purpose.
While managing access to variables is a good thing, Perl doesn't force
you out of its living room...
If a container object, such as a hash or array, is locked, all the
elements of that container are not locked. For example, if a thread
does a lock @a
, any other thread doing a lock($a[12]) won't
block.
Finally, lock will traverse up references exactly one level.
lock(\$a) is equivalent to lock($a), while lock(\\$a) is not.
async
creates a thread to execute the block immediately following
it. This block is treated as an anonymous sub, and so must have a
semi-colon after the closing brace. Like Thread->new
, async
returns a thread object.
The Thread->self
function returns a thread object that represents
the thread making the Thread->self
call.
Returns a list of all non-joined, non-detached Thread objects.
The cond_wait
function takes a locked variable as
a parameter, unlocks the variable, and blocks until another thread
does a cond_signal
or cond_broadcast
for that same locked
variable. The variable that cond_wait
blocked on is relocked
after the cond_wait
is satisfied. If there are multiple threads
cond_wait
ing on the same variable, all but one will reblock waiting
to reaquire the lock on the variable. (So if you're only using
cond_wait
for synchronization, give up the lock as soon as
possible.)
The cond_signal
function takes a locked variable as a parameter and
unblocks one thread that's cond_wait
ing on that variable. If more than
one thread is blocked in a cond_wait
on that variable, only one (and
which one is indeterminate) will be unblocked.
If there are no threads blocked in a cond_wait
on the variable,
the signal is discarded.
The cond_broadcast
function works similarly to cond_signal
.
cond_broadcast
, though, will unblock all the threads that are
blocked in a cond_wait
on the locked variable, rather than only
one.
The yield
function allows another thread to take control of the
CPU. The exact results are implementation-dependent.
join waits for a thread to end and returns any values the thread
exited with. join will block until the thread has ended, though
it won't block if the thread has already terminated.
If the thread being joined died, the error it died with will
be returned at this time. If you don't want the thread performing
the join to die as well, you should either wrap the join in
an eval or use the eval thread method instead of join.
detach
tells a thread that it is never going to be joined i.e.
that all traces of its existence can be removed once it stops running.
Errors in detached threads will not be visible anywhere - if you want
to catch them, you should use $SIG{__DIE__} or something like that.
equal
tests whether two thread objects represent the same thread and
returns true if they do.
The tid
method returns the tid of a thread. The tid is
a monotonically increasing integer assigned when a thread is
created. The main thread of a program will have a tid of zero,
while subsequent threads will have tids assigned starting with one.
The done
method returns true if the thread you're checking has
finished, and false otherwise.
The following were implemented with 5005threads, but are no longer available with ithreads.
With 5005threads, you could also lock a sub such that any calls to that sub
from another thread would block until the lock was released.
Also, subroutines could be declared with the :locked
attribute which would
serialize access to the subroutine, but allowed different threads
non-simultaneous access.
The eval method wrapped an eval around a join, and so waited for a
thread to exit, passing along any values the thread might have returned and
placing any errors into $@
.
The flags
method returned the flags for the thread - an integer value
corresponding to the internal flags for the thread.
UNIVERSAL - base class for ALL classes (blessed references)
- $is_io = $fd->isa("IO::Handle");
- $is_io = Class->isa("IO::Handle");
- $does_log = $obj->DOES("Logger");
- $does_log = Class->DOES("Logger");
- $sub = $obj->can("print");
- $sub = Class->can("print");
- $sub = eval { $ref->can("fandango") };
- $ver = $obj->VERSION;
- # but never do this!
- $is_io = UNIVERSAL::isa($fd, "IO::Handle");
- $sub = UNIVERSAL::can($obj, "print");
UNIVERSAL
is the base class from which all blessed references inherit.
See perlobj.
UNIVERSAL
provides the following methods:
$obj->isa( TYPE )
CLASS->isa( TYPE )
eval { VAL->isa( TYPE ) }
Where
TYPE
is a package name
$obj
is a blessed reference or a package name
CLASS
is a package name
VAL
is any of the above or an unblessed reference
When used as an instance or class method ($obj->isa( TYPE )
),
isa
returns true if $obj is blessed into package TYPE
or
inherits from package TYPE
.
When used as a class method (CLASS->isa( TYPE )
, sometimes
referred to as a static method), isa
returns true if CLASS
inherits from (or is itself) the name of the package TYPE
or
inherits from package TYPE
.
If you're not sure what you have (the VAL
case), wrap the method call in an
eval block to catch the exception if VAL
is undefined.
If you want to be sure that you're calling isa
as a method, not a class,
check the invocand with blessed
from Scalar::Util first:
$obj->DOES( ROLE )
CLASS->DOES( ROLE )
DOES
checks if the object or class performs the role ROLE
. A role is a
named group of specific behavior (often methods of particular names and
signatures), similar to a class, but not necessarily a complete class by
itself. For example, logging or serialization may be roles.
DOES
and isa
are similar, in that if either is true, you know that the
object or class on which you call the method can perform specific behavior.
However, DOES
is different from isa
in that it does not care how the
invocand performs the operations, merely that it does. (isa
of course
mandates an inheritance relationship. Other relationships include aggregation,
delegation, and mocking.)
By default, classes in Perl only perform the UNIVERSAL
role, as well as the
role of all classes in their inheritance. In other words, by default DOES
responds identically to isa
.
There is a relationship between roles and classes, as each class implies the
existence of a role of the same name. There is also a relationship between
inheritance and roles, in that a subclass that inherits from an ancestor class
implicitly performs any roles its parent performs. Thus you can use DOES
in
place of isa
safely, as it will return true in all places where isa
will
return true (provided that any overridden DOES
and isa
methods behave
appropriately).
$obj->can( METHOD )
CLASS->can( METHOD )
eval { VAL->can( METHOD ) }
can
checks if the object or class has a method called METHOD
. If it does,
then it returns a reference to the sub. If it does not, then it returns
undef. This includes methods inherited or imported by $obj
, CLASS
, or
VAL
.
can
cannot know whether an object will be able to provide a method through
AUTOLOAD (unless the object's class has overridden can
appropriately), so a
return value of undef does not necessarily mean the object will not be able
to handle the method call. To get around this some module authors use a forward
declaration (see perlsub) for methods they will handle via AUTOLOAD. For
such 'dummy' subs, can
will still return a code reference, which, when
called, will fall through to the AUTOLOAD. If no suitable AUTOLOAD is provided,
calling the coderef will cause an error.
You may call can
as a class (static) method or an object method.
Again, the same rule about having a valid invocand applies -- use an eval
block or blessed
if you need to be extra paranoid.
VERSION ( [ REQUIRE ] )
VERSION
will return the value of the variable $VERSION
in the
package the object is blessed into. If REQUIRE
is given then
it will do a comparison and die if the package version is not
greater than or equal to REQUIRE
, or if either $VERSION
or REQUIRE
is not a "lax" version number (as defined by the version module).
The return from VERSION
will actually be the stringified version object
using the package $VERSION
scalar, which is guaranteed to be equivalent
but may not be precisely the contents of the $VERSION
scalar. If you want
the actual contents of $VERSION
, use $CLASS::VERSION
instead.
VERSION
can be called as either a class (static) method or an object
method.
NOTE: can
directly uses Perl's internal code for method lookup, and
isa
uses a very similar method and cache-ing strategy. This may cause
strange effects if the Perl code dynamically changes @ISA in any package.
You may add other methods to the UNIVERSAL class via Perl or XS code.
You do not need to use UNIVERSAL
to make these methods
available to your program (and you should not do so).
None by default.
You may request the import of three functions (isa
, can
, and VERSION
),
but this feature is deprecated and will be removed. Please don't do this in
new code.
For example, previous versions of this documentation suggested using isa
as
a function to determine the type of a reference:
- use UNIVERSAL 'isa';
- $yes = isa $h, "HASH";
- $yes = isa "Foo", "Bar";
The problem is that this code will never call an overridden isa
method in
any class. Instead, use reftype
from Scalar::Util for the first case:
- use Scalar::Util 'reftype';
- $yes = reftype( $h ) eq "HASH";
and the method form of isa
for the second:
- $yes = Foo->isa("Bar");
XSLoader - Dynamically load C libraries into Perl code
Version 0.16
- package YourPackage;
- require XSLoader;
- XSLoader::load();
This module defines a standard simplified interface to the dynamic linking mechanisms available on many platforms. Its primary purpose is to implement cheap automatic dynamic loading of Perl modules.
For a more complicated interface, see DynaLoader. Many (most)
features of DynaLoader
are not implemented in XSLoader
, like for
example the dl_load_flags
, not honored by XSLoader
.
DynaLoader
A typical module using DynaLoader starts like this:
Change this to
In other words: replace require DynaLoader
by use XSLoader
, remove
DynaLoader
from @ISA
, change bootstrap
by XSLoader::load
. Do not
forget to quote the name of your package on the XSLoader::load
line,
and add comma (,
) before the arguments ($VERSION
above).
Of course, if @ISA
contained only DynaLoader
, there is no need to have
the @ISA
assignment at all; moreover, if instead of our one uses the
more backward-compatible
- use vars qw($VERSION @ISA);
one can remove this reference to @ISA
together with the @ISA
assignment.
If no $VERSION
was specified on the bootstrap
line, the last line becomes
- XSLoader::load 'YourPackage';
If the call to load
is from YourPackage
, then that can be further
simplified to
- XSLoader::load();
as load
will use caller to determine the package.
If you want to have your cake and eat it too, you need a more complicated boilerplate.
The parentheses about XSLoader::load()
arguments are needed since we replaced
use XSLoader
by require, so the compiler does not know that a function
XSLoader::load()
is present.
This boilerplate uses the low-overhead XSLoader
if present; if used with
an antique Perl which has no XSLoader
, it falls back to using DynaLoader
.
Skip this section if the XSUB functions are supposed to be called from other
modules only; read it only if you call your XSUBs from the code in your module,
or have a BOOT:
section in your XS file (see The BOOT: Keyword in perlxs).
What is described here is equally applicable to the DynaLoader
interface.
A sufficiently complicated module using XS would have both Perl code (defined in YourPackage.pm) and XS code (defined in YourPackage.xs). If this Perl code makes calls into this XS code, and/or this XS code makes calls to the Perl code, one should be careful with the order of initialization.
The call to XSLoader::load()
(or bootstrap()
) calls the module's
bootstrap code. For modules build by xsubpp (nearly all modules) this
has three side effects:
A sanity check is done to ensure that the versions of the .pm and the
(compiled) .xs parts are compatible. If $VERSION
was specified, this
is used for the check. If not specified, it defaults to
$XS_VERSION // $VERSION
(in the module's namespace)
the XSUBs are made accessible from Perl
if a BOOT:
section was present in the .xs file, the code there is called.
Consequently, if the code in the .pm file makes calls to these XSUBs, it is
convenient to have XSUBs installed before the Perl code is defined; for
example, this makes prototypes for XSUBs visible to this Perl code.
Alternatively, if the BOOT:
section makes calls to Perl functions (or
uses Perl variables) defined in the .pm file, they must be defined prior to
the call to XSLoader::load()
(or bootstrap()
).
The first situation being much more frequent, it makes sense to rewrite the boilerplate as
If the interdependence of your BOOT:
section and Perl code is
more complicated than this (e.g., the BOOT:
section makes calls to Perl
functions which make calls to XSUBs with prototypes), get rid of the BOOT:
section altogether. Replace it with a function onBOOT()
, and call it like
this:
- package YourPackage;
- use XSLoader;
- use vars qw($VERSION @ISA);
- BEGIN {
- @ISA = qw( OnePackage OtherPackage );
- $VERSION = '0.01';
- XSLoader::load 'YourPackage', $VERSION;
- }
- # Put Perl code used in onBOOT() function here; calls to XSUBs are
- # prototype-checked.
- onBOOT;
- # Put Perl initialization code assuming that XS is initialized here
Can't find '%s' symbol in %s
(F) The bootstrap symbol could not be found in the extension module.
Can't load '%s' for module %s: %s
(F) The loading or initialisation of the extension module failed. The detailed error follows.
Undefined symbols present after loading %s: %s
(W) As the message says, some symbols stay undefined although the extension module was correctly loaded and initialised. The list of undefined symbols follows.
To reduce the overhead as much as possible, only one possible location
is checked to find the extension DLL (this location is where make install
would put the DLL). If not found, the search for the DLL is transparently
delegated to DynaLoader
, which looks for the DLL along the @INC
list.
In particular, this is applicable to the structure of @INC
used for testing
not-yet-installed extensions. This means that running uninstalled extensions
may have much more overhead than running the same extensions after
make install
.
The new simpler way to call XSLoader::load()
with no arguments at all
does not work on Perl 5.8.4 and 5.8.5.
Please report any bugs or feature requests via the perlbug(1) utility.
Ilya Zakharevich originally extracted XSLoader
from DynaLoader
.
CPAN version is currently maintained by Sébastien Aperghis-Tramoni <sebastien@aperghis.net>.
Previous maintainer was Michael G Schwern <schwern@pobox.com>.
Copyright (C) 1990-2011 by Larry Wall and others.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
a2p - Awk to Perl translator
a2p [options] [filename]
A2p takes an awk script specified on the command line (or from standard input) and produces a comparable perl script on the standard output.
Options include:
sets debugging flags.
tells a2p that this awk script is always invoked with this -F switch.
specifies the names of the input fields if input does not have to be split into an array. If you were translating an awk script that processes the password file, you might say:
- a2p -7 -nlogin.password.uid.gid.gcos.shell.home
Any delimiter can be used to separate the field names.
causes a2p to assume that input will always have that many fields.
tells a2p to use old awk behavior. The only current differences are:
A2p cannot do as good a job translating as a human would, but it usually does pretty well. There are some areas where you may want to examine the perl script produced and tweak it some. Here are some of them, in no particular order.
There is an awk idiom of putting int() around a string expression to force numeric interpretation, even though the argument is always integer anyway. This is generally unneeded in perl, but a2p can't tell if the argument is always going to be integer, so it leaves it in. You may wish to remove it.
Perl differentiates numeric comparison from string comparison. Awk
has one operator for both that decides at run time which comparison to
do. A2p does not try to do a complete job of awk emulation at this
point. Instead it guesses which one you want. It's almost always
right, but it can be spoofed. All such guesses are marked with the
comment "#???
". You should go through and check them. You might
want to run at least once with the -w switch to perl, which will
warn you if you use == where you should have used eq.
Perl does not attempt to emulate the behavior of awk in which nonexistent array elements spring into existence simply by being referenced. If somehow you are relying on this mechanism to create null entries for a subsequent for...in, they won't be there in perl.
If a2p makes a split line that assigns to a list of variables that looks like (Fld1, Fld2, Fld3...) you may want to rerun a2p using the -n option mentioned above. This will let you name the fields throughout the script. If it splits to an array instead, the script is probably referring to the number of fields somewhere.
The exit statement in awk doesn't necessarily exit; it goes to the END block if there is one. Awk scripts that do contortions within the END block to bypass the block under such circumstances can be simplified by removing the conditional in the END block and just exiting directly from the perl script.
Perl has two kinds of array, numerically-indexed and associative. Perl associative arrays are called "hashes". Awk arrays are usually translated to hashes, but if you happen to know that the index is always going to be numeric you could change the {...} to [...]. Iteration over a hash is done using the keys() function, but iteration over an array is NOT. You might need to modify any loop that iterates over such an array.
Awk starts by assuming OFMT has the value %.6g. Perl starts by assuming its equivalent, $#, to have the value %.20g. You'll want to set $# explicitly if you use the default value of OFMT.
Near the top of the line loop will be the split operation that is implicit in the awk script. There are times when you can move this down past some conditionals that test the entire record so that the split is not done as often.
For aesthetic reasons you may wish to change index variables from being 1-based (awk style) to 0-based (Perl style). Be sure to change all operations the variable is involved in to match.
Cute comments that say "# Here is a workaround because awk is dumb" are passed through unmodified.
Awk scripts are often embedded in a shell script that pipes stuff into and out of awk. Often the shell script wrapper can be incorporated into the perl script, since perl can start up pipes into and out of itself, and can do other things that awk can't do by itself.
Scripts that refer to the special variables RSTART and RLENGTH can often be simplified by referring to the variables $`, $& and $', as long as they are within the scope of the pattern match that sets them.
The produced perl script may have subroutines defined to deal with awk's semantics regarding getline and print. Since a2p usually picks correctness over efficiency. it is almost always possible to rewrite such code to be more efficient by discarding the semantic sugar.
For efficiency, you may wish to remove the keyword from any return statement that is the last statement executed in a subroutine. A2p catches the most common case, but doesn't analyze embedded blocks for subtler cases.
ARGV[0] translates to $ARGV0, but ARGV[n] translates to $ARGV[$n-1]. A loop that tries to iterate over ARGV[0] won't find it.
A2p uses no environment variables.
Larry Wall <larry@wall.org>
- perl The perl compiler/interpreter
- s2p sed to perl translator
It would be possible to emulate awk's behavior in selecting string versus numeric operations at run time by inspection of the operands, but it would be gross and inefficient. Besides, a2p almost always guesses right.
Storage for the awk syntax tree is currently static, and can run out.
attributes - get/set subroutine or variable attributes
Subroutine declarations and definitions may optionally have attribute lists
associated with them. (Variable my declarations also may, but see the
warning below.) Perl handles these declarations by passing some information
about the call site and the thing being declared along with the attribute
list to this module. In particular, the first example above is equivalent to
the following:
- use attributes __PACKAGE__, \&foo, 'method';
The second example in the synopsis does something equivalent to this:
Yes, that's a lot of expansion.
WARNING: attribute declarations for variables are still evolving. The semantics and interfaces of such declarations could change in future versions. They are present for purposes of experimentation with what the semantics ought to be. Do not rely on the current implementation of this feature.
There are only a few attributes currently handled by Perl itself (or directly by this module, depending on how you look at it.) However, package-specific attributes are allowed by an extension mechanism. (See Package-specific Attribute Handling below.)
The setting of subroutine attributes happens at compile time.
Variable attributes in our declarations are also applied at compile time.
However, my variables get their attributes applied at run-time.
This means that you have to reach the run-time component of the my
before those attributes will get applied. For example:
- my $x : Bent = 42 if 0;
will neither assign 42 to $x nor will it apply the Bent
attribute
to the variable.
An attempt to set an unrecognized attribute is a fatal error. (The
error is trappable, but it still stops the compilation within that
eval.) Setting an attribute with a name that's all lowercase
letters that's not a built-in attribute (such as "foo") will result in
a warning with -w or use warnings 'reserved'
.
import doesIn the description it is mentioned that
is equivalent to
- use attributes __PACKAGE__, \&foo, 'method';
As you might know this calls the import function of attributes
at compile
time with these parameters: 'attributes', the caller's package name, the reference
to the code and 'method'.
- attributes->import( __PACKAGE__, \&foo, 'method' );
So you want to know what import actually does?
First of all import gets the type of the third parameter ('CODE' in this case).
attributes.pm
checks if there is a subroutine called MODIFY_<reftype>_ATTRIBUTES
in the caller's namespace (here: 'main'). In this case a
subroutine MODIFY_CODE_ATTRIBUTES
is required. Then this
method is called to check if you have used a "bad attribute".
The subroutine call in this example would look like
- MODIFY_CODE_ATTRIBUTES( 'main', \&foo, 'method' );
MODIFY_<reftype>_ATTRIBUTES has to return a list of all "bad attributes".
If there are any bad attributes import croaks.
(See Package-specific Attribute Handling below.)
The following are the built-in attributes for subroutines:
Indicates that the referenced subroutine is a valid lvalue and can be assigned to. The subroutine must return a modifiable value such as a scalar variable, as described in perlsub.
This module allows one to set this attribute on a subroutine that is already defined. For Perl subroutines (XSUBs are fine), it may or may not do what you want, depending on the code inside the subroutine, with details subject to change in future Perl versions. You may run into problems with lvalue context not being propagated properly into the subroutine, or maybe even assertion failures. For this reason, a warning is emitted if warnings are enabled. In other words, you should only do this if you really know what you are doing. You have been warned.
Indicates that the referenced subroutine is a method. A subroutine so marked will not trigger the "Ambiguous call resolved as CORE::%s" warning.
The "locked" attribute is deprecated, and has no effect in 5.10.0 and later. It was used as part of the now-removed "Perl 5.005 threads".
The following are the built-in attributes for variables:
Indicates that the referenced variable can be shared across different threads when used in conjunction with the threads and threads::shared modules.
The "unique" attribute is deprecated, and has no effect in 5.10.0 and later.
It used to indicate that a single copy of an our variable was to be used by
all interpreters should the program happen to be running in a
multi-interpreter environment.
The following subroutines are available for general use once this module has been loaded:
This routine expects a single parameter--a reference to a
subroutine or variable. It returns a list of attributes, which may be
empty. If passed invalid arguments, it uses die() (via Carp::croak)
to raise a fatal exception. If it can find an appropriate package name
for a class method lookup, it will include the results from a
FETCH_type_ATTRIBUTES call in its return list, as described in
Package-specific Attribute Handling below.
Otherwise, only built-in attributes will be returned.
This routine expects a single parameter--a reference to a subroutine or variable. It returns the built-in type of the referenced variable, ignoring any package into which it might have been blessed. This can be useful for determining the type value which forms part of the method names described in Package-specific Attribute Handling below.
Note that these routines are not exported by default.
WARNING: the mechanisms described here are still experimental. Do not rely on the current implementation. In particular, there is no provision for applying package attributes to 'cloned' copies of subroutines used as closures. (See Making References in perlref for information on closures.) Package-specific attribute handling may change incompatibly in a future release.
When an attribute list is present in a declaration, a check is made to see
whether an attribute 'modify' handler is present in the appropriate package
(or its @ISA inheritance tree). Similarly, when attributes::get
is
called on a valid reference, a check is made for an appropriate attribute
'fetch' handler. See EXAMPLES to see how the "appropriate package"
determination works.
The handler names are based on the underlying type of the variable being declared or of the reference passed. Because these attributes are associated with subroutine or variable declarations, this deliberately ignores any possibility of being blessed into some package. Thus, a subroutine declaration uses "CODE" as its type, and even a blessed hash reference uses "HASH" as its type.
The class methods invoked for modifying and fetching are these:
This method is called with two arguments: the relevant package name, and a reference to a variable or subroutine for which package-defined attributes are desired. The expected return value is a list of associated attributes. This list may be empty.
This method is called with two fixed arguments, followed by the list of attributes from the relevant declaration. The two fixed arguments are the relevant package name and a reference to the declared subroutine or variable. The expected return value is a list of attributes which were not recognized by this handler. Note that this allows for a derived class to delegate a call to its base class, and then only examine the attributes which the base class didn't already handle for it.
The call to this method is currently made during the processing of the declaration. In particular, this means that a subroutine reference will probably be for an undefined subroutine, even if this declaration is actually part of the definition.
Calling attributes::get()
from within the scope of a null package
declaration package ;
for an unblessed variable reference will
not provide any starting package name for the 'fetch' method lookup.
Thus, this circumstance will not result in a method call for package-defined
attributes. A named subroutine knows to which symbol table entry it belongs
(or originally belonged), and it will use the corresponding package.
An anonymous subroutine knows the package name into which it was compiled
(unless it was also compiled with a null package declaration), and so it
will use that package name.
An attribute list is a sequence of attribute specifications, separated by
whitespace or a colon (with optional whitespace).
Each attribute specification is a simple
name, optionally followed by a parenthesised parameter list.
If such a parameter list is present, it is scanned past as for the rules
for the q() operator. (See Quote and Quote-like Operators in perlop.)
The parameter list is passed as it was found, however, and not as per q().
Some examples of syntactically valid attribute lists:
- switch(10,foo(7,3)) : expensive
- Ugly('\(") :Bad
- _5x5
- lvalue method
Some examples of syntactically invalid attribute lists (with annotation):
- switch(10,foo() # ()-string not balanced
- Ugly('(') # ()-string not balanced
- 5x5 # "5x5" not a valid identifier
- Y2::north # "Y2::north" not a simple identifier
- foo + bar # "+" neither a colon nor whitespace
None.
The routines get
and reftype
are exportable.
The :ALL
tag will get all of the above exports.
Here are some samples of syntactically valid declarations, with annotation
as to how they resolve internally into use attributes
invocations by
perl. These examples are primarily useful to see how the "appropriate
package" is found for the possible method lookups for package-defined
attributes.
Code:
- package Canine;
- package Dog;
- my Canine $spot : Watchful ;
Effect:
- use attributes ();
- attributes::->import(Canine => \$spot, "Watchful");
Code:
- package Felis;
- my $cat : Nervous;
Effect:
- use attributes ();
- attributes::->import(Felis => \$cat, "Nervous");
Code:
Effect:
- use attributes X => \&foo, "lvalue";
Code:
Effect:
- use attributes Y => \&Y::x, "lvalue";
Code:
Effect:
- use attributes X => \&X::foo, "lvalue";
This last example is purely for purposes of completeness. You should not be trying to mess with the attributes of something in a package that's not your own.
This example runs. At compile time
MODIFY_CODE_ATTRIBUTES
is called. In that
subroutine, we check if any attribute is disallowed and we return a list of
these "bad attributes".
As we return an empty list, everything is fine.
This example is aborted at compile time as we use the attribute "Test" which
isn't allowed. MODIFY_CODE_ATTRIBUTES
returns a list that contains a single
element ('Test').
Private Variables via my() in perlsub and Subroutine Attributes in perlsub for details on the basic declarations; use for details on the normal invocation mechanism.
autodie - Replace functions with ones that succeed or die with lexical scope
- use autodie; # Recommended: implies 'use autodie qw(:default)'
- use autodie qw(:all); # Recommended more: defaults and system/exec.
- use autodie qw(open close); # open/close succeed or die
- open(my $fh, "<", $filename); # No need to check!
- {
- no autodie qw(open); # open failures won't die
- open(my $fh, "<", $filename); # Could fail silently!
- no autodie; # disable all autodies
- }
- bIlujDI' yIchegh()Qo'; yIHegh()!
- It is better to die() than to return() in failure.
- -- Klingon programming proverb.
The autodie
pragma provides a convenient way to replace functions
that normally return false on failure with equivalents that throw
an exception on failure.
The autodie
pragma has lexical scope, meaning that functions
and subroutines altered with autodie
will only change their behaviour
until the end of the enclosing block, file, or eval.
If system is specified as an argument to autodie
, then it
uses IPC::System::Simple to do the heavy lifting. See the
description of that module for more information.
Exceptions produced by the autodie
pragma are members of the
autodie::exception class. The preferred way to work with
these exceptions under Perl 5.10 is as follows:
- use feature qw(switch);
- eval {
- use autodie;
- open(my $fh, '<', $some_file);
- my @records = <$fh>;
- # Do things with @records...
- close($fh);
- };
- given ($@) {
- when (undef) { say "No error"; }
- when ('open') { say "Error from open"; }
- when (':io') { say "Non-open, IO error."; }
- when (':all') { say "All other autodie errors." }
- default { say "Not an autodie error at all." }
- }
Under Perl 5.8, the given/when structure is not available, so the
following structure may be used:
- eval {
- use autodie;
- open(my $fh, '<', $some_file);
- my @records = <$fh>;
- # Do things with @records...
- close($fh);
- };
- if ($@ and $@->isa('autodie::exception')) {
- if ($@->matches('open')) { print "Error from open\n"; }
- if ($@->matches(':io' )) { print "Non-open, IO error."; }
- } elsif ($@) {
- # A non-autodie exception.
- }
See autodie::exception for further information on interrogating exceptions.
Autodie uses a simple set of categories to group together similar
built-ins. Requesting a category type (starting with a colon) will
enable autodie for all built-ins beneath that category. For example,
requesting :file
will enable autodie for close, fcntl,
fileno, open and sysopen.
The categories are currently:
- :all
- :default
- :io
- read
- seek
- sysread
- sysseek
- syswrite
- :dbm
- dbmclose
- dbmopen
- :file
- binmode
- close
- fcntl
- fileno
- flock
- ioctl
- open
- sysopen
- truncate
- :filesys
- chdir
- closedir
- opendir
- link
- mkdir
- readlink
- rename
- rmdir
- symlink
- unlink
- :ipc
- pipe
- :msg
- msgctl
- msgget
- msgrcv
- msgsnd
- :semaphore
- semctl
- semget
- semop
- :shm
- shmctl
- shmget
- shmread
- :socket
- accept
- bind
- connect
- getsockopt
- listen
- recv
- send
- setsockopt
- shutdown
- socketpair
- :threads
- fork
- :system
- system
- exec
Note that while the above category system is presently a strict hierarchy, this should not be assumed.
A plain use autodie
implies use autodie qw(:default)
. Note that
system and exec are not enabled by default. system requires
the optional IPC::System::Simple module to be installed, and enabling
system or exec will invalidate their exotic forms. See BUGS
below for more details.
The syntax:
- use autodie qw(:1.994);
allows the :default
list from a particular version to be used. This
provides the convenience of using the default methods, but the surety
that no behavorial changes will occur if the autodie
module is
upgraded.
autodie
can be enabled for all of Perl's built-ins, including
system and exec with:
- use autodie qw(:all);
It is not considered an error for flock to return false if it fails
due to an EWOULDBLOCK
(or equivalent) condition. This means one can
still use the common convention of testing the return value of
flock when called with the LOCK_NB
option:
Autodying flock will generate an exception if flock returns
false with any other error.
The system built-in is considered to have failed in the following
circumstances:
The command does not start.
The command is killed by a signal.
The command returns a non-zero exit value (but see below).
On success, the autodying form of system returns the exit value
rather than the contents of $?
.
Additional allowable exit values can be supplied as an optional first
argument to autodying system:
- system( [ 0, 1, 2 ], $cmd, @args); # 0,1,2 are good exit values
autodie
uses the IPC::System::Simple module to change system.
See its documentation for further information.
Applying autodie
to system or exec causes the exotic
forms system { $cmd } @args
or exec { $cmd } @args
to be considered a syntax error until the end of the lexical scope.
If you really need to use the exotic form, you can call CORE::system
or CORE::exec
instead, or use no autodie qw(system exec)
before
calling the exotic form.
Functions called in list context are assumed to have failed if they return an empty list, or a list consisting only of a single undef element.
The :void
option is supported in Fatal, but not
autodie
. To workaround this, autodie
may be explicitly disabled until
the end of the current block with no autodie
.
To disable autodie for only a single function (eg, open)
use no autodie qw(open)
.
autodie
performs no checking of called context to determine whether to throw
an exception; the explicitness of error handling with autodie
is a deliberate
feature.
You've insisted on hints for user-subroutines, either by pre-pending
a !
to the subroutine name itself, or earlier in the list of arguments
to autodie
. However the subroutine in question does not have
any hints available.
See also DIAGNOSTICS in Fatal.
"Used only once" warnings can be generated when autodie
or Fatal
is used with package filehandles (eg, FILE
). Scalar filehandles are
strongly recommended instead.
When using autodie
or Fatal
with user subroutines, the
declaration of those subroutines must appear before the first use of
Fatal
or autodie
, or have been exported from a module.
Attempting to use Fatal
or autodie
on other user subroutines will
result in a compile-time error.
Due to a bug in Perl, autodie
may "lose" any format which has the
same name as an autodying built-in or function.
autodie
may not work correctly if used inside a file with a
name that looks like a string eval, such as eval (3).
Due to the current implementation of autodie
, unexpected results
may be seen when used near or with the string version of eval.
None of these bugs exist when using block eval.
Under Perl 5.8 only, autodie
does not propagate into string eval
statements, although it can be explicitly enabled inside a string
eval.
Under Perl 5.10 only, using a string eval when autodie
is in
effect can cause the autodie behaviour to leak into the surrounding
scope. This can be worked around by using a no autodie
at the
end of the scope to explicitly remove autodie's effects, or by
avoiding the use of string eval.
None of these bugs exist when using block eval. The use of
autodie
with block eval is considered good practice.
Please report bugs via the CPAN Request Tracker at http://rt.cpan.org/NoAuth/Bugs.html?Dist=autodie.
If you find this module useful, please consider rating it on the CPAN Ratings service at http://cpanratings.perl.org/rate?distribution=autodie .
The module author loves to hear how autodie
has made your life
better (or worse). Feedback can be sent to
<pjf@perltraining.com.au>.
Copyright 2008-2009, Paul Fenwick <pjf@perltraining.com.au>
This module is free software. You may distribute it under the same terms as Perl itself.
Fatal, autodie::exception, autodie::hints, IPC::System::Simple
Perl tips, autodie at http://perltraining.com.au/tips/2008-08-20.html
Mark Reed and Roland Giersig -- Klingon translators.
See the AUTHORS file for full credits. The latest version of this file can be found at http://github.com/pfenwick/autodie/tree/master/AUTHORS .
autouse - postpone load of modules until a function is used
- use autouse 'Carp' => qw(carp croak);
- carp "this carp was predeclared and autoused ";
If the module Module
is already loaded, then the declaration
- use autouse 'Module' => qw(func1 func2($;$));
is equivalent to
- use Module qw(func1 func2);
if Module
defines func2() with prototype ($;$), and func1() has
no prototypes. (At least if Module
uses Exporter
's import,
otherwise it is a fatal error.)
If the module Module
is not loaded yet, then the above declaration
declares functions func1() and func2() in the current package. When
these functions are called, they load the package Module
if needed,
and substitute themselves with the correct definitions.
Using autouse
will move important steps of your program's execution
from compile time to runtime. This can
Break the execution of your program if the module you autouse
d has
some initialization which it expects to be done early.
hide bugs in your code since important checks (like correctness of
prototypes) is moved from compile time to runtime. In particular, if
the prototype you specified on autouse
line is wrong, you will not
find it out until the corresponding function is executed. This will be
very unfortunate for functions which are not always called (note that
for such functions autouse
ing gives biggest win, for a workaround
see below).
To alleviate the second problem (partially) it is advised to write your scripts like this:
The first line ensures that the errors in your argument specification are found early. When you ship your application you should comment out the first line, since it makes the second one useless.
Ilya Zakharevich (ilya@math.ohio-state.edu)
perl(1).
base - Establish an ISA relationship with base classes at compile time
- package Baz;
- use base qw(Foo Bar);
Unless you are using the fields
pragma, consider this module discouraged
in favor of the lighter-weight parent
.
Allows you to both load one or more modules, while setting up inheritance from those modules at the same time. Roughly similar in effect to
When base
tries to require a module, it will not die if it cannot find
the module's file, but will die on any other error. After all this, should
your base class be empty, containing no symbols, base
will die. This is
useful for inheriting from classes in the same file as yourself but where
the filename does not match the base module name, like so:
- # in Bar.pm
- package Foo;
- sub exclaim { "I can have such a thing?!" }
- package Bar;
- use base "Foo";
There is no Foo.pm, but because Foo
defines a symbol (the exclaim
subroutine), base
will not die when the require fails to load Foo.pm.
base
will also initialize the fields if one of the base classes has it.
Multiple inheritance of fields is NOT supported, if two or more base classes
each have inheritable fields the 'base' pragma will croak. See fields
for a description of this feature.
The base class' import method is not called.
base.pm was unable to require the base package, because it was not found in your path.
Attempting to inherit from yourself generates a warning.
- package Foo;
- use base 'Foo';
This module was introduced with Perl 5.004_04.
Due to the limitations of the implementation, you must use base before you declare any of your own fields.
bigint - Transparent BigInteger support for Perl
- use bigint;
- $x = 2 + 4.5,"\n"; # BigInt 6
- print 2 ** 512,"\n"; # really is what you think it is
- print inf + 42,"\n"; # inf
- print NaN * 7,"\n"; # NaN
- print hex("0x1234567890123490"),"\n"; # Perl v5.10.0 or later
- {
- no bigint;
- print 2 ** 256,"\n"; # a normal Perl scalar now
- }
- # Import into current package:
- use bigint qw/hex oct/;
- print hex("0x1234567890123490"),"\n";
- print oct("01234567890123490"),"\n";
All operators (including basic math operations) except the range operator ..
are overloaded. Integer constants are created as proper BigInts.
Floating point constants are truncated to integer. All parts and results of expressions are also truncated.
Unlike integer, this pragma creates integer constants that are only limited in their size by the available memory and CPU time.
There is one small difference between use integer
and use bigint
: the
former will not affect assignments to variables and the return value of
some functions. bigint
truncates these results to integer too:
- # perl -Minteger -wle 'print 3.2'
- 3.2
- # perl -Minteger -wle 'print 3.2 + 0'
- 3
- # perl -Mbigint -wle 'print 3.2'
- 3
- # perl -Mbigint -wle 'print 3.2 + 0'
- 3
- # perl -Mbigint -wle 'print exp(1) + 0'
- 2
- # perl -Mbigint -wle 'print exp(1)'
- 2
- # perl -Minteger -wle 'print exp(1)'
- 2.71828182845905
- # perl -Minteger -wle 'print exp(1) + 0'
- 2
In practice this makes seldom a difference as parts and results of expressions will be truncated anyway, but this can, for instance, affect the return value of subroutines:
bigint recognizes some options that can be passed while loading it via use. The options can (currently) be either a single letter form, or the long form. The following options exist:
This sets the accuracy for all math operations. The argument must be greater than or equal to zero. See Math::BigInt's bround() function for details.
- perl -Mbigint=a,2 -le 'print 12345+1'
Note that setting precision and accuracy at the same time is not possible.
This sets the precision for all math operations. The argument can be any integer. Negative values mean a fixed number of digits after the dot, and are <B>ignored</B> since all operations happen in integer space. A positive value rounds to this digit left from the dot. 0 or 1 mean round to integer and are ignore like negative values.
See Math::BigInt's bfround() function for details.
- perl -Mbignum=p,5 -le 'print 123456789+123'
Note that setting precision and accuracy at the same time is not possible.
This enables a trace mode and is primarily for debugging bigint or Math::BigInt.
Override the built-in hex() method with a version that can handle big integers. This overrides it by exporting it to the current package. Under Perl v5.10.0 and higher, this is not so necessary, as hex() is lexically overridden in the current scope whenever the bigint pragma is active.
Override the built-in oct() method with a version that can handle big integers. This overrides it by exporting it to the current package. Under Perl v5.10.0 and higher, this is not so necessary, as oct() is lexically overridden in the current scope whenever the bigint pragma is active.
Load a different math lib, see Math Library.
- perl -Mbigint=lib,GMP -e 'print 2 ** 512'
- perl -Mbigint=try,GMP -e 'print 2 ** 512'
- perl -Mbigint=only,GMP -e 'print 2 ** 512'
Currently there is no way to specify more than one library on the command line. This means the following does not work:
- perl -Mbignum=l,GMP,Pari -e 'print 2 ** 512'
This will be hopefully fixed soon ;)
This prints out the name and version of all modules used and then exits.
- perl -Mbigint=v
Math with the numbers is done (by default) by a module called Math::BigInt::Calc. This is equivalent to saying:
- use bigint lib => 'Calc';
You can change this by using:
- use bignum lib => 'GMP';
The following would first try to find Math::BigInt::Foo, then Math::BigInt::Bar, and when this also fails, revert to Math::BigInt::Calc:
- use bigint lib => 'Foo,Math::BigInt::Bar';
Using lib
warns if none of the specified libraries can be found and
Math::BigInt did fall back to one of the default libraries.
To suppress this warning, use try
instead:
- use bignum try => 'GMP';
If you want the code to die instead of falling back, use only
instead:
- use bignum only => 'GMP';
Please see respective module documentation for further details.
The numbers are stored as objects, and their internals might change at anytime, especially between math operations. The objects also might belong to different classes, like Math::BigInt, or Math::BigInt::Lite. Mixing them together, even with normal scalars is not extraordinary, but normal and expected.
You should not depend on the internal format, all accesses must go through accessor methods. E.g. looking at $x->{sign} is not a good idea since there is no guaranty that the object in question has such a hash key, nor is a hash underneath at all.
The sign is either '+', '-', 'NaN', '+inf' or '-inf'. You can access it with the sign() method.
A sign of 'NaN' is used to represent the result when input arguments are not numbers or as a result of 0/0. '+inf' and '-inf' represent plus respectively minus infinity. You will get '+inf' when dividing a positive number by 0, and '-inf' when dividing any negative number by 0.
Since all numbers are now objects, you can use all functions that are part of the BigInt API. You can only use the bxxx() notation, and not the fxxx() notation, though.
But a warning is in order. When using the following to make a copy of a number, only a shallow copy will be made.
- $x = 9; $y = $x;
- $x = $y = 7;
Using the copy or the original with overloaded math is okay, e.g. the following work:
- $x = 9; $y = $x;
- print $x + 1, " ", $y,"\n"; # prints 10 9
but calling any method that modifies the number directly will result in both the original and the copy being destroyed:
Using methods that do not modify, but testthe contents works:
- $x = 9; $y = $x;
- $z = 9 if $x->is_zero(); # works fine
See the documentation about the copy constructor and =
in overload, as
well as the documentation in BigInt for further details.
A shortcut to return Math::BigInt->binf(). Useful because Perl does not always
handle bareword inf
properly.
A shortcut to return Math::BigInt->bnan(). Useful because Perl does not always
handle bareword NaN
properly.
- # perl -Mbigint=e -wle 'print e'
Returns Euler's number e
, aka exp(1). Note that under bigint, this is
truncated to an integer, and hence simple '2'.
- # perl -Mbigint=PI -wle 'print PI'
Returns PI. Note that under bigint, this is truncated to an integer, and hence simple '3'.
- bexp($power,$accuracy);
Returns Euler's number e
raised to the appropriate power, to
the wanted accuracy.
Note that under bigint, the result is truncated to an integer.
Example:
- # perl -Mbigint=bexp -wle 'print bexp(1,80)'
- bpi($accuracy);
Returns PI to the wanted accuracy. Note that under bigint, this is truncated to an integer, and hence simple '3'.
Example:
- # perl -Mbigint=bpi -wle 'print bpi(80)'
Return the class that numbers are upgraded to, is in fact returning
$Math::BigInt::upgrade
.
Returns true or false if bigint
is in effect in the current scope.
This method only works on Perl v5.9.4 or later.
Perl does not allow overloading of ranges, so you can neither safely use ranges with bigint endpoints, nor is the iterator variable a bigint.
This method only works on Perl v5.9.4 or later.
bigint
overrides these routines with versions that can also handle
big integer values. Under Perl prior to version v5.9.4, however, this
will not happen unless you specifically ask for it with the two
import tags "hex" and "oct" - and then it will be global and cannot be
disabled inside a scope with "no bigint":
The second call to hex() will warn about a non-portable constant.
Compare this to:
bigint
is just a thin wrapper around various modules of the Math::BigInt
family. Think of it as the head of the family, who runs the shop, and orders
the others to do the work.
The following modules are currently used by bigint:
- Math::BigInt::Lite (for speed, and only if it is loadable)
- Math::BigInt
Some cool command line examples to impress the Python crowd ;) You might want to compare them to the results under -Mbignum or -Mbigrat:
- perl -Mbigint -le 'print sqrt(33)'
- perl -Mbigint -le 'print 2*255'
- perl -Mbigint -le 'print 4.5+2*255'
- perl -Mbigint -le 'print 3/7 + 5/7 + 8/3'
- perl -Mbigint -le 'print 123->is_odd()'
- perl -Mbigint -le 'print log(2)'
- perl -Mbigint -le 'print 2 ** 0.5'
- perl -Mbigint=a,65 -le 'print 2 ** 0.2'
- perl -Mbignum=a,65,l,GMP -le 'print 7 ** 7777'
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
Especially bigrat as in perl -Mbigrat -le 'print 1/3+1/4'
and
bignum as in perl -Mbignum -le 'print sqrt(2)'
.
Math::BigInt, Math::BigRat and Math::Big as well as Math::BigInt::Pari and Math::BigInt::GMP.
(C) by Tels http://bloodgate.com/ in early 2002 - 2007.
bignum - Transparent BigNumber support for Perl
- use bignum;
- $x = 2 + 4.5,"\n"; # BigFloat 6.5
- print 2 ** 512 * 0.1,"\n"; # really is what you think it is
- print inf * inf,"\n"; # prints inf
- print NaN * 3,"\n"; # prints NaN
- {
- no bignum;
- print 2 ** 256,"\n"; # a normal Perl scalar now
- }
- # for older Perls, import into current package:
- use bignum qw/hex oct/;
- print hex("0x1234567890123490"),"\n";
- print oct("01234567890123490"),"\n";
All operators (including basic math operations) are overloaded. Integer and floating-point constants are created as proper BigInts or BigFloats, respectively.
If you do
- use bignum;
at the top of your script, Math::BigFloat and Math::BigInt will be loaded and any constant number will be converted to an object (Math::BigFloat for floats like 3.1415 and Math::BigInt for integers like 1234).
So, the following line:
- $x = 1234;
creates actually a Math::BigInt and stores a reference to in $x. This happens transparently and behind your back, so to speak.
You can see this with the following:
- perl -Mbignum -le 'print ref(1234)'
Don't worry if it says Math::BigInt::Lite, bignum and friends will use Lite if it is installed since it is faster for some operations. It will be automatically upgraded to BigInt whenever necessary:
- perl -Mbignum -le 'print ref(2**255)'
This also means it is a bad idea to check for some specific package, since
the actual contents of $x might be something unexpected. Due to the
transparent way of bignum ref() should not be necessary, anyway.
Since Math::BigInt and BigFloat also overload the normal math operations, the following line will still work:
- perl -Mbignum -le 'print ref(1234+1234)'
Since numbers are actually objects, you can call all the usual methods from BigInt/BigFloat on them. This even works to some extent on expressions:
- perl -Mbignum -le '$x = 1234; print $x->bdec()'
- perl -Mbignum -le 'print 1234->copy()->binc();'
- perl -Mbignum -le 'print 1234->copy()->binc->badd(6);'
- perl -Mbignum -le 'print +(1234)->copy()->binc()'
(Note that print doesn't do what you expect if the expression starts with
'(' hence the +
)
You can even chain the operations together as usual:
- perl -Mbignum -le 'print 1234->copy()->binc->badd(6);'
- 1241
Under bignum (or bigint or bigrat), Perl will "upgrade" the numbers appropriately. This means that:
- perl -Mbignum -le 'print 1234+4.5'
- 1238.5
will work correctly. These mixed cases don't do always work when using Math::BigInt or Math::BigFloat alone, or at least not in the way normal Perl scalars work.
If you do want to work with large integers like under use integer;
, try
use bigint;
:
- perl -Mbigint -le 'print 1234.5+4.5'
- 1238
There is also use bigrat;
which gives you big rationals:
- perl -Mbigrat -le 'print 1234+4.1'
- 12381/10
The entire upgrading/downgrading is still experimental and might not work as you expect or may even have bugs. You might get errors like this:
- Can't use an undefined value as an ARRAY reference at
- /usr/local/lib/perl5/5.8.0/Math/BigInt/Calc.pm line 864
This means somewhere a routine got a BigFloat/Lite but expected a BigInt (or vice versa) and the upgrade/downgrad path was missing. This is a bug, please report it so that we can fix it.
You might consider using just Math::BigInt or Math::BigFloat, since they
allow you finer control over what get's done in which module/space. For
instance, simple loop counters will be Math::BigInts under use bignum;
and
this is slower than keeping them as Perl scalars:
- perl -Mbignum -le 'for ($i = 0; $i < 10; $i++) { print ref($i); }'
Please note the following does not work as expected (prints nothing), since overloading of '..' is not yet possible in Perl (as of v5.8.0):
- perl -Mbignum -le 'for (1..2) { print ref($_); }'
bignum recognizes some options that can be passed while loading it via use. The options can (currently) be either a single letter form, or the long form. The following options exist:
This sets the accuracy for all math operations. The argument must be greater than or equal to zero. See Math::BigInt's bround() function for details.
- perl -Mbignum=a,50 -le 'print sqrt(20)'
Note that setting precision and accuracy at the same time is not possible.
This sets the precision for all math operations. The argument can be any integer. Negative values mean a fixed number of digits after the dot, while a positive value rounds to this digit left from the dot. 0 or 1 mean round to integer. See Math::BigInt's bfround() function for details.
- perl -Mbignum=p,-50 -le 'print sqrt(20)'
Note that setting precision and accuracy at the same time is not possible.
This enables a trace mode and is primarily for debugging bignum or Math::BigInt/Math::BigFloat.
Load a different math lib, see Math Library.
- perl -Mbignum=l,GMP -e 'print 2 ** 512'
Currently there is no way to specify more than one library on the command line. This means the following does not work:
- perl -Mbignum=l,GMP,Pari -e 'print 2 ** 512'
This will be hopefully fixed soon ;)
Override the built-in hex() method with a version that can handle big numbers. This overrides it by exporting it to the current package. Under Perl v5.10.0 and higher, this is not so necessary, as hex() is lexically overridden in the current scope whenever the bignum pragma is active.
Override the built-in oct() method with a version that can handle big numbers. This overrides it by exporting it to the current package. Under Perl v5.10.0 and higher, this is not so necessary, as oct() is lexically overridden in the current scope whenever the bigint pragma is active.
This prints out the name and version of all modules used and then exits.
- perl -Mbignum=v
Beside import() and AUTOLOAD() there are only a few other methods.
Since all numbers are now objects, you can use all functions that are part of the BigInt or BigFloat API. It is wise to use only the bxxx() notation, and not the fxxx() notation, though. This makes it possible that the underlying object might morph into a different class than BigFloat.
But a warning is in order. When using the following to make a copy of a number, only a shallow copy will be made.
- $x = 9; $y = $x;
- $x = $y = 7;
If you want to make a real copy, use the following:
- $y = $x->copy();
Using the copy or the original with overloaded math is okay, e.g. the following work:
- $x = 9; $y = $x;
- print $x + 1, " ", $y,"\n"; # prints 10 9
but calling any method that modifies the number directly will result in both the original and the copy being destroyed:
Using methods that do not modify, but test the contents works:
- $x = 9; $y = $x;
- $z = 9 if $x->is_zero(); # works fine
See the documentation about the copy constructor and =
in overload, as
well as the documentation in BigInt for further details.
A shortcut to return Math::BigInt->binf(). Useful because Perl does not always
handle bareword inf
properly.
A shortcut to return Math::BigInt->bnan(). Useful because Perl does not always
handle bareword NaN
properly.
- # perl -Mbignum=e -wle 'print e'
Returns Euler's number e
, aka exp(1).
- # perl -Mbignum=PI -wle 'print PI'
Returns PI.
- bexp($power,$accuracy);
Returns Euler's number e
raised to the appropriate power, to
the wanted accuracy.
Example:
- # perl -Mbignum=bexp -wle 'print bexp(1,80)'
- bpi($accuracy);
Returns PI to the wanted accuracy.
Example:
- # perl -Mbignum=bpi -wle 'print bpi(80)'
Return the class that numbers are upgraded to, is in fact returning
$Math::BigInt::upgrade
.
Returns true or false if bignum
is in effect in the current scope.
This method only works on Perl v5.9.4 or later.
Math with the numbers is done (by default) by a module called Math::BigInt::Calc. This is equivalent to saying:
- use bignum lib => 'Calc';
You can change this by using:
- use bignum lib => 'GMP';
The following would first try to find Math::BigInt::Foo, then Math::BigInt::Bar, and when this also fails, revert to Math::BigInt::Calc:
- use bignum lib => 'Foo,Math::BigInt::Bar';
Please see respective module documentation for further details.
Using lib
warns if none of the specified libraries can be found and
Math::BigInt did fall back to one of the default libraries.
To suppress this warning, use try
instead:
- use bignum try => 'GMP';
If you want the code to die instead of falling back, use only
instead:
- use bignum only => 'GMP';
The numbers are stored as objects, and their internals might change at anytime, especially between math operations. The objects also might belong to different classes, like Math::BigInt, or Math::BigFLoat. Mixing them together, even with normal scalars is not extraordinary, but normal and expected.
You should not depend on the internal format, all accesses must go through accessor methods. E.g. looking at $x->{sign} is not a bright idea since there is no guaranty that the object in question has such a hashkey, nor is a hash underneath at all.
The sign is either '+', '-', 'NaN', '+inf' or '-inf' and stored separately. You can access it with the sign() method.
A sign of 'NaN' is used to represent the result when input arguments are not numbers or as a result of 0/0. '+inf' and '-inf' represent plus respectively minus infinity. You will get '+inf' when dividing a positive number by 0, and '-inf' when dividing any negative number by 0.
This method only works on Perl v5.9.4 or later.
bigint
overrides these routines with versions that can also handle
big integer values. Under Perl prior to version v5.9.4, however, this
will not happen unless you specifically ask for it with the two
import tags "hex" and "oct" - and then it will be global and cannot be
disabled inside a scope with "no bigint":
The second call to hex() will warn about a non-portable constant.
Compare this to:
bignum
is just a thin wrapper around various modules of the Math::BigInt
family. Think of it as the head of the family, who runs the shop, and orders
the others to do the work.
The following modules are currently used by bignum:
- Math::BigInt::Lite (for speed, and only if it is loadable)
- Math::BigInt
- Math::BigFloat
Some cool command line examples to impress the Python crowd ;)
- perl -Mbignum -le 'print sqrt(33)'
- perl -Mbignum -le 'print 2*255'
- perl -Mbignum -le 'print 4.5+2*255'
- perl -Mbignum -le 'print 3/7 + 5/7 + 8/3'
- perl -Mbignum -le 'print 123->is_odd()'
- perl -Mbignum -le 'print log(2)'
- perl -Mbignum -le 'print exp(1)'
- perl -Mbignum -le 'print 2 ** 0.5'
- perl -Mbignum=a,65 -le 'print 2 ** 0.2'
- perl -Mbignum=a,65,l,GMP -le 'print 7 ** 7777'
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
Especially bigrat as in perl -Mbigrat -le 'print 1/3+1/4'
.
Math::BigFloat, Math::BigInt, Math::BigRat and Math::Big as well as Math::BigInt::Pari and Math::BigInt::GMP.
(C) by Tels http://bloodgate.com/ in early 2002 - 2007.
bigrat - Transparent BigNumber/BigRational support for Perl
All operators (including basic math operations) are overloaded. Integer and floating-point constants are created as proper BigInts or BigFloats, respectively.
Other than bignum, this module upgrades to Math::BigRat, meaning that instead of 2.5 you will get 2+1/2 as output.
bigrat
is just a thin wrapper around various modules of the Math::BigInt
family. Think of it as the head of the family, who runs the shop, and orders
the others to do the work.
The following modules are currently used by bignum:
- Math::BigInt::Lite (for speed, and only if it is loadable)
- Math::BigInt
- Math::BigFloat
- Math::BigRat
Math with the numbers is done (by default) by a module called Math::BigInt::Calc. This is equivalent to saying:
- use bigrat lib => 'Calc';
You can change this by using:
- use bignum lib => 'GMP';
The following would first try to find Math::BigInt::Foo, then Math::BigInt::Bar, and when this also fails, revert to Math::BigInt::Calc:
- use bigrat lib => 'Foo,Math::BigInt::Bar';
Using lib
warns if none of the specified libraries can be found and
Math::BigInt did fall back to one of the default libraries.
To suppress this warning, use try
instead:
- use bignum try => 'GMP';
If you want the code to die instead of falling back, use only
instead:
- use bignum only => 'GMP';
Please see respective module documentation for further details.
The sign is either '+', '-', 'NaN', '+inf' or '-inf'.
A sign of 'NaN' is used to represent the result when input arguments are not numbers or as a result of 0/0. '+inf' and '-inf' represent plus respectively minus infinity. You will get '+inf' when dividing a positive number by 0, and '-inf' when dividing any negative number by 0.
Since all numbers are not objects, you can use all functions that are part of the BigInt or BigFloat API. It is wise to use only the bxxx() notation, and not the fxxx() notation, though. This makes you independent on the fact that the underlying object might morph into a different class than BigFloat.
A shortcut to return Math::BigInt->binf(). Useful because Perl does not always
handle bareword inf
properly.
A shortcut to return Math::BigInt->bnan(). Useful because Perl does not always
handle bareword NaN
properly.
- # perl -Mbigrat=e -wle 'print e'
Returns Euler's number e
, aka exp(1).
- # perl -Mbigrat=PI -wle 'print PI'
Returns PI.
- bexp($power,$accuracy);
Returns Euler's number e
raised to the appropriate power, to
the wanted accuracy.
Example:
- # perl -Mbigrat=bexp -wle 'print bexp(1,80)'
- bpi($accuracy);
Returns PI to the wanted accuracy.
Example:
- # perl -Mbigrat=bpi -wle 'print bpi(80)'
Return the class that numbers are upgraded to, is in fact returning
$Math::BigInt::upgrade
.
Returns true or false if bigrat
is in effect in the current scope.
This method only works on Perl v5.9.4 or later.
Math with the numbers is done (by default) by a module called
But a warning is in order. When using the following to make a copy of a number, only a shallow copy will be made.
- $x = 9; $y = $x;
- $x = $y = 7;
If you want to make a real copy, use the following:
- $y = $x->copy();
Using the copy or the original with overloaded math is okay, e.g. the following work:
- $x = 9; $y = $x;
- print $x + 1, " ", $y,"\n"; # prints 10 9
but calling any method that modifies the number directly will result in both the original and the copy being destroyed:
Using methods that do not modify, but testthe contents works:
- $x = 9; $y = $x;
- $z = 9 if $x->is_zero(); # works fine
See the documentation about the copy constructor and =
in overload, as
well as the documentation in BigInt for further details.
bignum recognizes some options that can be passed while loading it via use. The options can (currently) be either a single letter form, or the long form. The following options exist:
This sets the accuracy for all math operations. The argument must be greater than or equal to zero. See Math::BigInt's bround() function for details.
- perl -Mbigrat=a,50 -le 'print sqrt(20)'
Note that setting precision and accuracy at the same time is not possible.
This sets the precision for all math operations. The argument can be any integer. Negative values mean a fixed number of digits after the dot, while a positive value rounds to this digit left from the dot. 0 or 1 mean round to integer. See Math::BigInt's bfround() function for details.
- perl -Mbigrat=p,-50 -le 'print sqrt(20)'
Note that setting precision and accuracy at the same time is not possible.
This enables a trace mode and is primarily for debugging bignum or Math::BigInt/Math::BigFloat.
Load a different math lib, see MATH LIBRARY.
- perl -Mbigrat=l,GMP -e 'print 2 ** 512'
Currently there is no way to specify more than one library on the command line. This means the following does not work:
- perl -Mbignum=l,GMP,Pari -e 'print 2 ** 512'
This will be hopefully fixed soon ;)
Override the built-in hex() method with a version that can handle big numbers. This overrides it by exporting it to the current package. Under Perl v5.10.0 and higher, this is not so necessary, as hex() is lexically overridden in the current scope whenever the bigrat pragma is active.
Override the built-in oct() method with a version that can handle big numbers. This overrides it by exporting it to the current package. Under Perl v5.10.0 and higher, this is not so necessary, as oct() is lexically overridden in the current scope whenever the bigrat pragma is active.
This prints out the name and version of all modules used and then exits.
- perl -Mbigrat=v
This method only works on Perl v5.9.4 or later.
bigint
overrides these routines with versions that can also handle
big integer values. Under Perl prior to version v5.9.4, however, this
will not happen unless you specifically ask for it with the two
import tags "hex" and "oct" - and then it will be global and cannot be
disabled inside a scope with "no bigint":
The second call to hex() will warn about a non-portable constant.
Compare this to:
- perl -Mbigrat -le 'print sqrt(33)'
- perl -Mbigrat -le 'print 2*255'
- perl -Mbigrat -le 'print 4.5+2*255'
- perl -Mbigrat -le 'print 3/7 + 5/7 + 8/3'
- perl -Mbigrat -le 'print 12->is_odd()';
- perl -Mbignum=l,GMP -le 'print 7 ** 7777'
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
Especially bignum.
Math::BigFloat, Math::BigInt, Math::BigRat and Math::Big as well as Math::BigInt::Pari and Math::BigInt::GMP.
(C) by Tels http://bloodgate.com/ in early 2002 - 2007.
blib - Use MakeMaker's uninstalled version of a package
- perl -Mblib script [args...]
- perl -Mblib=dir script [args...]
Looks for MakeMaker-like 'blib' directory structure starting in dir (or current directory) and working back up to five levels of '..'.
Intended for use on command line with -M option as a way of testing arbitrary scripts against an uninstalled version of a package.
However it is possible to :
etc. if you really must.
Pollutes global name space for development only task.
Nick Ing-Simmons nik@tiuk.ti.com
bytes - Perl pragma to force byte semantics rather than character semantics
This pragma reflects early attempts to incorporate Unicode into perl and has since been superseded. It breaks encapsulation (i.e. it exposes the innards of how the perl executable currently happens to store a string), and use of this module for anything other than debugging purposes is strongly discouraged. If you feel that the functions here within might be useful for your application, this possibly indicates a mismatch between your mental model of Perl Unicode and the current reality. In that case, you may wish to read some of the perl Unicode documentation: perluniintro, perlunitut, perlunifaq and perlunicode.
The use bytes
pragma disables character semantics for the rest of the
lexical scope in which it appears. no bytes
can be used to reverse
the effect of use bytes
within the current lexical scope.
Perl normally assumes character semantics in the presence of character
data (i.e. data that has come from a source that has been marked as
being of a particular character encoding). When use bytes
is in
effect, the encoding is temporarily ignored, and each string is treated
as a series of bytes.
As an example, when Perl sees $x = chr(400)
, it encodes the character
in UTF-8 and stores it in $x. Then it is marked as character data, so,
for instance, length $x
returns 1
. However, in the scope of the
bytes
pragma, $x is treated as a series of bytes - the bytes that make
up the UTF8 encoding - and length $x
returns 2
:
chr(), ord(), substr(), index() and rindex() behave similarly.
For more on the implications and differences between character semantics and byte semantics, see perluniintro and perlunicode.
bytes::substr() does not work as an lvalue().
c2ph, pstruct - Dump C structures as generated from cc -g -S
stabs
- c2ph [-dpnP] [var=val] [files ...]
- Options:
- -w wide; short for: type_width=45 member_width=35 offset_width=8
- -x hex; short for: offset_fmt=x offset_width=08 size_fmt=x size_width=04
- -n do not generate perl code (default when invoked as pstruct)
- -p generate perl code (default when invoked as c2ph)
- -v generate perl code, with C decls as comments
- -i do NOT recompute sizes for intrinsic datatypes
- -a dump information on intrinsics also
- -t trace execution
- -d spew reams of debugging output
- -slist give comma-separated list a structures to dump
The following is the old c2ph.doc documentation by Tom Christiansen <tchrist@perl.com> Date: 25 Jul 91 08:10:21 GMT
Once upon a time, I wrote a program called pstruct. It was a perl program that tried to parse out C structures and display their member offsets for you. This was especially useful for people looking at binary dumps or poking around the kernel.
Pstruct was not a pretty program. Neither was it particularly robust. The problem, you see, was that the C compiler was much better at parsing C than I could ever hope to be.
So I got smart: I decided to be lazy and let the C compiler parse the C, which would spit out debugger stabs for me to read. These were much easier to parse. It's still not a pretty program, but at least it's more robust.
Pstruct takes any .c or .h files, or preferably .s ones, since that's the format it is going to massage them into anyway, and spits out listings like this:
- struct tty {
- int tty.t_locker 000 4
- int tty.t_mutex_index 004 4
- struct tty * tty.t_tp_virt 008 4
- struct clist tty.t_rawq 00c 20
- int tty.t_rawq.c_cc 00c 4
- int tty.t_rawq.c_cmax 010 4
- int tty.t_rawq.c_cfx 014 4
- int tty.t_rawq.c_clx 018 4
- struct tty * tty.t_rawq.c_tp_cpu 01c 4
- struct tty * tty.t_rawq.c_tp_iop 020 4
- unsigned char * tty.t_rawq.c_buf_cpu 024 4
- unsigned char * tty.t_rawq.c_buf_iop 028 4
- struct clist tty.t_canq 02c 20
- int tty.t_canq.c_cc 02c 4
- int tty.t_canq.c_cmax 030 4
- int tty.t_canq.c_cfx 034 4
- int tty.t_canq.c_clx 038 4
- struct tty * tty.t_canq.c_tp_cpu 03c 4
- struct tty * tty.t_canq.c_tp_iop 040 4
- unsigned char * tty.t_canq.c_buf_cpu 044 4
- unsigned char * tty.t_canq.c_buf_iop 048 4
- struct clist tty.t_outq 04c 20
- int tty.t_outq.c_cc 04c 4
- int tty.t_outq.c_cmax 050 4
- int tty.t_outq.c_cfx 054 4
- int tty.t_outq.c_clx 058 4
- struct tty * tty.t_outq.c_tp_cpu 05c 4
- struct tty * tty.t_outq.c_tp_iop 060 4
- unsigned char * tty.t_outq.c_buf_cpu 064 4
- unsigned char * tty.t_outq.c_buf_iop 068 4
- (*int)() tty.t_oproc_cpu 06c 4
- (*int)() tty.t_oproc_iop 070 4
- (*int)() tty.t_stopproc_cpu 074 4
- (*int)() tty.t_stopproc_iop 078 4
- struct thread * tty.t_rsel 07c 4
etc.
Actually, this was generated by a particular set of options. You can control the formatting of each column, whether you prefer wide or fat, hex or decimal, leading zeroes or whatever.
All you need to be able to use this is a C compiler than generates BSD/GCC-style stabs. The -g option on native BSD compilers and GCC should get this for you.
To learn more, just type a bogus option, like -\?, and a long usage message will be provided. There are a fair number of possibilities.
If you're only a C programmer, than this is the end of the message for you. You can quit right now, and if you care to, save off the source and run it when you feel like it. Or not.
But if you're a perl programmer, then for you I have something much more wondrous than just a structure offset printer.
You see, if you call pstruct by its other incybernation, c2ph, you have a code generator that translates C code into perl code! Well, structure and union declarations at least, but that's quite a bit.
Prior to this point, anyone programming in perl who wanted to interact with C programs, like the kernel, was forced to guess the layouts of the C structures, and then hardwire these into his program. Of course, when you took your wonderfully crafted program to a system where the sgtty structure was laid out differently, your program broke. Which is a shame.
We've had Larry's h2ph translator, which helped, but that only works on cpp symbols, not real C, which was also very much needed. What I offer you is a symbolic way of getting at all the C structures. I've couched them in terms of packages and functions. Consider the following program:
- #!/usr/local/bin/perl
- require 'syscall.ph';
- require 'sys/time.ph';
- require 'sys/resource.ph';
- $ru = "\0" x &rusage'sizeof();
- syscall(&SYS_getrusage, &RUSAGE_SELF, $ru) && die "getrusage: $!";
- @ru = unpack($t = &rusage'typedef(), $ru);
- $utime = $ru[ &rusage'ru_utime + &timeval'tv_sec ]
- + ($ru[ &rusage'ru_utime + &timeval'tv_usec ]) / 1e6;
- $stime = $ru[ &rusage'ru_stime + &timeval'tv_sec ]
- + ($ru[ &rusage'ru_stime + &timeval'tv_usec ]) / 1e6;
- printf "you have used %8.3fs+%8.3fu seconds.\n", $utime, $stime;
As you see, the name of the package is the name of the structure. Regular fields are just their own names. Plus the following accessor functions are provided for your convenience:
- struct This takes no arguments, and is merely the number of first-level
- elements in the structure. You would use this for indexing
- into arrays of structures, perhaps like this
- $usec = $u[ &user'u_utimer
- + (&ITIMER_VIRTUAL * &itimerval'struct)
- + &itimerval'it_value
- + &timeval'tv_usec
- ];
- sizeof Returns the bytes in the structure, or the member if
- you pass it an argument, such as
- &rusage'sizeof(&rusage'ru_utime)
- typedef This is the perl format definition for passing to pack and
- unpack. If you ask for the typedef of a nothing, you get
- the whole structure, otherwise you get that of the member
- you ask for. Padding is taken care of, as is the magic to
- guarantee that a union is unpacked into all its aliases.
- Bitfields are not quite yet supported however.
- offsetof This function is the byte offset into the array of that
- member. You may wish to use this for indexing directly
- into the packed structure with vec() if you're too lazy
- to unpack it.
- typeof Not to be confused with the typedef accessor function, this
- one returns the C type of that field. This would allow
- you to print out a nice structured pretty print of some
- structure without knoning anything about it beforehand.
- No args to this one is a noop. Someday I'll post such
- a thing to dump out your u structure for you.
The way I see this being used is like basically this:
- % h2ph <some_include_file.h > /usr/lib/perl/tmp.ph
- % c2ph some_include_file.h >> /usr/lib/perl/tmp.ph
- % install
It's a little tricker with c2ph because you have to get the includes right. I can't know this for your system, but it's not usually too terribly difficult.
The code isn't pretty as I mentioned -- I never thought it would be a 1000- line program when I started, or I might not have begun. :-) But I would have been less cavalier in how the parts of the program communicated with each other, etc. It might also have helped if I didn't have to divine the makeup of the stabs on the fly, and then account for micro differences between my compiler and gcc.
Anyway, here it is. Should run on perl v4 or greater. Maybe less.
- --tom
charnames - access to Unicode character names and named character sequences; also define character names
- use charnames ':full';
- print "\N{GREEK SMALL LETTER SIGMA} is called sigma.\n";
- print "\N{LATIN CAPITAL LETTER E WITH VERTICAL LINE BELOW}",
- " is an officially named sequence of two Unicode characters\n";
- use charnames ':loose';
- print "\N{Greek small-letter sigma}",
- "can be used to ignore case, underscores, most blanks,"
- "and when you aren't sure if the official name has hyphens\n";
- use charnames ':short';
- print "\N{greek:Sigma} is an upper-case sigma.\n";
- use charnames qw(cyrillic greek);
- print "\N{sigma} is Greek sigma, and \N{be} is Cyrillic b.\n";
- use utf8;
- use charnames ":full", ":alias" => {
- e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE",
- mychar => 0xE8000, # Private use area
- "自転車に乗る人" => "BICYCLIST"
- };
- print "\N{e_ACUTE} is a small letter e with an acute.\n";
- print "\N{mychar} allows me to name private use characters.\n";
- print "And I can create synonyms in other languages,",
- " such as \N{自転車に乗る人} for "BICYCLIST (U+1F6B4)\n";
- use charnames ();
- print charnames::viacode(0x1234); # prints "ETHIOPIC SYLLABLE SEE"
- printf "%04X", charnames::vianame("GOTHIC LETTER AHSA"); # prints
- # "10330"
- print charnames::vianame("LATIN CAPITAL LETTER A"); # prints 65 on
- # ASCII platforms;
- # 193 on EBCDIC
- print charnames::string_vianame("LATIN CAPITAL LETTER A"); # prints "A"
Pragma use charnames
is used to gain access to the names of the
Unicode characters and named character sequences, and to allow you to define
your own character and character sequence names.
All forms of the pragma enable use of the following 3 functions:
charnames::string_vianame(name) for run-time lookup of a either a character name or a named character sequence, returning its string representation
charnames::vianame(name) for run-time lookup of a character name (but not a named character sequence) to get its ordinal value (code point)
charnames::viacode(code) for run-time lookup of a code point to get its Unicode name.
Starting in Perl v5.16, any occurrence of \N{CHARNAME} sequences
in a double-quotish string automatically loads this module with arguments
:full
and :short
(described below) if it hasn't already been loaded with
different arguments, in order to compile the named Unicode character into
position in the string. Prior to v5.16, an explicit use charnames
was
required to enable this usage. (However, prior to v5.16, the form "use
charnames ();"
did not enable \N{CHARNAME}.)
Note that \N{U+...}, where the ... is a hexadecimal number,
also inserts a character into a string.
The character it inserts is the one whose code point
(ordinal value) is equal to the number. For example, "\N{U+263a}"
is
the Unicode (white background, black foreground) smiley face
equivalent to "\N{WHITE SMILING FACE}"
.
Also note, \N{...} can mean a regex quantifier instead of a character
name, when the ... is a number (or comma separated pair of numbers
(see QUANTIFIERS in perlreref), and is not related to this pragma.
The charnames
pragma supports arguments :full
, :loose
, :short
,
script names and customized aliases.
If :full
is present, for expansion of
\N{CHARNAME}, the string CHARNAME is first looked up in the list of
standard Unicode character names.
:loose
is a variant of :full
which allows CHARNAME to be less
precisely specified. Details are in LOOSE MATCHES.
If :short
is present, and
CHARNAME has the form SCRIPT:CNAME, then CNAME is looked up
as a letter in script SCRIPT, as described in the next paragraph.
Or, if use charnames
is used
with script name arguments, then for \N{CHARNAME} the name
CHARNAME is looked up as a letter in the given scripts (in the
specified order). Customized aliases can override these, and are explained in
CUSTOM ALIASES.
For lookup of CHARNAME inside a given script SCRIPTNAME, this pragma looks in the table of standard Unicode names for the names
- SCRIPTNAME CAPITAL LETTER CHARNAME
- SCRIPTNAME SMALL LETTER CHARNAME
- SCRIPTNAME LETTER CHARNAME
If CHARNAME is all lowercase,
then the CAPITAL
variant is ignored, otherwise the SMALL
variant
is ignored, and both CHARNAME and SCRIPTNAME are converted to all
uppercase for look-up. Other than that, both of them follow loose rules if :loose
is also specified; strict otherwise.
Note that \N{...}
is compile-time; it's a special form of string
constant used inside double-quotish strings; this means that you cannot
use variables inside the \N{...}
. If you want similar run-time
functionality, use
charnames::string_vianame().
Note, starting in Perl 5.18, the name BELL
refers to the Unicode character
U+1F514, instead of the traditional U+0007. For the latter, use ALERT
or BEL
.
It is a syntax error to use \N{NAME}
where NAME
is unknown.
For \N{NAME}
, it is a fatal error if use bytes
is in effect and the
input name is that of a character that won't fit into a byte (i.e., whose
ordinal is above 255).
Otherwise, any string that includes a \N{charname} or
\N{U+code point} will automatically have Unicode semantics (see
Byte and Character Semantics in perlunicode).
By specifying :loose
, Unicode's loose character name matching rules are
selected instead of the strict exact match used otherwise.
That means that CHARNAME doesn't have to be so precisely specified.
Upper/lower case doesn't matter (except with scripts as mentioned above), nor
do any underscores, and the only hyphens that matter are those at the
beginning or end of a word in the name (with one exception: the hyphen in
U+1180 HANGUL JUNGSEONG O-E
does matter).
Also, blanks not adjacent to hyphens don't matter.
The official Unicode names are quite variable as to where they use hyphens
versus spaces to separate word-like units, and this option allows you to not
have to care as much.
The reason non-medial hyphens matter is because of cases like
U+0F60 TIBETAN LETTER -A
versus U+0F68 TIBETAN LETTER A
.
The hyphen here is significant, as is the space before it, and so both must be
included.
:loose
slows down look-ups by a factor of 2 to 3 versus
:full
, but the trade-off may be worth it to you. Each individual look-up
takes very little time, and the results are cached, so the speed difference
would become a factor only in programs that do look-ups of many different
spellings, and probably only when those look-ups are through vianame()
and
string_vianame()
, since \N{...}
look-ups are done at compile time.
Starting in Unicode 6.1 and Perl v5.16, Unicode defines many abbreviations and names that were formerly Perl extensions, and some additional ones that Perl did not previously accept. The list is getting too long to reproduce here, but you can get the complete list from the Unicode web site: http://www.unicode.org/Public/UNIDATA/NameAliases.txt.
Earlier versions of Perl accepted almost all the 6.1 names. These were most extensively documented in the v5.14 version of this pod: http://perldoc.perl.org/5.14.0/charnames.html#ALIASES.
You can add customized aliases to standard (:full
) Unicode naming
conventions. The aliases override any standard definitions, so, if
you're twisted enough, you can change "\N{LATIN CAPITAL LETTER A}"
to
mean "B"
, etc.
Aliases must begin with a character that is alphabetic. After that, each may
contain any combination of word (\w
) characters, SPACE (U+0020),
HYPHEN-MINUS (U+002D), LEFT PARENTHESIS (U+0028), RIGHT PARENTHESIS (U+0029),
and NO-BREAK SPACE (U+00A0). These last three should never have been allowed
in names, and are retained for backwards compatibility only; they may be
deprecated and removed in future releases of Perl, so don't use them for new
names. (More precisely, the first character of a name you specify must be
something that matches all of \p{ID_Start}
, \p{Alphabetic}
, and
\p{Gc=Letter}
. This makes sure it is what any reasonable person would view
as an alphabetic character. And, the continuation characters that match \w
must also match \p{ID_Continue}
.) Starting with Perl v5.18, any Unicode
characters meeting the above criteria may be used; prior to that only
Latin1-range characters were acceptable.
An alias can map to either an official Unicode character name (not a loose
matched name) or to a
numeric code point (ordinal). The latter is useful for assigning names
to code points in Unicode private use areas such as U+E800 through
U+F8FF.
A numeric code point must be a non-negative integer or a string beginning
with "U+"
or "0x"
with the remainder considered to be a
hexadecimal integer. A literal numeric constant must be unsigned; it
will be interpreted as hex if it has a leading zero or contains
non-decimal hex digits; otherwise it will be interpreted as decimal.
Aliases are added either by the use of anonymous hashes:
or by using a file containing aliases:
- use charnames ":alias" => "pro";
This will try to read "unicore/pro_alias.pl"
from the @INC
path. This
file should return a list in plain perl:
- (
- A_GRAVE => "LATIN CAPITAL LETTER A WITH GRAVE",
- A_CIRCUM => "LATIN CAPITAL LETTER A WITH CIRCUMFLEX",
- A_DIAERES => "LATIN CAPITAL LETTER A WITH DIAERESIS",
- A_TILDE => "LATIN CAPITAL LETTER A WITH TILDE",
- A_BREVE => "LATIN CAPITAL LETTER A WITH BREVE",
- A_RING => "LATIN CAPITAL LETTER A WITH RING ABOVE",
- A_MACRON => "LATIN CAPITAL LETTER A WITH MACRON",
- mychar2 => "U+E8001",
- );
Both these methods insert ":full"
automatically as the first argument (if no
other argument is given), and you can give the ":full"
explicitly as
well, like
- use charnames ":full", ":alias" => "pro";
":loose"
has no effect with these. Input names must match exactly, using
":full"
rules.
Also, both these methods currently allow only single characters to be named. To name a sequence of characters, use a custom translator (described below).
This is a runtime equivalent to \N{...}
. name can be any expression
that evaluates to a name accepted by \N{...}
under the :full option to charnames
. In addition, any other options for the
controlling "use charnames"
in the same scope apply, like :loose
or any
script list, :short option, or custom aliases you may have defined.
The only differences are due to the fact that string_vianame
is run-time
and \N{}
is compile time. You can't interpolate inside a \N{}
, (so
\N{$variable}
doesn't work); and if the input name is unknown,
string_vianame
returns undef instead of it being a syntax error.
This is similar to string_vianame
. The main difference is that under most
circumstances, vianame
returns an ordinal code
point, whereas string_vianame
returns a string. For example,
- printf "U+%04X", charnames::vianame("FOUR TEARDROP-SPOKED ASTERISK");
prints "U+2722".
This leads to the other two differences. Since a single code point is
returned, the function can't handle named character sequences, as these are
composed of multiple characters (it returns undef for these. And, the code
point can be that of any
character, even ones that aren't legal under the use bytes
pragma,
See BUGS for the circumstances in which the behavior differs from that described above.
Returns the full name of the character indicated by the numeric code. For example,
- print charnames::viacode(0x2722);
prints "FOUR TEARDROP-SPOKED ASTERISK".
The name returned is the "best" (defined below) official name or alias
for the code point, if
available; otherwise your custom alias for it, if defined; otherwise undef.
This means that your alias will only be returned for code points that don't
have an official Unicode name (nor alias) such as private use code points.
If you define more than one name for the code point, it is indeterminate which one will be returned.
As mentioned, the function returns undef if no name is known for the code
point. In Unicode the proper name for these is the empty string, which
undef stringifies to. (If you ask for a code point past the legal
Unicode maximum of U+10FFFF that you haven't assigned an alias to, you
get undef plus a warning.)
The input number must be a non-negative integer, or a string beginning
with "U+"
or "0x"
with the remainder considered to be a
hexadecimal integer. A literal numeric constant must be unsigned; it
will be interpreted as hex if it has a leading zero or contains
non-decimal hex digits; otherwise it will be interpreted as decimal.
As mentioned above under ALIASES, Unicode 6.1 defines extra names
(synonyms or aliases) for some code points, most of which were already
available as Perl extensions. All these are accepted by \N{...}
and the
other functions in this module, but viacode
has to choose which one
name to return for a given input code point, so it returns the "best" name.
To understand how this works, it is helpful to know more about the Unicode
name properties. All code points actually have only a single name, which
(starting in Unicode 2.0) can never change once a character has been assigned
to the code point. But mistakes have been made in assigning names, for
example sometimes a clerical error was made during the publishing of the
Standard which caused words to be misspelled, and there was no way to correct
those. The Name_Alias property was eventually created to handle these
situations. If a name was wrong, a corrected synonym would be published for
it, using Name_Alias. viacode
will return that corrected synonym as the
"best" name for a code point. (It is even possible, though it hasn't happened
yet, that the correction itself will need to be corrected, and so another
Name_Alias can be created for that code point; viacode
will return the
most recent correction.)
The Unicode name for each of the control characters (such as LINE FEED) is the
empty string. However almost all had names assigned by other standards, such
as the ASCII Standard, or were in common use. viacode
returns these names
as the "best" ones available. Unicode 6.1 has created Name_Aliases for each
of them, including alternate names, like NEW LINE. viacode
uses the
original name, "LINE FEED" in preference to the alternate. Similarly the
name returned for U+FEFF is "ZERO WIDTH NO-BREAK SPACE", not "BYTE ORDER
MARK".
Until Unicode 6.1, the 4 control characters U+0080, U+0081, U+0084, and U+0099 did not have names nor aliases. To preserve backwards compatibility, any alias you define for these code points will be returned by this function, in preference to the official name.
Some code points also have abbreviated names, such as "LF" or "NL".
viacode
never returns these.
Because a name correction may be added in future Unicode releases, the name
that viacode
returns may change as a result. This is a rare event, but it
does happen.
The mechanism of translation of \N{...}
escapes is general and not
hardwired into charnames.pm. A module can install custom
translations (inside the scope which uses the module) with the
following magic incantation:
- sub import {
- shift;
- $^H{charnames} = \&translator;
- }
Here translator() is a subroutine which takes CHARNAME as an
argument, and returns text to insert into the string instead of the
\N{CHARNAME} escape.
This is the only way you can create a custom named sequence of code points.
Since the text to insert should be different
in bytes
mode and out of it, the function should check the current
state of bytes
-flag as in:
See CUSTOM ALIASES above for restrictions on CHARNAME.
Of course, vianame
, viacode
, and string_vianame
would need to be
overridden as well.
vianame() normally returns an ordinal code point, but when the input name is of
the form U+...
, it returns a chr instead. In this case, if use bytes
is
in effect and the character won't fit into a byte, it returns undef and
raises a warning.
Since evaluation of the translation function (see CUSTOM TRANSLATORS) happens in the middle of compilation (of a string
literal), the translation function should not do any evals or
requires. This restriction should be lifted (but is low priority) in
a future version of Perl.
config_data - Query or change configuration of Perl modules
- # Get config/feature values
- config_data --module Foo::Bar --feature bazzable
- config_data --module Foo::Bar --config magic_number
- # Set config/feature values
- config_data --module Foo::Bar --set_feature bazzable=1
- config_data --module Foo::Bar --set_config magic_number=42
- # Print a usage message
- config_data --help
The config_data
tool provides a command-line interface to the
configuration of Perl modules. By "configuration", we mean something
akin to "user preferences" or "local settings". This is a
formalization and abstraction of the systems that people like Andreas
Koenig (CPAN::Config
), Jon Swartz (HTML::Mason::Config
), Andy
Wardley (Template::Config
), and Larry Wall (perl's own Config.pm)
have developed independently.
The configuration system employed here was developed in the context of
Module::Build
. Under this system, configuration information for a
module Foo
, for example, is stored in a module called
Foo::ConfigData
) (I would have called it Foo::Config
, but that
was taken by all those other systems mentioned in the previous
paragraph...). These ...::ConfigData
modules contain the
configuration data, as well as publicly accessible methods for
querying and setting (yes, actually re-writing) the configuration
data. The config_data
script (whose docs you are currently
reading) is merely a front-end for those methods. If you wish, you
may create alternate front-ends.
The two types of data that may be stored are called config
values
and feature
values. A config
value may be any perl scalar,
including references to complex data structures. It must, however, be
serializable using Data::Dumper
. A feature
is a boolean (1 or
0) value.
This script functions as a basic getter/setter wrapper around the
configuration of a single module. On the command line, specify which
module's configuration you're interested in, and pass options to get
or set config
or feature
values. The following options are
supported:
Specifies the name of the module to configure (required).
When passed the name of a feature
, shows its value. The value will
be 1 if the feature is enabled, 0 if the feature is not enabled, or
empty if the feature is unknown. When no feature name is supplied,
the names and values of all known features will be shown.
When passed the name of a config
entry, shows its value. The value
will be displayed using Data::Dumper
(or similar) as perl code.
When no config name is supplied, the names and values of all known
config entries will be shown.
Sets the given feature
to the given boolean value. Specify the value
as either 1 or 0.
Sets the given config
entry to the given value.
If the --eval
option is used, the values in set_config
will be
evaluated as perl code before being stored. This allows moderately
complicated data structures to be stored. For really complicated
structures, you probably shouldn't use this command-line interface,
just use the Perl API instead.
Prints a help message, including a few examples, and exits.
Ken Williams, kwilliams@cpan.org
Copyright (c) 1999, Ken Williams. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Module::Build(3), perl(1).
constant - Perl pragma to declare constants
- use constant PI => 4 * atan2(1, 1);
- use constant DEBUG => 0;
- print "Pi equals ", PI, "...\n" if DEBUG;
- use constant {
- SEC => 0,
- MIN => 1,
- HOUR => 2,
- MDAY => 3,
- MON => 4,
- YEAR => 5,
- WDAY => 6,
- YDAY => 7,
- ISDST => 8,
- };
- use constant WEEKDAYS => qw(
- Sunday Monday Tuesday Wednesday Thursday Friday Saturday
- );
- print "Today is ", (WEEKDAYS)[ (localtime)[WDAY] ], ".\n";
This pragma allows you to declare constants at compile-time.
When you declare a constant such as PI
using the method shown
above, each machine your script runs upon can have as many digits
of accuracy as it can use. Also, your program will be easier to
read, more likely to be maintained (and maintained correctly), and
far less likely to send a space probe to the wrong planet because
nobody noticed the one equation in which you wrote 3.14195
.
When a constant is used in an expression, Perl replaces it with its
value at compile time, and may then optimize the expression further.
In particular, any code in an if (CONSTANT)
block will be optimized
away if the constant is false.
As with all use directives, defining a constant happens at
compile time. Thus, it's probably not correct to put a constant
declaration inside of a conditional statement (like if ($foo)
{ use constant ... }
).
Constants defined using this module cannot be interpolated into strings like variables. However, concatenation works just fine:
Even though a reference may be declared as a constant, the reference may point to data which may be changed, as this code shows.
Dereferencing constant references incorrectly (such as using an array subscript on a constant hash reference, or vice versa) will be trapped at compile time.
Constants belong to the package they are defined in. To refer to a
constant defined in another package, specify the full package name, as
in Some::Package::CONSTANT
. Constants may be exported by modules,
and may also be called as either class or instance methods, that is,
as Some::Package->CONSTANT
or as $obj->CONSTANT
where
$obj
is an instance of Some::Package
. Subclasses may define
their own constants to override those in their base class.
The use of all caps for constant names is merely a convention, although it is recommended in order to make constants stand out and to help avoid collisions with other barewords, keywords, and subroutine names. Constant names must begin with a letter or underscore. Names beginning with a double underscore are reserved. Some poor choices for names will generate warnings, if warnings are enabled at compile time.
Constants may be lists of more (or less) than one value. A constant
with no values evaluates to undef in scalar context. Note that
constants with more than one value do not return their last value in
scalar context as one might expect. They currently return the number
of values, but this may change in the future. Do not use constants
with multiple values in scalar context.
NOTE: This implies that the expression defining the value of a constant is evaluated in list context. This may produce surprises:
The first line above defines TIMESTAMP
as a 9-element list, as
returned by localtime() in list context. To set it to the string
returned by localtime() in scalar context, an explicit scalar
keyword is required.
List constants are lists, not arrays. To index or slice them, they must be placed in parentheses.
Instead of writing multiple use constant
statements, you may define
multiple constants in a single statement by giving, instead of the
constant name, a reference to a hash where the keys are the names of
the constants to be defined. Obviously, all constants defined using
this method must have a single value.
- use constant {
- FOO => "A single value",
- BAR => "This", "won't", "work!", # Error!
- };
This is a fundamental limitation of the way hashes are constructed in Perl. The error messages produced when this happens will often be quite cryptic -- in the worst case there may be none at all, and you'll only later find that something is broken.
When defining multiple constants, you cannot use the values of other
constants defined in the same declaration. This is because the
calling package doesn't know about any constant within that group
until after the use statement is finished.
- use constant {
- BITMASK => 0xAFBAEBA8,
- NEGMASK => ~BITMASK, # Error!
- };
Magical values and references can be made into constants at compile time, allowing for way cool stuff like this. (These error numbers aren't totally portable, alas.)
You can't produce a tied constant by giving a tied scalar as the value. References to tied variables, however, can be used as constants without any problems.
In the current implementation, scalar constants are actually inlinable subroutines. As of version 5.004 of Perl, the appropriate scalar constant is inserted directly in place of some subroutine calls, thereby saving the overhead of a subroutine call. See Constant Functions in perlsub for details about how and when this happens.
In the rare case in which you need to discover at run time whether a
particular constant has been declared via this module, you may use
this function to examine the hash %constant::declared
. If the given
constant name does not include a package name, the current package is
used.
In the current version of Perl, list constants are not inlined and some symbols may be redefined without generating a warning.
It is not possible to have a subroutine or a keyword with the same name as a constant in the same package. This is probably a Good Thing.
A constant with a name in the list STDIN STDOUT STDERR ARGV ARGVOUT
ENV INC SIG
is not allowed anywhere but in package main::
, for
technical reasons.
Unlike constants in some languages, these cannot be overridden on the command line or via environment variables.
You can get into trouble if you use constants in a context which
automatically quotes barewords (as is true for any subroutine call).
For example, you can't say $hash{CONSTANT}
because CONSTANT
will
be interpreted as a string. Use $hash{CONSTANT()}
or
$hash{+CONSTANT}
to prevent the bareword quoting mechanism from
kicking in. Similarly, since the =>
operator quotes a bareword
immediately to its left, you have to say CONSTANT() => 'value'
(or simply use a comma in place of the big arrow) instead of
CONSTANT => 'value'
.
Readonly - Facility for creating read-only scalars, arrays, hashes.
Attribute::Constant - Make read-only variables via attribute
Scalar::Readonly - Perl extension to the SvREADONLY
scalar flag
Hash::Util - A selection of general-utility hash subroutines (mostly to lock/unlock keys and values)
Please report any bugs or feature requests via the perlbug(1) utility.
Tom Phoenix, <rootbeer@redcat.com>, with help from many other folks.
Multiple constant declarations at once added by Casey West, <casey@geeknest.com>.
Documentation mostly rewritten by Ilmari Karonen, <perl@itz.pp.sci.fi>.
This program is maintained by the Perl 5 Porters. The CPAN distribution is maintained by Sébastien Aperghis-Tramoni <sebastien@aperghis.net>.
Copyright (C) 1997, 1999 Tom Phoenix
This module is free software; you can redistribute it or modify it under the same terms as Perl itself.
corelist - a commandline frontend to Module::CoreList
See Module::CoreList for one.
- corelist -v
- corelist [-a|-d] <ModuleName> | /<ModuleRegex>/ [<ModuleVersion>] ...
- corelist [-v <PerlVersion>] [ <ModuleName> | /<ModuleRegex>/ ] ...
- corelist [-r <PerlVersion>] ...
- corelist --feature <FeatureName> [<FeatureName>] ...
- corelist --diff PerlVersion PerlVersion
- corelist --upstream <ModuleName>
lists all versions of the given module (or the matching modules, in case you used a module regexp) in the perls Module::CoreList knows about.
- corelist -a Unicode
- Unicode was first released with perl v5.6.2
- v5.6.2 3.0.1
- v5.8.0 3.2.0
- v5.8.1 4.0.0
- v5.8.2 4.0.0
- v5.8.3 4.0.0
- v5.8.4 4.0.1
- v5.8.5 4.0.1
- v5.8.6 4.0.1
- v5.8.7 4.1.0
- v5.8.8 4.1.0
- v5.8.9 5.1.0
- v5.9.0 4.0.0
- v5.9.1 4.0.0
- v5.9.2 4.0.1
- v5.9.3 4.1.0
- v5.9.4 4.1.0
- v5.9.5 5.0.0
- v5.10.0 5.0.0
- v5.10.1 5.1.0
- v5.11.0 5.1.0
- v5.11.1 5.1.0
- v5.11.2 5.1.0
- v5.11.3 5.2.0
- v5.11.4 5.2.0
- v5.11.5 5.2.0
- v5.12.0 5.2.0
- v5.12.1 5.2.0
- v5.12.2 5.2.0
- v5.12.3 5.2.0
- v5.12.4 5.2.0
- v5.13.0 5.2.0
- v5.13.1 5.2.0
- v5.13.2 5.2.0
- v5.13.3 5.2.0
- v5.13.4 5.2.0
- v5.13.5 5.2.0
- v5.13.6 5.2.0
- v5.13.7 6.0.0
- v5.13.8 6.0.0
- v5.13.9 6.0.0
- v5.13.10 6.0.0
- v5.13.11 6.0.0
- v5.14.0 6.0.0
- v5.14.1 6.0.0
- v5.15.0 6.0.0
finds the first perl version where a module has been released by date, and not by version number (as is the default).
Given two versions of perl, this prints a human-readable table of all module changes between the two. The output format may change in the future, and is meant for humans, not programs. For programs, use the Module::CoreList API.
help! help! help! to see more help, try --man.
all of the help
lists all of the perl release versions we got the CoreList for.
If you pass a version argument (value of $]
, like 5.00503
or 5.008008
),
you get a list of all the modules and their respective versions.
(If you have the version
module, you can also use new-style version numbers,
like 5.8.8
.)
In module filtering context, it can be used as Perl version filter.
lists all of the perl releases and when they were released
If you pass a perl version you get the release date for that version only.
lists the first version bundle of each named feature given
Shows if the given module is primarily maintained in perl core or on CPAN and bug tracker URL.
As a special case, if you specify the module name Unicode
, you'll get
the version number of the Unicode Character Database bundled with the
requested perl versions.
- $ corelist File::Spec
- File::Spec was first released with perl 5.005
- $ corelist File::Spec 0.83
- File::Spec 0.83 was released with perl 5.007003
- $ corelist File::Spec 0.89
- File::Spec 0.89 was not in CORE (or so I think)
- $ corelist File::Spec::Aliens
- File::Spec::Aliens was not in CORE (or so I think)
- $ corelist /IPC::Open/
- IPC::Open2 was first released with perl 5
- IPC::Open3 was first released with perl 5
- $ corelist /MANIFEST/i
- ExtUtils::Manifest was first released with perl 5.001
- $ corelist /Template/
- /Template/ has no match in CORE (or so I think)
- $ corelist -v 5.8.8 B
- B 1.09_01
- $ corelist -v 5.8.8 /^B::/
- B::Asmdata 1.01
- B::Assembler 0.07
- B::Bblock 1.02_01
- B::Bytecode 1.01_01
- B::C 1.04_01
- B::CC 1.00_01
- B::Concise 0.66
- B::Debug 1.02_01
- B::Deparse 0.71
- B::Disassembler 1.05
- B::Lint 1.03
- B::O 1.00
- B::Showlex 1.02
- B::Stackobj 1.00
- B::Stash 1.00
- B::Terse 1.03_01
- B::Xref 1.01
Copyright (c) 2002-2007 by D.H. aka PodMaster
Currently maintained by the perl 5 porters <perl5-porters@perl.org>.
This program is distributed under the same terms as perl itself. See http://perl.org/ or http://cpan.org/ for more info on that.
cpan - easily interact with CPAN from the command line
- # with arguments and no switches, installs specified modules
- cpan module_name [ module_name ... ]
- # with switches, installs modules with extra behavior
- cpan [-cfgimtTw] module_name [ module_name ... ]
- # with just the dot, install from the distribution in the
- # current directory
- cpan .
- # without arguments, starts CPAN.pm shell
- cpan
- # dump the configuration
- cpan -J
- # load a different configuration to install Module::Foo
- cpan -j some/other/file Module::Foo
- # without arguments, but some switches
- cpan [-ahrvACDlLO]
This script provides a command interface (not a shell) to CPAN. At the moment it uses CPAN.pm to do the work, but it is not a one-shot command runner for CPAN.pm.
Creates a CPAN.pm autobundle with CPAN::Shell->autobundle.
Shows the primary maintainers for the specified modules.
Runs a `make clean` in the specified module's directories.
Show the Changes files for the specified modules
Show the module details.
Force the specified action, when it normally would have failed. Use this to install a module even if its tests fail. When you use this option, -i is not optional for installing a module when you need to force it:
- % cpan -f -i Module::Foo
Turn off CPAN.pm's attempts to lock anything. You should be careful with
this since you might end up with multiple scripts trying to muck in the
same directory. This isn't so much of a concern if you're loading a special
config with -j
, and that config sets up its own work directories.
Downloads to the current directory the latest distribution of the module.
UNIMPLEMENTED
Download to the current directory the latest distribution of the modules, unpack each distribution, and create a git repository for each distribution.
If you want this feature, check out Yanick Champoux's Git::CPAN::Patch
distribution.
Print a help message and exit. When you specify -h
, it ignores all
of the other options and arguments.
Install the specified modules.
Load local::lib (think like -I
for loading lib paths).
Load the file that has the CPAN configuration data. This should have the
same format as the standard CPAN/Config.pm file, which defines
$CPAN::Config
as an anonymous hash.
Dump the configuration in the same format that CPAN.pm uses. This is useful for checking the configuration as well as using the dump as a starting point for a new, custom configuration.
List all installed modules wth their versions
List the modules by the specified authors.
Make the specified modules.
Show the out-of-date modules.
Ping the configured mirrors
Find the best mirrors you could be using (but doesn't configure them just yet)
Recompiles dynamically loaded modules with CPAN::Shell->recompile.
Run a `make test` on the specified modules.
Do not test modules. Simply install them.
Upgrade all installed modules. Blindly doing this can really break things, so keep a backup.
Print the script version and CPAN.pm version then exit.
Print detailed information about the cpan client.
UNIMPLEMENTED
Turn on cpan warnings. This checks various things, like directory permissions, and tells you about problems you might have.
- # print a help message
- cpan -h
- # print the version numbers
- cpan -v
- # create an autobundle
- cpan -a
- # recompile modules
- cpan -r
- # upgrade all installed modules
- cpan -u
- # install modules ( sole -i is optional )
- cpan -i Netscape::Booksmarks Business::ISBN
- # force install modules ( must use -i )
- cpan -fi CGI::Minimal URI
cpan
splits this variable on whitespace and prepends that list to @ARGV
before it processes the command-line arguments. For instance, if you always
want to use local:lib, you can set CPAN_OPTS
to -I
.
The script exits with zero if it thinks that everything worked, or a positive number if it thinks that something failed. Note, however, that in some cases it has to divine a failure by the output of things it does not control. For now, the exit codes are vague:
- 1 An unknown error
- 2 The was an external problem
- 4 There was an internal problem with the script
- 8 A module failed to install
* one shot configuration values from the command line
* none noted
Most behaviour, including environment variables and configuration, comes directly from CPAN.pm.
This code is in Github:
- git://github.com/briandfoy/cpan_script.git
Japheth Cleaver added the bits to allow a forced install (-f).
Jim Brandt suggest and provided the initial implementation for the up-to-date and Changes features.
Adam Kennedy pointed out that exit() causes problems on Windows where this script ends up with a .bat extension
brian d foy, <bdfoy@cpan.org>
Copyright (c) 2001-2013, brian d foy, All Rights Reserved.
You may redistribute this under the same terms as Perl itself.
cpan2dist - The CPANPLUS distribution creator
This script will create distributions of CPAN
modules of the format
you specify, including its prerequisites. These packages can then be
installed using the corresponding package manager for the format.
Note, you can also do this interactively from the default shell,
CPANPLUS::Shell::Default
. See the CPANPLUS::Dist
documentation,
as well as the documentation of your format of choice for any format
specific documentation.
Some modules you'd rather not package. Some because they are part of core-perl and you dont want a new package. Some because they won't build on your system. Some because your package manager of choice already packages them for you.
There may be a myriad of reasons. You can use the --ignore
and --ban
options for this, but we provide some built-in
lists that catch common cases. You can use these built-in lists
if you like, or supply your own if need be.
You can use this list of regexes to ignore modules matching to be listed as prerequisites of a package. Particularly useful if they are bundled with core-perl anyway and they have known issues building.
Toggle it by supplying the --default-ignorelist
option.
You can use this list of regexes to disable building of these modules altogether.
Toggle it by supplying the --default-banlist
option.
CPANPLUS::Dist, CPANPLUS::Module, CPANPLUS::Shell::Default,
cpanp
Please report bugs or other issues to <bug-cpanplus@rt.cpan.org<gt>.
This module by Jos Boumans <kane@cpan.org>.
The CPAN++ interface (of which this module is a part of) is copyright (c) 2001 - 2007, Jos Boumans <kane@cpan.org>. All rights reserved.
This library is free software; you may redistribute and/or modify it under the same terms as Perl itself.
cpanp - The CPANPLUS launcher
cpanp
cpanp [-]a [ --[no-]option... ] author...
cpanp [-]mfitulrcz [ --[no-]option... ] module...
cpanp [-]d [ --[no-]option... ] [ --fetchdir=... ] module...
cpanp [-]xb [ --[no-]option... ]
cpanp [-]o [ --[no-]option... ] [ module... ]
This script launches the CPANPLUS utility to perform various operations from the command line. If it's invoked without arguments, an interactive shell is executed by default.
Optionally, it can take a single-letter switch and one or more argument,
to perform the associated action on each arguments. A summary of the
available commands is listed below; cpanp -h
provides a detailed list.
- h # help information
- v # version information
- a AUTHOR ... # search by author(s)
- m MODULE ... # search by module(s)
- f MODULE ... # list all releases of a module
- i MODULE ... # install module(s)
- t MODULE ... # test module(s)
- u MODULE ... # uninstall module(s)
- d MODULE ... # download module(s)
- l MODULE ... # display detailed information about module(s)
- r MODULE ... # display README files of module(s)
- c MODULE ... # check for module report(s) from cpan-testers
- z MODULE ... # extract module(s) and open command prompt in it
- x # reload CPAN indices
- o [ MODULE ... ] # list installed module(s) that aren't up to date
- b # write a bundle file for your configuration
Each command may be followed by one or more options. If preceded by no,
the corresponding option will be set to 0
, otherwise it's set to 1
.
Example: To skip a module's tests,
- cpanp -i --skiptest MODULE ...
Valid options for most commands are cpantest
, debug
, flush
, force
,
prereqs
, storable
, verbose
, md5
, signature
, and skiptest
; the
'd' command also accepts fetchdir
. Please consult CPANPLUS::Configure
for an explanation to their meanings.
Example: To download a module's tarball to the current directory,
- cpanp -d --fetchdir=. MODULE ...
diagnostics, splain - produce verbose warning diagnostics
Using the diagnostics
pragma:
Using the splain
standalone filter program:
- perl program 2>diag.out
- splain [-v] [-p] diag.out
Using diagnostics to get stack traces from a misbehaving script:
- perl -Mdiagnostics=-traceonly my_script.pl
diagnostics
PragmaThis module extends the terse diagnostics normally emitted by both the
perl compiler and the perl interpreter (from running perl with a -w
switch or use warnings
), augmenting them with the more
explicative and endearing descriptions found in perldiag. Like the
other pragmata, it affects the compilation phase of your program rather
than merely the execution phase.
To use in your program as a pragma, merely invoke
- use diagnostics;
at the start (or near the start) of your program. (Note that this does enable perl's -w flag.) Your whole compilation will then be subject(ed :-) to the enhanced diagnostics. These still go out STDERR.
Due to the interaction between runtime and compiletime issues,
and because it's probably not a very good idea anyway,
you may not use no diagnostics
to turn them off at compiletime.
However, you may control their behaviour at runtime using the
disable() and enable() methods to turn them off and on respectively.
The -verbose flag first prints out the perldiag introduction before any other diagnostics. The $diagnostics::PRETTY variable can generate nicer escape sequences for pagers.
Warnings dispatched from perl itself (or more accurately, those that match descriptions found in perldiag) are only displayed once (no duplicate descriptions). User code generated warnings a la warn() are unaffected, allowing duplicate user messages to be displayed.
This module also adds a stack trace to the error message when perl dies. This is useful for pinpointing what caused the death. The -traceonly (or just -t) flag turns off the explanations of warning messages leaving just the stack traces. So if your script is dieing, run it again with
- perl -Mdiagnostics=-traceonly my_bad_script
to see the call stack at the time of death. By supplying the -warntrace (or just -w) flag, any warnings emitted will also come with a stack trace.
While apparently a whole nuther program, splain is actually nothing
more than a link to the (executable) diagnostics.pm module, as well as
a link to the diagnostics.pod documentation. The -v flag is like
the use diagnostics -verbose
directive.
The -p flag is like the
$diagnostics::PRETTY variable. Since you're post-processing with
splain, there's no sense in being able to enable() or disable() processing.
Output from splain is directed to STDOUT, unlike the pragma.
The following file is certain to trigger a few errors at both runtime and compiletime:
If you prefer to run your program first and look at its problem afterwards, do this:
- perl -w test.pl 2>test.out
- ./splain < test.out
Note that this is not in general possible in shells of more dubious heritage, as the theoretical
- (perl -w test.pl >/dev/tty) >& test.out
- ./splain < test.out
Because you just moved the existing stdout to somewhere else.
If you don't want to modify your source code, but still have on-the-fly warnings, do this:
- exec 3>&1; perl -w test.pl 2>&1 1>&3 3>&- | splain 1>&2 3>&-
Nifty, eh?
If you want to control warnings on the fly, do something like this.
Make sure you do the use first, or you won't be able to get
at the enable() or disable() methods.
- use diagnostics; # checks entire compilation phase
- print "\ntime for 1st bogus diags: SQUAWKINGS\n";
- print BOGUS1 'nada';
- print "done with 1st bogus\n";
- disable diagnostics; # only turns off runtime warnings
- print "\ntime for 2nd bogus: (squelched)\n";
- print BOGUS2 'nada';
- print "done with 2nd bogus\n";
- enable diagnostics; # turns back on runtime warnings
- print "\ntime for 3rd bogus: SQUAWKINGS\n";
- print BOGUS3 'nada';
- print "done with 3rd bogus\n";
- disable diagnostics;
- print "\ntime for 4th bogus: (squelched)\n";
- print BOGUS4 'nada';
- print "done with 4th bogus\n";
Diagnostic messages derive from the perldiag.pod file when available at runtime. Otherwise, they may be embedded in the file itself when the splain package is built. See the Makefile for details.
If an extant $SIG{__WARN__} handler is discovered, it will continue to be honored, but only after the diagnostics::splainthis() function (the module's $SIG{__WARN__} interceptor) has had its way with your warnings.
There is a $diagnostics::DEBUG variable you may set if you're desperately curious what sorts of things are being intercepted.
- BEGIN { $diagnostics::DEBUG = 1 }
Not being able to say "no diagnostics" is annoying, but may not be insurmountable.
The -pretty
directive is called too late to affect matters.
You have to do this instead, and before you load the module.
- BEGIN { $diagnostics::PRETTY = 1 }
I could start up faster by delaying compilation until it should be needed, but this gets a "panic: top_level" when using the pragma form in Perl 5.001e.
While it's true that this documentation is somewhat subserious, if you use a program named splain, you should expect a bit of whimsy.
Tom Christiansen <tchrist@mox.perl.com>, 25 June 1995.
enc2xs -- Perl Encode Module Generator
- enc2xs -[options]
- enc2xs -M ModName mapfiles...
- enc2xs -C
enc2xs builds a Perl extension for use by Encode from either Unicode Character Mapping files (.ucm) or Tcl Encoding Files (.enc). Besides being used internally during the build process of the Encode module, you can use enc2xs to add your own encoding to perl. No knowledge of XS is necessary.
If you want to know as little about Perl as possible but need to add a new encoding, just read this chapter and forget the rest.
Have a .ucm file ready. You can get it from somewhere or you can write
your own from scratch or you can grab one from the Encode distribution
and customize it. For the UCM format, see the next Chapter. In the
example below, I'll call my theoretical encoding myascii, defined
in my.ucm. $
is a shell prompt.
- $ ls -F
- my.ucm
Issue a command as follows;
- $ enc2xs -M My my.ucm
- generating Makefile.PL
- generating My.pm
- generating README
- generating Changes
Now take a look at your current directory. It should look like this.
- $ ls -F
- Makefile.PL My.pm my.ucm t/
The following files were created.
- Makefile.PL - MakeMaker script
- My.pm - Encode submodule
- t/My.t - test file
Edit the files generated. You don't have to if you have no time AND no intention to give it to someone else. But it is a good idea to edit the pod and to add more tests.
Now issue a command all Perl Mongers love:
- $ perl Makefile.PL
- Writing Makefile for Encode::My
Now all you have to do is make.
- $ make
- cp My.pm blib/lib/Encode/My.pm
- /usr/local/bin/perl /usr/local/bin/enc2xs -Q -O \
- -o encode_t.c -f encode_t.fnm
- Reading myascii (myascii)
- Writing compiled form
- 128 bytes in string tables
- 384 bytes (75%) saved spotting duplicates
- 1 bytes (0.775%) saved using substrings
- ....
- chmod 644 blib/arch/auto/Encode/My/My.bs
- $
The time it takes varies depending on how fast your machine is and how large your encoding is. Unless you are working on something big like euc-tw, it won't take too long.
You can "make install" already but you should test first.
- $ make test
- PERL_DL_NONLAZY=1 /usr/local/bin/perl -Iblib/arch -Iblib/lib \
- -e 'use Test::Harness qw(&runtests $verbose); \
- $verbose=0; runtests @ARGV;' t/*.t
- t/My....ok
- All tests successful.
- Files=1, Tests=2, 0 wallclock secs
- ( 0.09 cusr + 0.01 csys = 0.09 CPU)
If you are content with the test result, just "make install"
If you want to add your encoding to Encode's demand-loading list (so you don't have to "use Encode::YourEncoding"), run
- enc2xs -C
to update Encode::ConfigLocal, a module that controls local settings. After that, "use Encode;" is enough to load your encodings on demand.
Encode uses the Unicode Character Map (UCM) format for source character mappings. This format is used by IBM's ICU package and was adopted by Nick Ing-Simmons for use with the Encode module. Since UCM is more flexible than Tcl's Encoding Map and far more user-friendly, this is the recommended format for Encode now.
A UCM file looks like this.
- #
- # Comments
- #
- <code_set_name> "US-ascii" # Required
- <code_set_alias> "ascii" # Optional
- <mb_cur_min> 1 # Required; usually 1
- <mb_cur_max> 1 # Max. # of bytes/char
- <subchar> \x3F # Substitution char
- #
- CHARMAP
- <U0000> \x00 |0 # <control>
- <U0001> \x01 |0 # <control>
- <U0002> \x02 |0 # <control>
- ....
- <U007C> \x7C |0 # VERTICAL LINE
- <U007D> \x7D |0 # RIGHT CURLY BRACKET
- <U007E> \x7E |0 # TILDE
- <U007F> \x7F |0 # <control>
- END CHARMAP
Anything that follows #
is treated as a comment.
The header section continues until a line containing the word CHARMAP. This section has a form of <keyword> value, one pair per line. Strings used as values must be quoted. Barewords are treated as numbers. \xXX represents a byte.
Most of the keywords are self-explanatory. subchar means substitution character, not subcharacter. When you decode a Unicode sequence to this encoding but no matching character is found, the byte sequence defined here will be used. For most cases, the value here is \x3F; in ASCII, this is a question mark.
CHARMAP starts the character map section. Each line has a form as follows:
- <UXXXX> \xXX.. |0 # comment
- ^ ^ ^
- | | +- Fallback flag
- | +-------- Encoded byte sequence
- +-------------- Unicode Character ID in hex
The format is roughly the same as a header section except for the fallback flag: | followed by 0..3. The meaning of the possible values is as follows:
Round trip safe. A character decoded to Unicode encodes back to the same byte sequence. Most characters have this flag.
Fallback for unicode -> encoding. When seen, enc2xs adds this character for the encode map only.
Skip sub-char mapping should there be no code point.
Fallback for encoding -> unicode. When seen, enc2xs adds this character for the decode map only.
And finally, END OF CHARMAP ends the section.
When you are manually creating a UCM file, you should copy ascii.ucm or an existing encoding which is close to yours, rather than write your own from scratch.
When you do so, make sure you leave at least U0000 to U0020 as is, unless your environment is EBCDIC.
CAVEAT: not all features in UCM are implemented. For example, icu:state is not used. Because of that, you need to write a perl module if you want to support algorithmical encodings, notably the ISO-2022 series. Such modules include Encode::JP::2022_JP, Encode::KR::2022_KR, and Encode::TW::HZ.
When you create a map, you SHOULD make your mappings round-trip safe.
That is, encode('your-encoding', decode('your-encoding', $data)) eq
$data
stands for all characters that are marked as |0. Here is
how to make sure:
Sort your map in Unicode order.
When you have a duplicate entry, mark either one with '|1' or '|3'.
And make sure the '|1' or '|3' entry FOLLOWS the '|0' entry.
Here is an example from big5-eten.
- <U2550> \xF9\xF9 |0
- <U2550> \xA2\xA4 |3
Internally Encoding -> Unicode and Unicode -> Encoding Map looks like this;
- E to U U to E
- --------------------------------------
- \xF9\xF9 => U2550 U2550 => \xF9\xF9
- \xA2\xA4 => U2550
So it is round-trip safe for \xF9\xF9. But if the line above is upside down, here is what happens.
- E to U U to E
- --------------------------------------
- \xA2\xA4 => U2550 U2550 => \xF9\xF9
- (\xF9\xF9 => U2550 is now overwritten!)
The Encode package comes with ucmlint, a crude but sufficient utility to check the integrity of a UCM file. Check under the Encode/bin directory for this.
When in doubt, you can use ucmsort, yet another utility under Encode/bin directory.
ICU Home Page http://www.icu-project.org/
ICU Character Mapping Tables http://site.icu-project.org/charts/charset
ICU:Conversion Data http://www.icu-project.org/userguide/conversion-data.html
encoding - allows you to write your script in non-ascii or non-utf8
This module is deprecated under perl 5.18. It uses a mechanism provided by perl that is deprecated under 5.18 and higher, and may be removed in a future version.
- use encoding "greek"; # Perl like Greek to you?
- use encoding "euc-jp"; # Jperl!
- # or you can even do this if your shell supports your native encoding
- perl -Mencoding=latin2 -e'...' # Feeling centrally European?
- perl -Mencoding=euc-kr -e'...' # Or Korean?
- # more control
- # A simple euc-cn => utf-8 converter
- use encoding "euc-cn", STDOUT => "utf8"; while(<>){print};
- # "no encoding;" supported (but not scoped!)
- no encoding;
- # an alternate way, Filter
- use encoding "euc-jp", Filter=>1;
- # now you can use kanji identifiers -- in euc-jp!
- # switch on locale -
- # note that this probably means that unless you have a complete control
- # over the environments the application is ever going to be run, you should
- # NOT use the feature of encoding pragma allowing you to write your script
- # in any recognized encoding because changing locale settings will wreck
- # the script; you can of course still use the other features of the pragma.
- use encoding ':locale';
Let's start with a bit of history: Perl 5.6.0 introduced Unicode
support. You could apply substr() and regexes even to complex CJK
characters -- so long as the script was written in UTF-8. But back
then, text editors that supported UTF-8 were still rare and many users
instead chose to write scripts in legacy encodings, giving up a whole
new feature of Perl 5.6.
Rewind to the future: starting from perl 5.8.0 with the encoding
pragma, you can write your script in any encoding you like (so long
as the Encode
module supports it) and still enjoy Unicode support.
This pragma achieves that by doing the following:
Internally converts all literals (q//,qq//,qr//,qw///, qx//
) from
the encoding specified to utf8. In Perl 5.8.1 and later, literals in
tr/// and DATA
pseudo-filehandle are also converted.
Changing PerlIO layers of STDIN
and STDOUT
to the encoding
specified.
You can write code in EUC-JP as follows:
- my $Rakuda = "\xF1\xD1\xF1\xCC"; # Camel in Kanji
- #<-char-><-char-> # 4 octets
- s/\bCamel\b/$Rakuda/;
And with use encoding "euc-jp"
in effect, it is the same thing as
the code in UTF-8:
- my $Rakuda = "\x{99F1}\x{99DD}"; # two Unicode Characters
- s/\bCamel\b/$Rakuda/;
STD(IN|OUT)
The encoding pragma also modifies the filehandle layers of STDIN and STDOUT to the specified encoding. Therefore,
Will print "\xF1\xD1\xF1\xCC is the symbol of perl.\n", not "\x{99F1}\x{99DD} is the symbol of perl.\n".
You can override this by giving extra arguments; see below.
By default, if strings operating under byte semantics and strings with Unicode character data are concatenated, the new string will be created by decoding the byte strings as ISO 8859-1 (Latin-1).
The encoding pragma changes this to use the specified encoding instead. For example:
Will print 2
, because $string
is upgraded as UTF-8. Without
use encoding 'utf8';
, it will print 4
instead, since $string
is three octets when interpreted as Latin-1.
If the encoding
pragma is in scope then the lengths returned are
calculated from the length of $/
in Unicode characters, which is not
always the same as the length of $/
in the native encoding.
This pragma affects utf8::upgrade, but not utf8::downgrade.
Some of the features offered by this pragma requires perl 5.8.1. Most of these are done by Inaba Hiroto. Any other features and changes are good for 5.8.0.
Because perl needs to parse script before applying this pragma, such encodings as Shift_JIS and Big-5 that may contain '\' (BACKSLASH; \x5c) in the second byte fails because the second byte may accidentally escape the quoting character that follows. Perl 5.8.1 or later fixes this problem.
tr// was overlooked by Perl 5 porters when they released perl 5.8.0
See the section below for details.
Another feature that was overlooked was DATA
.
Sets the script encoding to ENCNAME. And unless ${^UNICODE} exists and non-zero, PerlIO layers of STDIN and STDOUT are set to ":encoding(ENCNAME)".
Note that STDERR WILL NOT be changed.
Also note that non-STD file handles remain unaffected. Use use
open
or binmode to change layers of those.
If no encoding is specified, the environment variable PERL_ENCODING
is consulted. If no encoding can be found, the error Unknown encoding
'ENCNAME' will be thrown.
You can also individually set encodings of STDIN and STDOUT via the
STDIN => ENCNAME form. In this case, you cannot omit the
first ENCNAME. STDIN => undef
turns the IO transcoding
completely off.
When ${^UNICODE} exists and non-zero, these options will completely ignored. ${^UNICODE} is a variable introduced in perl 5.8.1. See perlrun see ${^UNICODE} in perlvar and -C in perlrun for details (perl 5.8.1 and later).
This turns the encoding pragma into a source filter. While the default approach just decodes interpolated literals (in qq() and qr()), this will apply a source filter to the entire source code. See The Filter Option below for details.
Unsets the script encoding. The layers of STDIN, STDOUT are reset to ":raw" (the default unprocessed raw stream of bytes).
The magic of use encoding
is not applied to the names of
identifiers. In order to make ${"\x{4eba}"}++
($human++, where human
is a single Han ideograph) work, you still need to write your script
in UTF-8 -- or use a source filter. That's what 'Filter=>1' does.
What does this mean? Your source code behaves as if it is written in
UTF-8 with 'use utf8' in effect. So even if your editor only supports
Shift_JIS, for example, you can still try examples in Chapter 15 of
Programming Perl, 3rd Ed.. For instance, you can use UTF-8
identifiers.
This option is significantly slower and (as of this writing) non-ASCII identifiers are not very stable WITHOUT this option and with the source code written in UTF-8.
The Filter option now sets STDIN and STDOUT like non-filter options.
And STDIN=>ENCODING and STDOUT=>ENCODING work like
non-filter version.
use utf8
is implicitly declared so you no longer have to use
utf8
to ${"\x{4eba}"}++
.
The pragma is a per script, not a per block lexical. Only the last
use encoding
or no encoding
matters, and it affects
the whole script. However, the <no encoding> pragma is supported and
use encoding can appear as many times as you want in a given script.
The multiple use of this pragma is discouraged.
By the same reason, the use this pragma inside modules is also discouraged (though not as strongly discouraged as the case above. See below).
If you still have to write a module with this pragma, be very careful of the load order. See the codes below;
- # called module
- package Module_IN_BAR;
- use encoding "bar";
- # stuff in "bar" encoding here
- 1;
- # caller script
- use encoding "foo"
- use Module_IN_BAR;
- # surprise! use encoding "bar" is in effect.
The best way to avoid this oddity is to use this pragma RIGHT AFTER other modules are loaded. i.e.
Notice that only literals (string or regular expression) having only legacy code points are affected: if you mix data like this
- \xDF\x{100}
the data is assumed to be in (Latin 1 and) Unicode, not in your native encoding. In other words, this will match in "greek":
- "\xDF" =~ /\x{3af}/
but this will not
- "\xDF\x{100}" =~ /\x{3af}\x{100}/
since the \xDF
(ISO 8859-7 GREEK SMALL LETTER IOTA WITH TONOS) on
the left will not be upgraded to \x{3af} (Unicode GREEK SMALL
LETTER IOTA WITH TONOS) because of the \x{100}
on the left. You
should not be mixing your legacy data and Unicode in the same string.
This pragma also affects encoding of the 0x80..0xFF code point range:
normally characters in that range are left as eight-bit bytes (unless
they are combined with characters with code points 0x100 or larger,
in which case all characters need to become UTF-8 encoded), but if
the encoding
pragma is present, even the 0x80..0xFF range always
gets UTF-8 encoded.
After all, the best thing about this pragma is that you don't have to resort to \x{....} just to spell your name in a native encoding. So feel free to put your strings in your encoding in quotes and regexes.
The encoding pragma works by decoding string literals in
q//,qq//,qr//,qw///, qx//
and so forth. In perl 5.8.0, this
does not apply to tr///. Therefore,
- use encoding 'euc-jp';
- #....
- $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/;
- # -------- -------- -------- --------
Does not work as
- $kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/;
- utf8 euc-jp charnames::viacode()
- -----------------------------------------
- \x{3041} \xA4\xA1 HIRAGANA LETTER SMALL A
- \x{3093} \xA4\xF3 HIRAGANA LETTER N
- \x{30a1} \xA5\xA1 KATAKANA LETTER SMALL A
- \x{30f3} \xA5\xF3 KATAKANA LETTER N
This counterintuitive behavior has been fixed in perl 5.8.1.
In perl 5.8.0, you can work around as follows;
Note the tr// expression is surrounded by qq{}. The idea behind
is the same as classic idiom that makes tr/// 'interpolate'.
- tr/$from/$to/; # wrong!
- eval qq{ tr/$from/$to/ }; # workaround.
Nevertheless, in case of encoding pragma even q// is affected so
tr/// not being decoded was obviously against the will of Perl5
Porters so it has been fixed in Perl 5.8.1 or later.
- use encoding "iso 8859-7";
- # \xDF in ISO 8859-7 (Greek) is \x{3af} in Unicode.
- $a = "\xDF";
- $b = "\x{100}";
- printf "%#x\n", ord($a); # will print 0x3af, not 0xdf
- $c = $a . $b;
- # $c will be "\x{3af}\x{100}", not "\x{df}\x{100}".
- # chr() is affected, and ...
- print "mega\n" if ord(chr(0xdf)) == 0x3af;
- # ... ord() is affected by the encoding pragma ...
- print "tera\n" if ord(pack("C", 0xdf)) == 0x3af;
- # ... as are eq and cmp ...
- print "peta\n" if "\x{3af}" eq pack("C", 0xdf);
- print "exa\n" if "\x{3af}" cmp pack("C", 0xdf) == 0;
- # ... but pack/unpack C are not affected, in case you still
- # want to go back to your native encoding
- print "zetta\n" if unpack("C", (pack("C", 0xdf))) == 0xdf;
For native multibyte encodings (either fixed or variable length), the current implementation of the regular expressions may introduce recoding errors for regular expression literals longer than 127 bytes.
The encoding pragma is not supported on EBCDIC platforms. (Porters who are willing and able to remove this limitation are welcome.)
This pragma doesn't work well with format because PerlIO does not get along very well with it. When format contains non-ascii characters it prints funny or gets "wide character warnings". To understand it, try the code below.
- # Save this one in utf8
- # replace *non-ascii* with a non-ascii string
- my $camel;
- format STDOUT =
- *non-ascii*@>>>>>>>
- $camel
- .
- $camel = "*non-ascii*";
- binmode(STDOUT=>':encoding(utf8)'); # bang!
- write; # funny
- print $camel, "\n"; # fine
Without binmode this happens to work but without binmode, print() fails instead of write().
At any rate, the very use of format is questionable when it comes to unicode characters since you have to consider such things as character width (i.e. double-width for ideographs) and directions (i.e. BIDI for Arabic and Hebrew).
use encoding ...
is not thread-safe (i.e., do not use in threaded
applications).
The logic of :locale
is as follows:
If the platform supports the langinfo(CODESET) interface, the codeset returned is used as the default encoding for the open pragma.
If 1. didn't work but we are under the locale pragma, the environment
variables LC_ALL and LANG (in that order) are matched for encodings
(the part after ., if any), and if any found, that is used
as the default encoding for the open pragma.
If 1. and 2. didn't work, the environment variables LC_ALL and LANG
(in that order) are matched for anything looking like UTF-8, and if
any found, :utf8
is used as the default encoding for the open
pragma.
If your locale environment variables (LC_ALL, LC_CTYPE, LANG) contain the strings 'UTF-8' or 'UTF8' (case-insensitive matching), the default encoding of your STDIN, STDOUT, and STDERR, and of any subsequent file open, is UTF-8.
This pragma first appeared in Perl 5.8.0. For features that require 5.8.1 and better, see above.
The :locale
subpragma was implemented in 2.01, or Perl 5.8.6.
perlunicode, Encode, open, Filter::Util::Call,
Ch. 15 of Programming Perl (3rd Edition)
by Larry Wall, Tom Christiansen, Jon Orwant;
O'Reilly & Associates; ISBN 0-596-00027-8
feature - Perl pragma to enable new features
- use feature qw(say switch);
- given ($foo) {
- when (1) { say "\$foo == 1" }
- when ([2,3]) { say "\$foo == 2 || \$foo == 3" }
- when (/^a[bc]d$/) { say "\$foo eq 'abd' || \$foo eq 'acd'" }
- when ($_ > 100) { say "\$foo > 100" }
- default { say "None of the above" }
- }
- use feature ':5.10'; # loads all features available in perl 5.10
- use v5.10; # implicitly loads :5.10 feature bundle
It is usually impossible to add new syntax to Perl without breaking
some existing programs. This pragma provides a way to minimize that
risk. New syntactic constructs, or new semantic meanings to older
constructs, can be enabled by use feature 'foo'
, and will be parsed
only when the appropriate feature pragma is in scope. (Nevertheless, the
CORE::
prefix provides access to all Perl keywords, regardless of this
pragma.)
Like other pragmas (use strict
, for example), features have a lexical
effect. use feature qw(foo)
will only make the feature "foo" available
from that point to the end of the enclosing block.
no feature
Features can also be turned off by using no feature "foo"
. This too
has lexical effect.
no feature
with no features specified will reset to the default group. To
disable all features (an unusual request!) use no feature ':all'
.
use feature 'say'
tells the compiler to enable the Perl 6 style
say function.
See say for details.
This feature is available starting with Perl 5.10.
use feature 'state'
tells the compiler to enable state
variables.
See Persistent Private Variables in perlsub for details.
This feature is available starting with Perl 5.10.
use feature 'switch'
tells the compiler to enable the Perl 6
given/when construct.
See Switch Statements in perlsyn for details.
This feature is available starting with Perl 5.10.
use feature 'unicode_strings'
tells the compiler to use Unicode semantics
in all string operations executed within its scope (unless they are also
within the scope of either use locale
or use bytes
). The same applies
to all regular expressions compiled within the scope, even if executed outside
it. It does not change the internal representation of strings, but only how
they are interpreted.
no feature 'unicode_strings'
tells the compiler to use the traditional
Perl semantics wherein the native character set semantics is used unless it is
clear to Perl that Unicode is desired. This can lead to some surprises
when the behavior suddenly changes. (See
The Unicode Bug in perlunicode for details.) For this reason, if you are
potentially using Unicode in your program, the
use feature 'unicode_strings'
subpragma is strongly recommended.
This feature is available starting with Perl 5.12; was almost fully
implemented in Perl 5.14; and extended in Perl 5.16 to cover quotemeta.
Under the unicode_eval
feature, Perl's eval function, when passed a
string, will evaluate it as a string of characters, ignoring any
use utf8
declarations. use utf8
exists to declare the encoding of
the script, which only makes sense for a stream of bytes, not a string of
characters. Source filters are forbidden, as they also really only make
sense on strings of bytes. Any attempt to activate a source filter will
result in an error.
The evalbytes feature enables the evalbytes keyword, which evaluates
the argument passed to it as a string of bytes. It dies if the string
contains any characters outside the 8-bit range. Source filters work
within evalbytes: they apply to the contents of the string being
evaluated.
Together, these two features are intended to replace the historical eval
function, which has (at least) two bugs in it, that cannot easily be fixed
without breaking existing programs:
eval behaves differently depending on the internal encoding of the
string, sometimes treating its argument as a string of bytes, and sometimes
as a string of characters.
Source filters activated within eval leak out into whichever file
scope is currently being compiled. To give an example with the CPAN module
Semi::Semicolons:
evalbytes fixes that to work the way one would expect:
These two features are available starting with Perl 5.16.
This provides the __SUB__ token that returns a reference to the current
subroutine or undef outside of a subroutine.
This feature is available starting with Perl 5.16.
This feature supports the legacy $[
variable. See $[ in perlvar and
arybase. It is on by default but disabled under use v5.16
(see
IMPLICIT LOADING, below).
This feature is available under this name starting with Perl 5.16. In previous versions, it was simply on all the time, and this pragma knew nothing about it.
use feature 'fc'
tells the compiler to enable the fc function,
which implements Unicode casefolding.
See fc for details.
This feature is available from Perl 5.16 onwards.
WARNING: This feature is still experimental and the implementation may change in future versions of Perl. For this reason, Perl will warn when you use the feature, unless you have explicitly disabled the warning:
- no warnings "experimental::lexical_subs";
This enables declaration of subroutines via my sub foo
, state sub foo
and our sub foo
syntax. See Lexical Subroutines in perlsub for details.
This feature is available from Perl 5.18 onwards.
It's possible to load multiple features together, using a feature bundle. The name of a feature bundle is prefixed with a colon, to distinguish it from an actual feature.
- use feature ":5.10";
The following feature bundles are available:
- bundle features included
- --------- -----------------
- :default array_base
- :5.10 say state switch array_base
- :5.12 say state switch unicode_strings array_base
- :5.14 say state switch unicode_strings array_base
- :5.16 say state switch unicode_strings
- unicode_eval evalbytes current_sub fc
- :5.18 say state switch unicode_strings
- unicode_eval evalbytes current_sub fc
The :default
bundle represents the feature set that is enabled before
any use feature
or no feature
declaration.
Specifying sub-versions such as the 0
in 5.14.0
in feature bundles has
no effect. Feature bundles are guaranteed to be the same for all sub-versions.
Instead of loading feature bundles by name, it is easier to let Perl do implicit loading of a feature bundle for you.
There are two ways to load the feature
pragma implicitly:
By using the -E
switch on the Perl command-line instead of -e
.
That will enable the feature bundle for that version of Perl in the
main compilation unit (that is, the one-liner that follows -E
).
By explicitly requiring a minimum Perl version number for your program, with
the use VERSION
construct. That is,
- use v5.10.0;
will do an implicit
and so on. Note how the trailing sub-version is automatically stripped from the version.
But to avoid portability warnings (see use), you may prefer:
- use 5.010;
with the same effect.
If the required version is older than Perl 5.10, the ":default" feature bundle is automatically loaded instead.
fields - compile-time class fields
- {
- package Foo;
- use fields qw(foo bar _Foo_private);
- sub new {
- my Foo $self = shift;
- unless (ref $self) {
- $self = fields::new($self);
- $self->{_Foo_private} = "this is Foo's secret";
- }
- $self->{foo} = 10;
- $self->{bar} = 20;
- return $self;
- }
- }
- my $var = Foo->new;
- $var->{foo} = 42;
- # this will generate an error
- $var->{zap} = 42;
- # subclassing
- {
- package Bar;
- use base 'Foo';
- use fields qw(baz _Bar_private); # not shared with Foo
- sub new {
- my $class = shift;
- my $self = fields::new($class);
- $self->SUPER::new(); # init base fields
- $self->{baz} = 10; # init own fields
- $self->{_Bar_private} = "this is Bar's secret";
- return $self;
- }
- }
The fields
pragma enables compile-time verified class fields.
NOTE: The current implementation keeps the declared fields in the %FIELDS hash of the calling package, but this may change in future versions. Do not update the %FIELDS hash directly, because it must be created at compile-time for it to be fully useful, as is done by this pragma.
Only valid for perl before 5.9.0:
If a typed lexical variable holding a reference is used to access a hash element and a package with the same name as the type has declared class fields using this pragma, then the operation is turned into an array access at compile time.
The related base
pragma will combine fields from base classes and any
fields declared using the fields
pragma. This enables field
inheritance to work properly.
Field names that start with an underscore character are made private to
the class and are not visible to subclasses. Inherited fields can be
overridden but will generate a warning if used together with the -w
switch.
Only valid for perls before 5.9.0:
The effect of all this is that you can have objects with named fields which are as compact and as fast arrays to access. This only works as long as the objects are accessed through properly typed variables. If the objects are not typed, access is only checked at run time.
The following functions are supported:
perl before 5.9.0: fields::new() creates and blesses a
pseudo-hash comprised of the fields declared using the fields
pragma into the specified class.
perl 5.9.0 and higher: fields::new() creates and blesses a
restricted-hash comprised of the fields declared using the fields
pragma into the specified class.
This function is usable with or without pseudo-hashes. It is the recommended way to construct a fields-based object.
This makes it possible to write a constructor like this:
before perl 5.9.0:
fields::phash() can be used to create and initialize a plain (unblessed) pseudo-hash. This function should always be used instead of creating pseudo-hashes directly.
If the first argument is a reference to an array, the pseudo-hash will be created with keys from that array. If a second argument is supplied, it must also be a reference to an array whose elements will be used as the values. If the second array contains less elements than the first, the trailing elements of the pseudo-hash will not be initialized. This makes it particularly useful for creating a pseudo-hash from subroutine arguments:
- sub dogtag {
- my $tag = fields::phash([qw(name rank ser_num)], [@_]);
- }
fields::phash() also accepts a list of key-value pairs that will be used to construct the pseudo hash. Examples:
perl 5.9.0 and higher:
Pseudo-hashes have been removed from Perl as of 5.10. Consider using restricted hashes or fields::new() instead. Using fields::phash() will cause an error.
filetest - Perl pragma to control the filetest permission operators
- $can_perhaps_read = -r "file"; # use the mode bits
- {
- use filetest 'access'; # intuit harder
- $can_really_read = -r "file";
- }
- $can_perhaps_read = -r "file"; # use the mode bits again
This pragma tells the compiler to change the behaviour of the filetest
permission operators, -r
-w
-x
-R
-W
-X
(see perlfunc).
The default behaviour of file test operators is to use the simple
mode bits as returned by the stat() family of system calls. However,
many operating systems have additional features to define more complex
access rights, for example ACLs (Access Control Lists).
For such environments, use filetest
may help the permission
operators to return results more consistent with other tools.
The use filetest
or no filetest
statements affect file tests defined in
their block, up to the end of the closest enclosing block (they are lexically
block-scoped).
Currently, only the access
sub-pragma is implemented. It enables (or
disables) the use of access() when available, that is, on most UNIX systems and
other POSIX environments. See details below.
The stat() mode bits are probably right for most of the files and directories found on your system, because few people want to use the additional features offered by access(). But you may encounter surprises if your program runs on a system that uses ACLs, since the stat() information won't reflect the actual permissions.
There may be a slight performance decrease in the filetest operations when the filetest pragma is in effect, because checking bits is very cheap.
Also, note that using the file tests for security purposes is a lost cause from the start: there is a window open for race conditions (who is to say that the permissions will not change between the test and the real operation?). Therefore if you are serious about security, just try the real operation and test for its success - think in terms of atomic operations. Filetests are more useful for filesystem administrative tasks, when you have no need for the content of the elements on disk.
UNIX and POSIX systems provide an abstract access() operating system call, which should be used to query the read, write, and execute rights. This function hides various distinct approaches in additional operating system specific security features, like Access Control Lists (ACLs)
The extended filetest functionality is used by Perl only when the argument of the operators is a filename, not when it is a filehandle.
_
Because access() does not invoke stat() (at least not in a way visible
to Perl), the stat result cache "_" is not set. This means that the
outcome of the following two tests is different. The first has the stat
bits of /etc/passwd in _
, and in the second case this still
contains the bits of /etc.
Of course, unless your OS does not implement access(), in which case the
pragma is simply ignored. Best not to use _
at all in a file where
the filetest pragma is active!
As a side effect, as _
doesn't work, stacked filetest operators
(-f -w $file
) won't work either.
This limitation might be removed in a future version of perl.
find2perl - translate find command lines to Perl code
- find2perl [paths] [predicates] | perl
find2perl is a little translator to convert find command lines to equivalent Perl code. The resulting code is typically faster than running find itself.
"paths" are a set of paths where find2perl will start its searches and "predicates" are taken from the following list.
! PREDICATE
Negate the sense of the following predicate. The !
must be passed as
a distinct argument, so it may need to be surrounded by whitespace and/or
quoted from interpretation by the shell using a backslash (just as with
using find(1)
).
( PREDICATES )
Group the given PREDICATES. The parentheses must be passed as distinct
arguments, so they may need to be surrounded by whitespace and/or
quoted from interpretation by the shell using a backslash (just as with
using find(1)
).
PREDICATE1 PREDICATE2
True if _both_ PREDICATE1 and PREDICATE2 are true; PREDICATE2 is not evaluated if PREDICATE1 is false.
PREDICATE1 -o PREDICATE2
True if either one of PREDICATE1 or PREDICATE2 is true; PREDICATE2 is not evaluated if PREDICATE1 is true.
-follow
Follow (dereference) symlinks. The checking of file attributes depends
on the position of the -follow
option. If it precedes the file
check option, an stat is done which means the file check applies to the
file the symbolic link is pointing to. If -follow
option follows the
file check option, this now applies to the symbolic link itself, i.e.
an lstat is done.
-depth
Change directory traversal algorithm from breadth-first to depth-first.
-prune
Do not descend into the directory currently matched.
-xdev
Do not traverse mount points (prunes search at mount-point directories).
-name GLOB
File name matches specified GLOB wildcard pattern. GLOB may need to be
quoted to avoid interpretation by the shell (just as with using
find(1)
).
-iname GLOB
Like -name
, but the match is case insensitive.
-path GLOB
Path name matches specified GLOB wildcard pattern.
-ipath GLOB
Like -path
, but the match is case insensitive.
-perm PERM
Low-order 9 bits of permission match octal value PERM.
-perm -PERM
The bits specified in PERM are all set in file's permissions.
-type X
The file's type matches perl's -X operator.
-fstype TYPE
Filesystem of current path is of type TYPE (only NFS/non-NFS distinction is implemented).
-user USER
True if USER is owner of file.
-group GROUP
True if file's group is GROUP.
-nouser
True if file's owner is not in password database.
-nogroup
True if file's group is not in group database.
-inum INUM
True file's inode number is INUM.
-links N
True if (hard) link count of file matches N (see below).
-size N
True if file's size matches N (see below) N is normally counted in 512-byte blocks, but a suffix of "c" specifies that size should be counted in characters (bytes) and a suffix of "k" specifies that size should be counted in 1024-byte blocks.
-atime N
True if last-access time of file matches N (measured in days) (see below).
-ctime N
True if last-changed time of file's inode matches N (measured in days, see below).
-mtime N
True if last-modified time of file matches N (measured in days, see below).
-newer FILE
True if last-modified time of file matches N.
-print
Print out path of file (always true). If none of -exec
, -ls
,
-print0
, or -ok
is specified, then -print
will be added
implicitly.
-print0
Like -print, but terminates with \0 instead of \n.
-exec OPTIONS ;
exec() the arguments in OPTIONS in a subprocess; any occurrence of {} in
OPTIONS will first be substituted with the path of the current
file. Note that the command "rm" has been special-cased to use perl's
unlink() function instead (as an optimization). The ;
must be passed as
a distinct argument, so it may need to be surrounded by whitespace and/or
quoted from interpretation by the shell using a backslash (just as with
using find(1)
).
-ok OPTIONS ;
Like -exec, but first prompts user; if user's response does not begin
with a y, skip the exec. The ;
must be passed as
a distinct argument, so it may need to be surrounded by whitespace and/or
quoted from interpretation by the shell using a backslash (just as with
using find(1)
).
-eval EXPR
Has the perl script eval() the EXPR.
-ls
Simulates -exec ls -dils {} ;
-tar FILE
Adds current output to tar-format FILE.
-cpio FILE
Adds current output to old-style cpio-format FILE.
-ncpio FILE
Adds current output to "new"-style cpio-format FILE.
Predicates which take a numeric argument N can come in three forms:
- * N is prefixed with a +: match values greater than N
- * N is prefixed with a -: match values less than N
- * N is not prefixed with either + or -: match only values equal to N
find, File::Find.
h2ph - convert .h C header files to .ph Perl header files
h2ph [-d destination directory] [-r | -a] [-l] [headerfiles]
h2ph converts any C header files specified to the corresponding Perl header file format. It is most easily run while in /usr/include:
- cd /usr/include; h2ph * sys/*
or
- cd /usr/include; h2ph * sys/* arpa/* netinet/*
or
- cd /usr/include; h2ph -r -l .
The output files are placed in the hierarchy rooted at Perl's architecture dependent library directory. You can specify a different hierarchy with a -d switch.
If run with no arguments, filters standard input to standard output.
Put the resulting .ph files beneath destination_dir, instead of
beneath the default Perl library location ($Config{'installsitearch'}
).
Run recursively; if any of headerfiles are directories, then run h2ph on all files in those directories (and their subdirectories, etc.). -r and -a are mutually exclusive.
Run automagically; convert headerfiles, as well as any .h files which they include. This option will search for .h files in all directories which your C compiler ordinarily uses. -a and -r are mutually exclusive.
Symbolic links will be replicated in the destination directory. If -l is not specified, then links are skipped over.
Put 'hints' in the .ph files which will help in locating problems with h2ph. In those cases when you require a .ph file containing syntax errors, instead of the cryptic
- [ some error condition ] at (eval mmm) line nnn
you will see the slightly more helpful
- [ some error condition ] at filename.ph line nnn
However, the .ph files almost double in size when built using -h.
Include the code from the .h file as a comment in the .ph file. This is primarily used for debugging h2ph.
'Quiet' mode; don't print out the names of the files being converted.
No environment variables are used.
- /usr/include/*.h
- /usr/include/sys/*.h
etc.
Larry Wall
perl(1)
The usual warnings if it can't read or write the files involved.
Doesn't construct the %sizeof array for you.
It doesn't handle all C constructs, but it does attempt to isolate definitions inside evals so that you can get at the definitions that it can translate.
It's only intended as a rough tool. You may need to dicker with the files produced.
You have to run this program by hand; it's not run as part of the Perl installation.
Doesn't handle complicated expressions built piecemeal, a la:
- enum {
- FIRST_VALUE,
- SECOND_VALUE,
- #ifdef ABC
- THIRD_VALUE
- #endif
- };
Doesn't necessarily locate all of your C compiler's internally-defined symbols.
h2xs - convert .h C header files to Perl extensions
h2xs [OPTIONS ...] [headerfile ... [extra_libraries]]
h2xs -h|-?|--help
h2xs builds a Perl extension from C header files. The extension will include functions which can be used to retrieve the value of any #define statement which was in the C header files.
The module_name will be used for the name of the extension. If module_name is not supplied then the name of the first header file will be used, with the first character capitalized.
If the extension might need extra libraries, they should be included here. The extension Makefile.PL will take care of checking whether the libraries actually exist and how they should be loaded. The extra libraries should be specified in the form -lm -lposix, etc, just as on the cc command line. By default, the Makefile.PL will search through the library path determined by Configure. That path can be augmented by including arguments of the form -L/another/library/path in the extra-libraries argument.
In spite of its name, h2xs may also be used to create a skeleton pure Perl module. See the -X option.
Omit all autoload facilities. This is the same as -c but also
removes the use AutoLoader
statement from the .pm file.
Use an alpha/beta style version number. Causes version number to be "0.00_01" unless -v is specified.
Omits creation of the Changes file, and adds a HISTORY section to the POD template.
Additional flags to specify to C preprocessor when scanning header for function declarations. Writes these options in the generated Makefile.PL too.
selects functions/macros to process.
Allows a pre-existing extension directory to be overwritten.
Omit the autogenerated stub POD section.
Omit the XS portion. Used to generate a skeleton pure Perl module.
-c
and -f
are implicitly enabled.
Generate an accessor method for each element of structs and unions. The generated methods are named after the element name; will return the current value of the element if called without additional arguments; and will set the element to the supplied value (and return the new value) if called with an additional argument. Embedded structures and unions are returned as a pointer rather than the complete structure, to facilitate chained calls.
These methods all apply to the Ptr type for the structure; additionally
two methods are constructed for the structure type itself, _to_ptr
which returns a Ptr type pointing to the same structure, and a new
method to construct and return a new structure, initialised to zeroes.
Generates a .pm file which is backwards compatible with the specified perl version.
For versions < 5.6.0, the changes are. - no use of 'our' (uses 'use vars' instead) - no 'use warnings'
Specifying a compatibility version higher than the version of perl you are using to run h2xs will have no effect. If unspecified h2xs will default to compatibility with the version of perl you are using to run h2xs.
Omit constant()
from the .xs file and corresponding specialised
AUTOLOAD
from the .pm file.
Turn on debugging messages.
If regular expression is not given, skip all constants that are defined in a C enumeration. Otherwise skip only those constants that are defined in an enum whose name matches regular expression.
Since regular expression is optional, make sure that this switch is followed by at least one other switch if you omit regular expression and have some pending arguments such as header-file names. This is ok:
- h2xs -e -n Module::Foo foo.h
This is not ok:
- h2xs -n Module::Foo -e foo.h
In the latter, foo.h is taken as regular expression.
Allows an extension to be created for a header even if that header is not found in standard include directories.
Include code for safely storing static data in the .xs file. Extensions that do no make use of static data can ignore this option.
Print the usage, help and version for this h2xs and exit.
For function arguments declared as const
, omit the const attribute in the
generated XS code.
Experimental: for each variable declared in the header file(s), declare a perl variable of the same name magically tied to the C variable.
Specifies a name to be used for the extension, e.g., -n RPC::DCE
Use "opaque" data type for the C types matched by the regular
expression, even if these types are typedef
-equivalent to types
from typemaps. Should not be used without -x.
This may be useful since, say, types which are typedef
-equivalent
to integers may represent OS-related handles, and one may want to work
with these handles in OO-way, as in $handle->do_something()
.
Use -o .
if you want to handle all the typedef
ed types as opaque
types.
The type-to-match is whitewashed (except for commas, which have no
whitespace before them, and multiple *
which have no whitespace
between them).
Specify a prefix which should be removed from the Perl function names,
e.g., -p sec_rgy_ This sets up the XS PREFIX keyword and removes
the prefix from functions that are autoloaded via the constant()
mechanism.
Create a perl subroutine for the specified macros rather than autoload with the constant() subroutine. These macros are assumed to have a return type of char *, e.g., -s sec_rgy_wildcard_name,sec_rgy_wildcard_sid.
Specify the internal type that the constant() mechanism uses for macros.
The default is IV (signed integer). Currently all macros found during the
header scanning process will be assumed to have this type. Future versions
of h2xs
may gain the ability to make educated guesses.
When --compat-version (-b) is present the generated tests will use
Test::More
rather than Test
which is the default for versions before
5.6.2. Test::More
will be added to PREREQ_PM in the generated
Makefile.PL
.
Will force the generation of test code that uses the older Test
module.
Do not use Exporter
and/or export any symbol.
Do not use Devel::PPPort
: no portability to older version.
Do not use the module AutoLoader
; but keep the constant() function
and sub AUTOLOAD
for constants.
Do not use the pragma strict
.
Do not use the pragma warnings
.
Specify a version number for this extension. This version number is added
to the templates. The default is 0.01, or 0.00_01 if -B
is specified.
The version specified should be numeric.
Automatically generate XSUBs basing on function declarations in the
header file. The package C::Scan
should be installed. If this
option is specified, the name of the header file may look like
NAME1,NAME2
. In this case NAME1 is used instead of the specified
string, but XSUBs are emitted only for the declarations included from
file NAME2.
Note that some types of arguments/return-values for functions may
result in XSUB-declarations/typemap-entries which need
hand-editing. Such may be objects which cannot be converted from/to a
pointer (like long long
), pointers to functions, or arrays. See
also the section on LIMITATIONS of -x.
- # Default behavior, extension is Rusers
- h2xs rpcsvc/rusers
- # Same, but extension is RUSERS
- h2xs -n RUSERS rpcsvc/rusers
- # Extension is rpcsvc::rusers. Still finds <rpcsvc/rusers.h>
- h2xs rpcsvc::rusers
- # Extension is ONC::RPC. Still finds <rpcsvc/rusers.h>
- h2xs -n ONC::RPC rpcsvc/rusers
- # Without constant() or AUTOLOAD
- h2xs -c rpcsvc/rusers
- # Creates templates for an extension named RPC
- h2xs -cfn RPC
- # Extension is ONC::RPC.
- h2xs -cfn ONC::RPC
- # Extension is a pure Perl module with no XS code.
- h2xs -X My::Module
- # Extension is Lib::Foo which works at least with Perl5.005_03.
- # Constants are created for all #defines and enums h2xs can find
- # in foo.h.
- h2xs -b 5.5.3 -n Lib::Foo foo.h
- # Extension is Lib::Foo which works at least with Perl5.005_03.
- # Constants are created for all #defines but only for enums
- # whose names do not start with 'bar_'.
- h2xs -b 5.5.3 -e '^bar_' -n Lib::Foo foo.h
- # Makefile.PL will look for library -lrpc in
- # additional directory /opt/net/lib
- h2xs rpcsvc/rusers -L/opt/net/lib -lrpc
- # Extension is DCE::rgynbase
- # prefix "sec_rgy_" is dropped from perl function names
- h2xs -n DCE::rgynbase -p sec_rgy_ dce/rgynbase
- # Extension is DCE::rgynbase
- # prefix "sec_rgy_" is dropped from perl function names
- # subroutines are created for sec_rgy_wildcard_name and
- # sec_rgy_wildcard_sid
- h2xs -n DCE::rgynbase -p sec_rgy_ \
- -s sec_rgy_wildcard_name,sec_rgy_wildcard_sid dce/rgynbase
- # Make XS without defines in perl.h, but with function declarations
- # visible from perl.h. Name of the extension is perl1.
- # When scanning perl.h, define -DEXT=extern -DdEXT= -DINIT(x)=
- # Extra backslashes below because the string is passed to shell.
- # Note that a directory with perl header files would
- # be added automatically to include path.
- h2xs -xAn perl1 -F "-DEXT=extern -DdEXT= -DINIT\(x\)=" perl.h
- # Same with function declaration in proto.h as visible from perl.h.
- h2xs -xAn perl2 perl.h,proto.h
- # Same but select only functions which match /^av_/
- h2xs -M '^av_' -xAn perl2 perl.h,proto.h
- # Same but treat SV* etc as "opaque" types
- h2xs -o '^[S]V \*$' -M '^av_' -xAn perl2 perl.h,proto.h
Suppose that you have some C files implementing some functionality,
and the corresponding header files. How to create an extension which
makes this functionality accessible in Perl? The example below
assumes that the header files are interface_simple.h and
interface_hairy.h, and you want the perl module be named as
Ext::Ension
. If you need some preprocessor directives and/or
linking with external libraries, see the flags -F
, -L
and -l
in OPTIONS.
Start with a dummy run of h2xs:
- h2xs -Afn Ext::Ension
The only purpose of this step is to create the needed directories, and let you know the names of these directories. From the output you can see that the directory for the extension is Ext/Ension.
Copy your header files and C files to this directory Ext/Ension.
Run h2xs, overwriting older autogenerated files:
- h2xs -Oxan Ext::Ension interface_simple.h interface_hairy.h
h2xs looks for header files after changing to the extension directory, so it will find your header files OK.
As usual, run
- cd Ext/Ension
- perl Makefile.PL
- make dist
- make
- make test
It is important to do make dist
as early as possible. This way you
can easily merge(1) your changes to autogenerated files if you decide
to edit your .h files and rerun h2xs.
Do not forget to edit the documentation in the generated .pm file.
Consider the autogenerated files as skeletons only, you may invent better interfaces than what h2xs could guess.
Consider this section as a guideline only, some other options of h2xs may better suit your needs.
No environment variables are used.
Larry Wall and others
perl, perlxstut, ExtUtils::MakeMaker, and AutoLoader.
The usual warnings if it cannot read or write the files involved.
h2xs would not distinguish whether an argument to a C function
which is of the form, say, int *
, is an input, output, or
input/output parameter. In particular, argument declarations of the
form
should be better rewritten as
if n
is an input parameter.
Additionally, h2xs has no facilities to intuit that a function
- int
- foo(addr,l)
- char *addr
- int l
takes a pair of address and length of data at this address, so it is better to rewrite this function as
- int
- foo(sv)
- SV *addr
- PREINIT:
- STRLEN len;
- char *s;
- CODE:
- s = SvPV(sv,len);
- RETVAL = foo(s, len);
- OUTPUT:
- RETVAL
or alternately
- static int
- my_foo(SV *sv)
- {
- STRLEN len;
- char *s = SvPV(sv,len);
- return foo(s, len);
- }
- MODULE = foo PACKAGE = foo PREFIX = my_
- int
- foo(sv)
- SV *sv
if - use a Perl module if a condition holds
- use if CONDITION, MODULE => ARGUMENTS;
The construct
- use if CONDITION, MODULE => ARGUMENTS;
has no effect unless CONDITION
is true. In this case the effect is
the same as of
- use MODULE ARGUMENTS;
Above =>
provides necessary quoting of MODULE
. If not used (e.g.,
no ARGUMENTS to give), you'd better quote MODULE
yourselves.
The current implementation does not allow specification of the required version of the module.
Ilya Zakharevich mailto:ilyaz@cpan.org.
Core documentation for Perl 5 version 18.2, in HTML and PDF formats.
To find out what's new in Perl 5.18.2, read the perldelta manpage.
If you are new to the Perl language, good places to start reading are the introduction and overview at perlintro, and the extensive FAQ section, which provides answers to over 300 common questions.
The complete documentation set is available to download for offline use.
As well as the documentation pages, the perldoc search engine is also included in the above downloads. No installation is required, just unpack the archive and open the index.html file in your web browser.
To obtain Perl itself, please go to http://www.perl.org/get.html.
integer - Perl pragma to use integer arithmetic instead of floating point
- use integer;
- $x = 10/3;
- # $x is now 3, not 3.33333333333333333
This tells the compiler to use integer operations from here to the end of the enclosing BLOCK. On many machines, this doesn't matter a great deal for most computations, but on those without floating point hardware, it can make a big difference in performance.
Note that this only affects how most of the arithmetic and relational
operators handle their operands and results, and not how all
numbers everywhere are treated. Specifically, use integer;
has the
effect that before computing the results of the arithmetic operators
(+, -, *, /, %, +=, -=, *=, /=, %=, and unary minus), the comparison
operators (<, <=, >, >=, ==, !=, <=>), and the bitwise operators (|, &,
^, <<,>>, |=, &=, ^=, <<=,>>=), the operands have their fractional
portions truncated (or floored), and the result will have its
fractional portion truncated as well. In addition, the range of
operands and results is restricted to that of familiar two's complement
integers, i.e., -(2**31) .. (2**31-1) on 32-bit architectures, and
-(2**63) .. (2**63-1) on 64-bit architectures. For example, this code
will print: 5.8, -5, 7, 3, 2, 10, 1, 2147483647, -2147483648
Note that $x is still printed as having its true non-integer value of
5.8 since it wasn't operated on. And note too the wrap-around from the
largest positive integer to the largest negative one. Also, arguments
passed to functions and the values returned by them are not affected
by use integer;
. E.g.,
will give the same result with or without use integer;
The power
operator **
is also not affected, so that 2 ** .5 is always the
square root of 2. Now, it so happens that the pre- and post- increment
and decrement operators, ++ and --, are not affected by use integer;
either. Some may rightly consider this to be a bug -- but at least it's
a long-standing one.
Finally, use integer;
also has an additional affect on the bitwise
operators. Normally, the operands and results are treated as
unsigned integers, but with use integer;
the operands and results
are signed. This means, among other things, that ~0 is -1, and -2 &
-5 is -6.
Internally, native integer arithmetic (as provided by your C compiler) is used. This means that Perl's own semantics for arithmetic operations may not be preserved. One common source of trouble is the modulus of negative numbers, which Perl does one way, but your hardware may do another.
- % perl -le 'print (4 % -3)'
- -2
- % perl -Minteger -le 'print (4 % -3)'
- 1
See Pragmatic Modules in perlmodlib, Integer Arithmetic in perlop
less - perl pragma to request less of something
- use less 'CPU';
This is a user-pragma. If you're very lucky some code you're using will know that you asked for less CPU usage or ram or fat or... we just can't know. Consult your documentation on everything you're currently using.
For general suggestions, try requesting CPU
or memory
.
If you ask for nothing in particular, you'll be asking for less
'please'
.
- use less 'please';
less has been in the core as a "joke" module for ages now and it
hasn't had any real way to communicating any information to
anything. Thanks to Nicholas Clark we have user pragmas (see
perlpragma) and now less
can do something.
You can probably expect your users to be able to guess that they can request less CPU or memory or just "less" overall.
If the user didn't specify anything, it's interpreted as having used
the please
tag. It's up to you to make this useful.
BOOLEAN = less->of( FEATURE )
The class method less->of( NAME )
returns a boolean to tell you
whether your user requested less of something.
- if ( less->of( 'CPU' ) ) {
- ...
- }
- elsif ( less->of( 'memory' ) ) {
- }
FEATURES = less->of()
If you don't ask for any feature, you get the list of features that the user requested you to be nice to. This has the nice side effect that if you don't respect anything in particular then you can just ask for it and use it like a boolean.
- if ( less->of ) {
- ...
- }
- else {
- ...
- }
lib - manipulate @INC at compile time
This is a small simple module which simplifies the manipulation of @INC at compile time.
It is typically used to add extra directories to perl's search path so
that later use or require statements will find modules which are
not located on perl's default search path.
The parameters to use lib
are added to the start of the perl search
path. Saying
- use lib LIST;
is almost the same as saying
- BEGIN { unshift(@INC, LIST) }
For each directory in LIST (called $dir here) the lib module also checks to see if a directory called $dir/$archname/auto exists. If so the $dir/$archname directory is assumed to be a corresponding architecture specific directory and is added to @INC in front of $dir. lib.pm also checks if directories called $dir/$version and $dir/$version/$archname exist and adds these directories to @INC.
The current value of $archname
can be found with this command:
- perl -V:archname
The corresponding command to get the current value of $version
is:
- perl -V:version
To avoid memory leaks, all trailing duplicate entries in @INC are removed.
You should normally only add directories to @INC. If you need to delete directories from @INC take care to only delete those which you added yourself or which you are certain are not needed by other modules in your script. Other modules may have added directories which they need for correct operation.
The no lib
statement deletes all instances of each named directory
from @INC.
For each directory in LIST (called $dir here) the lib module also checks to see if a directory called $dir/$archname/auto exists. If so the $dir/$archname directory is assumed to be a corresponding architecture specific directory and is also deleted from @INC.
When the lib module is first loaded it records the current value of @INC
in an array @lib::ORIG_INC
. To restore @INC to that value you
can say
- @INC = @lib::ORIG_INC;
In order to keep lib.pm small and simple, it only works with Unix filepaths. This doesn't mean it only works on Unix, but non-Unix users must first translate their file paths to Unix conventions.
- # VMS users wanting to put [.stuff.moo] into
- # their @INC would write
- use lib 'stuff/moo';
In the future, this module will likely use File::Spec for determining paths, as it does now for Mac OS (where Unix-style or Mac-style paths work, and Unix-style paths are converted properly to Mac-style paths before being added to @INC).
If you try to add a file to @INC as follows:
- use lib 'this_is_a_file.txt';
lib
will warn about this. The sole exceptions are files with the
.par extension which are intended to be used as libraries.
FindBin - optional module which deals with paths relative to the source file.
PAR - optional module which can treat .par files as Perl libraries.
Tim Bunce, 2nd June 1995.
lib
is maintained by the perl5-porters. Please direct
any questions to the canonical mailing list. Anything that
is applicable to the CPAN release can be sent to its maintainer,
though.
Maintainer: The Perl5-Porters <perl5-porters@perl.org>
Maintainer of the CPAN release: Steffen Mueller <smueller@cpan.org>
This package has been part of the perl core since perl 5.001. It has been released separately to CPAN so older installations can benefit from bug fixes.
This package has the same copyright and license as the perl core.
libnetcfg - configure libnet
The libnetcfg utility can be used to configure the libnet. Starting from perl 5.8 libnet is part of the standard Perl distribution, but the libnetcfg can be used for any libnet installation.
Without arguments libnetcfg displays the current configuration.
- $ libnetcfg
- # old config ./libnet.cfg
- daytime_hosts ntp1.none.such
- ftp_int_passive 0
- ftp_testhost ftp.funet.fi
- inet_domain none.such
- nntp_hosts nntp.none.such
- ph_hosts
- pop3_hosts pop.none.such
- smtp_hosts smtp.none.such
- snpp_hosts
- test_exist 1
- test_hosts 1
- time_hosts ntp.none.such
- # libnetcfg -h for help
- $
It tells where the old configuration file was found (if found).
The -h
option will show a usage message.
To change the configuration you will need to use either the -c
or
the -d
options.
The default name of the old configuration file is by default
"libnet.cfg", unless otherwise specified using the -i option,
-i oldfile
, and it is searched first from the current directory,
and then from your module path.
The default name of the new configuration file is "libnet.cfg", and by
default it is written to the current directory, unless otherwise
specified using the -o option, -o newfile
.
Graham Barr, the original Configure script of libnet.
Jarkko Hietaniemi, conversion into libnetcfg for inclusion into Perl 5.8.
locale - Perl pragma to use or avoid POSIX locales for built-in operations
This pragma tells the compiler to enable (or disable) the use of POSIX locales for built-in operations (for example, LC_CTYPE for regular expressions, LC_COLLATE for string comparison, and LC_NUMERIC for number formatting). Each "use locale" or "no locale" affects statements to the end of the enclosing BLOCK.
Starting in Perl 5.16, a hybrid mode for this pragma is available,
- use locale ':not_characters';
which enables only the portions of locales that don't affect the character set (that is, all except LC_COLLATE and LC_CTYPE). This is useful when mixing Unicode and locales, including UTF-8 locales.
- use locale ':not_characters';
- use open ":locale"; # Convert I/O to/from Unicode
- use POSIX qw(locale_h); # Import the LC_ALL constant
- setlocale(LC_ALL, ""); # Required for the next statement
- # to take effect
- printf "%.2f\n", 12345.67' # Locale-defined formatting
- @x = sort @y; # Unicode-defined sorting order.
- # (Note that you will get better
- # results using Unicode::Collate.)
See perllocale for more detailed information on how Perl supports locales.
If your system does not support locales, then loading this module will cause the program to die with a message:
- "Your vendor does not support locales, you cannot use the locale
- module."
mro - Method Resolution Order
The "mro" namespace provides several utilities for dealing with method resolution order and method caching in general.
These interfaces are only available in Perl 5.9.5 and higher. See MRO::Compat on CPAN for a mostly forwards compatible implementation for older Perls.
It's possible to change the MRO of a given class either by using use
mro
as shown in the synopsis, or by using the mro::set_mro function
below.
The special methods next::method, next::can, and
maybe::next::method
are not available until this mro
module
has been loaded via use or require.
In addition to the traditional Perl default MRO (depth first
search, called DFS
here), Perl now offers the C3 MRO as
well. Perl's support for C3 is based on the work done in
Stevan Little's module Class::C3, and most of the C3-related
documentation here is ripped directly from there.
C3 is the name of an algorithm which aims to provide a sane method resolution order under multiple inheritance. It was first introduced in the language Dylan (see links in the SEE ALSO section), and then later adopted as the preferred MRO (Method Resolution Order) for the new-style classes in Python 2.3. Most recently it has been adopted as the "canonical" MRO for Perl 6 classes, and the default MRO for Parrot objects as well.
C3 works by always preserving local precedence ordering. This essentially means that no class will appear before any of its subclasses. Take, for instance, the classic diamond inheritance pattern:
- <A>
- / \
- <B> <C>
- \ /
- <D>
The standard Perl 5 MRO would be (D, B, A, C). The result being that A appears before C, even though C is the subclass of A. The C3 MRO algorithm however, produces the following order: (D, B, C, A), which does not have this issue.
This example is fairly trivial; for more complex cases and a deeper explanation, see the links in the SEE ALSO section.
Returns an arrayref which is the linearized MRO of the given class.
Uses whichever MRO is currently in effect for that class by default,
or the given MRO (either c3
or dfs
if specified as $type
).
The linearized MRO of a class is an ordered array of all of the classes one would search when resolving a method on that class, starting with the class itself.
If the requested class doesn't yet exist, this function will still
succeed, and return [ $classname ]
Note that UNIVERSAL
(and any members of UNIVERSAL
's MRO) are not
part of the MRO of a class, even though all classes implicitly inherit
methods from UNIVERSAL
and its parents.
Sets the MRO of the given class to the $type
argument (either
c3
or dfs
).
Returns the MRO of the given class (either c3
or dfs
).
Gets the mro_isarev
for this class, returned as an
arrayref of class names. These are every class that "isa"
the given class name, even if the isa relationship is
indirect. This is used internally by the MRO code to
keep track of method/MRO cache invalidations.
As with mro::get_linear_isa
above, UNIVERSAL
is special.
UNIVERSAL
(and parents') isarev lists do not include
every class in existence, even though all classes are
effectively descendants for method inheritance purposes.
Returns a boolean status indicating whether or not
the given classname is either UNIVERSAL
itself,
or one of UNIVERSAL
's parents by @ISA
inheritance.
Any class for which this function returns true is "universal" in the sense that all classes potentially inherit methods from it.
Increments PL_sub_generation
, which invalidates method
caching in all packages.
Invalidates the method cache of any classes dependent on the given class. This is not normally necessary. The only known case where pure perl code can confuse the method cache is when you manually install a new constant subroutine by using a readonly scalar value, like the internals of constant do. If you find another case, please report it so we can either fix it or document the exception here.
Returns an integer which is incremented every time a
real local method in the package $classname
changes,
or the local @ISA
of $classname
is modified.
This is intended for authors of modules which do lots
of class introspection, as it allows them to very quickly
check if anything important about the local properties
of a given class have changed since the last time they
looked. It does not increment on method/@ISA
changes in superclasses.
It's still up to you to seek out the actual changes, and there might not actually be any. Perhaps all of the changes since you last checked cancelled each other out and left the package in the state it was in before.
This integer normally starts off at a value of 1
when a package stash is instantiated. Calling it
on packages whose stashes do not exist at all will
return 0
. If a package stash is completely
deleted (not a normal occurence, but it can happen
if someone does something like undef %PkgName::
),
the number will be reset to either 0
or 1
,
depending on how completely package was wiped out.
This is somewhat like SUPER
, but it uses the C3 method
resolution order to get better consistency in multiple
inheritance situations. Note that while inheritance in
general follows whichever MRO is in effect for the
given class, next::method only uses the C3 MRO.
One generally uses it like so:
Note that you don't (re-)specify the method name. It forces you to always use the same method name as the method you started in.
It can be called on an object or a class, of course.
The way it resolves which actual method to call is:
First, it determines the linearized C3 MRO of the object or class it is being called on.
Then, it determines the class and method name of the context it was invoked from.
Finally, it searches down the C3 MRO list until it reaches the contextually enclosing class, then searches further down the MRO list for the next method with the same name as the contextually enclosing method.
Failure to find a next method will result in an exception being thrown (see below for alternatives).
This is substantially different than the behavior
of SUPER
under complex multiple inheritance.
(This becomes obvious when one realizes that the
common superclasses in the C3 linearizations of
a given class and one of its parents will not
always be ordered the same for both.)
Caveat: Calling next::method from methods defined outside the class:
There is an edge case when using next::method from within a subroutine
which was created in a different module than the one it is called from. It
sounds complicated, but it really isn't. Here is an example which will not
work correctly:
The problem exists because the anonymous subroutine being assigned to the
*Foo::foo
glob will show up in the call stack as being called
__ANON__
and not foo
as you might expect. Since next::method uses
caller to find the name of the method it was called in, it will fail in
this case.
But fear not, there's a simple solution. The module Sub::Name
will
reach into the perl internals and assign a name to an anonymous subroutine
for you. Simply do this:
and things will Just Work.
This is similar to next::method, but just returns either a code
reference or undef to indicate that no further methods of this name
exist.
In simple cases, it is equivalent to:
- $self->next::method(@_) if $self->next::can;
But there are some cases where only this solution
works (like goto &maybe::next::method
);
The Pugs prototype Perl 6 Object Model uses C3
Parrot now uses C3
Brandon L. Black, <blblack@gmail.com>
Based on Stevan Little's Class::C3
open - perl pragma to set default PerlIO layers for input and output
Full-fledged support for I/O layers is now implemented provided Perl is configured to use PerlIO as its IO system (which is now the default).
The open pragma serves as one of the interfaces to declare default
"layers" (also known as "disciplines") for all I/O. Any two-argument
open(), readpipe() (aka qx//) and similar operators found within the
lexical scope of this pragma will use the declared defaults.
Even three-argument opens may be affected by this pragma
when they don't specify IO layers in MODE.
With the IN
subpragma you can declare the default layers
of input streams, and with the OUT
subpragma you can declare
the default layers of output streams. With the IO
subpragma
you can control both input and output streams simultaneously.
If you have a legacy encoding, you can use the :encoding(...)
tag.
If you want to set your encoding layers based on your
locale environment variables, you can use the :locale
tag.
For example:
- $ENV{LANG} = 'ru_RU.KOI8-R';
- # the :locale will probe the locale environment variables like LANG
- use open OUT => ':locale';
- open(O, ">koi8");
- print O chr(0x430); # Unicode CYRILLIC SMALL LETTER A = KOI8-R 0xc1
- close O;
- open(I, "<koi8");
- printf "%#x\n", ord(<I>), "\n"; # this should print 0xc1
- close I;
These are equivalent
as are these
and these
The matching of encoding names is loose: case does not matter, and many encodings have several aliases. See Encode::Supported for details and the list of supported locales.
When open() is given an explicit list of layers (with the three-arg
syntax), they override the list declared using this pragma. open() can
also be given a single colon (:) for a layer name, to override this pragma
and use the default (:raw
on Unix, :crlf
on Windows).
The :std
subpragma on its own has no effect, but if combined with
the :utf8
or :encoding
subpragmas, it converts the standard
filehandles (STDIN, STDOUT, STDERR) to comply with encoding selected
for input/output handles. For example, if both input and out are
chosen to be :encoding(utf8)
, a :std
will mean that STDIN, STDOUT,
and STDERR are also in :encoding(utf8)
. On the other hand, if only
output is chosen to be in :encoding(koi8r)
, a :std
will cause
only the STDOUT and STDERR to be in koi8r
. The :locale
subpragma
implicitly turns on :std
.
The logic of :locale
is described in full in encoding,
but in short it is first trying nl_langinfo(CODESET) and then
guessing from the LC_ALL and LANG locale environment variables.
Directory handles may also support PerlIO layers in the future.
If Perl is not built to use PerlIO as its IO system then only the two
pseudo-layers :bytes
and :crlf
are available.
The :bytes
layer corresponds to "binary mode" and the :crlf
layer corresponds to "text mode" on platforms that distinguish
between the two modes when opening files (which is many DOS-like
platforms, including Windows). These two layers are no-ops on
platforms where binmode() is a no-op, but perform their functions
everywhere if PerlIO is enabled.
There is a class method in PerlIO::Layer
find
which is
implemented as XS code. It is called by import to validate the
layers:
- PerlIO::Layer::->find("perlio")
The return value (if defined) is a Perl object, of class
PerlIO::Layer
which is created by the C code in perlio.c. As
yet there is nothing useful you can do with the object at the perl
level.
ops - Perl pragma to restrict unsafe operations when compiling
- perl -Mops=:default ... # only allow reasonably safe operations
- perl -M-ops=system ... # disable the 'system' opcode
Since the ops
pragma currently has an irreversible global effect, it is
only of significant practical use with the -M
option on the command line.
See the Opcode module for information about opcodes, optags, opmasks and important information about safety.
overload - Package for overloading Perl operations
- package SomeThing;
- use overload
- '+' => \&myadd,
- '-' => \&mysub;
- # etc
- ...
- package main;
- $a = SomeThing->new( 57 );
- $b = 5 + $a;
- ...
- if (overload::Overloaded $b) {...}
- ...
- $strval = overload::StrVal $b;
This pragma allows overloading of Perl's operators for a class. To overload built-in functions, see Overriding Built-in Functions in perlsub instead.
Arguments of the use overload
directive are (key, value) pairs.
For the full set of legal keys, see Overloadable Operations below.
Operator implementations (the values) can be subroutines,
references to subroutines, or anonymous subroutines
- in other words, anything legal inside a &{ ... }
call.
Values specified as strings are interpreted as method names.
Thus
declares that subtraction is to be implemented by method minus()
in the class Number
(or one of its base classes),
and that the function Number::muas()
is to be used for the
assignment form of multiplication, *=
.
It also defines an anonymous subroutine to implement stringification:
this is called whenever an object blessed into the package Number
is used in a string context (this subroutine might, for example,
return the number as a Roman numeral).
The following sample implementation of minus()
(which assumes
that Number
objects are simply blessed references to scalars)
illustrates the calling conventions:
Three arguments are passed to all subroutines specified in the
use overload
directive (with one exception - see nomethod).
The first of these is the operand providing the overloaded
operator implementation -
in this case, the object whose minus()
method is being called.
The second argument is the other operand, or undef in the
case of a unary operator.
The third argument is set to TRUE if (and only if) the two
operands have been swapped. Perl may do this to ensure that the
first argument ($self
) is an object implementing the overloaded
operation, in line with general object calling conventions.
For example, if $x
and $y
are Number
s:
- operation | generates a call to
- ============|======================
- $x - $y | minus($x, $y, '')
- $x - 7 | minus($x, 7, '')
- 7 - $x | minus($x, 7, 1)
Perl may also use minus()
to implement other operators which
have not been specified in the use overload
directive,
according to the rules for Magic Autogeneration described later.
For example, the use overload
above declared no subroutine
for any of the operators --
, neg
(the overload key for
unary minus), or -=
. Thus
Note the undefs:
where autogeneration results in the method for a standard
operator which does not change either of its operands, such
as -
, being used to implement an operator which changes
the operand ("mutators": here, --
and -=
),
Perl passes undef as the third argument.
This still evaluates as FALSE, consistent with the fact that
the operands have not been swapped, but gives the subroutine
a chance to alter its behaviour in these cases.
In all the above examples, minus()
is required
only to return the result of the subtraction:
Perl takes care of the assignment to $x.
In fact, such methods should not modify their operands,
even if undef is passed as the third argument
(see Overloadable Operations).
The same is not true of implementations of ++
and --
:
these are expected to modify their operand.
An appropriate implementation of --
might look like
- use overload '--' => "decr",
- # ...
- sub decr { --${$_[0]}; }
The term 'mathemagic' describes the overloaded implementation of mathematical operators. Mathemagical operations raise an issue. Consider the code:
- $a = $b;
- --$a;
If $a
and $b
are scalars then after these statements
- $a == $b - 1
An object, however, is a reference to blessed data, so if
$a
and $b
are objects then the assignment $a = $b
copies only the reference, leaving $a
and $b
referring
to the same object data.
One might therefore expect the operation --$a
to decrement
$b
as well as $a
.
However, this would not be consistent with how we expect the
mathematical operators to work.
Perl resolves this dilemma by transparently calling a copy
constructor before calling a method defined to implement
a mutator (--
, +=
, and so on.).
In the above example, when Perl reaches the decrement
statement, it makes a copy of the object data in $a
and
assigns to $a
a reference to the copied data.
Only then does it call decr()
, which alters the copied
data, leaving $b
unchanged.
Thus the object metaphor is preserved as far as possible,
while mathemagical operations still work according to the
arithmetic metaphor.
Note: the preceding paragraph describes what happens when Perl autogenerates the copy constructor for an object based on a scalar. For other cases, see Copy Constructor.
The complete list of keys that can be specified in the use overload
directive are given, separated by spaces, in the values of the
hash %overload::ops
:
- with_assign => '+ - * / % ** << >> x .',
- assign => '+= -= *= /= %= **= <<= >>= x= .=',
- num_comparison => '< <= > >= == !=',
- '3way_comparison'=> '<=> cmp',
- str_comparison => 'lt le gt ge eq ne',
- binary => '& &= | |= ^ ^=',
- unary => 'neg ! ~',
- mutators => '++ --',
- func => 'atan2 cos sin exp abs log sqrt int',
- conversion => 'bool "" 0+ qr',
- iterators => '<>',
- filetest => '-X',
- dereferencing => '${} @{} %{} &{} *{}',
- matching => '~~',
- special => 'nomethod fallback ='
Most of the overloadable operators map one-to-one to these keys. Exceptions, including additional overloadable operations not apparent from this hash, are included in the notes which follow.
A warning is issued if an attempt is made to register an operator not found above.
not
The operator not
is not a valid key for use overload
.
However, if the operator !
is overloaded then the same
implementation will be used for not
(since the two operators differ only in precedence).
neg
The key neg
is used for unary minus to disambiguate it from
binary -
.
++
, --
Assuming they are to behave analogously to Perl's ++
and --
,
overloaded implementations of these operators are required to
mutate their operands.
No distinction is made between prefix and postfix forms of the increment and decrement operators: these differ only in the point at which Perl calls the associated subroutine when evaluating an expression.
- += -= *= /= %= **= <<= >>= x= .=
- &= |= ^=
Simple assignment is not overloadable (the '='
key is used
for the Copy Constructor).
Perl does have a way to make assignments to an object do whatever
you want, but this involves using tie(), not overload -
see tie and the COOKBOOK examples below.
The subroutine for the assignment variant of an operator is required only to return the result of the operation. It is permitted to change the value of its operand (this is safe because Perl calls the copy constructor first), but this is optional since Perl assigns the returned value to the left-hand operand anyway.
An object that overloads an assignment operator does so only in respect of assignments to that object. In other words, Perl never calls the corresponding methods with the third argument (the "swap" argument) set to TRUE. For example, the operation
- $a *= $b
cannot lead to $b
's implementation of *=
being called,
even if $a
is a scalar.
(It can, however, generate a call to $b
's method for *
).
- + - * / % ** << >> x .
- & | ^
As described above,
Perl may call methods for operators like +
and &
in the course
of implementing missing operations like ++
, +=
, and &=
.
While these methods may detect this usage by testing the definedness
of the third argument, they should in all cases avoid changing their
operands.
This is because Perl does not call the copy constructor before
invoking these methods.
int
Traditionally, the Perl function int rounds to 0
(see int), and so for floating-point-like types one
should follow the same semantic.
- "" 0+ bool
These conversions are invoked according to context as necessary.
For example, the subroutine for '""'
(stringify) may be used
where the overloaded object is passed as an argument to print,
and that for 'bool'
where it is tested in the condition of a flow
control statement (like while
) or the ternary ?: operation.
Of course, in contexts like, for example, $obj + 1
, Perl will
invoke $obj
's implementation of +
rather than (in this
example) converting $obj
to a number using the numify method
'0+'
(an exception to this is when no method has been provided
for '+'
and fallback is set to TRUE).
The subroutines for '""'
, '0+'
, and 'bool'
can return
any arbitrary Perl value.
If the corresponding operation for this value is overloaded too,
the operation will be called again with this value.
As a special case if the overload returns the object itself then it will
be used directly. An overloaded conversion returning the object is
probably a bug, because you're likely to get something that looks like
YourPackage=HASH(0x8172b34)
.
- qr
The subroutine for 'qr'
is used wherever the object is
interpolated into or used as a regexp, including when it
appears on the RHS of a =~
or !~
operator.
qr must return a compiled regexp, or a ref to a compiled regexp
(such as qr// returns), and any further overloading on the return
value will be ignored.
If <>
is overloaded then the same implementation is used
for both the read-filehandle syntax <$var>
and
globbing syntax <${var}>
.
The key '-X'
is used to specify a subroutine to handle all the
filetest operators (-f
, -x
, and so on: see -X for
the full list);
it is not possible to overload any filetest operator individually.
To distinguish them, the letter following the '-' is passed as the
second argument (that is, in the slot that for binary operators
is used to pass the second operand).
Calling an overloaded filetest operator does not affect the stat value
associated with the special filehandle _
. It still refers to the
result of the last stat, lstat or unoverloaded filetest.
This overload was introduced in Perl 5.12.
The key "~~"
allows you to override the smart matching logic used by
the ~~
operator and the switch construct (given
/when
). See
Switch Statements in perlsyn and feature.
Unusually, the overloaded implementation of the smart match operator does not get full control of the smart match behaviour. In particular, in the following code:
the smart match does not invoke the method call like this:
- $obj->match([1,2,3],0);
rather, the smart match distributive rule takes precedence, so $obj is smart matched against each array element in turn until a match is found, so you may see between one and three of these calls instead:
- $obj->match(1,0);
- $obj->match(2,0);
- $obj->match(3,0);
Consult the match table in Smartmatch Operator in perlop for details of when overloading is invoked.
- ${} @{} %{} &{} *{}
If these operators are not explicitly overloaded then they
work in the normal way, yielding the underlying scalar,
array, or whatever stores the object data (or the appropriate
error message if the dereference operator doesn't match it).
Defining a catch-all 'nomethod'
(see below)
makes no difference to this as the catch-all function will
not be called to implement a missing dereference operator.
If a dereference operator is overloaded then it must return a
reference of the appropriate type (for example, the
subroutine for key '${}'
should return a reference to a
scalar, not a scalar), or another object which overloads the
operator: that is, the subroutine only determines what is
dereferenced and the actual dereferencing is left to Perl.
As a special case, if the subroutine returns the object itself
then it will not be called again - avoiding infinite recursion.
- nomethod fallback =
If a method for an operation is not found then Perl tries to autogenerate a substitute implementation from the operations that have been defined.
Note: the behaviour described in this section can be disabled
by setting fallback
to FALSE (see fallback).
In the following tables, numbers indicate priority.
For example, the table below states that,
if no implementation for '!'
has been defined then Perl will
implement it using 'bool'
(that is, by inverting the value
returned by the method for 'bool'
);
if boolean conversion is also unimplemented then Perl will
use '0+'
or, failing that, '""'
.
- operator | can be autogenerated from
- |
- | 0+ "" bool . x
- =========|==========================
- 0+ | 1 2
- "" | 1 2
- bool | 1 2
- int | 1 2 3
- ! | 2 3 1
- qr | 2 1 3
- . | 2 1 3
- x | 2 1 3
- .= | 3 2 4 1
- x= | 3 2 4 1
- <> | 2 1 3
- -X | 2 1 3
Note: The iterator ('<>'
) and file test ('-X'
)
operators work as normal: if the operand is not a blessed glob or
IO reference then it is converted to a string (using the method
for '""'
, '0+'
, or 'bool'
) to be interpreted as a glob
or filename.
- operator | can be autogenerated from
- |
- | < <=> neg -= -
- =========|==========================
- neg | 1
- -= | 1
- -- | 1 2
- abs | a1 a2 b1 b2 [*]
- < | 1
- <= | 1
- > | 1
- >= | 1
- == | 1
- != | 1
- * one from [a1, a2] and one from [b1, b2]
Just as numeric comparisons can be autogenerated from the method
for '<=>'
, string comparisons can be autogenerated from
that for 'cmp'
:
- operators | can be autogenerated from
- ====================|===========================
- lt gt le ge eq ne | cmp
Similarly, autogeneration for keys '+='
and '++'
is analogous
to '-='
and '--'
above:
- operator | can be autogenerated from
- |
- | += +
- =========|==========================
- += | 1
- ++ | 1 2
And other assignment variations are analogous to
'+='
and '-='
(and similar to '.='
and 'x='
above):
- operator || *= /= %= **= <<= >>= &= ^= |=
- -------------------||--------------------------------
- autogenerated from || * / % ** << >> & ^ |
Note also that the copy constructor (key '='
) may be
autogenerated, but only for objects based on scalars.
See Copy Constructor.
Since some operations can be automatically generated from others, there is a minimal set of operations that need to be overloaded in order to have the complete set of overloaded operations at one's disposal. Of course, the autogenerated operations may not do exactly what the user expects. The minimal set is:
Of the conversions, only one of string, boolean or numeric is needed because each can be generated from either of the other two.
use overload
nomethod
The 'nomethod'
key is used to specify a catch-all function to
be called for any operator that is not individually overloaded.
The specified function will be passed four parameters.
The first three arguments coincide with those that would have been
passed to the corresponding method if it had been defined.
The fourth argument is the use overload
key for that missing
method.
For example, if $a
is an object blessed into a package declaring
- use overload 'nomethod' => 'catch_all', # ...
then the operation
- 3 + $a
could (unless a method is specifically declared for the key
'+'
) result in a call
- catch_all($a, 3, 1, '+')
See How Perl Chooses an Operator Implementation.
fallback
The value assigned to the key 'fallback'
tells Perl how hard
it should try to find an alternative way to implement a missing
operator.
- use overload "fallback" => 0, # ... ;
This disables Magic Autogeneration.
undef
In the default case where no value is explicitly assigned to
fallback
, magic autogeneration is enabled.
The same as for undef, but if a missing operator cannot be
autogenerated then, instead of issuing an error message, Perl
is allowed to revert to what it would have done for that
operator if there had been no use overload
directive.
Note: in most cases, particularly the Copy Constructor, this is unlikely to be appropriate behaviour.
See How Perl Chooses an Operator Implementation.
As mentioned above,
this operation is called when a mutator is applied to a reference
that shares its object with some other reference.
For example, if $b
is mathemagical, and '++'
is overloaded
with 'incr'
, and '='
is overloaded with 'clone'
, then the
code
- $a = $b;
- # ... (other code which does not modify $a or $b) ...
- ++$b;
would be executed in a manner equivalent to
Note:
The subroutine for '='
does not overload the Perl assignment
operator: it is used only to allow mutators to work as described
here. (See Assignments above.)
As for other operations, the subroutine implementing '=' is passed
three arguments, though the last two are always undef and ''
.
The copy constructor is called only before a call to a function
declared to implement a mutator, for example, if ++$b;
in the
code above is effected via a method declared for key '++'
(or 'nomethod', passed '++'
as the fourth argument) or, by
autogeneration, '+='
.
It is not called if the increment operation is effected by a call
to the method for '+'
since, in the equivalent code,
- $a = $b;
- $b = $b + 1;
the data referred to by $a
is unchanged by the assignment to
$b
of a reference to new object data.
The copy constructor is not called if Perl determines that it is unnecessary because there is no other reference to the data being modified.
If 'fallback'
is undefined or TRUE then a copy constructor
can be autogenerated, but only for objects based on scalars.
In other cases it needs to be defined explicitly.
Where an object's data is stored as, for example, an array of
scalars, the following might be appropriate:
If 'fallback'
is TRUE and no copy constructor is defined then,
for objects not based on scalars, Perl may silently fall back on
simple assignment - that is, assignment of the object reference.
In effect, this disables the copy constructor mechanism since
no new copy of the object data is created.
This is almost certainly not what you want.
(It is, however, consistent: for example, Perl's fallback for the
++
operator is to increment the reference itself.)
Which is checked first, nomethod
or fallback
?
If the two operands of an operator are of different types and
both overload the operator, which implementation is used?
The following are the precedence rules:
If the first operand has declared a subroutine to overload the operator then use that implementation.
Otherwise, if fallback is TRUE or undefined for the first operand then see if the rules for autogeneration allows another of its operators to be used instead.
Unless the operator is an assignment (+=
, -=
, etc.),
repeat step (1) in respect of the second operand.
Repeat Step (2) in respect of the second operand.
If the first operand has a "nomethod" method then use that.
If the second operand has a "nomethod" method then use that.
If fallback
is TRUE for both operands
then perform the usual operation for the operator,
treating the operands as numbers, strings, or booleans
as appropriate for the operator (see note).
Nothing worked - die.
Where there is only one operand (or only one operand with overloading) the checks in respect of the other operand above are skipped.
There are exceptions to the above rules for dereference operations
(which, if Step 1 fails, always fall back to the normal, built-in
implementations - see Dereferencing), and for ~~
(which has its
own set of rules - see Matching
under Overloadable Operations
above).
Note on Step 7: some operators have a different semantic depending on the type of their operands. As there is no way to instruct Perl to treat the operands as, e.g., numbers instead of strings, the result here may not be what you expect. See BUGS AND PITFALLS.
The restriction for the comparison operation is that even if, for example,
cmp
should return a blessed reference, the autogenerated lt
function will produce only a standard logical value based on the
numerical value of the result of cmp
. In particular, a working
numeric conversion is needed in this case (possibly expressed in terms of
other conversions).
Similarly, .=
and x=
operators lose their mathemagical properties
if the string conversion substitution is applied.
When you chop() a mathemagical object it is promoted to a string and its mathemagical properties are lost. The same can happen with other operations as well.
Overloading respects inheritance via the @ISA hierarchy. Inheritance interacts with overloading in two ways.
use overload
directive
If value
in
- use overload key => value;
is a string, it is interpreted as a method name - which may (in the usual way) be inherited from another class.
Any class derived from an overloaded class is also overloaded and inherits its operator implementations. If the same operator is overloaded in more than one ancestor then the implementation is determined by the usual inheritance rules.
For example, if A
inherits from B
and C
(in that order),
B
overloads +
with \&D::plus_sub
, and C
overloads
+
by "plus_meth"
, then the subroutine D::plus_sub
will
be called to implement operation +
for an object in package A
.
Note that in Perl version prior to 5.18 inheritance of the fallback
key
was not governed by the above rules. The value of fallback
in the first
overloaded ancestor was used. This was fixed in 5.18 to follow the usual
rules of inheritance.
Since all use directives are executed at compile-time, the only way to
change overloading during run-time is to
- eval 'use overload "+" => \&addmethod';
You can also use
- eval 'no overload "+", "--", "<="';
though the use of these constructs during run-time is questionable.
Package overload.pm
provides the following public functions:
Gives the string value of arg
as in the
absence of stringify overloading. If you
are using this to get the address of a reference (useful for checking if two
references point to the same thing) then you may be better off using
Scalar::Util::refaddr()
, which is faster.
Returns true if arg
is subject to overloading of some operations.
Returns undef or a reference to the method that implements op
.
For some applications, the Perl parser mangles constants too much.
It is possible to hook into this process via overload::constant()
and overload::remove_constant()
functions.
These functions take a hash as an argument. The recognized keys of this hash are:
to overload integer constants,
to overload floating point constants,
to overload octal and hexadecimal constants,
to overload q-quoted strings, constant pieces of qq- and qx-quoted
strings and here-documents,
to overload constant pieces of regular expressions.
The corresponding values are references to functions which take three arguments:
the first one is the initial string form of the constant, the second one
is how Perl interprets this constant, the third one is how the constant is used.
Note that the initial string form does not
contain string delimiters, and has backslashes in backslash-delimiter
combinations stripped (thus the value of delimiter is not relevant for
processing of this string). The return value of this function is how this
constant is going to be interpreted by Perl. The third argument is undefined
unless for overloaded q- and qr- constants, it is q in single-quote
context (comes from strings, regular expressions, and single-quote HERE
documents), it is tr for arguments of tr/y operators,
it is s for right-hand side of s-operator, and it is qq otherwise.
Since an expression "ab$cd,,"
is just a shortcut for 'ab' . $cd . ',,'
,
it is expected that overloaded constant strings are equipped with reasonable
overloaded catenation operator, otherwise absurd results will result.
Similarly, negative numbers are considered as negations of positive constants.
Note that it is probably meaningless to call the functions overload::constant() and overload::remove_constant() from anywhere but import() and unimport() methods. From these methods they may be called as
What follows is subject to change RSN.
The table of methods for all operations is cached in magic for the
symbol table hash for the package. The cache is invalidated during
processing of use overload
, no overload
, new function
definitions, and changes in @ISA.
(Every SVish thing has a magic queue, and magic is an entry in that queue. This is how a single variable may participate in multiple forms of magic simultaneously. For instance, environment variables regularly have two forms at once: their %ENV magic and their taint magic. However, the magic which implements overloading is applied to the stashes, which are rarely used directly, thus should not slow down Perl.)
If a package uses overload, it carries a special flag. This flag is also set when new function are defined or @ISA is modified. There will be a slight speed penalty on the very first operation thereafter that supports overloading, while the overload tables are updated. If there is no overloading present, the flag is turned off. Thus the only speed penalty thereafter is the checking of this flag.
It is expected that arguments to methods that are not explicitly supposed to be changed are constant (but this is not enforced).
Please add examples to what follows!
Put this in two_face.pm in your Perl library directory:
Use it as follows:
(The second line creates a scalar which has both a string value, and a numeric value.) This prints:
- seven=vii, seven=7, eight=8
- seven contains 'i'
Suppose you want to create an object which is accessible as both an array reference and a hash reference.
- package two_refs;
- use overload '%{}' => \&gethash, '@{}' => sub { $ {shift()} };
- sub new {
- my $p = shift;
- bless \ [@_], $p;
- }
- sub gethash {
- my %h;
- my $self = shift;
- tie %h, ref $self, $self;
- \%h;
- }
- sub TIEHASH { my $p = shift; bless \ shift, $p }
- my %fields;
- my $i = 0;
- $fields{$_} = $i++ foreach qw{zero one two three};
- sub STORE {
- my $self = ${shift()};
- my $key = $fields{shift()};
- defined $key or die "Out of band access";
- $$self->[$key] = shift;
- }
- sub FETCH {
- my $self = ${shift()};
- my $key = $fields{shift()};
- defined $key or die "Out of band access";
- $$self->[$key];
- }
Now one can access an object using both the array and hash syntax:
Note several important features of this example. First of all, the
actual type of $bar is a scalar reference, and we do not overload
the scalar dereference. Thus we can get the actual non-overloaded
contents of $bar by just using $$bar
(what we do in functions which
overload dereference). Similarly, the object returned by the
TIEHASH() method is a scalar reference.
Second, we create a new tied hash each time the hash syntax is used. This allows us not to worry about a possibility of a reference loop, which would lead to a memory leak.
Both these problems can be cured. Say, if we want to overload hash dereference on a reference to an object which is implemented as a hash itself, the only problem one has to circumvent is how to access this actual hash (as opposed to the virtual hash exhibited by the overloaded dereference operator). Here is one possible fetching routine:
To remove creation of the tied hash on each access, one may an extra level of indirection which allows a non-circular structure of references:
- package two_refs1;
- use overload '%{}' => sub { ${shift()}->[1] },
- '@{}' => sub { ${shift()}->[0] };
- sub new {
- my $p = shift;
- my $a = [@_];
- my %h;
- tie %h, $p, $a;
- bless \ [$a, \%h], $p;
- }
- sub gethash {
- my %h;
- my $self = shift;
- tie %h, ref $self, $self;
- \%h;
- }
- sub TIEHASH { my $p = shift; bless \ shift, $p }
- my %fields;
- my $i = 0;
- $fields{$_} = $i++ foreach qw{zero one two three};
- sub STORE {
- my $a = ${shift()};
- my $key = $fields{shift()};
- defined $key or die "Out of band access";
- $a->[$key] = shift;
- }
- sub FETCH {
- my $a = ${shift()};
- my $key = $fields{shift()};
- defined $key or die "Out of band access";
- $a->[$key];
- }
Now if $baz is overloaded like this, then $baz
is a reference to a
reference to the intermediate array, which keeps a reference to an
actual array, and the access hash. The tie()ing object for the access
hash is a reference to a reference to the actual array, so
There are no loops of references.
Both "objects" which are blessed into the class two_refs1
are
references to a reference to an array, thus references to a scalar.
Thus the accessor expression $$foo->[$ind]
involves no
overloaded operations.
Put this in symbolic.pm in your Perl library directory:
This module is very unusual as overloaded modules go: it does not
provide any usual overloaded operators, instead it provides an
implementation for nomethod. In this example the nomethod
subroutine returns an object which encapsulates operations done over
the objects: symbolic->new(3)
contains ['n', 3]
, 2 +
symbolic->new(3)
contains ['+', 2, ['n', 3]]
.
Here is an example of the script which "calculates" the side of circumscribed octagon using the above package:
The value of $side is
- ['/', ['-', ['sqrt', ['+', 1, ['**', ['n', 1], 2]],
- undef], 1], ['n', 1]]
Note that while we obtained this value using a nice little script,
there is no simple way to use this value. In fact this value may
be inspected in debugger (see perldebug), but only if
bareStringify
Option is set, and not via p
command.
If one attempts to print this value, then the overloaded operator
""
will be called, which will call nomethod
operator. The
result of this operator will be stringified again, but this result is
again of type symbolic
, which will lead to an infinite loop.
Add a pretty-printer method to the module symbolic.pm:
Now one can finish the script by
- print "side = ", $side->pretty, "\n";
The method pretty
is doing object-to-string conversion, so it
is natural to overload the operator ""
using this method. However,
inside such a method it is not necessary to pretty-print the
components $a and $b of an object. In the above subroutine
"[$meth $a $b]"
is a catenation of some strings and components $a
and $b. If these components use overloading, the catenation operator
will look for an overloaded operator .; if not present, it will
look for an overloaded operator ""
. Thus it is enough to use
Now one can change the last line of the script to
- print "side = $side\n";
which outputs
- side = [/ [- [sqrt [+ 1 [** [n 1 u] 2]] u] 1] [n 1 u]]
and one can inspect the value in debugger using all the possible methods.
Something is still amiss: consider the loop variable $cnt of the
script. It was a number, not an object. We cannot make this value of
type symbolic
, since then the loop will not terminate.
Indeed, to terminate the cycle, the $cnt should become false.
However, the operator bool
for checking falsity is overloaded (this
time via overloaded ""
), and returns a long string, thus any object
of type symbolic
is true. To overcome this, we need a way to
compare an object to 0. In fact, it is easier to write a numeric
conversion routine.
Here is the text of symbolic.pm with such a routine added (and slightly modified str()):
- package symbolic; # Primitive symbolic calculator
- use overload
- nomethod => \&wrap, '""' => \&str, '0+' => \#
- sub new { shift; bless ['n', @_] }
- sub wrap {
- my ($obj, $other, $inv, $meth) = @_;
- ($obj, $other) = ($other, $obj) if $inv;
- bless [$meth, $obj, $other];
- }
- sub str {
- my ($meth, $a, $b) = @{+shift};
- $a = 'u' unless defined $a;
- if (defined $b) {
- "[$meth $a $b]";
- } else {
- "[$meth $a]";
- }
- }
- my %subr = ( n => sub {$_[0]},
- sqrt => sub {sqrt $_[0]},
- '-' => sub {shift() - shift()},
- '+' => sub {shift() + shift()},
- '/' => sub {shift() / shift()},
- '*' => sub {shift() * shift()},
- '**' => sub {shift() ** shift()},
- );
- sub num {
- my ($meth, $a, $b) = @{+shift};
- my $subr = $subr{$meth}
- or die "Do not know how to ($meth) in symbolic";
- $a = $a->num if ref $a eq __PACKAGE__;
- $b = $b->num if ref $b eq __PACKAGE__;
- $subr->($a,$b);
- }
All the work of numeric conversion is done in %subr and num(). Of course, %subr is not complete, it contains only operators used in the example below. Here is the extra-credit question: why do we need an explicit recursion in num()? (Answer is at the end of this section.)
Use this module like this:
It prints (without so many line breaks)
- [/ [- [sqrt [+ 1 [** [/ [- [sqrt [+ 1 [** [n 1] 2]]] 1]
- [n 1]] 2]]] 1]
- [/ [- [sqrt [+ 1 [** [n 1] 2]]] 1] [n 1]]]=0.198912
- pi=3.182598
The above module is very primitive. It does not implement
mutator methods (++
, -=
and so on), does not do deep copying
(not required without mutators!), and implements only those arithmetic
operations which are used in the example.
To implement most arithmetic operations is easy; one should just use the tables of operations, and change the code which fills %subr to
- my %subr = ( 'n' => sub {$_[0]} );
- foreach my $op (split " ", $overload::ops{with_assign}) {
- $subr{$op} = $subr{"$op="} = eval "sub {shift() $op shift()}";
- }
- my @bins = qw(binary 3way_comparison num_comparison str_comparison);
- foreach my $op (split " ", "@overload::ops{ @bins }") {
- $subr{$op} = eval "sub {shift() $op shift()}";
- }
- foreach my $op (split " ", "@overload::ops{qw(unary func)}") {
- print "defining '$op'\n";
- $subr{$op} = eval "sub {$op shift()}";
- }
Since subroutines implementing assignment operators are not required
to modify their operands (see Overloadable Operations above),
we do not need anything special to make +=
and friends work,
besides adding these operators to %subr and defining a copy
constructor (needed since Perl has no way to know that the
implementation of '+='
does not mutate the argument -
see Copy Constructor).
To implement a copy constructor, add '=' => \&cpy
to use overload
line, and code (this code assumes that mutators change things one level
deep only, so recursive copying is not needed):
To make ++
and --
work, we need to implement actual mutators,
either directly, or in nomethod
. We continue to do things inside
nomethod
, thus add
after the first line of wrap(). This is not a most effective implementation, one may consider
instead.
As a final remark, note that one can fill %subr by
- my %subr = ( 'n' => sub {$_[0]} );
- foreach my $op (split " ", $overload::ops{with_assign}) {
- $subr{$op} = $subr{"$op="} = eval "sub {shift() $op shift()}";
- }
- my @bins = qw(binary 3way_comparison num_comparison str_comparison);
- foreach my $op (split " ", "@overload::ops{ @bins }") {
- $subr{$op} = eval "sub {shift() $op shift()}";
- }
- foreach my $op (split " ", "@overload::ops{qw(unary func)}") {
- $subr{$op} = eval "sub {$op shift()}";
- }
- $subr{'++'} = $subr{'+'};
- $subr{'--'} = $subr{'-'};
This finishes implementation of a primitive symbolic calculator in 50 lines of Perl code. Since the numeric values of subexpressions are not cached, the calculator is very slow.
Here is the answer for the exercise: In the case of str(), we need no
explicit recursion since the overloaded .-operator will fall back
to an existing overloaded operator ""
. Overloaded arithmetic
operators do not fall back to numeric conversion if fallback
is
not explicitly requested. Thus without an explicit recursion num()
would convert ['+', $a, $b]
to $a + $b
, which would just rebuild
the argument of num().
If you wonder why defaults for conversion are different for str() and num(), note how easy it was to write the symbolic calculator. This simplicity is due to an appropriate choice of defaults. One extra note: due to the explicit recursion num() is more fragile than sym(): we need to explicitly check for the type of $a and $b. If components $a and $b happen to be of some related type, this may lead to problems.
One may wonder why we call the above calculator symbolic. The reason is that the actual calculation of the value of expression is postponed until the value is used.
To see it in action, add a method
to the package symbolic
. After this change one can do
and the numeric value of $c becomes 5. However, after calling
- $a->STORE(12); $b->STORE(5);
the numeric value of $c becomes 13. There is no doubt now that the module symbolic provides a symbolic calculator indeed.
To hide the rough edges under the hood, provide a tie()d interface to the
package symbolic
. Add methods
(the bug, fixed in Perl 5.14, is described in BUGS). One can use this new interface as
Now numeric value of $c is 5. After $a = 12; $b = 5
the numeric value
of $c becomes 13. To insulate the user of the module add a method
Now
shows that the numeric value of $c follows changes to the values of $a and $b.
Ilya Zakharevich <ilya@math.mps.ohio-state.edu>.
The overloading
pragma can be used to enable or disable overloaded
operations within a lexical scope - see overloading.
When Perl is run with the -Do switch or its equivalent, overloading induces diagnostic messages.
Using the m command of Perl debugger (see perldebug) one can
deduce which operations are overloaded (and which ancestor triggers
this overloading). Say, if eq
is overloaded, then the method (eq
is shown by debugger. The method ()
corresponds to the fallback
key (in fact a presence of this method shows that this package has
overloading enabled, and it is what is used by the Overloaded
function of module overload
).
The module might issue the following warnings:
(W) The call to overload::constant contained an odd number of arguments. The arguments should come in pairs.
(W) You tried to overload a constant type the overload package is unaware of.
(W) The second (fourth, sixth, ...) argument of overload::constant needs to be a code reference. Either an anonymous subroutine, or a reference to a subroutine.
(W) use overload
was passed an argument it did not
recognize. Did you mistype an operator?
A pitfall when fallback is TRUE and Perl resorts to a built-in
implementation of an operator is that some operators have more
than one semantic, for example |:
You might expect this to output "12".
In fact, it prints "<": the ASCII result of treating "|"
as a bitwise string operator - that is, the result of treating
the operands as the strings "4" and "8" rather than numbers.
The fact that numify (0+
) is implemented but stringify
(""
) isn't makes no difference since the latter is simply
autogenerated from the former.
The only way to change this is to provide your own subroutine
for '|'
.
Magic autogeneration increases the potential for inadvertently creating self-referential structures. Currently Perl will not free self-referential structures until cycles are explicitly broken. For example,
is asking for trouble, since
- $obj += $y;
will effectively become
- $obj = add($obj, $y, undef);
with the same result as
- $obj = [\$obj, \$foo];
Even if no explicit assignment-variants of operators are present in the script, they may be generated by the optimizer. For example,
- "obj = $obj\n"
may be optimized to
- my $tmp = 'obj = ' . $obj; $tmp .= "\n";
The symbol table is filled with names looking like line-noise.
This bug was fixed in Perl 5.18, but may still trip you up if you are using older versions:
For the purpose of inheritance every overloaded package behaves as if
fallback
is present (possibly undefined). This may create
interesting effects if some package is not overloaded, but inherits
from two overloaded packages.
Before Perl 5.14, the relation between overloading and tie()ing was broken. Overloading was triggered or not based on the previous class of the tie()d variable.
This happened because the presence of overloading was checked
too early, before any tie()d access was attempted. If the
class of the value FETCH()ed from the tied variable does not
change, a simple workaround for code that is to run on older Perl
versions is to access the value (via () = $foo
or some such)
immediately after tie()ing, so that after this call the previous class
coincides with the current one.
Barewords are not covered by overloaded string constants.
The range operator ..
cannot be overloaded.
overloading - perl pragma to lexically control overloading
This pragma allows you to lexically disable or enable overloading.
no overloading
Disables overloading entirely in the current lexical scope.
no overloading @ops
Disables only specific overloads in the current lexical scope.
use overloading
Reenables overloading in the current lexical scope.
use overloading @ops
Reenables overloading only for specific ops in the current lexical scope.
parent - Establish an ISA relationship with base classes at compile time
- package Baz;
- use parent qw(Foo Bar);
Allows you to both load one or more modules, while setting up inheritance from those modules at the same time. Mostly similar in effect to
By default, every base class needs to live in a file of its own.
If you want to have a subclass and its parent class in the same file, you
can tell parent
not to load any modules by using the -norequire
switch:
- package Foo;
- sub exclaim { "I CAN HAS PERL" }
- package DoesNotLoadFooBar;
- use parent -norequire, 'Foo', 'Bar';
- # will not go looking for Foo.pm or Bar.pm
This is equivalent to the following code:
- package Foo;
- sub exclaim { "I CAN HAS PERL" }
- package DoesNotLoadFooBar;
- push @DoesNotLoadFooBar::ISA, 'Foo', 'Bar';
This is also helpful for the case where a package lives within a differently named file:
This is equivalent to the following code:
If you want to load a subclass from a file that require would
not consider an eligible filename (that is, it does not end in
either .pm or .pmc), use the following code:
Attempting to inherit from yourself generates a warning.
- package Foo;
- use parent 'Foo';
This module was forked from base to remove the cruft that had accumulated in it.
Rafaël Garcia-Suarez, Bart Lateur, Max Maischein, Anno Siegel, Michael Schwern
Max Maischein corion@cpan.org
Copyright (c) 2007-10 Max Maischein <corion@cpan.org>
Based on the idea of base.pm
, which was introduced with Perl 5.004_04.
This module is released under the same terms as Perl itself.
perl - The Perl 5 language interpreter
perl [ -sTtuUWX ] [ -hv ] [ -V[:configvar] ] [ -cw ] [ -d[t][:debugger] ] [ -D[number/list] ] [ -pna ] [ -Fpattern ] [ -l[octal] ] [ -0[octal/hexadecimal] ] [ -Idir ] [ -m[-]module ] [ -M[-]'module...' ] [ -f ] [ -C [number/list] ] [ -S ] [ -x[dir] ] [ -i[extension] ] [ [-e|-E] 'command' ] [ -- ] [ programfile ] [ argument ]...
For more information on these options, you can run perldoc perlrun
.
The perldoc program gives you access to all the documentation that comes with Perl. You can get more documentation, tutorials and community support online at http://www.perl.org/.
If you're new to Perl, you should start by running perldoc perlintro
,
which is a general intro for beginners and provides some background to help
you navigate the rest of Perl's extensive documentation. Run perldoc
perldoc
to learn more things you can do with perldoc.
For ease of access, the Perl manual has been split up into several sections.
- perlreftut Perl references short introduction
- perldsc Perl data structures intro
- perllol Perl data structures: arrays of arrays
- perlrequick Perl regular expressions quick start
- perlretut Perl regular expressions tutorial
- perlootut Perl OO tutorial for beginners
- perlperf Perl Performance and Optimization Techniques
- perlstyle Perl style guide
- perlcheat Perl cheat sheet
- perltrap Perl traps for the unwary
- perldebtut Perl debugging tutorial
- perlfaq Perl frequently asked questions
- perlfaq1 General Questions About Perl
- perlfaq2 Obtaining and Learning about Perl
- perlfaq3 Programming Tools
- perlfaq4 Data Manipulation
- perlfaq5 Files and Formats
- perlfaq6 Regexes
- perlfaq7 Perl Language Issues
- perlfaq8 System Interaction
- perlfaq9 Networking
- perlsyn Perl syntax
- perldata Perl data structures
- perlop Perl operators and precedence
- perlsub Perl subroutines
- perlfunc Perl built-in functions
- perlopentut Perl open() tutorial
- perlpacktut Perl pack() and unpack() tutorial
- perlpod Perl plain old documentation
- perlpodspec Perl plain old documentation format specification
- perlpodstyle Perl POD style guide
- perldiag Perl diagnostic messages
- perllexwarn Perl warnings and their control
- perldebug Perl debugging
- perlvar Perl predefined variables
- perlre Perl regular expressions, the rest of the story
- perlrebackslash Perl regular expression backslash sequences
- perlrecharclass Perl regular expression character classes
- perlreref Perl regular expressions quick reference
- perlref Perl references, the rest of the story
- perlform Perl formats
- perlobj Perl objects
- perltie Perl objects hidden behind simple variables
- perldbmfilter Perl DBM filters
- perlipc Perl interprocess communication
- perlfork Perl fork() information
- perlnumber Perl number semantics
- perlthrtut Perl threads tutorial
- perlport Perl portability guide
- perllocale Perl locale support
- perluniintro Perl Unicode introduction
- perlunicode Perl Unicode support
- perlunifaq Perl Unicode FAQ
- perluniprops Index of Unicode properties in Perl
- perlunitut Perl Unicode tutorial
- perlebcdic Considerations for running Perl on EBCDIC platforms
- perlsec Perl security
- perlmod Perl modules: how they work
- perlmodlib Perl modules: how to write and use
- perlmodstyle Perl modules: how to write modules with style
- perlmodinstall Perl modules: how to install from CPAN
- perlnewmod Perl modules: preparing a new module for distribution
- perlpragma Perl modules: writing a user pragma
- perlutil utilities packaged with the Perl distribution
- perlfilter Perl source filters
- perldtrace Perl's support for DTrace
- perlglossary Perl Glossary
- perlembed Perl ways to embed perl in your C or C++ application
- perldebguts Perl debugging guts and tips
- perlxstut Perl XS tutorial
- perlxs Perl XS application programming interface
- perlxstypemap Perl XS C/Perl type conversion tools
- perlclib Internal replacements for standard C library functions
- perlguts Perl internal functions for those doing extensions
- perlcall Perl calling conventions from C
- perlmroapi Perl method resolution plugin interface
- perlreapi Perl regular expression plugin interface
- perlreguts Perl regular expression engine internals
- perlapi Perl API listing (autogenerated)
- perlintern Perl internal functions (autogenerated)
- perliol C API for Perl's implementation of IO in Layers
- perlapio Perl internal IO abstraction interface
- perlhack Perl hackers guide
- perlsource Guide to the Perl source tree
- perlinterp Overview of the Perl interpreter source and how it works
- perlhacktut Walk through the creation of a simple C code patch
- perlhacktips Tips for Perl core C code hacking
- perlpolicy Perl development policies
- perlgit Using git with the Perl repository
- perlbook Perl book information
- perlcommunity Perl community information
- perldoc Look up Perl documentation in Pod format
- perlhist Perl history records
- perldelta Perl changes since previous version
- perl5181delta Perl changes in version 5.18.1
- perl5180delta Perl changes in version 5.18.0
- perl5161delta Perl changes in version 5.16.1
- perl5162delta Perl changes in version 5.16.2
- perl5163delta Perl changes in version 5.16.3
- perl5160delta Perl changes in version 5.16.0
- perl5144delta Perl changes in version 5.14.4
- perl5143delta Perl changes in version 5.14.3
- perl5142delta Perl changes in version 5.14.2
- perl5141delta Perl changes in version 5.14.1
- perl5140delta Perl changes in version 5.14.0
- perl5125delta Perl changes in version 5.12.5
- perl5124delta Perl changes in version 5.12.4
- perl5123delta Perl changes in version 5.12.3
- perl5122delta Perl changes in version 5.12.2
- perl5121delta Perl changes in version 5.12.1
- perl5120delta Perl changes in version 5.12.0
- perl5101delta Perl changes in version 5.10.1
- perl5100delta Perl changes in version 5.10.0
- perl589delta Perl changes in version 5.8.9
- perl588delta Perl changes in version 5.8.8
- perl587delta Perl changes in version 5.8.7
- perl586delta Perl changes in version 5.8.6
- perl585delta Perl changes in version 5.8.5
- perl584delta Perl changes in version 5.8.4
- perl583delta Perl changes in version 5.8.3
- perl582delta Perl changes in version 5.8.2
- perl581delta Perl changes in version 5.8.1
- perl58delta Perl changes in version 5.8.0
- perl561delta Perl changes in version 5.6.1
- perl56delta Perl changes in version 5.6
- perl5005delta Perl changes in version 5.005
- perl5004delta Perl changes in version 5.004
- perlexperiment A listing of experimental features in Perl
- perlartistic Perl Artistic License
- perlgpl GNU General Public License
- perlcn Perl for Simplified Chinese (in EUC-CN)
- perljp Perl for Japanese (in EUC-JP)
- perlko Perl for Korean (in EUC-KR)
- perltw Perl for Traditional Chinese (in Big5)
- perlaix Perl notes for AIX
- perlamiga Perl notes for AmigaOS
- perlbs2000 Perl notes for POSIX-BC BS2000
- perlce Perl notes for WinCE
- perlcygwin Perl notes for Cygwin
- perldgux Perl notes for DG/UX
- perldos Perl notes for DOS
- perlfreebsd Perl notes for FreeBSD
- perlhaiku Perl notes for Haiku
- perlhpux Perl notes for HP-UX
- perlhurd Perl notes for Hurd
- perlirix Perl notes for Irix
- perllinux Perl notes for Linux
- perlmacos Perl notes for Mac OS (Classic)
- perlmacosx Perl notes for Mac OS X
- perlnetware Perl notes for NetWare
- perlopenbsd Perl notes for OpenBSD
- perlos2 Perl notes for OS/2
- perlos390 Perl notes for OS/390
- perlos400 Perl notes for OS/400
- perlplan9 Perl notes for Plan 9
- perlqnx Perl notes for QNX
- perlriscos Perl notes for RISC OS
- perlsolaris Perl notes for Solaris
- perlsymbian Perl notes for Symbian
- perltru64 Perl notes for Tru64
- perlvms Perl notes for VMS
- perlvos Perl notes for Stratus VOS
- perlwin32 Perl notes for Windows
On a Unix-like system, these documentation files will usually also be available as manpages for use with the man program.
Some documentation is not available as man pages, so if a
cross-reference is not found by man, try it with perldoc. Perldoc can
also take you directly to documentation for functions (with the -f
switch). See perldoc --help
(or perldoc perldoc
or man perldoc
)
for other helpful options perldoc has to offer.
In general, if something strange has gone wrong with your program and you're not sure where you should look for help, try making your code comply with use strict and use warnings. These will often point out exactly where the trouble is.
Perl officially stands for Practical Extraction and Report Language, except when it doesn't.
Perl was originally a language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It quickly became a good language for many system management tasks. Over the years, Perl has grown into a general-purpose programming language. It's widely used for everything from quick "one-liners" to full-scale application development.
The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). It combines (in the author's opinion, anyway) some of the best features of sed, awk, and sh, making it familiar and easy to use for Unix users to whip up quick solutions to annoying problems. Its general-purpose programming facilities support procedural, functional, and object-oriented programming paradigms, making Perl a comfortable language for the long haul on major projects, whatever your bent.
Perl's roots in text processing haven't been forgotten over the years. It still boasts some of the most powerful regular expressions to be found anywhere, and its support for Unicode text is world-class. It handles all kinds of structured text, too, through an extensive collection of extensions. Those libraries, collected in the CPAN, provide ready-made solutions to an astounding array of problems. When they haven't set the standard themselves, they steal from the best -- just like Perl itself.
Perl is available for most operating systems, including virtually all Unix-like platforms. See Supported Platforms in perlport for a listing.
See perlrun.
Larry Wall <larry@wall.org>, with the help of oodles of other folks.
If your Perl success stories and testimonials may be of help to others who wish to advocate the use of Perl in their applications, or if you wish to simply express your gratitude to Larry and the Perl developers, please write to perl-thanks@perl.org .
- "@INC" locations of perl libraries
- http://www.perl.org/ the Perl homepage
- http://www.perl.com/ Perl articles (O'Reilly)
- http://www.cpan.org/ the Comprehensive Perl Archive
- http://www.pm.org/ the Perl Mongers
Using the use strict
pragma ensures that all variables are properly
declared and prevents other misuses of legacy Perl features.
The use warnings
pragma produces some lovely diagnostics. One can
also use the -w flag, but its use is normally discouraged, because
it gets applied to all executed Perl code, including that not under
your control.
See perldiag for explanations of all Perl's diagnostics. The use
diagnostics
pragma automatically turns Perl's normally terse warnings
and errors into these longer forms.
Compilation errors will tell you the line number of the error, with an indication of the next token or token type that was to be examined. (In a script passed to Perl via -e switches, each -e is counted as one line.)
Setuid scripts have additional constraints that can produce error messages such as "Insecure dependency". See perlsec.
Did we mention that you should definitely consider using the use warnings pragma?
The behavior implied by the use warnings pragma is not mandatory.
Perl is at the mercy of your machine's definitions of various operations such as type casting, atof(), and floating-point output with sprintf().
If your stdio requires a seek or eof between reads and writes on a particular stream, so does Perl. (This doesn't apply to sysread() and syswrite().)
While none of the built-in data types have any arbitrary size limits (apart from memory size), there are still a few arbitrary limits: a given variable name may not be longer than 251 characters. Line numbers displayed by diagnostics are internally stored as short integers, so they are limited to a maximum of 65535 (higher numbers usually being affected by wraparound).
You may mail your bug reports (be sure to include full configuration
information as output by the myconfig program in the perl source
tree, or by perl -V
) to perlbug@perl.org . If you've succeeded
in compiling perl, the perlbug script in the utils/ subdirectory
can be used to help mail in a bug report.
Perl actually stands for Pathologically Eclectic Rubbish Lister, but don't tell anyone I said that.
The Perl motto is "There's more than one way to do it." Divining how many more is left as an exercise to the reader.
The three principal virtues of a programmer are Laziness, Impatience, and Hubris. See the Camel Book for why.
perl5004delta - what's new for perl5.004
This document describes differences between the 5.003 release (as documented in Programming Perl, second edition--the Camel Book) and this one.
Perl5.004 builds out of the box on Unix, Plan 9, LynxOS, VMS, OS/2, QNX, AmigaOS, and Windows NT. Perl runs on Windows 95 as well, but it cannot be built there, for lack of a reasonable command interpreter.
Most importantly, many bugs were fixed, including several security problems. See the Changes file in the distribution for details.
%ENV = ()
and %ENV = @list
now work as expected (except on VMS
where it generates a fatal error).
The error "Can't locate Foo.pm in @INC" now lists the contents of @INC for easier debugging.
There is a new Configure question that asks if you want to maintain binary compatibility with Perl 5.003. If you choose binary compatibility, you do not have to recompile your extensions, but you might have symbol conflicts if you embed Perl in another application, just as in the 5.003 release. By default, binary compatibility is preserved at the expense of symbol table pollution.
You may now put Perl options in the $PERL5OPT environment variable. Unless Perl is running with taint checks, it will interpret this variable as if its contents had appeared on a "#!perl" line at the beginning of your script, except that hyphens are optional. PERL5OPT may only be used to set the following switches: -[DIMUdmw].
The -M
and -m options are no longer allowed on the #!
line of
a script. If a script needs a module, it should invoke it with the
use pragma.
The -T option is also forbidden on the #!
line of a script,
unless it was present on the Perl command line. Due to the way #!
works, this usually means that -T must be in the first argument.
Thus:
- #!/usr/bin/perl -T -w
will probably work for an executable script invoked as scriptname
,
while:
- #!/usr/bin/perl -w -T
will probably fail under the same conditions. (Non-Unix systems will
probably not follow this rule.) But perl scriptname
is guaranteed
to fail, since then there is no chance of -T being found on the
command line before it is found on the #!
line.
If you removed the -w option from your Perl 5.003 scripts because it made Perl too verbose, we recommend that you try putting it back when you upgrade to Perl 5.004. Each new perl version tends to remove some undesirable warnings, while adding new warnings that may catch bugs in your scripts.
AUTOLOAD
for non-methodsBefore Perl 5.004, AUTOLOAD
functions were looked up as methods
(using the @ISA
hierarchy), even when the function to be autoloaded
was called as a plain function (e.g. Foo::bar()
), not a method
(e.g. Foo->bar()
or $obj->bar()
).
Perl 5.005 will use method lookup only for methods' AUTOLOAD
s.
However, there is a significant base of existing code that may be using
the old behavior. So, as an interim step, Perl 5.004 issues an optional
warning when a non-method uses an inherited AUTOLOAD
.
The simple rule is: Inheritance will not work when autoloading
non-methods. The simple fix for old code is: In any module that used to
depend on inheriting AUTOLOAD
for non-methods from a base class named
BaseClass
, execute *AUTOLOAD = \&BaseClass::AUTOLOAD
during startup.
Using %OVERLOAD to define overloading was deprecated in 5.003. Overloading is now defined using the overload pragma. %OVERLOAD is still used internally but should not be used by Perl scripts. See overload for more details.
In Perl 5.004, nonexistent array and hash elements used as subroutine
parameters are brought into existence only if they are actually
assigned to (via @_
).
Earlier versions of Perl vary in their handling of such arguments. Perl versions 5.002 and 5.003 always brought them into existence. Perl versions 5.000 and 5.001 brought them into existence only if they were not the first argument (which was almost certainly a bug). Earlier versions of Perl never brought them into existence.
For example, given this code:
After this code executes in Perl 5.004, $a{b} exists but $a[2] does not. In Perl 5.002 and 5.003, both $a{b} and $a[2] would have existed (but $a[2]'s value would have been undefined).
$)
The $)
special variable has always (well, in Perl 5, at least)
reflected not only the current effective group, but also the group list
as returned by the getgroups()
C function (if there is one).
However, until this release, there has not been a way to call the
setgroups()
C function from Perl.
In Perl 5.004, assigning to $)
is exactly symmetrical with examining
it: The first number in its string value is used as the effective gid;
if there are any numbers after the first one, they are passed to the
setgroups()
C function (if there is one).
Perl versions before 5.004 misinterpreted any type marker followed by "$" and a digit. For example, "$$0" was incorrectly taken to mean "${$}0" instead of "${$0}". This bug is (mostly) fixed in Perl 5.004.
However, the developers of Perl 5.004 could not fix this bug completely, because at least two widely-used modules depend on the old meaning of "$$0" in a string. So Perl 5.004 still interprets "$$<digit>" in the old (broken) way inside strings; but it generates this message as a warning. And in Perl 5.005, this special treatment will cease.
Perl versions before 5.004 did not always properly localize the regex-related special variables. Perl 5.004 does localize them, as the documentation has always said it should. This may result in $1, $2, etc. no longer being set where existing programs use them.
The documentation for Perl 5.0 has always stated that $.
is not
reset when an already-open file handle is reopened with no intervening
call to close. Due to a bug, perl versions 5.000 through 5.003
did reset $.
under that circumstance; Perl 5.004 does not.
wantarray may return undefThe wantarray operator returns true if a subroutine is expected to
return a list, and false otherwise. In Perl 5.004, wantarray can
also return the undefined value if a subroutine's return value will
not be used at all, which allows subroutines to avoid a time-consuming
calculation of a return value if it isn't going to be used.
eval EXPR
determines value of EXPR in scalar contextPerl (version 5) used to determine the value of EXPR inconsistently, sometimes incorrectly using the surrounding context for the determination. Now, the value of EXPR (before being parsed by eval) is always determined in a scalar context. Once parsed, it is executed as before, by providing the context that the scope surrounding the eval provided. This change makes the behavior Perl4 compatible, besides fixing bugs resulting from the inconsistent behavior. This program:
used to print something like "timenowis881399109|4", but now (and in perl4) prints "4|4".
A bug in previous versions may have failed to detect some insecure
conditions when taint checks are turned on. (Taint checks are used
in setuid or setgid scripts, or when explicitly turned on with the
-T
invocation option.) Although it's unlikely, this may cause a
previously-working script to now fail, which should be construed
as a blessing since that indicates a potentially-serious security
hole was just plugged.
The new restrictions when tainting include:
These operators may spawn the C shell (csh), which cannot be made safe. This restriction will be lifted in a future version of Perl when globbing is implemented without the use of an external program.
These environment variables may alter the behavior of spawned programs (especially shells) in ways that subvert security. So now they are treated as dangerous, in the manner of $IFS and $PATH.
Some termcap libraries do unsafe things with $TERM. However, it would be unnecessarily harsh to treat all $TERM values as unsafe, since only shell metacharacters can cause trouble in $TERM. So a tainted $TERM is considered to be safe if it contains only alphanumerics, underscores, dashes, and colons, and unsafe if it contains other characters (including whitespace).
A new Opcode module supports the creation, manipulation and application of opcode masks. The revised Safe module has a new API and is implemented using the new Opcode module. Please read the new Opcode and Safe documentation.
In older versions of Perl it was not possible to create more than one Perl interpreter instance inside a single process without leaking like a sieve and/or crashing. The bugs that caused this behavior have all been fixed. However, you still must take care when embedding Perl in a C program. See the updated perlembed manpage for tips on how to manage your interpreters.
File handles are now stored internally as type IO::Handle. The FileHandle module is still supported for backwards compatibility, but it is now merely a front end to the IO::* modules, specifically IO::Handle, IO::Seekable, and IO::File. We suggest, but do not require, that you use the IO::* modules in new code.
In harmony with this change, *GLOB{FILEHANDLE}
is now just a
backward-compatible synonym for *GLOB{IO}
.
It is now possible to build Perl with AT&T's sfio IO package instead of stdio. See perlapio for more details, and the INSTALL file for how to use it.
A subroutine reference may now be suffixed with an arrow and a (possibly empty) parameter list. This syntax denotes a call of the referenced subroutine, with the given parameters (if any).
This new syntax follows the pattern of $hashref->{FOO}
and
$aryref->[$foo]
: You may now write &$subref($foo)
as
$subref->($foo)
. All these arrow terms may be chained;
thus, &{$table->{FOO}}($bar)
may now be written
$table->{FOO}->($bar)
.
The current package name at compile time, or the undefined value if
there is no current package (due to a package; directive). Like
__FILE__
and __LINE__
, __PACKAGE__
does not interpolate
into strings.
Extended error message on some platforms. (Also known as
$EXTENDED_OS_ERROR if you use English
).
The current set of syntax checks enabled by use strict
. See the
documentation of strict
for more details. Not actually new, but
newly documented.
Because it is intended for internal use by Perl core components,
there is no use English
long name for this variable.
By default, running out of memory it is not trappable. However, if
compiled for this, Perl may use the contents of $^M
as an emergency
pool after die()ing with this message. Suppose that your Perl were
compiled with -DPERL_EMERGENCY_SBRK and used Perl's malloc. Then
- $^M = 'a' x (1<<16);
would allocate a 64K buffer for use when in emergency.
See the INSTALL file for information on how to enable this option.
As a disincentive to casual use of this advanced feature,
there is no use English
long name for this variable.
This now works. (e.g. delete @ENV{'PATH', 'MANPATH'}
)
is now supported on more platforms, prefers fcntl to lockf when emulating, and always flushes before (un)locking.
Perl now implements these functions itself; it doesn't use the C library function sprintf() any more, except for floating-point numbers, and even then only known flags are allowed. As a result, it is now possible to know which conversions and flags will work, and what they will do.
The new conversions in Perl's sprintf() are:
- %i a synonym for %d
- %p a pointer (the address of the Perl value, in hexadecimal)
- %n special: *stores* the number of characters output so far
- into the next variable in the parameter list
The new flags that go between the %
and the conversion are:
- # prefix octal with "0", hex with "0x"
- h interpret integer as C type "short" or "unsigned short"
- V interpret integer as Perl's standard integer type
Also, where a number would appear in the flags, an asterisk ("*") may be used instead, in which case Perl uses the next item in the parameter list as the given number (that is, as the field width or precision). If a field width obtained through "*" is negative, it has the same effect as the '-' flag: left-justification.
See sprintf for a complete list of conversion and flags.
As an lvalue, keys allows you to increase the number of hash buckets
allocated for the given hash. This can gain you a measure of efficiency if
you know the hash is going to get big. (This is similar to pre-extending
an array by assigning a larger number to $#array.) If you say
- keys %hash = 200;
then %hash
will have at least 200 buckets allocated for it. These
buckets will be retained even if you do %hash = ()
; use undef
%hash
if you want to free the storage while %hash
is still in scope.
You can't shrink the number of buckets allocated for the hash using
keys in this way (but you needn't worry about doing this by accident,
as trying has no effect).
You can now use my() (with or without the parentheses) in the control expressions of control structures such as:
Also, you can declare a foreach loop control variable as lexical by preceding it with the word "my". For example, in:
- foreach my $i (1, 2, 3) {
- some_function();
- }
$i is a lexical variable, and the scope of $i extends to the end of the loop, but not beyond it.
Note that you still cannot use my() on global punctuation variables such as $_ and the like.
A new format 'w' represents a BER compressed integer (as defined in ASN.1). Its format is a sequence of one or more bytes, each of which provides seven bits of the total value, with the most significant first. Bit eight of each byte is set, except for the last byte, in which bit eight is clear.
If 'p' or 'P' are given undef as values, they now generate a NULL pointer.
Both pack() and unpack() now fail when their templates contain invalid types. (Invalid types used to be ignored.)
The new sysseek() operator is a variant of seek() that sets and gets the file's system read/write position, using the lseek(2) system call. It is the only reliable way to seek before using sysread() or syswrite(). Its return value is the new position, or the undefined value on failure.
If the first argument to use is a number, it is treated as a version
number instead of a module name. If the version of the Perl interpreter
is less than VERSION, then an error message is printed and Perl exits
immediately. Because use occurs at compile time, this check happens
immediately during the compilation process, unlike require VERSION
,
which waits until runtime for the check. This is often useful if you
need to check the current Perl version before useing library modules
which have changed in incompatible ways from older versions of Perl.
(We try not to do this more than we have to.)
If the VERSION argument is present between Module and LIST, then the
use will call the VERSION method in class Module with the given
version as an argument. The default VERSION method, inherited from
the UNIVERSAL class, croaks if the given version is larger than the
value of the variable $Module::VERSION. (Note that there is not a
comma after VERSION!)
This version-checking mechanism is similar to the one currently used in the Exporter module, but it is faster and can be used with modules that don't use the Exporter. It is the recommended method for new code.
Returns the prototype of a function as a string (or undef if the
function has no prototype). FUNCTION is a reference to or the name of the
function whose prototype you want to retrieve.
(Not actually new; just never documented before.)
The default seed for srand, which used to be time, has been changed.
Now it's a heady mix of difficult-to-predict system-dependent values,
which should be sufficient for most everyday purposes.
Previous to version 5.004, calling rand without first calling srand
would yield the same sequence of random numbers on most or all machines.
Now, when perl sees that you're calling rand and haven't yet called
srand, it calls srand with the default seed. You should still call
srand manually if your code might ever be run on a pre-5.004 system,
of course, or if you want a seed other than the default.
Functions documented in the Camel to default to $_ now in fact do, and all those that do are so documented in perlfunc.
m//gc does not reset search position on failure
The m//g match iteration construct has always reset its target
string's search position (which is visible through the pos operator)
when a match fails; as a result, the next m//g match after a failure
starts again at the beginning of the string. With Perl 5.004, this
reset may be disabled by adding the "c" (for "continue") modifier,
i.e. m//gc. This feature, in conjunction with the \G
zero-width
assertion, makes it possible to chain matches together. See perlop
and perlre.
m//x ignores whitespace before ?*+{}
The m//x construct has always been intended to ignore all unescaped
whitespace. However, before Perl 5.004, whitespace had the effect of
escaping repeat modifiers like "*" or "?"; for example, /a *b/x
was
(mis)interpreted as /a\*b/x
. This bug has been fixed in 5.004.
sub{} closures work now
Prior to the 5.004 release, nested anonymous functions didn't work right. They do now.
Just like anonymous functions that contain lexical variables
that change (like a lexical index variable for a foreach
loop),
formats now work properly. For example, this silently failed
before (printed only zeros), but is fine now:
- my $i;
- foreach $i ( 1 .. 10 ) {
- write;
- }
- format =
- my i is @#
- $i
- .
However, it still fails (without a warning) if the foreach is within a subroutine:
- my $i;
- sub foo {
- foreach $i ( 1 .. 10 ) {
- write;
- }
- }
- foo;
- format =
- my i is @#
- $i
- .
The UNIVERSAL
package automatically contains the following methods that
are inherited by all other classes:
isa
returns true if its object is blessed into a subclass of CLASS
isa
is also exportable and can be called as a sub with two arguments. This
allows the ability to check what a reference points to. Example:
- use UNIVERSAL qw(isa);
- if(isa($ref, 'ARRAY')) {
- ...
- }
can
checks to see if its object has a method called METHOD
,
if it does then a reference to the sub is returned; if it does not then
undef is returned.
VERSION
returns the version number of the class (package). If the
NEED argument is given then it will check that the current version (as
defined by the $VERSION variable in the given package) not less than
NEED; it will die if this is not the case. This method is normally
called as a class method. This method is called automatically by the
VERSION
form of use.
- use A 1.2 qw(some imported subs);
- # implies:
- A->VERSION(1.2);
NOTE: can
directly uses Perl's internal code for method lookup, and
isa
uses a very similar method and caching strategy. This may cause
strange effects if the Perl code dynamically changes @ISA in any package.
You may add other methods to the UNIVERSAL class via Perl or XS code.
You do not need to use UNIVERSAL
in order to make these methods
available to your program. This is necessary only if you wish to
have isa
available as a plain subroutine in the current package.
See perltie for other kinds of tie()s.
This is the constructor for the class. That means it is expected to return an object of some sort. The reference can be used to hold some internal information.
This method will be triggered every time the tied handle is printed to. Beyond its self reference it also expects the list that was passed to the print function.
This method will be triggered every time the tied handle is printed to
with the printf() function.
Beyond its self reference it also expects the format and list that was
passed to the printf function.
This method will be called when the handle is read from via the read
or sysread functions.
This method will be called when the handle is read from. The method should return undef when there is no more data.
This method will be called when the getc function is called.
As with the other types of ties, this method will be called when the tied handle is about to be destroyed. This is useful for debugging and possibly for cleaning up.
- sub DESTROY {
- print "</shout>\n";
- }
If perl is compiled with the malloc included with the perl distribution
(that is, if perl -V:d_mymalloc is 'define') then you can print
memory statistics at runtime by running Perl thusly:
- env PERL_DEBUG_MSTATS=2 perl your_script_here
The value of 2 means to print statistics after compilation and on exit; with a value of 1, the statistics are printed only on exit. (If you want the statistics at an arbitrary time, you'll need to install the optional module Devel::Peek.)
Three new compilation flags are recognized by malloc.c. (They have no effect if perl is compiled with system malloc().)
If this macro is defined, running out of memory need not be a fatal
error: a memory pool can allocated by assigning to the special
variable $^M
. See $^M.
Perl memory allocation is by bucket with sizes close to powers of two.
Because of these malloc overhead may be big, especially for data of
size exactly a power of two. If PACK_MALLOC
is defined, perl uses
a slightly different algorithm for small allocations (up to 64 bytes
long), which makes it possible to have overhead down to 1 byte for
allocations which are powers of two (and appear quite often).
Expected memory savings (with 8-byte alignment in alignbytes
) is
about 20% for typical Perl usage. Expected slowdown due to additional
malloc overhead is in fractions of a percent (hard to measure, because
of the effect of saved memory on speed).
Similarly to PACK_MALLOC
, this macro improves allocations of data
with size close to a power of two; but this works for big allocations
(starting with 16K by default). Such allocations are typical for big
hashes and special-purpose scripts, especially image processing.
On recent systems, the fact that perl requires 2M from system for 1M allocation will not affect speed of execution, since the tail of such a chunk is not going to be touched (and thus will not require real memory). However, it may result in a premature out-of-memory error. So if you will be manipulating very large blocks with sizes close to powers of two, it would be wise to define this macro.
Expected saving of memory is 0-100% (100% in applications which require most memory in such 2**n chunks); expected slowdown is negligible.
Functions that have an empty prototype and that do nothing but return
a fixed value are now inlined (e.g. sub PI () { 3.14159 }
).
Each unique hash key is only allocated once, no matter how many hashes have an entry with that key. So even if you have 100 copies of the same hash, the hash keys never have to be reallocated.
Support for the following operating systems is new in Perl 5.004.
Perl 5.004 now includes support for building a "native" perl under Windows NT, using the Microsoft Visual C++ compiler (versions 2.0 and above) or the Borland C++ compiler (versions 5.02 and above). The resulting perl can be used under Windows 95 (if it is installed in the same directory locations as it got installed in Windows NT). This port includes support for perl extension building tools like ExtUtils::MakeMaker and h2xs, so that many extensions available on the Comprehensive Perl Archive Network (CPAN) can now be readily built under Windows NT. See http://www.perl.com/ for more information on CPAN and README.win32 in the perl distribution for more details on how to get started with building this port.
There is also support for building perl under the Cygwin32 environment. Cygwin32 is a set of GNU tools that make it possible to compile and run many Unix programs under Windows NT by providing a mostly Unix-like interface for compilation and execution. See README.cygwin32 in the perl distribution for more details on this port and how to obtain the Cygwin32 toolkit.
See README.plan9 in the perl distribution.
See README.qnx in the perl distribution.
See README.amigaos in the perl distribution.
Six new pragmatic modules exist:
Defers require MODULE
until someone calls one of the specified
subroutines (which must be exported by MODULE). This pragma should be
used with caution, and only when necessary.
Looks for MakeMaker-like 'blib' directory structure starting in dir (or current directory) and working back up to five levels of parent directories.
Intended for use on command line with -M option as a way of testing arbitrary scripts against an uninstalled version of a package.
Provides a convenient interface for creating compile-time constants, See Constant Functions in perlsub.
Tells the compiler to enable (or disable) the use of POSIX locales for builtin operations.
When use locale
is in effect, the current LC_CTYPE locale is used
for regular expressions and case mapping; LC_COLLATE for string
ordering; and LC_NUMERIC for numeric formatting in printf and sprintf
(but not in print). LC_NUMERIC is always used in write, since
lexical scoping of formats is problematic at best.
Each use locale
or no locale
affects statements to the end of
the enclosing BLOCK or, if not inside a BLOCK, to the end of the
current file. Locales can be switched and queried with
POSIX::setlocale().
See perllocale for more information.
Disable unsafe opcodes, or any named opcodes, when compiling Perl code.
Enable VMS-specific language features. Currently, there are three
VMS-specific features available: 'status', which makes $?
and
system return genuine VMS status values instead of emulating POSIX;
'exit', which makes exit take a genuine VMS status value instead of
assuming that exit 1
is an error; and 'time', which makes all times
relative to the local time zone, in the VMS tradition.
Though Perl 5.004 is compatible with almost all modules that work with Perl 5.003, there are a few exceptions:
- Module Required Version for Perl 5.004
- ------ -------------------------------
- Filter Filter-1.12
- LWP libwww-perl-5.08
- Tk Tk400.202 (-w makes noise)
Also, the majordomo mailing list program, version 1.94.1, doesn't work with Perl 5.004 (nor with perl 4), because it executes an invalid regular expression. This bug is fixed in majordomo version 1.94.2.
The installperl script now places the Perl source files for extensions in the architecture-specific library directory, which is where the shared libraries for extensions have always been. This change is intended to allow administrators to keep the Perl 5.004 library directory unchanged from a previous version, without running the risk of binary incompatibility between extensions' Perl source and shared libraries.
Brand new modules, arranged by topic rather than strictly alphabetically:
- CGI.pm Web server interface ("Common Gateway Interface")
- CGI/Apache.pm Support for Apache's Perl module
- CGI/Carp.pm Log server errors with helpful context
- CGI/Fast.pm Support for FastCGI (persistent server process)
- CGI/Push.pm Support for server push
- CGI/Switch.pm Simple interface for multiple server types
- CPAN Interface to Comprehensive Perl Archive Network
- CPAN::FirstTime Utility for creating CPAN configuration file
- CPAN::Nox Runs CPAN while avoiding compiled extensions
- IO.pm Top-level interface to IO::* classes
- IO/File.pm IO::File extension Perl module
- IO/Handle.pm IO::Handle extension Perl module
- IO/Pipe.pm IO::Pipe extension Perl module
- IO/Seekable.pm IO::Seekable extension Perl module
- IO/Select.pm IO::Select extension Perl module
- IO/Socket.pm IO::Socket extension Perl module
- Opcode.pm Disable named opcodes when compiling Perl code
- ExtUtils/Embed.pm Utilities for embedding Perl in C programs
- ExtUtils/testlib.pm Fixes up @INC to use just-built extension
- FindBin.pm Find path of currently executing program
- Class/Struct.pm Declare struct-like datatypes as Perl classes
- File/stat.pm By-name interface to Perl's builtin stat
- Net/hostent.pm By-name interface to Perl's builtin gethost*
- Net/netent.pm By-name interface to Perl's builtin getnet*
- Net/protoent.pm By-name interface to Perl's builtin getproto*
- Net/servent.pm By-name interface to Perl's builtin getserv*
- Time/gmtime.pm By-name interface to Perl's builtin gmtime
- Time/localtime.pm By-name interface to Perl's builtin localtime
- Time/tm.pm Internal object for Time::{gm,local}time
- User/grent.pm By-name interface to Perl's builtin getgr*
- User/pwent.pm By-name interface to Perl's builtin getpw*
- Tie/RefHash.pm Base class for tied hashes with references as keys
- UNIVERSAL.pm Base class for *ALL* classes
New constants in the existing Fcntl modules are now supported, provided that your operating system happens to support them:
- F_GETOWN F_SETOWN
- O_ASYNC O_DEFER O_DSYNC O_FSYNC O_SYNC
- O_EXLOCK O_SHLOCK
These constants are intended for use with the Perl operators sysopen() and fcntl() and the basic database modules like SDBM_File. For the exact meaning of these and other Fcntl constants please refer to your operating system's documentation for fcntl() and open().
In addition, the Fcntl module now provides these constants for use with the Perl operator flock():
- LOCK_SH LOCK_EX LOCK_NB LOCK_UN
These constants are defined in all environments (because where there is
no flock() system call, Perl emulates it). However, for historical
reasons, these constants are not exported unless they are explicitly
requested with the ":flock" tag (e.g. use Fcntl ':flock'
).
The IO module provides a simple mechanism to load all the IO modules at one go. Currently this includes:
- IO::Handle
- IO::Seekable
- IO::File
- IO::Pipe
- IO::Socket
For more information on any of these modules, please see its respective documentation.
The Math::Complex module has been totally rewritten, and now supports more operations. These are overloaded:
- + - * / ** <=> neg ~ abs sqrt exp log sin cos atan2 "" (stringify)
And these functions are now exported:
- pi i Re Im arg
- log10 logn ln cbrt root
- tan
- csc sec cot
- asin acos atan
- acsc asec acot
- sinh cosh tanh
- csch sech coth
- asinh acosh atanh
- acsch asech acoth
- cplx cplxe
This new module provides a simpler interface to parts of Math::Complex for those who need trigonometric functions only for real numbers.
There have been quite a few changes made to DB_File. Here are a few of the highlights:
Fixed a handful of bugs.
By public demand, added support for the standard hash function exists().
Made it compatible with Berkeley DB 1.86.
Made negative subscripts work with RECNO interface.
Changed the default flags from O_RDWR to O_CREAT|O_RDWR and the default mode from 0640 to 0666.
Made DB_File automatically import the open() constants (O_RDWR, O_CREAT etc.) from Fcntl, if available.
Updated documentation.
Refer to the HISTORY section in DB_File.pm for a complete list of changes. Everything after DB_File 1.01 has been added since 5.003.
Major rewrite - support added for both udp echo and real icmp pings.
Many of the Perl builtins returning lists now have object-oriented overrides. These are:
- File::stat
- Net::hostent
- Net::netent
- Net::protoent
- Net::servent
- Time::gmtime
- Time::localtime
- User::grent
- User::pwent
For example, you can now say
The pod2html utility included with Perl 5.004 is entirely new. By default, it sends the converted HTML to its standard output, instead of writing it to a file like Perl 5.003's pod2html did. Use the --outfile=FILENAME option to write to a file.
void
XSUBs now default to returning nothing
Due to a documentation/implementation bug in previous versions of
Perl, XSUBs with a return type of void
have actually been
returning one value. Usually that value was the GV for the XSUB,
but sometimes it was some already freed or reused value, which would
sometimes lead to program failure.
In Perl 5.004, if an XSUB is declared as returning void
, it
actually returns no value, i.e. an empty list (though there is a
backward-compatibility exception; see below). If your XSUB really
does return an SV, you should give it a return type of SV *
.
For backward compatibility, xsubpp tries to guess whether a
void
XSUB is really void
or if it wants to return an SV *
.
It does so by examining the text of the XSUB: if xsubpp finds
what looks like an assignment to ST(0)
, it assumes that the
XSUB's return type is really SV *
.
gv_fetchmethod
and perl_call_sv
The gv_fetchmethod
function finds a method for an object, just like
in Perl 5.003. The GV it returns may be a method cache entry.
However, in Perl 5.004, method cache entries are not visible to users;
therefore, they can no longer be passed directly to perl_call_sv
.
Instead, you should use the GvCV
macro on the GV to extract its CV,
and pass the CV to perl_call_sv
.
The most likely symptom of passing the result of gv_fetchmethod
to
perl_call_sv
is Perl's producing an "Undefined subroutine called"
error on the second call to a given method (since there is no cache
on the first call).
perl_eval_pv
A new function handy for eval'ing strings of Perl code inside C code. This function returns the value from the eval statement, which can be used instead of fetching globals from the symbol table. See perlguts, perlembed and perlcall for details and examples.
Internal handling of hash keys has changed. The old hashtable API is
still fully supported, and will likely remain so. The additions to the
API allow passing keys as SV*
s, so that tied hashes can be given
real scalars as keys rather than plain strings (nontied hashes still
can only use strings as keys). New extensions must use the new hash
access functions and macros if they wish to use SV*
keys. These
additions also make it feasible to manipulate HE*
s (hash entries),
which can be more efficient. See perlguts for details.
Many of the base and library pods were updated. These new pods are included in section 1:
This document.
Frequently asked questions.
Locale support (internationalization and localization).
Tutorial on Perl OO programming.
Perl internal IO abstraction interface.
Perl module library and recommended practice for module creation. Extracted from perlmod (which is much smaller as a result).
Although not new, this has been massively updated.
Although not new, this has been massively updated.
Several new conditions will trigger warnings that were silent before. Some only affect certain platforms. The following new warnings and errors outline these. These messages are classified as follows (listed in increasing order of desperation):
- (W) A warning (optional).
- (D) A deprecation (optional).
- (S) A severe warning (mandatory).
- (F) A fatal error (trappable).
- (P) An internal error you should never see (trappable).
- (X) A very fatal error (nontrappable).
- (A) An alien error message (not generated by Perl).
(W) A lexical variable has been redeclared in the same scope, effectively eliminating all access to the previous instance. This is almost always a typographical error. Note that the earlier variable will still exist until the end of the scope or until all closure referents to it are destroyed.
(F) The argument to delete() must be either a hash element, such as
- $foo{$bar}
- $ref->[12]->{"susie"}
or a hash slice, such as
- @foo{$bar, $baz, $xyzzy}
- @{$ref->[12]}{"susie", "queue"}
(X) You can't allocate more than 64K on an MS-DOS machine.
(F) You can't allocate more than 2^31+"small amount" bytes.
(W) The pattern match (//), substitution (s///), and transliteration (tr///) operators work on scalar values. If you apply one of them to an array or a hash, it will convert the array or hash to a scalar value (the length of an array or the population info of a hash) and then work on that scalar value. This is probably not what you meant to do. See grep and map for alternatives.
(P) Perl maintains a reference counted internal table of strings to optimize the storage and access of hash keys and other strings. This indicates someone tried to decrement the reference count of a string that can no longer be found in the table.
(W) You supplied a reference as the first argument to substr() used as an lvalue, which is pretty strange. Perhaps you forgot to dereference it first. See substr.
(W) You used a qualified bareword of the form Foo::
, but
the compiler saw no other uses of that namespace before that point.
Perhaps you need to predeclare a package?
(F) Perl optimizes the internal handling of sort subroutines and keeps
pointers into them. You tried to redefine one such sort subroutine when it
was currently active, which is not allowed. If you really want to do
this, you should write sort { &func } @x
instead of sort func @x
.
(F) Only hard references are allowed by "strict refs". Symbolic references are disallowed. See perlref.
(P) Internal error trying to resolve overloading specified by a method name (as opposed to a subroutine reference).
(S) You redefined a subroutine which had previously been eligible for inlining. See Constant Functions in perlsub for commentary and workarounds.
(S) You undefined a subroutine which had previously been eligible for inlining. See Constant Functions in perlsub for commentary and workarounds.
(F) The method which overloads "=" is buggy. See Copy Constructor in overload.
(F) You passed die() an empty string (the equivalent of die ""
) or
you called it with no args and both $@
and $_
were empty.
(W) You are exiting a rather special block construct (like a sort block or subroutine) by unconventional means, such as a goto, or a loop control statement. See sort.
(F) Perl limits identifiers (names for variables, functions, etc.) to
252 characters for simple names, somewhat more for compound names (like
$A::B
). You've exceeded Perl's limits. Future versions of Perl are
likely to eliminate these arbitrary limitations.
(F) A carriage return character was found in the input. This is an
error, and not a warning, because carriage return characters can break
multi-line strings, including here documents (e.g., print <<EOF;).
(X) The PERL5OPT environment variable may only be used to set the following switches: -[DIMUdmw].
(S) The literal hex number you have specified is too big for your architecture. On a 32-bit architecture the largest hex literal is 0xFFFFFFFF.
(S) The literal octal number you have specified is too big for your architecture. On a 32-bit architecture the largest octal literal is 037777777777.
(P) Something went wrong with the external program(s) used for glob
and <*.c>
. This may mean that your csh (C shell) is
broken. If so, you should change all of the csh-related variables in
config.sh: If you have tcsh, make the variables refer to it as if it
were csh (e.g. full_csh='/usr/bin/tcsh'
); otherwise, make them all
empty (except that d_csh
should be 'undef'
) so that Perl will
think csh is missing. In either case, after editing config.sh, run
./Configure -S and rebuild Perl.
(W) Perl does not understand the given format conversion. See sprintf.
(F) The given character is not a valid pack type. See pack.
(F) The given character is not a valid unpack type. See unpack.
(W) Typographical errors often show up as unique variable names.
If you had a good reason for having a unique name, then just mention
it again somehow to suppress the message (the use vars
pragma is
provided for just this purpose).
(F) The first argument to formline must be a valid format picture specification. It was found to be empty, which probably means you supplied it an uninitialized value. See perlform.
(F) You tried to do a read/write/send/recv operation with an offset
pointing outside the buffer. This is difficult to imagine.
The sole exception to this is that sysread()ing past the buffer
will extend the buffer and zero pad the new area.
(X|F) The malloc() function returned 0, indicating there was insufficient remaining memory (or virtual memory) to satisfy the request.
The request was judged to be small, so the possibility to trap it
depends on the way Perl was compiled. By default it is not trappable.
However, if compiled for this, Perl may use the contents of $^M
as
an emergency pool after die()ing with this message. In this case the
error is trappable once.
(F) The malloc() function returned 0, indicating there was insufficient remaining memory (or virtual memory) to satisfy the request. However, the request was judged large enough (compile-time default is 64K), so a possibility to shut down by trapping this error is granted.
(P) The library function frexp() failed, making printf("%f") impossible.
(W) qw() lists contain items separated by whitespace; as with literal strings, comment characters are not ignored, but are instead treated as literal data. (You may have used different delimiters than the parentheses shown here; braces are also frequently used.)
You probably wrote something like this:
- @list = qw(
- a # a comment
- b # another comment
- );
when you should have written this:
- @list = qw(
- a
- b
- );
If you really want comments, build your list the old-fashioned way, with quotes and commas:
- @list = (
- 'a', # a comment
- 'b', # another comment
- );
(W) qw() lists contain items separated by whitespace; therefore commas aren't needed to separate the items. (You may have used different delimiters than the parentheses shown here; braces are also frequently used.)
You probably wrote something like this:
- qw! a, b, c !;
which puts literal commas into some of the list items. Write it without commas if you don't want them to appear in your data:
- qw! a b c !;
(W) You've used a hash slice (indicated by @) to select a single element of
a hash. Generally it's better to ask for a scalar value (indicated by $).
The difference is that $foo{&bar}
always behaves like a scalar, both when
assigning to it and when evaluating its argument, while @foo{&bar}
behaves
like a list when you assign to it, and provides a list context to its
subscript, which can do weird things if you're expecting only one subscript.
(P) Overloading resolution over @ISA tree may be broken by importing stubs.
Stubs should never be implicitly created, but explicit calls to can
may break this.
(X) The #! line (or local equivalent) in a Perl script contains the -T option, but Perl was not invoked with -T in its argument list. This is an error because, by the time Perl discovers a -T in a script, it's too late to properly taint everything from the environment. So Perl gives up.
(W) A copy of the object returned from tie (or tied) was still
valid when untie was called.
(F) The Perl parser has no idea what to do with the specified character in your Perl script (or eval). Perhaps you tried to run a compressed script, a binary program, or a directory as a Perl program.
(F) Your version of executable does not support forking.
Note that under some systems, like OS/2, there may be different flavors of
Perl executables, some of which may support fork, some not. Try changing
the name you call Perl by to perl_
, perl__
, and so on.
(D) Perl versions before 5.004 misinterpreted any type marker followed by "$" and a digit. For example, "$$0" was incorrectly taken to mean "${$}0" instead of "${$0}". This bug is (mostly) fixed in Perl 5.004.
However, the developers of Perl 5.004 could not fix this bug completely, because at least two widely-used modules depend on the old meaning of "$$0" in a string. So Perl 5.004 still interprets "$$<digit>" in the old (broken) way inside strings; but it generates this message as a warning. And in Perl 5.005, this special treatment will cease.
(W) In a conditional expression, you used <HANDLE>, <*> (glob), each(),
or readdir() as a boolean value. Each of these constructs can return a
value of "0"; that would make the conditional expression false, which is
probably not what you intended. When using these constructs in conditional
expressions, test their values with the defined operator.
(W) An inner (nested) anonymous subroutine is inside a named subroutine, and outside that is another subroutine; and the anonymous (innermost) subroutine is referencing a lexical variable defined in the outermost subroutine. For example:
If the anonymous subroutine is called or referenced (directly or indirectly) from the outermost subroutine, it will share the variable as you would expect. But if the anonymous subroutine is called or referenced when the outermost subroutine is not active, it will see the value of the shared variable as it was before and during the *first* call to the outermost subroutine, which is probably not what you want.
In these circumstances, it is usually best to make the middle
subroutine anonymous, using the sub {}
syntax. Perl has specific
support for shared variables in nested anonymous subroutines; a named
subroutine in between interferes with this feature.
(W) An inner (nested) named subroutine is referencing a lexical variable defined in an outer subroutine.
When the inner subroutine is called, it will probably see the value of the outer subroutine's variable as it was before and during the *first* call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared.
Furthermore, if the outer subroutine is anonymous and references a lexical variable outside itself, then the outer and inner subroutines will never share the given variable.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {}
syntax. When inner anonymous subs that
reference variables in outer subroutines are called or referenced,
they are automatically rebound to the current values of such
variables.
(W) You passed warn() an empty string (the equivalent of warn ""
) or
you called it with no args and $_
was empty.
(W) A warning peculiar to VMS. A logical name was encountered when preparing to iterate over %ENV which violates the syntactic rules governing logical names. Since it cannot be translated normally, it is skipped, and will not appear in %ENV. This may be a benign occurrence, as some software packages might directly modify logical name tables and introduce nonstandard names, or it may indicate that a logical name table has been corrupted.
(P) An error peculiar to OS/2. Most probably you're using an obsolete version of Perl, and this should not happen anyway.
(F) An error peculiar to OS/2. PERLLIB_PREFIX should be of the form
- prefix1;prefix2
or
- prefix1 prefix2
with nonempty prefix1 and prefix2. If prefix1
is indeed a prefix
of a builtin library search path, prefix2 is substituted. The error
may appear if components are not found, or are too long. See
"PERLLIB_PREFIX" in README.os2.
(F) An error peculiar to OS/2. PERL_SH_DIR is the directory to find the
sh
-shell in. See "PERL_SH_DIR" in README.os2.
(W) This is a standard message issued by OS/2 applications, while *nix applications die in silence. It is considered a feature of the OS/2 port. One can easily disable this by appropriate sighandlers, see Signals in perlipc. See also "Process terminated by SIGTERM/SIGINT" in README.os2.
If you find what you think is a bug, you might check the headers of recently posted articles in the comp.lang.perl.misc newsgroup. There may also be information at http://www.perl.com/perl/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Make sure you trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to <perlbug@perl.com> to be
analysed by the Perl porting team.
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl. This file has been significantly updated for 5.004, so even veteran users should look through it.
The README file for general stuff.
The Copying file for copyright information.
Constructed by Tom Christiansen, grabbing material with permission from innumerable contributors, with kibitzing by more than a few Perl porters.
Last update: Wed May 14 11:14:09 EDT 1997
perl5005delta - what's new for perl5.005
This document describes differences between the 5.004 release and this one.
Perl is now developed on two tracks: a maintenance track that makes
small, safe updates to released production versions with emphasis on
compatibility; and a development track that pursues more aggressive
evolution. Maintenance releases (which should be considered production
quality) have subversion numbers that run from 1
to 49
, and
development releases (which should be considered "alpha" quality) run
from 50
to 99
.
Perl 5.005 is the combined product of the new dual-track development scheme.
Starting with Perl 5.004_50 there were many deep and far-reaching changes to the language internals. If you have dynamically loaded extensions that you built under perl 5.003 or 5.004, you can continue to use them with 5.004, but you will need to rebuild and reinstall those extensions to use them 5.005. See INSTALL for detailed instructions on how to upgrade.
The new Configure defaults are designed to allow a smooth upgrade from 5.004 to 5.005, but you should read INSTALL for a detailed discussion of the changes in order to adapt them to your system.
When none of the experimental features are enabled, there should be very few user-visible Perl source compatibility issues.
If threads are enabled, then some caveats apply. @_
and $_
become
lexical variables. The effect of this should be largely transparent to
the user, but there are some boundary conditions under which user will
need to be aware of the issues. For example, local(@_) results in
a "Can't localize lexical variable @_ ..." message. This may be enabled
in a future version.
Some new keywords have been introduced. These are generally expected to have very little impact on compatibility. See New INIT keyword, New lock keyword, and / operator in New qr.
Certain barewords are now reserved. Use of these will provoke a warning
if you have asked for them with the -w
switch.
See our is now a reserved word.
There have been a large number of changes in the internals to support the new features in this release.
Core sources now require ANSI C compiler
An ANSI C compiler is now required to build perl. See INSTALL.
All Perl global variables must now be referenced with an explicit prefix
All Perl global variables that are visible for use by extensions now
have a PL_
prefix. New extensions should not
refer to perl globals
by their unqualified names. To preserve sanity, we provide limited
backward compatibility for globals that are being widely used like
sv_undef
and na
(which should now be written as PL_sv_undef
,
PL_na
etc.)
If you find that your XS extension does not compile anymore because a
perl global is not visible, try adding a PL_
prefix to the global
and rebuild.
It is strongly recommended that all functions in the Perl API that don't
begin with perl
be referenced with a Perl_
prefix. The bare function
names without the Perl_
prefix are supported with macros, but this
support may cease in a future release.
See perlapi.
Enabling threads has source compatibility issues
Perl built with threading enabled requires extensions to use the new
dTHR
macro to initialize the handle to access per-thread data.
If you see a compiler error that talks about the variable thr
not
being declared (when building a module that has XS code), you need
to add dTHR;
at the beginning of the block that elicited the error.
The API function perl_get_sv("@",GV_ADD)
should be used instead of
directly accessing perl globals as GvSV(errgv)
. The API call is
backward compatible with existing perls and provides source compatibility
with threading is enabled.
See C Source Compatibility for more information.
This version is NOT binary compatible with older versions. All extensions will need to be recompiled. Further binaries built with threads enabled are incompatible with binaries built without. This should largely be transparent to the user, as all binary incompatible configurations have their own unique architecture name, and extension binaries get installed at unique locations. This allows coexistence of several configurations in the same directory hierarchy. See INSTALL.
A few taint leaks and taint omissions have been corrected. This may lead to "failure" of scripts that used to work with older versions. Compiling with -DINCOMPLETE_TAINTS provides a perl with minimal amounts of changes to the tainting behavior. But note that the resulting perl will have known insecurities.
Oneliners with the -e
switch do not create temporary files anymore.
Many new warnings that were introduced in 5.004 have been made optional. Some of these warnings are still present, but perl's new features make them less often a problem. See New Diagnostics.
Perl has a new Social Contract for contributors. See Porting/Contract.
The license included in much of the Perl documentation has changed. Most of the Perl documentation was previously under the implicit GNU General Public License or the Artistic License (at the user's choice). Now much of the documentation unambiguously states the terms under which it may be distributed. Those terms are in general much less restrictive than the GNU GPL. See perl and the individual perl manpages listed therein.
WARNING: Threading is considered an experimental feature. Details of the implementation may change without notice. There are known limitations and some bugs. These are expected to be fixed in future versions.
See README.threads.
WARNING: The Compiler and related tools are considered experimental. Features may change without notice, and there are known limitations and bugs. Since the compiler is fully external to perl, the default configuration will build and install it.
The Compiler produces three different types of transformations of a perl program. The C backend generates C code that captures perl's state just before execution begins. It eliminates the compile-time overheads of the regular perl interpreter, but the run-time performance remains comparatively the same. The CC backend generates optimized C code equivalent to the code path at run-time. The CC backend has greater potential for big optimizations, but only a few optimizations are implemented currently. The Bytecode backend generates a platform independent bytecode representation of the interpreter's state just before execution. Thus, the Bytecode back end also eliminates much of the compilation overhead of the interpreter.
The compiler comes with several valuable utilities.
B::Lint
is an experimental module to detect and warn about suspicious
code, especially the cases that the -w
switch does not detect.
B::Deparse
can be used to demystify perl code, and understand
how perl optimizes certain constructs.
B::Xref
generates cross reference reports of all definition and use
of variables, subroutines and formats in a program.
B::Showlex
show the lexical variables used by a subroutine or file
at a glance.
perlcc
is a simple frontend for compiling perl.
See ext/B/README
, B, and the respective compiler modules.
Perl's regular expression engine has been seriously overhauled, and many new constructs are supported. Several bugs have been fixed.
Here is an itemized summary:
Changes in the RE engine:
- Unneeded nodes removed;
- Substrings merged together;
- New types of nodes to process (SUBEXPR)* and similar expressions
- quickly, used if the SUBEXPR has no side effects and matches
- strings of the same length;
- Better optimizations by lookup for constant substrings;
- Better search for constants substrings anchored by $ ;
Changes in Perl code using RE engine:
- More optimizations to s/longer/short/;
- study() was not working;
- /blah/ may be optimized to an analogue of index() if $& $` $' not seen;
- Unneeded copying of matched-against string removed;
- Only matched part of the string is copying if $` $' were not seen;
Note that only the major bug fixes are listed here. See Changes for others.
- Backtracking might not restore start of $3.
- No feedback if max count for * or + on "complex" subexpression
- was reached, similarly (but at compile time) for {3,34567}
- Primitive restrictions on max count introduced to decrease a
- possibility of a segfault;
- (ZERO-LENGTH)* could segfault;
- (ZERO-LENGTH)* was prohibited;
- Long REs were not allowed;
- /RE/g could skip matches at the same position after a
- zero-length match;
The following new syntax elements are supported:
- (?<=RE)
- (?<!RE)
- (?{ CODE })
- (?i-x)
- (?i:RE)
- (?(COND)YES_RE|NO_RE)
- (?>RE)
- \z
See / operator in New qr.
- Better debugging output (possibly with colors),
- even from non-debugging Perl;
- RE engine code now looks like C, not like assembler;
- Behaviour of RE modifiable by `use re' directive;
- Improved documentation;
- Test suite significantly extended;
- Syntax [:^upper:] etc., reserved inside character classes;
- (?i) localized inside enclosing group;
- $( is not interpolated into RE any more;
- /RE/g may match at the same position (with non-zero length)
- after a zero-length match (bug fix).
See banner at the beginning of malloc.c
for details.
Perl now contains its own highly optimized qsort() routine. The new qsort()
is resistant to inconsistent comparison functions, so Perl's sort() will
not provoke coredumps any more when given poorly written sort subroutines.
(Some C library qsort()
s that were being used before used to have this
problem.) In our testing, the new qsort()
required the minimal number
of pair-wise compares on average, among all known qsort()
implementations.
See perlfunc/sort
.
Perl's signal handling is susceptible to random crashes, because signals arrive asynchronously, and the Perl runtime is not reentrant at arbitrary times.
However, one experimental implementation of reliable signals is available
when threads are enabled. See Thread::Signal
. Also see INSTALL for
how to build a Perl capable of threads.
The internals now reallocate the perl stack only at predictable times. In particular, magic calls never trigger reallocations of the stack, because all reentrancy of the runtime is handled using a "stack of stacks". This should improve reliability of cached stack pointers in the internals and in XSUBs.
Perl used to complain if it encountered literal carriage returns in
scripts. Now they are mostly treated like whitespace within program text.
Inside string literals and here documents, literal carriage returns are
ignored if they occur paired with linefeeds, or get interpreted as whitespace
if they stand alone. This behavior means that literal carriage returns
in files should be avoided. You can get the older, more compatible (but
less generous) behavior by defining the preprocessor symbol
PERL_STRICT_CR
when building perl. Of course, all this has nothing
whatever to do with how escapes like \r
are handled within strings.
Note that this doesn't somehow magically allow you to keep all text files in DOS format. The generous treatment only applies to files that perl itself parses. If your C compiler doesn't allow carriage returns in files, you may still be unable to build modules that need a C compiler.
substr, pos and vec don't leak memory anymore when used in lvalue
context. Many small leaks that impacted applications that embed multiple
interpreters have been fixed.
The build-time option -DMULTIPLICITY
has had many of the details
reworked. Some previously global variables that should have been
per-interpreter now are. With care, this allows interpreters to call
each other. See the PerlInterp
extension on CPAN.
See Temporary Values via local() in perlsub.
%!
is transparently tied to the Errno moduleSee perlref.
EXPR foreach EXPR
is supportedSee perlsyn.
See perlsub.
$^E
is meaningful on Win32See perlvar.
foreach (1..1000000)
optimizedforeach (1..1000000)
is now optimized into a counting loop. It does
not try to allocate a 1000000-size list anymore.
Foo::
can be used as implicitly quoted package nameBarewords caused unintuitive behavior when a subroutine with the same
name as a package happened to be defined. Thus, new Foo @args
,
use the result of the call to Foo()
instead of Foo
being treated
as a literal. The recommended way to write barewords in the indirect
object slot is new Foo:: @args
. Note that the method new()
is
called with a first argument of Foo
, not Foo::
when you do that.
exists $Foo::{Bar::}
tests existence of a packageIt was impossible to test for the existence of a package without
actually creating it before. Now exists $Foo::{Bar::}
can be
used to test if the Foo::Bar
namespace has been created.
See perllocale.
Perl5 has always had 64-bit support on systems with 64-bit longs. Starting with 5.005, the beginnings of experimental support for systems with 32-bit long and 64-bit 'long long' integers has been added. If you add -DUSE_LONG_LONG to your ccflags in config.sh (or manually define it in perl.h) then perl will be built with 'long long' support. There will be many compiler warnings, and the resultant perl may not work on all systems. There are many other issues related to third-party extensions and libraries. This option exists to allow people to work on those issues.
See prototype.
die() now accepts a reference value, and $@
gets set to that
value in exception traps. This makes it possible to propagate
exception objects. This is an undocumented experimental feature.
printf format conversions are handled internallySee printf.
INIT
keywordINIT
subs are like BEGIN
and END
, but they get run just before
the perl runtime begins execution. e.g., the Perl Compiler makes use of
INIT
blocks to initialize and resolve pointers to XSUBs.
lock keywordThe lock keyword is the fundamental synchronization primitive
in threaded perl. When threads are not enabled, it is currently a noop.
To minimize impact on source compatibility this keyword is "weak", i.e., any
user-defined subroutine of the same name overrides it, unless a use Thread
has been seen.
qr// operatorThe qr// operator, which is syntactically similar to the other quote-like
operators, is used to create precompiled regular expressions. This compiled
form can now be explicitly passed around in variables, and interpolated in
other regular expressions. See perlop.
our is now a reserved wordCalling a subroutine with the name our will now provoke a warning when
using the -w
switch.
See Tie::Array.
Several missing hooks have been added. There is also a new base class for TIEARRAY implementations. See Tie::Array.
substr() can now both return and replace in one operation. The optional 4th argument is the replacement string. See substr.
splice() with a negative LENGTH argument now work similar to what the LENGTH did for substr(). Previously a negative LENGTH was treated as 0. See splice.
When you say something like substr($x, 5) = "hi"
, the scalar returned
by substr() is special, in that any modifications to it affect $x.
(This is called a 'magic lvalue' because an 'lvalue' is something on
the left side of an assignment.) Normally, this is exactly what you
would expect to happen, but Perl uses the same magic if you use substr(),
pos(), or vec() in a context where they might be modified, like taking
a reference with \
or as an argument to a sub that modifies @_
.
In previous versions, this 'magic' only went one way, but now changes
to the scalar the magic refers to ($x in the above example) affect the
magic lvalue too. For instance, this code now acts differently:
In previous versions, this would print "hello", but it now prints "g'bye".
If $/
is a reference to an integer, or a scalar that holds an integer,
<> will read in records instead of lines. For more info, see
$/ in perlvar.
Configure has many incremental improvements. Site-wide policy for building perl can now be made persistent, via Policy.sh. Configure also records the command-line arguments used in config.sh.
BeOS is now supported. See README.beos.
DOS is now supported under the DJGPP tools. See README.dos (installed as perldos on some systems).
MiNT is now supported. See README.mint.
MPE/iX is now supported. See README.mpeix.
MVS (aka OS390, aka Open Edition) is now supported. See README.os390 (installed as perlos390 on some systems).
Stratus VOS is now supported. See README.vos.
Win32 support has been vastly enhanced. Support for Perl Object, a C++ encapsulation of Perl. GCC and EGCS are now supported on Win32. See README.win32, aka perlwin32.
VMS configuration system has been rewritten. See README.vms (installed as README_vms on some systems).
The hints files for most Unix platforms have seen incremental improvements.
Perl compiler and tools. See B.
A module to pretty print Perl data. See Data::Dumper.
A module to dump perl values to the screen. See Dumpvalue.
A module to look up errors more conveniently. See Errno.
A portable API for file operations.
Query and manage installed modules.
Manipulate .packlist files.
Make functions/builtins succeed or die.
Constants and other support infrastructure for System V IPC operations in perl.
A framework for writing test suites.
Base class for tied arrays.
Base class for tied handles.
Perl thread creation, manipulation, and support.
Set subroutine attributes.
Compile-time class fields.
Various pragmata to control behavior of regular expressions.
You can now run tests for x seconds instead of guessing the right number of tests to run.
Keeps better time.
Carp has a new function cluck(). cluck() warns, like carp(), but also adds a stack backtrace to the error message, like confess().
CGI has been updated to version 2.42.
More Fcntl constants added: F_SETLK64, F_SETLKW64, O_LARGEFILE for large (more than 4G) file access (the 64-bit support is not yet working, though, so no need to get overly excited), Free/Net/OpenBSD locking behaviour flags F_FLOCK, F_POSIX, Linux F_SHLCK, and O_ACCMODE: the mask of O_RDONLY, O_WRONLY, and O_RDWR.
The accessors methods Re, Im, arg, abs, rho, theta, methods can ($z->Re()) now also act as mutators ($z->Re(3)).
A little bit of radial trigonometry (cylindrical and spherical) added, for example the great circle distance.
POSIX now has its own platform-specific hints files.
DB_File supports version 2.x of Berkeley DB. See ext/DB_File/Changes
.
MakeMaker now supports writing empty makefiles, provides a way to specify that site umask() policy should be honored. There is also better support for manipulation of .packlist files, and getting information about installed modules.
Extensions that have both architecture-dependent and architecture-independent files are now always installed completely in the architecture-dependent locations. Previously, the shareable parts were shared both across architectures and across perl versions and were therefore liable to be overwritten with newer versions that might have subtle incompatibilities.
See perlmodinstall and CPAN.
Cwd::cwd is faster on most platforms.
h2ph
and related utilities have been vastly overhauled.
perlcc
, a new experimental front end for the compiler is available.
The crude GNU configure
emulator is now called configure.gnu
to
avoid trampling on Configure
under case-insensitive filesystems.
perldoc
used to be rather slow. The slower features are now optional.
In particular, case-insensitive searches need the -i
switch, and
recursive searches need -r
. You can set these switches in the
PERLDOC
environment variable to get the old behavior.
Config.pm now has a glossary of variables.
Porting/patching.pod has detailed instructions on how to create and submit patches for perl.
perlport specifies guidelines on how to write portably.
perlmodinstall describes how to fetch and install modules from CPAN
sites.
Some more Perl traps are documented now. See perltrap.
perlopentut gives a tutorial on using open().
perlreftut gives a tutorial on references.
perlthrtut gives a tutorial on threads.
(W) A subroutine you have declared has the same name as a Perl keyword, and you have used the name without qualification for calling one or the other. Perl decided to call the builtin because the subroutine is not imported.
To force interpretation as a subroutine call, either put an ampersand
before the subroutine name, or qualify the name with its package.
Alternatively, you can import the subroutine (or pretend that it's
imported with the use subs
pragma).
To silently interpret it as the Perl operator, use the CORE::
prefix
on the operator (e.g. CORE::log($x)
) or by declaring the subroutine
to be an object method (see attrs).
(F) The index looked up in the hash found as the 0'th element of a pseudo-hash is not legal. Index values must be at 1 or greater. See perlref.
(W) You used a qualified bareword of the form Foo::
, but
the compiler saw no other uses of that namespace before that point.
Perhaps you need to predeclare a package?
(F) You used the syntax of a method call, but the slot filled by the object reference or package name contains an undefined value. Something like this will reproduce the error:
- $BADREF = 42;
- process $BADREF 1,2,3;
- $BADREF->process(1,2,3);
(P) For some reason you can't check the filesystem of the script for nosuid.
(F) You used an array where a hash was expected, but the array has no information on how to map from keys to array indices. You can do that only with arrays that have a hash reference at index 0.
(F) The "goto subroutine" call can't be used to jump out of an eval "string". (You can use it to jump out of an eval {BLOCK}, but you probably don't want to.)
(F) You said something like local $ar->{'key'}
, where $ar is
a reference to a pseudo-hash. That hasn't been implemented yet, but
you can get a similar effect by localizing the corresponding array
element directly: local $ar->[$ar->[0]{'key'}]
.
(F) The first time the %! hash is used, perl automatically loads the
Errno.pm module. The Errno module is expected to tie the %! hash to
provide symbolic names for $!
errno values.
(F) A string of a form CORE::word
was given to prototype(), but
there is no builtin with the name word
.
(W) Within regular expression character classes ([]) the syntax beginning with "[." and ending with ".]" is reserved for future extensions. If you need to represent those character sequences inside a regular expression character class, just quote the square brackets with the backslash: "\[." and ".\]".
(W) Within regular expression character classes ([]) the syntax beginning with "[:" and ending with ":]" is reserved for future extensions. If you need to represent those character sequences inside a regular expression character class, just quote the square brackets with the backslash: "\[:" and ":\]".
(W) Within regular expression character classes ([]) the syntax beginning with "[=" and ending with "=]" is reserved for future extensions. If you need to represent those character sequences inside a regular expression character class, just quote the square brackets with the backslash: "\[=" and "=\]".
(F) Perl detected tainted data when trying to compile a regular expression
that contains the (?{ ... }) zero-width assertion, which is unsafe.
See (?{ code }) in perlre, and perlsec.
(F) A regular expression contained the (?{ ... }) zero-width assertion,
but that construct is only allowed when the use re 'eval'
pragma is
in effect. See (?{ code }) in perlre.
(F) Perl tried to compile a regular expression containing the (?{ ... })
zero-width assertion at run time, as it would when the pattern contains
interpolated values. Since that is a security risk, it is not allowed.
If you insist, you may still do this by explicitly building the pattern
from an interpolated string at run time and using that in an eval().
See (?{ code }) in perlre.
(W) You are blessing a reference to a zero length string. This has the effect of blessing the reference into the package main. This is usually not what you want. Consider providing a default target package, e.g. bless($ref, $p || 'MyPackage');
(W) You may have tried to use a character other than 0 - 9 or A - F in a hexadecimal number. Interpretation of the hexadecimal number stopped before the illegal character.
(F) You tried to access an array as a hash, but the field name used is not defined. The hash at index 0 should map all valid field names to array indices for that to work.
(F) You tried to access a field of a typed variable where the type does not know about the field name. The field names are looked up in the %FIELDS hash in the type package at compile time. The %FIELDS hash is usually set up with the 'fields' pragma.
(F) You can't allocate more than 2^31+"small amount" bytes. This error
is most likely to be caused by a typo in the Perl program. e.g., $arr[time]
instead of $arr[$time]
.
(F) One (or both) of the numeric arguments to the range operator ".." are outside the range which can be represented by integers internally. One possible workaround is to force Perl to use magical string increment by prepending "0" to your numbers.
(F) More than 100 levels of inheritance were encountered while invoking a method. Probably indicates an unintended loop in your inheritance hierarchy.
(W) You gave a single reference where Perl was expecting a list with an even number of elements (for assignment to a hash). This usually means that you used the anon hash constructor when you meant to use parens. In any case, a hash requires key/value pairs.
- %hash = { one => 1, two => 2, }; # WRONG
- %hash = [ qw/ an anon array / ]; # WRONG
- %hash = ( one => 1, two => 2, ); # right
- %hash = qw( one 1 two 2 ); # also fine
(W) An undefined value was assigned to a typeglob, a la *foo = undef
.
This does nothing. It's possible that you really mean undef *foo
.
(D) The indicated bareword is a reserved word. Future versions of perl
may use it as a keyword, so you're better off either explicitly quoting
the word in a manner appropriate for its context of use, or using a
different name altogether. The warning can be suppressed for subroutine
names by either adding a &
prefix, or using a package qualifier,
e.g. &our()
, or Foo::our()
.
(S) The whole warning message will look something like:
- perl: warning: Setting locale failed.
- perl: warning: Please check that your locale settings:
- LC_ALL = "En_US",
- LANG = (unset)
- are supported and installed on your system.
- perl: warning: Falling back to the standard locale ("C").
Exactly what were the failed locale settings varies. In the above the settings were that the LC_ALL was "En_US" and the LANG had no value. This error means that Perl detected that you and/or your system administrator have set up the so-called variable system but Perl could not use those settings. This was not dead serious, fortunately: there is a "default locale" called "C" that Perl can and will use, the script will be run. Before you really fix the problem, however, you will get the same error message each time you run Perl. How to really fix the problem can be found in LOCALE PROBLEMS in perllocale.
(F) The mktemp() routine failed for some reason while trying to process a -e switch. Maybe your /tmp partition is full, or clobbered.
Removed because -e doesn't use temporary files any more.
(F) The write routine failed for some reason while trying to process a -e switch. Maybe your /tmp partition is full, or clobbered.
Removed because -e doesn't use temporary files any more.
(F) The create routine failed for some reason while trying to process a -e switch. Maybe your /tmp partition is full, or clobbered.
Removed because -e doesn't use temporary files any more.
(F) The current implementation of regular expressions uses shorts as address offsets within a string. Unfortunately this means that if the regular expression compiles to longer than 32767, it'll blow up. Usually when you want a regular expression this big, there is a better way to do it with multiple statements. See perlre.
You can use "Configure -Uinstallusrbinperl" which causes installperl to skip installing perl also as /usr/bin/perl. This is useful if you prefer not to modify /usr/bin for some reason or another but harmful because many scripts assume to find Perl in /usr/bin/perl.
If you find what you think is a bug, you might check the headers of recently posted articles in the comp.lang.perl.misc newsgroup. There may also be information at http://www.perl.com/perl/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Make sure you trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to <perlbug@perl.com> to be
analysed by the Perl porting team.
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
Written by Gurusamy Sarathy <gsar@activestate.com>, with many contributions from The Perl Porters.
Send omissions or corrections to <perlbug@perl.com>.
perl5100delta - what is new for perl 5.10.0
This document describes the differences between the 5.8.8 release and the 5.10.0 release.
Many of the bug fixes in 5.10.0 were already seen in the 5.8.X maintenance releases; they are not duplicated here and are documented in the set of man pages named perl58[1-8]?delta.
feature
pragmaThe feature
pragma is used to enable new syntax that would break Perl's
backwards-compatibility with older releases of the language. It's a lexical
pragma, like strict
or warnings
.
Currently the following new features are available: switch
(adds a
switch statement), say (adds a say built-in function), and state
(adds a state keyword for declaring "static" variables). Those
features are described in their own sections of this document.
The feature
pragma is also implicitly loaded when you require a minimal
perl version (with the use VERSION
construct) greater than, or equal
to, 5.9.5. See feature for details.
-E is equivalent to -e, but it implicitly enables all
optional features (like use feature ":5.10"
).
A new operator //
(defined-or) has been implemented.
The following expression:
- $a // $b
is merely equivalent to
- defined $a ? $a : $b
and the statement
- $c //= $d;
can now be used instead of
- $c = $d unless defined $c;
The //
operator has the same precedence and associativity as ||.
Special care has been taken to ensure that this operator Do What You Mean
while not breaking old code, but some edge cases involving the empty
regular expression may now parse differently. See perlop for
details.
Perl 5 now has a switch statement. It's available when use feature
'switch'
is in effect. This feature introduces three new keywords,
given
, when
, and default
:
- given ($foo) {
- when (/^abc/) { $abc = 1; }
- when (/^def/) { $def = 1; }
- when (/^xyz/) { $xyz = 1; }
- default { $nothing = 1; }
- }
A more complete description of how Perl matches the switch variable
against the when
conditions is given in Switch statements in perlsyn.
This kind of match is called smart match, and it's also possible to use
it outside of switch statements, via the new ~~
operator. See
Smart matching in detail in perlsyn.
This feature was contributed by Robin Houston.
It is now possible to write recursive patterns without using the (??{})
construct. This new way is more efficient, and in many cases easier to
read.
Each capturing parenthesis can now be treated as an independent pattern
that can be entered by using the (?PARNO) syntax (PARNO
standing for
"parenthesis number"). For example, the following pattern will match
nested balanced angle brackets:
- /
- ^ # start of line
- ( # start capture buffer 1
- < # match an opening angle bracket
- (?: # match one of:
- (?> # don't backtrack over the inside of this group
- [^<>]+ # one or more non angle brackets
- ) # end non backtracking group
- | # ... or ...
- (?1) # recurse to bracket 1 and try it again
- )* # 0 or more times.
- > # match a closing angle bracket
- ) # end capture buffer one
- $ # end of line
- /x
PCRE users should note that Perl's recursive regex feature allows backtracking into a recursed pattern, whereas in PCRE the recursion is atomic or "possessive" in nature. As in the example above, you can add (?>) to control this selectively. (Yves Orton)
It is now possible to name capturing parenthesis in a pattern and refer to
the captured contents by name. The naming syntax is (?<NAME>....).
It's possible to backreference to a named buffer with the \k<NAME>
syntax. In code, the new magical hashes %+
and %-
can be used to
access the contents of the capture buffers.
Thus, to replace all doubled chars with a single copy, one could write
- s/(?<letter>.)\k<letter>/$+{letter}/g
Only buffers with defined contents will be "visible" in the %+
hash, so
it's possible to do something like
The %-
hash is a bit more complete, since it will contain array refs
holding values from all capture buffers similarly named, if there should
be many of them.
%+
and %-
are implemented as tied hashes through the new module
Tie::Hash::NamedCapture
.
Users exposed to the .NET regex engine will find that the perl implementation differs in that the numerical ordering of the buffers is sequential, and not "unnamed first, then named". Thus in the pattern
- /(A)(?<B>B)(C)(?<D>D)/
$1 will be 'A', $2 will be 'B', $3 will be 'C' and $4 will be 'D' and not $1 is 'A', $2 is 'C' and $3 is 'B' and $4 is 'D' that a .NET programmer would expect. This is considered a feature. :-) (Yves Orton)
Perl now supports the "possessive quantifier" syntax of the "atomic match"
pattern. Basically a possessive quantifier matches as much as it can and never
gives any back. Thus it can be used to control backtracking. The syntax is
similar to non-greedy matching, except instead of using a '?' as the modifier
the '+' is used. Thus ?+, *+
, ++
, {min,max}+
are now legal
quantifiers. (Yves Orton)
The regex engine now supports a number of special-purpose backtrack control verbs: (*THEN), (*PRUNE), (*MARK), (*SKIP), (*COMMIT), (*FAIL) and (*ACCEPT). See perlre for their descriptions. (Yves Orton)
A new syntax \g{N}
or \gN
where "N" is a decimal integer allows a
safer form of back-reference notation as well as allowing relative
backreferences. This should make it easier to generate and embed patterns
that contain backreferences. See Capture buffers in perlre. (Yves Orton)
\K
escape
The functionality of Jeff Pinyan's module Regexp::Keep has been added to
the core. In regular expressions you can now use the special escape \K
as a way to do something like floating length positive lookbehind. It is
also useful in substitutions like:
- s/(foo)bar/$1/g
that can now be converted to
- s/foo\Kbar//g
which is much more efficient. (Yves Orton)
Regular expressions now recognize the \v
and \h
escapes that match
vertical and horizontal whitespace, respectively. \V
and \H
logically match their complements.
\R
matches a generic linebreak, that is, vertical whitespace, plus
the multi-character sequence "\x0D\x0A"
.
say()say() is a new built-in, only available when use feature 'say'
is in
effect, that is similar to print(), but that implicitly appends a newline
to the printed string. See say. (Robin Houston)
$_
The default variable $_
can now be lexicalized, by declaring it like
any other lexical variable, with a simple
- my $_;
The operations that default on $_
will use the lexically-scoped
version of $_
when it exists, instead of the global $_
.
In a map or a grep block, if $_
was previously my'ed, then the
$_
inside the block is lexical as well (and scoped to the block).
In a scope where $_
has been lexicalized, you can still have access to
the global version of $_
by using $::_
, or, more simply, by
overriding the lexical declaration with our $_
. (Rafael Garcia-Suarez)
_
prototypeA new prototype character has been added. _
is equivalent to $
but
defaults to $_
if the corresponding argument isn't supplied (both $
and _
denote a scalar). Due to the optional nature of the argument,
you can only use it at the end of a prototype, or before a semicolon.
This has a small incompatible consequence: the prototype() function has
been adjusted to return _
for some built-ins in appropriate cases (for
example, prototype('CORE::rmdir')). (Rafael Garcia-Suarez)
UNITCHECK
, a new special code block has been introduced, in addition to
BEGIN
, CHECK
, INIT
and END
.
CHECK
and INIT
blocks, while useful for some specialized purposes,
are always executed at the transition between the compilation and the
execution of the main program, and thus are useless whenever code is
loaded at runtime. On the other hand, UNITCHECK
blocks are executed
just after the unit which defined them has been compiled. See perlmod
for more information. (Alex Gough)
mro
A new pragma, mro
(for Method Resolution Order) has been added. It
permits to switch, on a per-class basis, the algorithm that perl uses to
find inherited methods in case of a multiple inheritance hierarchy. The
default MRO hasn't changed (DFS, for Depth First Search). Another MRO is
available: the C3 algorithm. See mro for more information.
(Brandon Black)
Note that, due to changes in the implementation of class hierarchy search,
code that used to undef the *ISA
glob will most probably break. Anyway,
undef'ing *ISA
had the side-effect of removing the magic on the @ISA
array and should not have been done in the first place. Also, the
cache *::ISA::CACHE::
no longer exists; to force reset the @ISA cache,
you now need to use the mro
API, or more simply to assign to @ISA
(e.g. with @ISA = @ISA
).
The readdir() function may return a "short filename" when the long filename contains characters outside the ANSI codepage. Similarly Cwd::cwd() may return a short directory name, and glob() may return short names as well. On the NTFS file system these short names can always be represented in the ANSI codepage. This will not be true for all other file system drivers; e.g. the FAT filesystem stores short filenames in the OEM codepage, so some files on FAT volumes remain unaccessible through the ANSI APIs.
Similarly, $^X, @INC, and $ENV{PATH} are preprocessed at startup to make sure all paths are valid in the ANSI codepage (if possible).
The Win32::GetLongPathName() function now returns the UTF-8 encoded correct long file name instead of using replacement characters to force the name into the ANSI codepage. The new Win32::GetANSIPathName() function can be used to turn a long pathname into a short one only if the long one cannot be represented in the ANSI codepage.
Many other functions in the Win32
module have been improved to accept
UTF-8 encoded arguments. Please see Win32 for details.
The built-in function readpipe() is now overridable. Overriding it permits
also to override its operator counterpart, qx// (a.k.a. ``
).
Moreover, it now defaults to $_
if no argument is provided. (Rafael
Garcia-Suarez)
readline() now defaults to *ARGV
if no argument is provided. (Rafael
Garcia-Suarez)
A new class of variables has been introduced. State variables are similar
to my variables, but are declared with the state keyword in place of
my. They're visible only in their lexical scope, but their value is
persistent: unlike my variables, they're not undefined at scope entry,
but retain their previous value. (Rafael Garcia-Suarez, Nicholas Clark)
To use state variables, one needs to enable them by using
- use feature 'state';
or by using the -E
command-line switch in one-liners.
See Persistent Private Variables in perlsub.
As a new form of syntactic sugar, it's now possible to stack up filetest
operators. You can now write -f -w -x $file
in a row to mean
-x $file && -w _ && -f _
. See -X.
The UNIVERSAL
class has a new method, DOES()
. It has been added to
solve semantic problems with the isa()
method. isa()
checks for
inheritance, while DOES()
has been designed to be overridden when
module authors use other types of relations between classes (in addition
to inheritance). (chromatic)
See $obj->DOES( ROLE ) in UNIVERSAL.
Formats were improved in several ways. A new field, ^*, can be used for
variable-width, one-line-at-a-time text. Null characters are now handled
correctly in picture lines. Using @#
and ~~
together will now
produce a compile-time error, as those format fields are incompatible.
perlform has been improved, and miscellaneous bugs fixed.
There are two new byte-order modifiers, > (big-endian) and <
(little-endian), that can be appended to most pack() and unpack() template
characters and groups to force a certain byte-order for that type or group.
See pack and perlpacktut for details.
no VERSION
You can now use no followed by a version number to specify that you
want to use a version of perl older than the specified one.
chdir, chmod and chown on filehandleschdir, chmod and chown can now work on filehandles as well as
filenames, if the system supports respectively fchdir
, fchmod
and
fchown
, thanks to a patch provided by Gisle Aas.
$(
and $)
now return groups in the order where the OS returns them,
thanks to Gisle Aas. This wasn't previously the case.
You can now use recursive subroutines with sort(), thanks to Robin Houston.
The constant folding routine is now wrapped in an exception handler, and if folding throws an exception (such as attempting to evaluate 0/0), perl now retains the current optree, rather than aborting the whole program. Without this change, programs would not compile if they had expressions that happened to generate exceptions, even though those expressions were in code that could never be reached at runtime. (Nicholas Clark, Dave Mitchell)
It's possible to enhance the mechanism of subroutine hooks in @INC by adding a source filter on top of the filehandle opened and returned by the hook. This feature was planned a long time ago, but wasn't quite working until now. See require for details. (Nicholas Clark)
${^RE_DEBUG_FLAGS}
This variable controls what debug flags are in effect for the regular
expression engine when running under use re "debug"
. See re for
details.
${^CHILD_ERROR_NATIVE}
This variable gives the native status returned by the last pipe close, backtick command, successful call to wait() or waitpid(), or from the system() operator. See perlvar for details. (Contributed by Gisle Aas.)
${^RE_TRIE_MAXBUF}
${^WIN32_SLOPPY_STAT}
unpack() now defaults to unpacking the $_
variable.
mkdir() without arguments now defaults to $_
.
The internal dump output has been improved, so that non-printable characters
such as newline and backspace are output in \x
notation, rather than
octal.
The -C option can no longer be used on the #!
line. It wasn't
working there anyway, since the standard streams are already set up
at this point in the execution of the perl interpreter. You can use
binmode() instead to get the desired behaviour.
The copy of the Unicode Character Database included in Perl 5 has been updated to version 5.0.0.
MAD, which stands for Miscellaneous Attribute Decoration, is a
still-in-development work leading to a Perl 5 to Perl 6 converter. To
enable it, it's necessary to pass the argument -Dmad
to Configure. The
obtained perl isn't binary compatible with a regular perl 5.10, and has
space and speed penalties; moreover not all regression tests still pass
with it. (Larry Wall, Nicholas Clark)
On Windows platforms, kill(-9, $pid)
now kills a process tree.
(On Unix, this delivers the signal to all processes in the same process
group.)
The semantics of pack() and unpack() regarding UTF-8-encoded data has been
changed. Processing is now by default character per character instead of
byte per byte on the underlying encoding. Notably, code that used things
like pack("a*", $string)
to see through the encoding of string will now
simply get back the original $string. Packed strings can also get upgraded
during processing when you store upgraded characters. You can get the old
behaviour by using use bytes
.
To be consistent with pack(), the C0
in unpack() templates indicates
that the data is to be processed in character mode, i.e. character by
character; on the contrary, U0
in unpack() indicates UTF-8 mode, where
the packed string is processed in its UTF-8-encoded Unicode form on a byte
by byte basis. This is reversed with regard to perl 5.8.X, but now consistent
between pack() and unpack().
Moreover, C0
and U0
can also be used in pack() templates to specify
respectively character and byte modes.
C0
and U0
in the middle of a pack or unpack format now switch to the
specified encoding mode, honoring parens grouping. Previously, parens were
ignored.
Also, there is a new pack() character format, W
, which is intended to
replace the old C
. C
is kept for unsigned chars coded as bytes in
the strings internal representation. W
represents unsigned (logical)
character values, which can be greater than 255. It is therefore more
robust when dealing with potentially UTF-8-encoded data (as C
will wrap
values outside the range 0..255, and not respect the string encoding).
In practice, that means that pack formats are now encoding-neutral, except
C
.
For consistency, A
in unpack() format now trims all Unicode whitespace
from the end of the string. Before perl 5.9.2, it used to strip only the
classical ASCII space characters.
A new unpack() template character, "."
, returns the number of bytes or
characters (depending on the selected encoding mode, see above) read so far.
$*
and $#
variables have been removed$*
, which was deprecated in favor of the /s and /m regexp
modifiers, has been removed.
The deprecated $#
variable (output format for numbers) has been
removed.
Two new severe warnings, $#/$* is no longer supported, have been added.
The lvalues returned by the three argument form of substr() used to be a "fixed length window" on the original string. In some cases this could cause surprising action at distance or other undefined behaviour. Now the length of the window adjusts itself to the length of the string assigned to it.
-f _
The identifier _
is now forced to be a bareword after a filetest
operator. This solves a number of misparsing issues when a global _
subroutine is defined.
:unique
The :unique
attribute has been made a no-op, since its current
implementation was fundamentally flawed and not threadsafe.
The compile-time value of the %^H
hint variable can now propagate into
eval("")uated code. This makes it more useful to implement lexical
pragmas.
As a side-effect of this, the overloaded-ness of constants now propagates into eval("").
A bareword argument to chdir() is now recognized as a file handle. Earlier releases interpreted the bareword as a directory name. (Gisle Aas)
An old feature of perl was that before require or use look for a
file with a .pm extension, they will first look for a similar filename
with a .pmc extension. If this file is found, it will be loaded in
place of any potentially existing file ending in a .pm extension.
Previously, .pmc files were loaded only if more recent than the matching .pm file. Starting with 5.9.4, they'll be always loaded if they exist.
version
object instead of a v-string$^V can still be used with the %vd
format in printf, but any
character-level operations will now access the string representation
of the version
object and not the ordinals of a v-string.
Expressions like substr($^V, 0, 2)
or split //, $^V
no longer work and must be rewritten.
The special arrays @-
and @+
are no longer interpolated in regular
expressions. (Sadahiro Tomoyuki)
If you call a subroutine by a tainted name, and if it defers to an AUTOLOAD function, then $AUTOLOAD will be (correctly) tainted. (Rick Delaney)
When perl is run under taint mode, printf() and sprintf() will now
reject any tainted format argument. (Rafael Garcia-Suarez)
Undefining or deleting a signal handler via undef $SIG{FOO}
is now
equivalent to setting it to 'DEFAULT'
. (Rafael Garcia-Suarez)
use strict 'refs'
was ignoring taking a hard reference in an argument
to defined(), as in :
This now correctly produces the run-time error Can't use string as a
SCALAR ref while "strict refs" in use.
defined @$foo
and defined %$bar
are now also subject to strict
'refs'
(that is, $foo
and $bar
shall be proper references there.)
(defined(@foo) and defined(%bar) are discouraged constructs anyway.)
(Nicholas Clark)
(?p{}) has been removedThe regular expression construct (?p{}), which was deprecated in perl
5.8, has been removed. Use (??{})
instead. (Rafael Garcia-Suarez)
Support for pseudo-hashes has been removed from Perl 5.9. (The fields
pragma remains here, but uses an alternate implementation.)
perlcc
, the byteloader and the supporting modules (B::C, B::CC,
B::Bytecode, etc.) are no longer distributed with the perl sources. Those
experimental tools have never worked reliably, and, due to the lack of
volunteers to keep them in line with the perl interpreter developments, it
was decided to remove them instead of shipping a broken version of those.
The last version of those modules can be found with perl 5.9.4.
However the B compiler framework stays supported in the perl core, as with the more useful modules it has permitted (among others, B::Deparse and B::Concise).
The JPL (Java-Perl Lingo) has been removed from the perl sources tarball.
Perl will now immediately throw an exception if you modify any package's
@ISA
in such a way that it would cause recursive inheritance.
Previously, the exception would not occur until Perl attempted to make
use of the recursive inheritance while resolving a method or doing a
$foo->isa($bar)
lookup.
The behaviour in 5.10.x favors the person using the module; The behaviour in 5.8.x favors the module writer;
Assume the following code:
- main calls Foo::Bar::baz()
- Foo::Bar inherits from Foo::Base
- Foo::Bar::baz() calls Foo::Base::_bazbaz()
- Foo::Base::_bazbaz() calls: warnings::warnif('substr', 'some warning
- message');
On 5.8.x, the code warns when Foo::Bar contains use warnings;
It does not matter if Foo::Base or main have warnings enabled
to disable the warning one has to modify Foo::Bar.
On 5.10.0 and newer, the code warns when main contains use warnings;
It does not matter if Foo::Base or Foo::Bar have warnings enabled
to disable the warning one has to modify main.
Even more core modules are now also available separately through the
CPAN. If you wish to update one of these modules, you don't need to
wait for a new perl release. From within the cpan shell, running the
'r' command will report on modules with upgrades available. See
perldoc CPAN
for more information.
feature
The new pragma feature
is used to enable new features that might break
old code. See The feature pragma above.
mro
This new pragma enables to change the algorithm used to resolve inherited methods. See New Pragma, mro above.
sort pragma
The sort pragma is now lexically scoped. Its effect used to be global.
bignum
, bigint
, bigrat
The three numeric pragmas bignum
, bigint
and bigrat
are now
lexically scoped. (Tels)
base
The base
pragma now warns if a class tries to inherit from itself.
(Curtis "Ovid" Poe)
strict
and warnings
strict
and warnings
will now complain loudly if they are loaded via
incorrect casing (as in use Strict;
). (Johan Vromans)
version
The version
module provides support for version objects.
warnings
The warnings
pragma doesn't load Carp
anymore. That means that code
that used Carp
routines without having loaded it at compile time might
need to be adjusted; typically, the following (faulty) code won't work
anymore, and will require parentheses to be added after the function name:
less
less
now does something useful (or at least it tries to). In fact, it
has been turned into a lexical pragma. So, in your modules, you can now
test whether your users have requested to use less CPU, or less memory,
less magic, or maybe even less fat. See less for more. (Joshua ben
Jore)
encoding::warnings
, by Audrey Tang, is a module to emit warnings
whenever an ASCII character string containing high-bit bytes is implicitly
converted into UTF-8. It's a lexical pragma since Perl 5.9.4; on older
perls, its effect is global.
Module::CoreList
, by Richard Clamp, is a small handy module that tells
you what versions of core modules ship with any versions of Perl 5. It
comes with a command-line frontend, corelist
.
Math::BigInt::FastCalc
is an XS-enabled, and thus faster, version of
Math::BigInt::Calc
.
Compress::Zlib
is an interface to the zlib compression library. It
comes with a bundled version of zlib, so having a working zlib is not a
prerequisite to install it. It's used by Archive::Tar
(see below).
IO::Zlib
is an IO::
-style interface to Compress::Zlib
.
Archive::Tar
is a module to manipulate tar
archives.
Digest::SHA
is a module used to calculate many types of SHA digests,
has been included for SHA support in the CPAN module.
ExtUtils::CBuilder
and ExtUtils::ParseXS
have been added.
Hash::Util::FieldHash
, by Anno Siegel, has been added. This module
provides support for field hashes: hashes that maintain an association
of a reference with a value, in a thread-safe garbage-collected way.
Such hashes are useful to implement inside-out objects.
Module::Build
, by Ken Williams, has been added. It's an alternative to
ExtUtils::MakeMaker
to build and install perl modules.
Module::Load
, by Jos Boumans, has been added. It provides a single
interface to load Perl modules and .pl files.
Module::Loaded
, by Jos Boumans, has been added. It's used to mark
modules as loaded or unloaded.
Package::Constants
, by Jos Boumans, has been added. It's a simple
helper to list all constants declared in a given package.
Win32API::File
, by Tye McQueen, has been added (for Windows builds).
This module provides low-level access to Win32 system API calls for
files/dirs.
Locale::Maketext::Simple
, needed by CPANPLUS, is a simple wrapper around
Locale::Maketext::Lexicon
. Note that Locale::Maketext::Lexicon
isn't
included in the perl core; the behaviour of Locale::Maketext::Simple
gracefully degrades when the later isn't present.
Params::Check
implements a generic input parsing/checking mechanism. It
is used by CPANPLUS.
Term::UI
simplifies the task to ask questions at a terminal prompt.
Object::Accessor
provides an interface to create per-object accessors.
Module::Pluggable
is a simple framework to create modules that accept
pluggable sub-modules.
Module::Load::Conditional
provides simple ways to query and possibly
load installed modules.
Time::Piece
provides an object oriented interface to time functions,
overriding the built-ins localtime() and gmtime().
IPC::Cmd
helps to find and run external commands, possibly
interactively.
File::Fetch
provide a simple generic file fetching mechanism.
Log::Message
and Log::Message::Simple
are used by the log facility
of CPANPLUS
.
Archive::Extract
is a generic archive extraction mechanism
for .tar (plain, gzipped or bzipped) or .zip files.
CPANPLUS
provides an API and a command-line tool to access the CPAN
mirrors.
Pod::Escapes
provides utilities that are useful in decoding Pod
E<...> sequences.
Pod::Simple
is now the backend for several of the Pod-related modules
included with Perl.
Attribute::Handlers
Attribute::Handlers
can now report the caller's file and line number.
(David Feldman)
All interpreted attributes are now passed as array references. (Damian Conway)
B::Lint
B::Lint
is now based on Module::Pluggable
, and so can be extended
with plugins. (Joshua ben Jore)
B
It's now possible to access the lexical pragma hints (%^H
) by using the
method B::COP::hints_hash(). It returns a B::RHE
object, which in turn
can be used to get a hash reference via the method B::RHE::HASH(). (Joshua
ben Jore)
Thread
As the old 5005thread threading model has been removed, in favor of the
ithreads scheme, the Thread
module is now a compatibility wrapper, to
be used in old code only. It has been removed from the default list of
dynamic extensions.
The Perl debugger can now save all debugger commands for sourcing later; notably, it can now emulate stepping backwards, by restarting and rerunning all bar the last command from a saved command history.
It can also display the parent inheritance tree of a given class, with the
i
command.
ptar
is a pure perl implementation of tar
that comes with
Archive::Tar
.
ptardiff
is a small utility used to generate a diff between the contents
of a tar archive and a directory tree. Like ptar
, it comes with
Archive::Tar
.
shasum
is a command-line utility, used to print or to check SHA
digests. It comes with the new Digest::SHA
module.
The corelist
utility is now installed with perl (see New modules
above).
h2ph
and h2xs
have been made more robust with regard to
"modern" C code.
h2xs
implements a new option --use-xsloader
to force use of
XSLoader
even in backwards compatible modules.
The handling of authors' names that had apostrophes has been fixed.
Any enums with negative values are now skipped.
perlivp
no longer checks for *.ph files by default. Use the new -a
option to run all tests.
find2perl
now assumes -print
as a default action. Previously, it
needed to be specified explicitly.
Several bugs have been fixed in find2perl
, regarding -exec
and
-eval
. Also the options -path
, -ipath
and -iname
have been
added.
config_data
is a new utility that comes with Module::Build
. It
provides a command-line interface to the configuration of Perl modules
that use Module::Build's framework of configurability (that is,
*::ConfigData
modules that contain local configuration information for
their parent modules.)
cpanp
, the CPANPLUS shell, has been added. (cpanp-run-perl
, a
helper for CPANPLUS operation, has been added too, but isn't intended for
direct use).
cpan2dist
is a new utility that comes with CPANPLUS. It's a tool to
create distributions (or packages) from CPAN modules.
The output of pod2html
has been enhanced to be more customizable via
CSS. Some formatting problems were also corrected. (Jari Aalto)
The perlpragma manpage documents how to write one's own lexical pragmas in pure Perl (something that is possible starting with 5.9.4).
The new perlglossary manpage is a glossary of terms used in the Perl documentation, technical and otherwise, kindly provided by O'Reilly Media, Inc.
The perlreguts manpage, courtesy of Yves Orton, describes internals of the Perl regular expression engine.
The perlreapi manpage describes the interface to the perl interpreter used to write pluggable regular expression engines (by Ævar Arnfjörð Bjarmason).
The perlunitut manpage is an tutorial for programming with Unicode and string encodings in Perl, courtesy of Juerd Waalboer.
A new manual page, perlunifaq (the Perl Unicode FAQ), has been added (Juerd Waalboer).
The perlcommunity manpage gives a description of the Perl community on the Internet and in real life. (Edgar "Trizor" Bering)
The CORE manual page documents the CORE::
namespace. (Tels)
The long-existing feature of /(?{...})/
regexps setting $_
and pos()
is now documented.
Sorting arrays in place (@a = sort @a
) is now optimized to avoid
making a temporary copy of the array.
Likewise, reverse sort ...
is now optimized to sort in reverse,
avoiding the generation of a temporary intermediate list.
Access to elements of lexical arrays via a numeric constant between 0 and 255 is now faster. (This used to be only the case for global arrays.)
Some pure-perl code that perl was using to retrieve Unicode properties and transliteration mappings has been reimplemented in XS.
The interpreter internals now support a far more memory efficient form of inlineable constants. Storing a reference to a constant value in a symbol table is equivalent to a full typeglob referencing a constant subroutine, but using about 400 bytes less memory. This proxy constant subroutine is automatically upgraded to a real typeglob with subroutine if necessary. The approach taken is analogous to the existing space optimisation for subroutine stub declarations, which are stored as plain scalars in place of the full typeglob.
Several of the core modules have been converted to use this feature for
their system dependent constants - as a result use POSIX;
now takes about
200K less memory.
PERL_DONT_CREATE_GVSV
The new compilation flag PERL_DONT_CREATE_GVSV
, introduced as an option
in perl 5.8.8, is turned on by default in perl 5.9.3. It prevents perl
from creating an empty scalar with every new typeglob. See perl589delta
for details.
Weak reference creation is now O(1) rather than O(n), courtesy of Nicholas Clark. Weak reference deletion remains O(n), but if deletion only happens at program exit, it may be skipped completely.
Salvador Fandiño provided improvements to reduce the memory usage of sort
and to speed up some cases.
Several internal data structures (typeglobs, GVs, CVs, formats) have been restructured to use less memory. (Nicholas Clark)
The UTF-8 caching code is now more efficient, and used more often. (Nicholas Clark)
On Windows, perl's stat() function normally opens the file to determine the link count and update attributes that may have been changed through hard links. Setting ${^WIN32_SLOPPY_STAT} to a true value speeds up stat() by not performing this operation. (Jan Dubois)
The regular expression engine is no longer recursive, meaning that patterns that used to overflow the stack will either die with useful explanations, or run to completion, which, since they were able to blow the stack before, will likely take a very long time to happen. If you were experiencing the occasional stack overflow (or segfault) and upgrade to discover that now perl apparently hangs instead, look for a degenerate regex. (Dave Mitchell)
Classes of a single character are now treated the same as if the character had been used as a literal, meaning that code that uses char-classes as an escaping mechanism will see a speedup. (Yves Orton)
Alternations, where possible, are optimised into more efficient matching structures. String literal alternations are merged into a trie and are matched simultaneously. This means that instead of O(N) time for matching N alternations at a given point, the new code performs in O(1) time. A new special variable, ${^RE_TRIE_MAXBUF}, has been added to fine-tune this optimization. (Yves Orton)
Note: Much code exists that works around perl's historic poor performance on alternations. Often the tricks used to do so will disable the new optimisations. Hopefully the utility modules used for this purpose will be educated about these new optimisations.
When a pattern starts with a trie-able alternation and there aren't better optimisations available, the regex engine will use Aho-Corasick matching to find the start point. (Yves Orton)
-Dusesitecustomize
Run-time customization of @INC can be enabled by passing the
-Dusesitecustomize
flag to Configure. When enabled, this will make perl
run $sitelibexp/sitecustomize.pl before anything else. This script can
then be set up to add additional entries to @INC.
There is now Configure support for creating a relocatable perl tree. If
you Configure with -Duserelocatableinc
, then the paths in @INC (and
everything else in %Config) can be optionally located via the path of the
perl executable.
That means that, if the string ".../"
is found at the start of any
path, it's substituted with the directory of $^X. So, the relocation can
be configured on a per-directory basis, although the default with
-Duserelocatableinc
is that everything is relocated. The initial
install is done to the original configured prefix.
The configuration process now detects whether strlcat() and strlcpy() are available. When they are not available, perl's own version is used (from Russ Allbery's public domain implementation). Various places in the perl interpreter now use them. (Steve Peters)
d_pseudofork
and d_printf_format_null
A new configuration variable, available as $Config{d_pseudofork}
in
the Config module, has been added, to distinguish real fork() support
from fake pseudofork used on Windows platforms.
A new configuration variable, d_printf_format_null
, has been added,
to see if printf-like formats are allowed to be NULL.
Configure -h
has been extended with the most commonly used options.
Parallel makes should work properly now, although there may still be problems
if make test
is instructed to run in parallel.
Building with Borland's compilers on Win32 should work more smoothly. In particular Steve Hay has worked to side step many warnings emitted by their compilers and at least one C compiler internal error.
Perl extensions on Windows now can be statically built into the Perl DLL.
Also, it's now possible to build a perl-static.exe
that doesn't depend
on the Perl DLL on Win32. See the Win32 makefiles for details.
(Vadim Konovalov)
All ppport.h files in the XS modules bundled with perl are now autogenerated at build time. (Marcus Holland-Moritz)
Efforts have been made to make perl and the core XS modules compilable with various C++ compilers (although the situation is not perfect with some of the compilers on some of the platforms tested.)
Support for building perl with Microsoft's 64-bit compiler has been improved. (ActiveState)
Perl can now be compiled with Microsoft Visual C++ 2005 (and 2008 Beta 2).
All win32 builds (MS-Win, WinCE) have been merged and cleaned up.
README files and changelogs for CPAN modules bundled with perl are no longer installed.
Perl has been reported to work on Symbian OS. See perlsymbian for more information.
Many improvements have been made towards making Perl work correctly on z/OS.
Perl has been reported to work on DragonFlyBSD and MidnightBSD.
Perl has also been reported to work on NexentaOS ( http://www.gnusolaris.org/ ).
The VMS port has been improved. See perlvms.
Support for Cray XT4 Catamount/Qk has been added. See hints/catamount.sh in the source code distribution for more information.
Vendor patches have been merged for RedHat and Gentoo.
DynaLoader::dl_unload_file() now works on Windows.
strict
wasn't in effect in regexp-eval blocks (/(?{...})/
).
CORE::require() and CORE::do() were always parsed as require() and do() when they were overridden. This is now fixed.
You can now use a non-arrowed form for chained subscripts after a list slice, like in:
- ({foo => "bar"})[0]{foo}
This used to be a syntax error; a ->
was required.
no warnings 'category'
works correctly with -w
Previously when running with warnings enabled globally via -w
, selective
disabling of specific warning categories would actually turn off all warnings.
This is now fixed; now no warnings 'io';
will only turn off warnings in the
io
class. Previously it would erroneously turn off all warnings.
Several memory leaks in ithreads were closed. Also, ithreads were made less memory-intensive.
threads
is now a dual-life module, also available on CPAN. It has been
expanded in many ways. A kill() method is available for thread signalling.
One can get thread status, or the list of running or joinable threads.
A new threads->exit()
method is used to exit from the application
(this is the default for the main thread) or from the current thread only
(this is the default for all other threads). On the other hand, the exit()
built-in now always causes the whole application to terminate. (Jerry
D. Hedden)
chr() on a negative value now gives \x{FFFD}
, the Unicode replacement
character, unless when the bytes
pragma is in effect, where the low
eight bits of the value are used.
On Windows, the PERL5SHELL environment variable is now checked for taintedness. (Rafael Garcia-Suarez)
stat() and -X filetests now treat *FILE{IO} filehandles like *FILE
filehandles. (Steve Peters)
Overloading now works when references are reblessed into another class. Internally, this has been implemented by moving the flag for "overloading" from the reference to the referent, which logically is where it should always have been. (Nicholas Clark)
A few bugs related to UTF-8 handling with objects that have stringification overloaded have been fixed. (Nicholas Clark)
Traditionally, eval 'syntax error'
has leaked badly. Many (but not all)
of these leaks have now been eliminated or reduced. (Dave Mitchell)
In previous versions, perl would read the file /dev/urandom if it existed when seeding its random number generator. That file is unlikely to exist on Windows, and if it did would probably not contain appropriate data, so perl no longer tries to read it on Windows. (Alex Davies)
The PERLIO_DEBUG
environment variable no longer has any effect for
setuid scripts and for scripts run with -T.
Moreover, with a thread-enabled perl, using PERLIO_DEBUG
could lead to
an internal buffer overflow. This has been fixed.
PerlIO::scalar will now prevent writing to read-only scalars. Moreover, seek() is now supported with PerlIO::scalar-based filehandles, the underlying string being zero-filled as needed. (Rafael, Jarkko Hietaniemi)
study() never worked for UTF-8 strings, but could lead to false results. It's now a no-op on UTF-8 data. (Yves Orton)
The signals SIGILL, SIGBUS and SIGSEGV are now always delivered in an "unsafe" manner (contrary to other signals, that are deferred until the perl interpreter reaches a reasonably stable state; see Deferred Signals (Safe Signals) in perlipc). (Rafael)
When a module or a file is loaded through an @INC-hook, and when this hook has set a filename entry in %INC, __FILE__ is now set for this module accordingly to the contents of that %INC entry. (Rafael)
-t
switch fix
The -w
and -t
switches can now be used together without messing
up which categories of warnings are activated. (Rafael)
Duping a filehandle which has the :utf8
PerlIO layer set will now
properly carry that layer on the duped filehandle. (Rafael)
Localizing a hash element whose key was given as a variable didn't work
correctly if the variable was changed while the local() was in effect (as
in local $h{$x}; ++$x
). (Bo Lindbergh)
Perl will now try to tell you the name of the variable (if any) that was undefined.
A new deprecation warning, Deprecated use of my() in false conditional, has been added, to warn against the use of the dubious and deprecated construct
- my $x if 0;
A new warning, !=~ should be !~
, is emitted to prevent this misspelling
of the non-matching operator.
The warning Newline in left-justified string has been removed.
The error Too late for "-T" option has been reformulated to be more descriptive.
This warning is now emitted in more consistent cases; in short, when one
of the declarations involved is a my variable:
On the other hand, the following:
now gives a "our" variable %s redeclared warning.
These new warnings are now emitted when a dirhandle is used but is either closed or not really a dirhandle.
Two deprecation warnings have been added: (Rafael)
- Opening dirhandle %s also as a file
- Opening filehandle %s also as a directory
Perl's command-line switch -P
is now deprecated.
Perl will warn you against potential backwards compatibility problems with
the use VERSION
syntax.
perl -V
has several improvements, making it more useable from shell
scripts to get the value of configuration variables. See perlrun for
details.
In general, the source code of perl has been refactored, tidied up, and optimized in many places. Also, memory management and allocation has been improved in several points.
When compiling the perl core with gcc, as many gcc warning flags are turned on as is possible on the platform. (This quest for cleanliness doesn't extend to XS code because we cannot guarantee the tidiness of code we didn't write.) Similar strictness flags have been added or tightened for various other C compilers.
The relative ordering of constants that define the various types of SV
have changed; in particular, SVt_PVGV
has been moved before SVt_PVLV
,
SVt_PVAV
, SVt_PVHV
and SVt_PVCV
. This is unlikely to make any
difference unless you have code that explicitly makes assumptions about that
ordering. (The inheritance hierarchy of B::*
objects has been changed
to reflect this.)
Related to this, the internal type SVt_PVBM
has been removed. This
dedicated type of SV
was used by the index operator and parts of the
regexp engine to facilitate fast Boyer-Moore matches. Its use internally has
been replaced by SV
s of type SVt_PVGV
.
A new type SVt_BIND
has been added, in readiness for the project to
implement Perl 6 on 5. There deliberately is no implementation yet, and
they cannot yet be created or destroyed.
The C preprocessor symbols PERL_PM_APIVERSION
and
PERL_XS_APIVERSION
, which were supposed to give the version number of
the oldest perl binary-compatible (resp. source-compatible) with the
present one, were not used, and sometimes had misleading values. They have
been removed.
The BASEOP
structure now uses less space. The op_seq
field has been
removed and replaced by a single bit bit-field op_opt
. op_type
is now 9
bits long. (Consequently, the B::OP
class doesn't provide an seq
method anymore.)
perl's parser is now generated by bison (it used to be generated by byacc.) As a result, it seems to be a bit more robust.
Also, Dave Mitchell improved the lexer debugging output under -DT
.
const
Andy Lester supplied many improvements to determine which function
parameters and local variables could actually be declared const
to the C
compiler. Steve Peters provided new *_set
macros and reworked the core to
use these rather than assigning to macros in LVALUE context.
A new file, mathoms.c, has been added. It contains functions that are
no longer used in the perl core, but that remain available for binary or
source compatibility reasons. However, those functions will not be
compiled in if you add -DNO_MATHOMS
in the compiler flags.
AvFLAGS
has been removedThe AvFLAGS
macro has been removed.
av_*
changesThe av_*()
functions, used to manipulate arrays, no longer accept null
AV*
parameters.
The implementation of the special variables $^H and %^H has changed, to allow implementing lexical pragmas in pure Perl.
The inheritance hierarchy of B::
modules has changed; B::NV
now
inherits from B::SV
(it used to inherit from B::IV
).
The anonymous hash and array constructors now take 1 op in the optree instead of 3, now that pp_anonhash and pp_anonlist return a reference to an hash/array when the op is flagged with OPf_SPECIAL. (Nicholas Clark)
There's still a remaining problem in the implementation of the lexical
$_
: it doesn't work inside /(?{...})/
blocks. (See the TODO test in
t/op/mydef.t.)
Stacked filetest operators won't work when the filetest
pragma is in
effect, because they rely on the stat() buffer _
being populated, and
filetest bypasses stat().
The handling of Unicode still is unclean in several places, where it's dependent on whether a string is internally flagged as UTF-8. This will be made more consistent in perl 5.12, but that won't be possible without a certain amount of backwards incompatibility.
When compiled with g++ and thread support on Linux, it's reported that the
$!
stops working correctly. This is related to the fact that the glibc
provides two strerror_r(3) implementation, and perl selects the wrong
one.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/rt3/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
The Changes file and the perl590delta to perl595delta man pages for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5101delta - what is new for perl v5.10.1
This document describes differences between the 5.10.0 release and the 5.10.1 release.
If you are upgrading from an earlier release such as 5.8.8, first read the perl5100delta, which describes differences between 5.8.8 and 5.10.0
The handling of complex expressions by the given
/when
switch
statement has been enhanced. There are two new cases where when
now
interprets its argument as a boolean, instead of an expression to be used
in a smart match:
The ..
and ...
flip-flop operators are now evaluated in boolean
context, following their usual semantics; see Range Operators in perlop.
Note that, as in perl 5.10.0, when (1..10)
will not work to test
whether a given value is an integer between 1 and 10; you should use
when ([1..10])
instead (note the array reference).
However, contrary to 5.10.0, evaluating the flip-flop operators in boolean
context ensures it can now be useful in a when()
, notably for
implementing bistable conditions, like in:
- when (/^=begin/ .. /^=end/) {
- # do something
- }
A compound expression involving the defined-or operator, as in
when (expr1 // expr2), will be treated as boolean if the first
expression is boolean. (This just extends the existing rule that applies
to the regular or operator, as in when (expr1 || expr2)
.)
The next section details more changes brought to the semantics to the smart match operator, that naturally also modify the behaviour of the switch statements where smart matching is implicitly used.
The smart match operator ~~
is no longer commutative. The behaviour of
a smart match now depends primarily on the type of its right hand
argument. Moreover, its semantics have been adjusted for greater
consistency or usefulness in several cases. While the general backwards
compatibility is maintained, several changes must be noted:
Code references with an empty prototype are no longer treated specially. They are passed an argument like the other code references (even if they choose to ignore it).
%hash ~~ sub {}
and @array ~~ sub {}
now test that the subroutine
returns a true value for each key of the hash (or element of the
array), instead of passing the whole hash or array as a reference to
the subroutine.
Due to the commutativity breakage, code references are no longer
treated specially when appearing on the left of the ~~
operator,
but like any vulgar scalar.
undef ~~ %hash
is always false (since undef can't be a key in a
hash). No implicit conversion to ""
is done (as was the case in perl
5.10.0).
$scalar ~~ @array
now always distributes the smart match across the
elements of the array. It's true if one element in @array verifies
$scalar ~~ $element
. This is a generalization of the old behaviour
that tested whether the array contained the scalar.
The full dispatch table for the smart match operator is given in Smart matching in detail in perlsyn.
According to the rule of dispatch based on the rightmost argument type,
when an object overloading ~~
appears on the right side of the
operator, the overload routine will always be called (with a 3rd argument
set to a true value, see overload.) However, when the object will
appear on the left, the overload routine will be called only when the
rightmost argument is a simple scalar. This way distributivity of smart match
across arrays is not broken, as well as the other behaviours with complex
types (coderefs, hashes, regexes). Thus, writers of overloading routines
for smart match mostly need to worry only with comparing against a scalar,
and possibly with stringification overloading; the other common cases
will be automatically handled consistently.
~~
will now refuse to work on objects that do not overload it (in order
to avoid relying on the object's underlying structure). (However, if the
object overloads the stringification or the numification operators, and
if overload fallback is active, it will be used instead, as usual.)
The semantics of use feature :5.10* have changed slightly.
See Modules and Pragmata for more information.
It is now a run-time error to use the smart match operator ~~
with an object that has no overload defined for it. (This way
~~
will not break encapsulation by matching against the
object's internal representation as a reference.)
The version control system used for the development of the perl
interpreter has been switched from Perforce to git. This is mainly an
internal issue that only affects people actively working on the perl core;
but it may have minor external visibility, for example in some of details
of the output of perl -V
. See perlrepository for more information.
The internal structure of the ext/
directory in the perl source has
been reorganised. In general, a module Foo::Bar
whose source was
stored under ext/Foo/Bar/ is now located under ext/Foo-Bar/. Also,
some modules have been moved from lib/ to ext/. This is purely a
source tarball change, and should make no difference to the compilation or
installation of perl, unless you have a very customised build process that
explicitly relies on this structure, or which hard-codes the nonxs_ext
Configure parameter. Specifically, this change does not by default
alter the location of any files in the final installation.
As part of the Test::Harness
2.x to 3.x upgrade, the experimental
Test::Harness::Straps
module has been removed.
See Updated Modules for more details.
As part of the ExtUtils::MakeMaker
upgrade, the
ExtUtils::MakeMaker::bytes
and ExtUtils::MakeMaker::vmsish
modules
have been removed from this distribution.
Module::CoreList
no longer contains the %:patchlevel hash.
This one is actually a change introduced in 5.10.0, but it was missed from that release's perldelta, so it is mentioned here instead.
A bugfix related to the handling of the /m modifier and qr resulted
in a change of behaviour between 5.8.x and 5.10.0:
- # matches in 5.8.x, doesn't match in 5.10.0
- $re = qr/^bar/; "foo\nbar" =~ /$re/m;
The copy of the Unicode Character Database included in Perl 5.10.1 has been updated to 5.1.0 from 5.0.0. See http://www.unicode.org/versions/Unicode5.1.0/#Notable_Changes for the notable changes.
As of Perl 5.10.1 there is a new interface for plugging and using method resolution orders other than the default (linear depth first search). The C3 method resolution order added in 5.10.0 has been re-implemented as a plugin, without changing its Perl-space interface. See perlmroapi for more information.
overloading
pragmaThis pragma allows you to lexically disable or enable overloading for some or all operations. (Yuval Kogman)
The core distribution can now run its regression tests in parallel on
Unix-like platforms. Instead of running make test
, set TEST_JOBS
in
your environment to the number of tests to run in parallel, and run
make test_harness
. On a Bourne-like shell, this can be done as
- TEST_JOBS=3 make test_harness # Run 3 tests in parallel
An environment variable is used, rather than parallel make itself, because
TAP::Harness needs to be able to schedule individual non-conflicting test
scripts itself, and there is no standard interface to make
utilities to
interact with their job schedulers.
Note that currently some test scripts may fail when run in parallel (most
notably ext/IO/t/io_dir.t
). If necessary run just the failing scripts
again sequentially and see if the failures go away.
Some support for DTrace has been added. See "DTrace support" in INSTALL.
configure_requires
in CPAN module metadataBoth CPAN
and CPANPLUS
now support the configure_requires
keyword
in the META.yml
metadata file included in most recent CPAN distributions.
This allows distribution authors to specify configuration prerequisites that
must be installed before running Makefile.PL or Build.PL.
See the documentation for ExtUtils::MakeMaker
or Module::Build
for more
on how to specify configure_requires
when creating a distribution for CPAN.
autodie
This is a new lexically-scoped alternative for the Fatal
module.
The bundled version is 2.06_01. Note that in this release, using a string
eval when autodie
is in effect can cause the autodie behaviour to leak
into the surrounding scope. See BUGS in autodie for more details.
Compress::Raw::Bzip2
This has been added to the core (version 2.020).
parent
This pragma establishes an ISA relationship with base classes at compile
time. It provides the key feature of base
without the feature creep.
Parse::CPAN::Meta
This has been added to the core (version 1.39).
attributes
Upgraded from version 0.08 to 0.09.
attrs
Upgraded from version 1.02 to 1.03.
base
Upgraded from version 2.13 to 2.14. See parent for a replacement.
bigint
Upgraded from version 0.22 to 0.23.
bignum
Upgraded from version 0.22 to 0.23.
bigrat
Upgraded from version 0.22 to 0.23.
charnames
Upgraded from version 1.06 to 1.07.
The Unicode NameAliases.txt database file has been added. This has the
effect of adding some extra \N
character names that formerly wouldn't
have been recognised; for example, "\N{LATIN CAPITAL LETTER GHA}"
.
constant
Upgraded from version 1.13 to 1.17.
feature
The meaning of the :5.10
and :5.10.X
feature bundles has
changed slightly. The last component, if any (i.e. X
) is simply ignored.
This is predicated on the assumption that new features will not, in
general, be added to maintenance releases. So :5.10
and :5.10.X
have identical effect. This is a change to the behaviour documented for
5.10.0.
fields
Upgraded from version 2.13 to 2.14 (this was just a version bump; there were no functional changes).
lib
Upgraded from version 0.5565 to 0.62.
open
Upgraded from version 1.06 to 1.07.
overload
Upgraded from version 1.06 to 1.07.
overloading
See The overloading pragma above.
version
Upgraded from version 0.74 to 0.77.
Archive::Extract
Upgraded from version 0.24 to 0.34.
Archive::Tar
Upgraded from version 1.38 to 1.52.
Attribute::Handlers
Upgraded from version 0.79 to 0.85.
AutoLoader
Upgraded from version 5.63 to 5.68.
AutoSplit
Upgraded from version 1.05 to 1.06.
B
Upgraded from version 1.17 to 1.22.
B::Debug
Upgraded from version 1.05 to 1.11.
B::Deparse
Upgraded from version 0.83 to 0.89.
B::Lint
Upgraded from version 1.09 to 1.11.
B::Xref
Upgraded from version 1.01 to 1.02.
Benchmark
Upgraded from version 1.10 to 1.11.
Carp
Upgraded from version 1.08 to 1.11.
CGI
Upgraded from version 3.29 to 3.43. (also includes the "default_value for popup_menu()" fix from 3.45).
Compress::Zlib
Upgraded from version 2.008 to 2.020.
CPAN
Upgraded from version 1.9205 to 1.9402. CPAN::FTP
has a local fix to
stop it being too verbose on download failure.
CPANPLUS
Upgraded from version 0.84 to 0.88.
CPANPLUS::Dist::Build
Upgraded from version 0.06_02 to 0.36.
Cwd
Upgraded from version 3.25_01 to 3.30.
Data::Dumper
Upgraded from version 2.121_14 to 2.124.
DB
Upgraded from version 1.01 to 1.02.
DB_File
Upgraded from version 1.816_1 to 1.820.
Devel::PPPort
Upgraded from version 3.13 to 3.19.
Digest::MD5
Upgraded from version 2.36_01 to 2.39.
Digest::SHA
Upgraded from version 5.45 to 5.47.
DirHandle
Upgraded from version 1.01 to 1.03.
Dumpvalue
Upgraded from version 1.12 to 1.13.
DynaLoader
Upgraded from version 1.08 to 1.10.
Encode
Upgraded from version 2.23 to 2.35.
Errno
Upgraded from version 1.10 to 1.11.
Exporter
Upgraded from version 5.62 to 5.63.
ExtUtils::CBuilder
Upgraded from version 0.21 to 0.2602.
ExtUtils::Command
Upgraded from version 1.13 to 1.16.
ExtUtils::Constant
Upgraded from 0.20 to 0.22. (Note that neither of these versions are available on CPAN.)
ExtUtils::Embed
Upgraded from version 1.27 to 1.28.
ExtUtils::Install
Upgraded from version 1.44 to 1.54.
ExtUtils::MakeMaker
Upgraded from version 6.42 to 6.55_02.
Note that ExtUtils::MakeMaker::bytes
and ExtUtils::MakeMaker::vmsish
have been removed from this distribution.
ExtUtils::Manifest
Upgraded from version 1.51_01 to 1.56.
ExtUtils::ParseXS
Upgraded from version 2.18_02 to 2.2002.
Fatal
Upgraded from version 1.05 to 2.06_01. See also the new pragma autodie
.
File::Basename
Upgraded from version 2.76 to 2.77.
File::Compare
Upgraded from version 1.1005 to 1.1006.
File::Copy
Upgraded from version 2.11 to 2.14.
File::Fetch
Upgraded from version 0.14 to 0.20.
File::Find
Upgraded from version 1.12 to 1.14.
File::Path
Upgraded from version 2.04 to 2.07_03.
File::Spec
Upgraded from version 3.2501 to 3.30.
File::stat
Upgraded from version 1.00 to 1.01.
File::Temp
Upgraded from version 0.18 to 0.22.
FileCache
Upgraded from version 1.07 to 1.08.
FileHandle
Upgraded from version 2.01 to 2.02.
Filter::Simple
Upgraded from version 0.82 to 0.84.
Filter::Util::Call
Upgraded from version 1.07 to 1.08.
FindBin
Upgraded from version 1.49 to 1.50.
GDBM_File
Upgraded from version 1.08 to 1.09.
Getopt::Long
Upgraded from version 2.37 to 2.38.
Hash::Util::FieldHash
Upgraded from version 1.03 to 1.04. This fixes a memory leak.
I18N::Collate
Upgraded from version 1.00 to 1.01.
IO
Upgraded from version 1.23_01 to 1.25.
This makes non-blocking mode work on Windows in IO::Socket::INET
[CPAN #43573].
IO::Compress::*
Upgraded from version 2.008 to 2.020.
IO::Dir
Upgraded from version 1.06 to 1.07.
IO::Handle
Upgraded from version 1.27 to 1.28.
IO::Socket
Upgraded from version 1.30_01 to 1.31.
IO::Zlib
Upgraded from version 1.07 to 1.09.
IPC::Cmd
Upgraded from version 0.40_1 to 0.46.
IPC::Open3
Upgraded from version 1.02 to 1.04.
IPC::SysV
Upgraded from version 1.05 to 2.01.
lib
Upgraded from version 0.5565 to 0.62.
List::Util
Upgraded from version 1.19 to 1.21.
Locale::MakeText
Upgraded from version 1.12 to 1.13.
Log::Message
Upgraded from version 0.01 to 0.02.
Math::BigFloat
Upgraded from version 1.59 to 1.60.
Math::BigInt
Upgraded from version 1.88 to 1.89.
Math::BigInt::FastCalc
Upgraded from version 0.16 to 0.19.
Math::BigRat
Upgraded from version 0.21 to 0.22.
Math::Complex
Upgraded from version 1.37 to 1.56.
Math::Trig
Upgraded from version 1.04 to 1.20.
Memoize
Upgraded from version 1.01_02 to 1.01_03 (just a minor documentation change).
Module::Build
Upgraded from version 0.2808_01 to 0.34_02.
Module::CoreList
Upgraded from version 2.13 to 2.18. This release no longer contains the
%Module::CoreList::patchlevel
hash.
Module::Load
Upgraded from version 0.12 to 0.16.
Module::Load::Conditional
Upgraded from version 0.22 to 0.30.
Module::Loaded
Upgraded from version 0.01 to 0.02.
Module::Pluggable
Upgraded from version 3.6 to 3.9.
NDBM_File
Upgraded from version 1.07 to 1.08.
Net::Ping
Upgraded from version 2.33 to 2.36.
NEXT
Upgraded from version 0.60_01 to 0.64.
Object::Accessor
Upgraded from version 0.32 to 0.34.
OS2::REXX
Upgraded from version 1.03 to 1.04.
Package::Constants
Upgraded from version 0.01 to 0.02.
PerlIO
Upgraded from version 1.04 to 1.06.
PerlIO::via
Upgraded from version 0.04 to 0.07.
Pod::Man
Upgraded from version 2.16 to 2.22.
Pod::Parser
Upgraded from version 1.35 to 1.37.
Pod::Simple
Upgraded from version 3.05 to 3.07.
Pod::Text
Upgraded from version 3.08 to 3.13.
POSIX
Upgraded from version 1.13 to 1.17.
Safe
Upgraded from 2.12 to 2.18.
Scalar::Util
Upgraded from version 1.19 to 1.21.
SelectSaver
Upgraded from 1.01 to 1.02.
SelfLoader
Upgraded from 1.11 to 1.17.
Socket
Upgraded from 1.80 to 1.82.
Storable
Upgraded from 2.18 to 2.20.
Switch
Upgraded from version 2.13 to 2.14. Please see Deprecations.
Symbol
Upgraded from version 1.06 to 1.07.
Sys::Syslog
Upgraded from version 0.22 to 0.27.
Term::ANSIColor
Upgraded from version 1.12 to 2.00.
Term::ReadLine
Upgraded from version 1.03 to 1.04.
Term::UI
Upgraded from version 0.18 to 0.20.
Test::Harness
Upgraded from version 2.64 to 3.17.
Note that one side-effect of the 2.x to 3.x upgrade is that the
experimental Test::Harness::Straps
module (and its supporting
Assert
, Iterator
, Point
and Results
modules) have been
removed. If you still need this, then they are available in the
(unmaintained) Test-Harness-Straps
distribution on CPAN.
Test::Simple
Upgraded from version 0.72 to 0.92.
Text::ParseWords
Upgraded from version 3.26 to 3.27.
Text::Tabs
Upgraded from version 2007.1117 to 2009.0305.
Text::Wrap
Upgraded from version 2006.1117 to 2009.0305.
Thread::Queue
Upgraded from version 2.00 to 2.11.
Thread::Semaphore
Upgraded from version 2.01 to 2.09.
threads
Upgraded from version 1.67 to 1.72.
threads::shared
Upgraded from version 1.14 to 1.29.
Tie::RefHash
Upgraded from version 1.37 to 1.38.
Tie::StdHandle
This has documentation changes, and has been assigned a version for the first time: version 4.2.
Time::HiRes
Upgraded from version 1.9711 to 1.9719.
Time::Local
Upgraded from version 1.18 to 1.1901.
Time::Piece
Upgraded from version 1.12 to 1.15.
Unicode::Normalize
Upgraded from version 1.02 to 1.03.
Unicode::UCD
Upgraded from version 0.25 to 0.27.
charinfo()
now works on Unified CJK code points added to later versions
of Unicode.
casefold()
has new fields returned to provide both a simpler interface
and previously missing information. The old fields are retained for
backwards compatibility. Information about Turkic-specific code points is
now returned.
The documentation has been corrected and expanded.
UNIVERSAL
Upgraded from version 1.04 to 1.05.
Win32
Upgraded from version 0.34 to 0.39.
Win32API::File
Upgraded from version 0.1001_01 to 0.1101.
XSLoader
Upgraded from version 0.08 to 0.10.
Now looks in include-fixed
too, which is a recent addition to gcc's
search path.
No longer incorrectly treats enum values like macros (Daniel Burr).
Now handles C++ style constants (//
) properly in enums. (A patch from
Rainer Weikusat was used; Daniel Burr also proposed a similar fix).
LVALUE
subroutines now work under the debugger.
The debugger now correctly handles proxy constant subroutines, and subroutine stubs.
Perl 5.10.1 adds a new utility perlthanks, which is a variant of perlbug, but for sending non-bug-reports to the authors and maintainers of Perl. Getting nothing but bug reports can become a bit demoralising: we'll see if this changes things.
This contains instructions on how to build perl for the Haiku platform.
This describes the new interface for pluggable Method Resolution Orders.
This document, by Richard Foley, provides an introduction to the use of performance and optimization techniques which can be used with particular reference to perl programs.
This describes how to access the perl source using the git version control system.
This describes the new perlthanks utility.
The various large Changes*
files (which listed every change made to perl
over the last 18 years) have been removed, and replaced by a small file,
also called Changes
, which just explains how that same information may
be extracted from the git version control system.
The file Porting/patching.pod has been deleted, as it mainly described interacting with the old Perforce-based repository, which is now obsolete. Information still relevant has been moved to perlrepository.
perlapi, perlintern, perlmodlib and perltoc are now all generated at build time, rather than being shipped as part of the release.
A new internal cache means that isa()
will often be faster.
Under use locale
, the locale-relevant information is now cached on
read-only values, such as the list returned by keys %hash
. This makes
operations such as sort keys %hash
in the scope of use locale
much
faster.
Empty DESTROY
methods are no longer called.
The layout of directories in ext has been revised. Specifically, all
extensions are now flat, and at the top level, with / in pathnames
replaced by -
, so that ext/Data/Dumper/ is now ext/Data-Dumper/,
etc. The names of the extensions as specified to Configure, and as
reported by %Config::Config
under the keys dynamic_ext
,
known_extensions
, nonxs_ext
and static_ext
have not changed, and
still use /. Hence this change will not have any affect once perl is
installed. However, Attribute::Handlers
, Safe
and mro
have now
become extensions in their own right, so if you run Configure with
options to specify an exact list of extensions to build, you will need to
change it to account for this.
For 5.10.2, it is planned that many dual-life modules will have been moved from lib to ext; again this will have no effect on an installed perl, but will matter if you invoke Configure with a pre-canned list of extensions to build.
If vendorlib
and vendorarch
are the same, then they are only added to
@INC
once.
$Config{usedevel}
and the C-level PERL_USE_DEVEL
are now defined if
perl is built with -Dusedevel
.
Configure will enable use of -fstack-protector
, to provide protection
against stack-smashing attacks, if the compiler supports it.
Configure will now determine the correct prototypes for re-entrant
functions, and for gconvert
, if you are using a C++ compiler rather
than a C compiler.
On Unix, if you build from a tree containing a git repository, the
configuration process will note the commit hash you have checked out, for
display in the output of perl -v
and perl -V
. Unpushed local commits
are automatically added to the list of local patches displayed by
perl -V
.
As part of the flattening of ext, all extensions on all platforms are built by make_ext.pl. This replaces the Unix-specific ext/util/make_ext, VMS-specific make_ext.com and Win32-specific win32/buildext.pl.
Removed libbsd for AIX 5L and 6.1. Only flock() was used from libbsd.
Removed libgdbm for AIX 5L and 6.1. The libgdbm is delivered as an optional package with the AIX Toolbox. Unfortunately the 64 bit version is broken.
Hints changes mean that AIX 4.2 should work again.
On Cygwin we now strip the last number from the DLL. This has been the behaviour in the cygwin.com build for years. The hints files have been updated.
The hints files now identify the correct threading libraries on FreeBSD 7 and later.
We now work around a bizarre preprocessor bug in the Irix 6.5 compiler:
cc -E -
unfortunately goes into K&R mode, but cc -E file.c
doesn't.
Patches from the Haiku maintainers have been merged in. Perl should now build on Haiku.
Perl should now build on MirOS BSD.
Hints now supports versions 5.*.
Various changes from Stratus have been merged in.
There is now support for Symbian S60 3.2 SDK and S60 5.0 SDK.
Improved message window handling means that alarm and kill messages
will no longer be dropped under race conditions.
Reads from the in-memory temporary files of PerlIO::scalar
used to fail
if $/
was set to a numeric reference (to indicate record-style reads).
This is now fixed.
VMS now supports getgrgid.
Many improvements and cleanups have been made to the VMS file name handling and conversion code.
Enabling the PERL_VMS_POSIX_EXIT
logical name now encodes a POSIX exit
status in a VMS condition value for better interaction with GNV's bash
shell and other utilities that depend on POSIX exit values. See
$? in perlvms for details.
5.10.0 inadvertently disabled an optimisation, which caused a measurable
performance drop in list assignment, such as is often used to assign
function parameters from @_
. The optimisation has been re-instated, and
the performance regression fixed.
Fixed memory leak on while (1) { map 1, 1 }
[RT #53038].
Some potential coredumps in PerlIO fixed [RT #57322,54828].
The debugger now works with lvalue subroutines.
The debugger's m command was broken on modules that defined constants
[RT #61222].
crypt() and string complement could return tainted values for untainted
arguments [RT #59998].
The -i.suffix
command-line switch now recreates the file using
restricted permissions, before changing its mode to match the original
file. This eliminates a potential race condition [RT #60904].
On some Unix systems, the value in $?
would not have the top bit set
($? & 128
) even if the child core dumped.
Under some circumstances, $^R could incorrectly become undefined [RT #57042].
(XS) In various hash functions, passing a pre-computed hash to when the key is UTF-8 might result in an incorrect lookup.
(XS) Including XSUB.h before perl.h gave a compile-time error [RT #57176].
$object->isa('Foo')
would report false if the package Foo
didn't
exist, even if the object's @ISA
contained Foo
.
Various bugs in the new-to 5.10.0 mro code, triggered by manipulating
@ISA
, have been found and fixed.
Bitwise operations on references could crash the interpreter, e.g.
$x=\$y; $x |= "foo"
[RT #54956].
Patterns including alternation might be sensitive to the internal UTF-8 representation, e.g.
Within UTF8-encoded Perl source files (i.e. where use utf8
is in
effect), double-quoted literal strings could be corrupted where a \xNN
,
\0NNN or \N{}
is followed by a literal character with ordinal value
greater than 255 [RT #59908].
B::Deparse
failed to correctly deparse various constructs:
readpipe STRING
[RT #62428], CORE::require(STRING)
[RT #62488],
sub foo(_)
[RT #62484].
Using setpgrp() with no arguments could corrupt the perl stack.
The block form of eval is now specifically trappable by Safe
and
ops
. Previously it was erroneously treated like string eval.
In 5.10.0, the two characters [~ were sometimes parsed as the smart
match operator (~~
) [RT #63854].
In 5.10.0, the *
quantifier in patterns was sometimes treated as
{0,32767}
[RT #60034, #60464]. For example, this match would fail:
- ("ab" x 32768) =~ /^(ab)*$/
shmget was limited to a 32 bit segment size on a 64 bit OS [RT #63924].
Using next or last to exit a given
block no longer produces a
spurious warning like the following:
- Exiting given via last at foo.pl line 123
On Windows, '.\foo'
and '..\foo'
were treated differently than
'./foo'
and '../foo'
by do and require [RT #63492].
Assigning a format to a glob could corrupt the format; e.g.:
- *bar=*foo{FORMAT}; # foo format now bad
Attempting to coerce a typeglob to a string or number could cause an
assertion failure. The correct error message is now generated,
Can't coerce GLOB to $type.
Under use filetest 'access'
, -x
was using the wrong access mode. This
has been fixed [RT #49003].
length on a tied scalar that returned a Unicode value would not be
correct the first time. This has been fixed.
Using an array tie inside in array tie could SEGV. This has been
fixed. [RT #51636]
A race condition inside PerlIOStdio_close()
has been identified and
fixed. This used to cause various threading issues, including SEGVs.
In unpack, the use of ()
groups in scalar context was internally
placing a list on the interpreter's stack, which manifested in various
ways, including SEGVs. This is now fixed [RT #50256].
Magic was called twice in substr, \&$x
, tie $x, $m
and chop.
These have all been fixed.
A 5.10.0 optimisation to clear the temporary stack within the implicit
loop of s///ge has been reverted, as it turned out to be the cause of
obscure bugs in seemingly unrelated parts of the interpreter [commit
ef0d4e17921ee3de].
The line numbers for warnings inside elsif
are now correct.
The ..
operator now works correctly with ranges whose ends are at or
close to the values of the smallest and largest integers.
binmode STDIN, ':raw'
could lead to segmentation faults on some platforms.
This has been fixed [RT #54828].
An off-by-one error meant that index $str, ...
was effectively being
executed as index "$str\0", ...
. This has been fixed [RT #53746].
Various leaks associated with named captures in regexes have been fixed [RT #57024].
A weak reference to a hash would leak. This was affecting DBI
[RT #56908].
Using (?|) in a regex could cause a segfault [RT #59734].
Use of a UTF-8 tr// within a closure could cause a segfault [RT #61520].
Calling sv_chop()
or otherwise upgrading an SV could result in an
unaligned 64-bit access on the SPARC architecture [RT #60574].
In the 5.10.0 release, inc_version_list
would incorrectly list
5.10.*
after 5.8.*
; this affected the @INC
search order
[RT #67628].
In 5.10.0, pack "a*", $tainted_value
returned a non-tainted value
[RT #52552].
In 5.10.0, printf and sprintf could produce the fatal error
panic: utf8_mg_pos_cache_update
when printing UTF-8 strings
[RT #62666].
In the 5.10.0 release, a dynamically created AUTOLOAD
method might be
missed (method cache issue) [RT #60220,60232].
In the 5.10.0 release, a combination of use feature
and //ee could
cause a memory leak [RT #63110].
-C
on the shebang (#!
) line is once more permitted if it is also
specified on the command line. -C
on the shebang line used to be a
silent no-op if it was not also on the command line, so perl 5.10.0
disallowed it, which broke some scripts. Now perl checks whether it is
also on the command line and only dies if it is not [RT #67880].
In 5.10.0, certain types of re-entrant regular expression could crash, or cause the following assertion failure [RT #60508]:
- Assertion rx->sublen >= (s - rx->subbeg) + i failed
panic: sv_chop %s
This new fatal error occurs when the C routine Perl_sv_chop()
was
passed a position that is not within the scalar's string buffer. This
could be caused by buggy XS code, and at this point recovery is not
possible.
Can't locate package %s for the parents of %s
This warning has been removed. In general, it only got produced in conjunction with other warnings, and removing it allowed an ISA lookup optimisation to be added.
v-string in use/require is non-portable
This warning has been removed.
Deep recursion on subroutine "%s"
It is now possible to change the depth threshold for this warning from the
default of 100, by recompiling the perl binary, setting the C
pre-processor macro PERL_SUB_DEPTH_WARN
to the desired value.
The J.R.R. Tolkien quotes at the head of C source file have been checked and proper citations added, thanks to a patch from Tom Christiansen.
vcroak()
now accepts a null first argument. In addition, a full audit
was made of the "not NULL" compiler annotations, and those for several
other internal functions were corrected.
New macros dSAVEDERRNO
, dSAVE_ERRNO
, SAVE_ERRNO
, RESTORE_ERRNO
have been added to formalise the temporary saving of the errno
variable.
The function Perl_sv_insert_flags
has been added to augment
Perl_sv_insert
.
The function Perl_newSV_type(type)
has been added, equivalent to
Perl_newSV()
followed by Perl_sv_upgrade(type)
.
The function Perl_newSVpvn_flags()
has been added, equivalent to
Perl_newSVpvn()
and then performing the action relevant to the flag.
Two flag bits are currently supported.
SVf_UTF8
This will call SvUTF8_on()
for you. (Note that this does not convert an
sequence of ISO 8859-1 characters to UTF-8). A wrapper, newSVpvn_utf8()
is available for this.
SVs_TEMP
Call sv_2mortal()
on the new SV.
There is also a wrapper that takes constant strings, newSVpvs_flags()
.
The function Perl_croak_xs_usage
has been added as a wrapper to
Perl_croak
.
The functions PerlIO_find_layer
and PerlIO_list_alloc
are now
exported.
PL_na
has been exterminated from the core code, replaced by local STRLEN
temporaries, or *_nolen()
calls. Either approach is faster than PL_na
,
which is a pointer deference into the interpreter structure under ithreads,
and a global variable otherwise.
Perl_mg_free()
used to leave freed memory accessible via SvMAGIC() on
the scalar. It now updates the linked list to remove each piece of magic
as it is freed.
Under ithreads, the regex in PL_reg_curpm
is now reference counted. This
eliminates a lot of hackish workarounds to cope with it not being reference
counted.
Perl_mg_magical()
would sometimes incorrectly turn on SvRMAGICAL()
.
This has been fixed.
The public IV and NV flags are now not set if the string value has trailing "garbage". This behaviour is consistent with not setting the public IV or NV flags if the value is out of range for the type.
SV allocation tracing has been added to the diagnostics enabled by -Dm
.
The tracing can alternatively output via the PERL_MEM_LOG
mechanism, if
that was enabled when the perl binary was compiled.
Uses of Nullav
, Nullcv
, Nullhv
, Nullop
, Nullsv
etc have been
replaced by NULL
in the core code, and non-dual-life modules, as NULL
is clearer to those unfamiliar with the core code.
A macro MUTABLE_PTR(p)
has been added, which on (non-pedantic) gcc will
not cast away const
, returning a void *
. Macros MUTABLE_SV(av)
,
MUTABLE_SV(cv)
etc build on this, casting to AV *
etc without
casting away const
. This allows proper compile-time auditing of
const
correctness in the core, and helped picked up some errors (now
fixed).
Macros mPUSHs()
and mXPUSHs()
have been added, for pushing SVs on the
stack and mortalizing them.
Use of the private structure mro_meta
has changed slightly. Nothing
outside the core should be accessing this directly anyway.
A new tool, Porting/expand-macro.pl
has been added, that allows you
to view how a C preprocessor macro would be expanded when compiled.
This is handy when trying to decode the macro hell that is the perl
guts.
Many modules updated from CPAN incorporate new tests.
Several tests that have the potential to hang forever if they fail now
incorporate a "watchdog" functionality that will kill them after a timeout,
which helps ensure that make test
and make test_harness
run to
completion automatically. (Jerry Hedden).
Some core-specific tests have been added:
Check that the debugger can retain source lines from eval.
Check that bad layers fail.
Check that PerlIO layers are not leaking.
Check that certain special forms of open work.
General PerlIO tests.
Check that there is no unexpected interaction between the internal types
PVBM
and PVGV
.
Check that mro works properly in the presence of aliased packages.
Tests for the interaction of index and threads.
Tests for the interaction of esoteric patterns and threads.
Test that qr doesn't leak.
Tests for the interaction of regex recursion and threads.
Tests for the interaction of patterns with embedded qr// and threads.
Tests for Unicode properties in regular expressions.
Tests for the interaction of Unicode properties and threads.
Test the tied methods of Tie::Hash::NamedCapture
.
Check that POSIX character classes behave consistently.
Check that exportable re
functions in universal.c work.
Check that setpgrp works.
Tests for the interaction of substr and threads.
Check that upgrading and assigning scalars works.
Check that Unicode in the lexer works.
Check that Unicode and tie work.
This is a list of some significant unfixed bugs, which are regressions from either 5.10.0 or 5.8.x.
List::Util::first
misbehaves in the presence of a lexical $_
(typically introduced by my $_
or implicitly by given
). The variable
which gets set for each iteration is the package variable $_
, not the
lexical $_
[RT #67694].
A similar issue may occur in other modules that provide functions which take a block as their first argument, like
- foo { ... $_ ...} list
The charnames
pragma may generate a run-time error when a regex is
interpolated [RT #56444]:
A workaround is to generate the character outside of the regex:
Some regexes may run much more slowly when run in a child thread compared with the thread the pattern was compiled into [RT #55600].
The following items are now deprecated.
Switch
is buggy and should be avoided. From perl 5.11.0 onwards, it is
intended that any use of the core version of this module will emit a
warning, and that the module will eventually be removed from the core
(probably in perl 5.14.0). See Switch statements in perlsyn for its
replacement.
suidperl
will be removed in 5.12.0. This provides a mechanism to
emulate setuid permission bits on systems that don't support it properly.
Some of the work in this release was funded by a TPF grant.
Nicholas Clark officially retired from maintenance pumpking duty at the end of 2008; however in reality he has put much effort in since then to help get 5.10.1 into a fit state to be released, including writing a considerable chunk of this perldelta.
Steffen Mueller and David Golden in particular helped getting CPAN modules polished and synchronised with their in-core equivalents.
Craig Berry was tireless in getting maint to run under VMS, no matter how many times we broke it for him.
The other core committers contributed most of the changes, and applied most of the patches sent in by the hundreds of contributors listed in AUTHORS.
(Sorry to all the people I haven't mentioned by name).
Finally, thanks to Larry Wall, without whom none of this would be necessary.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5120delta - what is new for perl v5.12.0
This document describes differences between the 5.10.0 release and the 5.12.0 release.
Many of the bug fixes in 5.12.0 are already included in the 5.10.1 maintenance release.
You can see the list of those changes in the 5.10.1 release notes (perl5101delta).
package NAME VERSION syntaxThis new syntax allows a module author to set the $VERSION of a namespace
when the namespace is declared with 'package'. It eliminates the need
for our $VERSION = ...
and similar constructs. E.g.
- package Foo::Bar 1.23;
- # $Foo::Bar::VERSION == 1.23
There are several advantages to this:
$VERSION
is parsed in exactly the same way as use NAME VERSION
$VERSION
is set at compile time
$VERSION
is a version object that provides proper overloading of
comparison operators so comparing $VERSION
to decimal (1.23) or
dotted-decimal (v1.2.3) version numbers works correctly.
Eliminates $VERSION = ...
and eval $VERSION
clutter
As it requires VERSION to be a numeric literal or v-string
literal, it can be statically parsed by toolchain modules
without eval the way MM->parse_version does for $VERSION = ...
It does not break old code with only package NAME, but code that uses
package NAME VERSION will need to be restricted to perl 5.12.0 or newer
This is analogous to the change to open from two-args to three-args.
Users requiring the latest Perl will benefit, and perhaps after several
years, it will become a standard practice.
However, package NAME VERSION requires a new, 'strict' version
number format. See Version number formats for details.
...
operatorA new operator, ...
, nicknamed the Yada Yada operator, has been added.
It is intended to mark placeholder code that is not yet implemented.
See Yada Yada Operator in perlop.
Using the use VERSION
syntax with a version number greater or equal
to 5.11.0 will lexically enable strictures just like use strict
would do (in addition to enabling features.) The following:
- use 5.12.0;
means:
Perl 5.12 comes with Unicode 5.2, the latest version available to us at the time of release. This version of Unicode was released in October 2009. See http://www.unicode.org/versions/Unicode5.2.0 for further details about what's changed in this version of the standard. See perlunicode for instructions on installing and using other versions of Unicode.
Additionally, Perl's developers have significantly improved Perl's Unicode implementation. For full details, see Unicode overhaul below.
Perl's core time-related functions are now Y2038 compliant. (It may not mean much to you, but your kids will love it!)
It is now possible to overload the qr// operator, that is,
conversion to regexp, like it was already possible to overload
conversion to boolean, string or number of objects. It is invoked when
an object appears on the right hand side of the =~
operator or when
it is interpolated into a regexp. See overload.
Extension modules can now cleanly hook into the Perl parser to define new kinds of keyword-headed expression and compound statement. The syntax following the keyword is defined entirely by the extension. This allows a completely non-Perl sublanguage to be parsed inline, with the correct ops cleanly generated.
See PL_keyword_plugin in perlapi for the mechanism. The Perl core source distribution also includes a new module XS::APItest::KeywordRPN, which implements reverse Polish notation arithmetic via pluggable keywords. This module is mainly used for test purposes, and is not normally installed, but also serves as an example of how to use the new mechanism.
Perl's developers consider this feature to be experimental. We may remove it or change it in a backwards-incompatible way in Perl 5.14.
The lowest layers of the lexer and parts of the pad system now have C APIs available to XS extensions. These are necessary to support proper use of pluggable keywords, but have other uses too. The new APIs are experimental, and only cover a small proportion of what would be necessary to take full advantage of the core's facilities in these areas. It is intended that the Perl 5.13 development cycle will see the addition of a full range of clean, supported interfaces.
Perl's developers consider this feature to be experimental. We may remove it or change it in a backwards-incompatible way in Perl 5.14.
Where an extension module hooks the creation of rv2cv ops to modify the
subroutine lookup process, this now works correctly for bareword
subroutine calls. This means that prototypes on subroutines referenced
this way will be processed correctly. (Previously bareword subroutine
names were initially looked up, for parsing purposes, by an unhookable
mechanism, so extensions could only properly influence subroutine names
that appeared with an &
sigil.)
As of Perl 5.12.0 there is a new interface for plugging and using method resolution orders other than the default linear depth first search. The C3 method resolution order added in 5.10.0 has been re-implemented as a plugin, without changing its Perl-space interface. See perlmroapi for more information.
\N
experimental regex escapePerl now supports \N
, a new regex escape which you can think of as
the inverse of \n
. It will match any character that is not a newline,
independently from the presence or absence of the single line match
modifier /s. It is not usable within a character class. \N{3}
means to match 3 non-newlines; \N{5,}
means to match at least 5.
\N{NAME}
still means the character or sequence named NAME
, but
NAME
no longer can be things like 3
, or 5,
.
This will break a custom charnames translator which allows numbers for character names, as \N{3}
will
now mean to match 3 non-newline characters, and not the character whose
name is 3
. (No name defined by the Unicode standard is a number,
so only custom translators might be affected.)
Perl's developers are somewhat concerned about possible user confusion
with the existing \N{...}
construct which matches characters by their
Unicode name. Consequently, this feature is experimental. We may remove
it or change it in a backwards-incompatible way in Perl 5.14.
Perl now has some support for DTrace. See "DTrace support" in INSTALL.
configure_requires
in CPAN module metadataBoth CPAN
and CPANPLUS
now support the configure_requires
keyword in the META.yml metadata file included in most recent CPAN
distributions. This allows distribution authors to specify configuration
prerequisites that must be installed before running Makefile.PL
or Build.PL.
See the documentation for ExtUtils::MakeMaker
or Module::Build
for
more on how to specify configure_requires
when creating a distribution
for CPAN.
each, keys, values are now more flexibleThe each, keys, values function can now operate on arrays.
when
as a statement modifierwhen
is now allowed to be used as a statement modifier.
$,
flexibilityThe variable $,
may now be tied.
// now behaves like || in when clauses
You can now set -W
from the PERL5OPT
environment variable
delete local
delete local
now allows you to locally delete a hash entry.
Abstract namespace sockets are Linux-specific socket type that live in AF_UNIX family, slightly abusing it to be able to use arbitrary character arrays as addresses: They start with nul byte and are not terminated by nul byte, but with the length passed to the socket() system call.
The 32-bit limit on substr arguments has now been removed. The full
range of the system's signed and unsigned integers is now available for
the pos and len
arguments.
Over the years, Perl's developers have deprecated a number of language
features for a variety of reasons. Perl now defaults to issuing a
warning if a deprecated language feature is used. Many of the deprecations
Perl now warns you about have been deprecated for many years. You can
find a list of what was deprecated in a given release of Perl in the
perl5xxdelta.pod
file for that release.
To disable this feature in a given lexical scope, you should use no
warnings 'deprecated';
For information about which language features
are deprecated and explanations of various deprecation warnings, please
see perldiag. See Deprecations below for the list of features
and modules Perl's developers have deprecated as part of this release.
Acceptable version number formats have been formalized into "strict" and
"lax" rules. package NAME VERSION takes a strict version number.
UNIVERSAL::VERSION
and the version object constructors take lax
version numbers. Providing an invalid version will result in a fatal
error. The version argument in use NAME VERSION
is first parsed as a
numeric literal or v-string and then passed to UNIVERSAL::VERSION
(and must then pass the "lax" format test).
These formats are documented fully in the version module. To a first approximation, a "strict" version number is a positive decimal number (integer or decimal-fraction) without exponentiation or else a dotted-decimal v-string with a leading 'v' character and at least three components. A "lax" version number allows v-strings with fewer than three components or without a leading 'v'. Under "lax" rules, both decimal and dotted-decimal versions may have a trailing "alpha" component separated by an underscore character after a fractional or dotted-decimal component.
The version module adds version::is_strict
and version::is_lax
functions to check a scalar against these rules.
In @INC
, ARCHLIB
and PRIVLIB
now occur after after the current
version's site_perl
and vendor_perl
. Modules installed into
site_perl
and vendor_perl
will now be loaded in preference to
those installed in ARCHLIB
and PRIVLIB
.
Internally, Perl now treats compiled regular expressions (such as
those created with qr//) as first class entities. Perl modules which
serialize, deserialize or otherwise have deep interaction with Perl's
internal data structures need to be updated for this change. Most
affected CPAN modules have already been updated as of this writing.
The given
/when
switch statement handles complex statements better
than Perl 5.10.0 did (These enhancements are also available in
5.10.1 and subsequent 5.10 releases.) There are two new cases where
when
now interprets its argument as a boolean, instead of an
expression to be used in a smart match:
The ..
and ...
flip-flop operators are now evaluated in boolean
context, following their usual semantics; see Range Operators in perlop.
Note that, as in perl 5.10.0, when (1..10)
will not work to test
whether a given value is an integer between 1 and 10; you should use
when ([1..10])
instead (note the array reference).
However, contrary to 5.10.0, evaluating the flip-flop operators in
boolean context ensures it can now be useful in a when()
, notably
for implementing bistable conditions, like in:
- when (/^=begin/ .. /^=end/) {
- # do something
- }
A compound expression involving the defined-or operator, as in
when (expr1 // expr2), will be treated as boolean if the first
expression is boolean. (This just extends the existing rule that applies
to the regular or operator, as in when (expr1 || expr2)
.)
Since Perl 5.10.0, Perl's developers have made a number of changes to the smart match operator. These, of course, also alter the behaviour of the switch statements where smart matching is implicitly used. These changes were also made for the 5.10.1 release, and will remain in subsequent 5.10 releases.
The smart match operator ~~
is no longer commutative. The behaviour of
a smart match now depends primarily on the type of its right hand
argument. Moreover, its semantics have been adjusted for greater
consistency or usefulness in several cases. While the general backwards
compatibility is maintained, several changes must be noted:
Code references with an empty prototype are no longer treated specially. They are passed an argument like the other code references (even if they choose to ignore it).
%hash ~~ sub {}
and @array ~~ sub {}
now test that the subroutine
returns a true value for each key of the hash (or element of the
array), instead of passing the whole hash or array as a reference to
the subroutine.
Due to the commutativity breakage, code references are no longer
treated specially when appearing on the left of the ~~
operator,
but like any vulgar scalar.
undef ~~ %hash
is always false (since undef can't be a key in a
hash). No implicit conversion to ""
is done (as was the case in perl
5.10.0).
$scalar ~~ @array
now always distributes the smart match across the
elements of the array. It's true if one element in @array verifies
$scalar ~~ $element
. This is a generalization of the old behaviour
that tested whether the array contained the scalar.
The full dispatch table for the smart match operator is given in Smart matching in detail in perlsyn.
According to the rule of dispatch based on the rightmost argument type,
when an object overloading ~~
appears on the right side of the
operator, the overload routine will always be called (with a 3rd argument
set to a true value, see overload.) However, when the object will
appear on the left, the overload routine will be called only when the
rightmost argument is a simple scalar. This way, distributivity of smart
match across arrays is not broken, as well as the other behaviours with
complex types (coderefs, hashes, regexes). Thus, writers of overloading
routines for smart match mostly need to worry only with comparing
against a scalar, and possibly with stringification overloading; the
other common cases will be automatically handled consistently.
~~
will now refuse to work on objects that do not overload it (in order
to avoid relying on the object's underlying structure). (However, if the
object overloads the stringification or the numification operators, and
if overload fallback is active, it will be used instead, as usual.)
The definitions of a number of Unicode properties have changed to match those of the current Unicode standard. These are listed above under Unicode overhaul. This change may break code that expects the old definitions.
The boolkeys op has moved to the group of hash ops. This breaks binary compatibility.
Filehandles are now always blessed into IO::File
.
The previous behaviour was to bless Filehandles into FileHandle
(an empty proxy class) if it was loaded into memory and otherwise
to bless them into IO::Handle
.
The semantics of use feature :5.10* have changed slightly.
See Modules and Pragmata for more information.
Perl's developers now use git, rather than Perforce. This should be
a purely internal change only relevant to people actively working on
the core. However, you may see minor difference in perl as a consequence
of the change. For example in some of details of the output of perl
-V
. See perlrepository for more information.
As part of the Test::Harness
2.x to 3.x upgrade, the experimental
Test::Harness::Straps
module has been removed.
See Modules and Pragmata for more details.
As part of the ExtUtils::MakeMaker
upgrade, the
ExtUtils::MakeMaker::bytes
and ExtUtils::MakeMaker::vmsish
modules
have been removed from this distribution.
Module::CoreList
no longer contains the %:patchlevel hash.
Unsupported private C API functions are now declared "static" to prevent leakage to Perl's public API.
To support the bootstrapping process, miniperl no longer builds with UTF-8 support in the regexp engine.
This allows a build to complete with PERL_UNICODE set and a UTF-8 locale. Without this there's a bootstrapping problem, as miniperl can't load the UTF-8 components of the regexp engine, because they're not yet built.
miniperl's @INC is now restricted to just -I...
, the split of
$ENV{PERL5LIB}
, and "."
A space or a newline is now required after a "#line XXX"
directive.
Tied filehandles now have an additional method EOF which provides the EOF type.
To better match all other flow control statements, foreach
may no
longer be used as an attribute.
Perl's command-line switch "-P", which was deprecated in version 5.10.0, has
now been removed. The CPAN module Filter::cpp
can be used as an
alternative.
From time to time, Perl's developers find it necessary to deprecate features or modules we've previously shipped as part of the core distribution. We are well aware of the pain and frustration that a backwards-incompatible change to Perl can cause for developers building or maintaining software in Perl. You can be sure that when we deprecate a functionality or syntax, it isn't a choice we make lightly. Sometimes, we choose to deprecate functionality or syntax because it was found to be poorly designed or implemented. Sometimes, this is because they're holding back other features or causing performance problems. Sometimes, the reasons are more complex. Wherever possible, we try to keep deprecated functionality available to developers in its previous form for at least one major release. So long as a deprecated feature isn't actively disrupting our ability to maintain and extend Perl, we'll try to leave it in place as long as possible.
The following items are now deprecated:
suidperl
is no longer part of Perl. It used to provide a mechanism to
emulate setuid permission bits on systems that don't support it properly.
:=
to mean an empty attribute list
An accident of Perl's parser meant that these constructions were all equivalent:
with the :
being treated as the start of an attribute list, which
ends before the =
. As whitespace is not significant here, all are
parsed as an empty attribute list, hence all the above are equivalent
to, and better written as
- my $pi = 4;
because no attribute processing is done for an empty list.
As is, this meant that :=
cannot be used as a new token, without
silently changing the meaning of existing code. Hence that particular
form is now deprecated, and will become a syntax error. If it is
absolutely necessary to have empty attribute lists (for example,
because of a code generator) then avoid the warning by adding a space
before the =
.
UNIVERSAL->import()
The method UNIVERSAL->import()
is now deprecated. Attempting to
pass import arguments to a use UNIVERSAL
statement will result in a
deprecation warning.
Using goto to jump from an outer scope into an inner scope is now
deprecated. This rare use case was causing problems in the
implementation of scopes.
In \N{name}, name can be just about anything. The standard
Unicode names have a very limited domain, but a custom name translator
could create names that are, for example, made up entirely of punctuation
symbols. It is now deprecated to make names that don't begin with an
alphabetic character, and aren't alphanumeric or contain other than
a very few other characters, namely spaces, dashes, parentheses
and colons. Because of the added meaning of \N
(See \N experimental regex escape), names that look like curly brace -enclosed
quantifiers won't work. For example, \N{3,4}
now means to match 3 to
4 non-newlines; before a custom name 3,4
could have been created.
The following modules will be removed from the core distribution in a future release, and should be installed from CPAN instead. Distributions on CPAN which require these should add them to their prerequisites. The core versions of these modules warnings will issue a deprecation warning.
If you ship a packaged version of Perl, either alone or as part of a
larger system, then you should carefully consider the repercussions of
core module deprecations. You may want to consider shipping your default
build of Perl with packages for some or all deprecated modules which
install into vendor
or site
perl library directories. This will
inhibit the deprecation warnings.
Alternatively, you may want to consider patching lib/deprecate.pm to provide deprecation warnings specific to your packaging system or distribution of Perl, consistent with how your packaging system or distribution manages a staged transition from a release where the installation of a single package provides the given functionality, to a later release where the system administrator needs to know to install multiple packages to get that same functionality.
You can silence these deprecation warnings by installing the modules
in question from CPAN. To install the latest version of all of them,
just install Task::Deprecations::5_12
.
Switch is buggy and should be avoided. You may find Perl's new
given
/when
feature a suitable replacement. See Switch statements in perlsyn for more information.
Perl_pmflag
is no longer part of Perl's public API. Calling it now
generates a deprecation warning, and it will be removed in a future
release. Although listed as part of the API, it was never documented,
and only ever used in toke.c, and prior to 5.10, regcomp.c. In
core, it has been replaced by a static function.
termcap.pl, tainted.pl, stat.pl, shellwords.pl, pwd.pl, open3.pl, open2.pl, newgetopt.pl, look.pl, find.pl, finddepth.pl, importenv.pl, hostname.pl, getopts.pl, getopt.pl, getcwd.pl, flush.pl, fastcwd.pl, exceptions.pl, ctime.pl, complete.pl, cacheout.pl, bigrat.pl, bigint.pl, bigfloat.pl, assert.pl, abbrev.pl, dotsh.pl, and timelocal.pl are all now deprecated. Earlier, Perl's developers intended to remove these libraries from Perl's core for the 5.14.0 release.
During final testing before the release of 5.12.0, several developers discovered current production code using these ancient libraries, some inside the Perl core itself. Accordingly, the pumpking granted them a stay of execution. They will begin to warn about their deprecation in the 5.14.0 release and will be removed in the 5.16.0 release.
Perl's developers have made a concerted effort to update Perl to be in sync with the latest Unicode standard. Changes for this include:
Perl can now handle every Unicode character property. New documentation, perluniprops, lists all available non-Unihan character properties. By default, perl does not expose Unihan, deprecated or Unicode-internal properties. See below for more details on these; there is also a section in the pod listing them, and explaining why they are not exposed.
Perl now fully supports the Unicode compound-style of using =
and :
in writing regular expressions: \p{property=value}
and
\p{property:value}
(both of which mean the same thing).
Perl now fully supports the Unicode loose matching rules for text between
the braces in \p{...}
constructs. In addition, Perl allows underscores
between digits of numbers.
Perl now accepts all the Unicode-defined synonyms for properties and property values.
qr/\X/, which matches a Unicode logical character, has
been expanded to work better with various Asian languages. It
now is defined as an extended grapheme cluster. (See
http://www.unicode.org/reports/tr29/). Anything matched previously
and that made sense will continue to be accepted. Additionally:
\X
will not break apart a CR LF
sequence.
\X
will now match a sequence which includes the ZWJ
and ZWNJ
characters.
\X
will now always match at least one character, including an initial
mark. Marks generally come after a base character, but it is possible in
Unicode to have them in isolation, and \X
will now handle that case,
for example at the beginning of a line, or after a ZWSP
. And this is
the part where \X
doesn't match the things that it used to that don't
make sense. Formerly, for example, you could have the nonsensical case
of an accented LF.
\X
will now match a (Korean) Hangul syllable sequence, and the Thai
and Lao exception cases.
Otherwise, this change should be transparent for the non-affected languages.
\p{...}
matches using the Canonical_Combining_Class property were
completely broken in previous releases of Perl. They should now work
correctly.
Before Perl 5.12, the Unicode Decomposition_Type=Compat
property
and a Perl extension had the same name, which led to neither matching
all the correct values (with more than 100 mistakes in one, and several
thousand in the other). The Perl extension has now been renamed to be
Decomposition_Type=Noncanonical
(short: dt=noncanon
). It has the
same meaning as was previously intended, namely the union of all the
non-canonical Decomposition types, with Unicode Compat
being just
one of those.
\p{Decomposition_Type=Canonical}
now includes the Hangul syllables.
\p{Uppercase}
and \p{Lowercase}
now work as the Unicode standard
says they should. This means they each match a few more characters than
they used to.
\p{Cntrl}
now matches the same characters as \p{Control}
. This
means it no longer will match Private Use (gc=co), Surrogates (gc=cs),
nor Format (gc=cf) code points. The Format code points represent the
biggest possible problem. All but 36 of them are either officially
deprecated or strongly discouraged from being used. Of those 36, likely
the most widely used are the soft hyphen (U+00AD), and BOM, ZWSP, ZWNJ,
WJ, and similar characters, plus bidirectional controls.
\p{Alpha}
now matches the same characters as \p{Alphabetic}
. Before
5.12, Perl's definition definition included a number of things that aren't
really alpha (all marks) while omitting many that were. The definitions
of \p{Alnum}
and \p{Word}
depend on Alpha's definition and have
changed accordingly.
\p{Word}
no longer incorrectly matches non-word characters such
as fractions.
\p{Print}
no longer matches the line control characters: Tab, LF,
CR, FF, VT, and NEL. This brings it in line with standards and the
documentation.
\p{XDigit}
now matches the same characters as \p{Hex_Digit}
. This
means that in addition to the characters it currently matches,
[A-Fa-f0-9]
, it will also match the 22 fullwidth equivalents, for
example U+FF10: FULLWIDTH DIGIT ZERO.
The Numeric type property has been extended to include the Unihan characters.
There is a new Perl extension, the 'Present_In', or simply 'In',
property. This is an extension of the Unicode Age property, but
\p{In=5.0}
matches any code point whose usage has been determined
as of Unicode version 5.0. The \p{Age=5.0}
only matches code points
added in precisely version 5.0.
A number of properties now have the correct values for unassigned code points. The affected properties are Bidi_Class, East_Asian_Width, Joining_Type, Decomposition_Type, Hangul_Syllable_Type, Numeric_Type, and Line_Break.
The Default_Ignorable_Code_Point, ID_Continue, and ID_Start properties are now up to date with current Unicode definitions.
Earlier versions of Perl erroneously exposed certain properties that are supposed to be Unicode internal-only. Use of these in regular expressions will now generate, if enabled, a deprecation warning message. The properties are: Other_Alphabetic, Other_Default_Ignorable_Code_Point, Other_Grapheme_Extend, Other_ID_Continue, Other_ID_Start, Other_Lowercase, Other_Math, and Other_Uppercase.
It is now possible to change which Unicode properties Perl understands on a per-installation basis. As mentioned above, certain properties are turned off by default. These include all the Unihan properties (which should be accessible via the CPAN module Unicode::Unihan) and any deprecated or Unicode internal-only property that Perl has never exposed.
The generated files in the lib/unicore/To
directory are now more
clearly marked as being stable, directly usable by applications. New hash
entries in them give the format of the normal entries, which allows for
easier machine parsing. Perl can generate files in this directory for
any property, though most are suppressed. You can find instructions
for changing which are written in perluniprops.
autodie
autodie
is a new lexically-scoped alternative for the Fatal
module.
The bundled version is 2.06_01. Note that in this release, using a string
eval when autodie
is in effect can cause the autodie behaviour to leak
into the surrounding scope. See BUGS in autodie for more details.
Version 2.06_01 has been added to the Perl core.
Compress::Raw::Bzip2
Version 2.024 has been added to the Perl core.
overloading
overloading
allows you to lexically disable or enable overloading
for some or all operations.
Version 0.001 has been added to the Perl core.
parent
parent
establishes an ISA relationship with base classes at compile
time. It provides the key feature of base
without further unwanted
behaviors.
Version 0.223 has been added to the Perl core.
Parse::CPAN::Meta
Version 1.40 has been added to the Perl core.
VMS::DCLsym
Version 1.03 has been added to the Perl core.
VMS::Stdio
Version 2.4 has been added to the Perl core.
XS::APItest::KeywordRPN
Version 0.003 has been added to the Perl core.
base
Upgraded from version 2.13 to 2.15.
bignum
Upgraded from version 0.22 to 0.23.
charnames
charnames
now contains the Unicode NameAliases.txt database file.
This has the effect of adding some extra \N
character names that
formerly wouldn't have been recognised; for example, "\N{LATIN CAPITAL
LETTER GHA}"
.
Upgraded from version 1.06 to 1.07.
constant
Upgraded from version 1.13 to 1.20.
diagnostics
diagnostics
now supports %.0f formatting internally.
diagnostics
no longer suppresses Use of uninitialized value in range
(or flip)
warnings. [perl #71204]
Upgraded from version 1.17 to 1.19.
feature
In feature
, the meaning of the :5.10
and :5.10.X
feature
bundles has changed slightly. The last component, if any (i.e. X
) is
simply ignored. This is predicated on the assumption that new features
will not, in general, be added to maintenance releases. So :5.10
and :5.10.X
have identical effect. This is a change to the behaviour
documented for 5.10.0.
feature
now includes the unicode_strings
feature:
- use feature "unicode_strings";
This pragma turns on Unicode semantics for the case-changing operations
(uc, lc, ucfirst, lcfirst) on strings that don't have the
internal UTF-8 flag set, but that contain single-byte characters between
128 and 255.
Upgraded from version 1.11 to 1.16.
less
less
now includes the stash_name
method to allow subclasses of
less
to pick where in %^H to store their stash.
Upgraded from version 0.02 to 0.03.
lib
Upgraded from version 0.5565 to 0.62.
mro
mro
is now implemented as an XS extension. The documented interface has
not changed. Code relying on the implementation detail that some mro::
methods happened to be available at all times gets to "keep both pieces".
Upgraded from version 1.00 to 1.02.
overload
overload
now allow overloading of 'qr'.
Upgraded from version 1.06 to 1.10.
threads
Upgraded from version 1.67 to 1.75.
threads::shared
Upgraded from version 1.14 to 1.32.
version
version
now has support for Version number formats as described
earlier in this document and in its own documentation.
Upgraded from version 0.74 to 0.82.
warnings
warnings
has a new warnings::fatal_enabled()
function. It also
includes a new illegalproto
warning category. See also New or Changed Diagnostics for this change.
Upgraded from version 1.06 to 1.09.
Archive::Extract
Upgraded from version 0.24 to 0.38.
Archive::Tar
Upgraded from version 1.38 to 1.54.
Attribute::Handlers
Upgraded from version 0.79 to 0.87.
AutoLoader
Upgraded from version 5.63 to 5.70.
B::Concise
Upgraded from version 0.74 to 0.78.
B::Debug
Upgraded from version 1.05 to 1.12.
B::Deparse
Upgraded from version 0.83 to 0.96.
B::Lint
Upgraded from version 1.09 to 1.11_01.
CGI
Upgraded from version 3.29 to 3.48.
Class::ISA
Upgraded from version 0.33 to 0.36.
NOTE: Class::ISA
is deprecated and may be removed from a future
version of Perl.
Compress::Raw::Zlib
Upgraded from version 2.008 to 2.024.
CPAN
Upgraded from version 1.9205 to 1.94_56.
CPANPLUS
Upgraded from version 0.84 to 0.90.
CPANPLUS::Dist::Build
Upgraded from version 0.06_02 to 0.46.
Data::Dumper
Upgraded from version 2.121_14 to 2.125.
DB_File
Upgraded from version 1.816_1 to 1.820.
Devel::PPPort
Upgraded from version 3.13 to 3.19.
Digest
Upgraded from version 1.15 to 1.16.
Digest::MD5
Upgraded from version 2.36_01 to 2.39.
Digest::SHA
Upgraded from version 5.45 to 5.47.
Encode
Upgraded from version 2.23 to 2.39.
Exporter
Upgraded from version 5.62 to 5.64_01.
ExtUtils::CBuilder
Upgraded from version 0.21 to 0.27.
ExtUtils::Command
Upgraded from version 1.13 to 1.16.
ExtUtils::Constant
Upgraded from version 0.2 to 0.22.
ExtUtils::Install
Upgraded from version 1.44 to 1.55.
ExtUtils::MakeMaker
Upgraded from version 6.42 to 6.56.
ExtUtils::Manifest
Upgraded from version 1.51_01 to 1.57.
ExtUtils::ParseXS
Upgraded from version 2.18_02 to 2.21.
File::Fetch
Upgraded from version 0.14 to 0.24.
File::Path
Upgraded from version 2.04 to 2.08_01.
File::Temp
Upgraded from version 0.18 to 0.22.
Filter::Simple
Upgraded from version 0.82 to 0.84.
Filter::Util::Call
Upgraded from version 1.07 to 1.08.
Getopt::Long
Upgraded from version 2.37 to 2.38.
IO
Upgraded from version 1.23_01 to 1.25_02.
IO::Zlib
Upgraded from version 1.07 to 1.10.
IPC::Cmd
Upgraded from version 0.40_1 to 0.54.
IPC::SysV
Upgraded from version 1.05 to 2.01.
Locale::Maketext
Upgraded from version 1.12 to 1.14.
Locale::Maketext::Simple
Upgraded from version 0.18 to 0.21.
Log::Message
Upgraded from version 0.01 to 0.02.
Log::Message::Simple
Upgraded from version 0.04 to 0.06.
Math::BigInt
Upgraded from version 1.88 to 1.89_01.
Math::BigInt::FastCalc
Upgraded from version 0.16 to 0.19.
Math::BigRat
Upgraded from version 0.21 to 0.24.
Math::Complex
Upgraded from version 1.37 to 1.56.
Memoize
Upgraded from version 1.01_02 to 1.01_03.
MIME::Base64
Upgraded from version 3.07_01 to 3.08.
Module::Build
Upgraded from version 0.2808_01 to 0.3603.
Module::CoreList
Upgraded from version 2.12 to 2.29.
Module::Load
Upgraded from version 0.12 to 0.16.
Module::Load::Conditional
Upgraded from version 0.22 to 0.34.
Module::Loaded
Upgraded from version 0.01 to 0.06.
Module::Pluggable
Upgraded from version 3.6 to 3.9.
Net::Ping
Upgraded from version 2.33 to 2.36.
NEXT
Upgraded from version 0.60_01 to 0.64.
Object::Accessor
Upgraded from version 0.32 to 0.36.
Package::Constants
Upgraded from version 0.01 to 0.02.
PerlIO
Upgraded from version 1.04 to 1.06.
Pod::Parser
Upgraded from version 1.35 to 1.37.
Pod::Perldoc
Upgraded from version 3.14_02 to 3.15_02.
Pod::Plainer
Upgraded from version 0.01 to 1.02.
NOTE: Pod::Plainer
is deprecated and may be removed from a future
version of Perl.
Pod::Simple
Upgraded from version 3.05 to 3.13.
Safe
Upgraded from version 2.12 to 2.22.
SelfLoader
Upgraded from version 1.11 to 1.17.
Storable
Upgraded from version 2.18 to 2.22.
Switch
Upgraded from version 2.13 to 2.16.
NOTE: Switch
is deprecated and may be removed from a future version
of Perl.
Sys::Syslog
Upgraded from version 0.22 to 0.27.
Term::ANSIColor
Upgraded from version 1.12 to 2.02.
Term::UI
Upgraded from version 0.18 to 0.20.
Test
Upgraded from version 1.25 to 1.25_02.
Test::Harness
Upgraded from version 2.64 to 3.17.
Test::Simple
Upgraded from version 0.72 to 0.94.
Text::Balanced
Upgraded from version 2.0.0 to 2.02.
Text::ParseWords
Upgraded from version 3.26 to 3.27.
Text::Soundex
Upgraded from version 3.03 to 3.03_01.
Thread::Queue
Upgraded from version 2.00 to 2.11.
Thread::Semaphore
Upgraded from version 2.01 to 2.09.
Tie::RefHash
Upgraded from version 1.37 to 1.38.
Time::HiRes
Upgraded from version 1.9711 to 1.9719.
Time::Local
Upgraded from version 1.18 to 1.1901_01.
Time::Piece
Upgraded from version 1.12 to 1.15.
Unicode::Collate
Upgraded from version 0.52 to 0.52_01.
Unicode::Normalize
Upgraded from version 1.02 to 1.03.
Win32
Upgraded from version 0.34 to 0.39.
Win32API::File
Upgraded from version 0.1001_01 to 0.1101.
XSLoader
Upgraded from version 0.08 to 0.10.
attrs
Removed from the Perl core. Prior version was 1.02.
CPAN::API::HOWTO
Removed from the Perl core. Prior version was 'undef'.
CPAN::DeferedCode
Removed from the Perl core. Prior version was 5.50.
CPANPLUS::inc
Removed from the Perl core. Prior version was 'undef'.
DCLsym
Removed from the Perl core. Prior version was 1.03.
ExtUtils::MakeMaker::bytes
Removed from the Perl core. Prior version was 6.42.
ExtUtils::MakeMaker::vmsish
Removed from the Perl core. Prior version was 6.42.
Stdio
Removed from the Perl core. Prior version was 2.3.
Test::Harness::Assert
Removed from the Perl core. Prior version was 0.02.
Test::Harness::Iterator
Removed from the Perl core. Prior version was 0.02.
Test::Harness::Point
Removed from the Perl core. Prior version was 0.01.
Test::Harness::Results
Removed from the Perl core. Prior version was 0.01.
Test::Harness::Straps
Removed from the Perl core. Prior version was 0.26_01.
Test::Harness::Util
Removed from the Perl core. Prior version was 0.01.
XSSymSet
Removed from the Perl core. Prior version was 1.1.
See Deprecated Modules above.
perlhaiku contains instructions on how to build perl for the Haiku platform.
perlmroapi describes the new interface for pluggable Method Resolution Orders.
perlperf, by Richard Foley, provides an introduction to the use of performance and optimization techniques which can be used with particular reference to perl programs.
perlrepository describes how to access the perl source using the git version control system.
perlpolicy extends the "Social contract about contributed modules" into the beginnings of a document on Perl porting policies.
The various large Changes* files (which listed every change made to perl over the last 18 years) have been removed, and replaced by a small file, also called Changes, which just explains how that same information may be extracted from the git version control system.
Porting/patching.pod has been deleted, as it mainly described interacting with the old Perforce-based repository, which is now obsolete. Information still relevant has been moved to perlrepository.
The syntax unless (EXPR) BLOCK else BLOCK is now documented as valid,
as is the syntax unless (EXPR) BLOCK elsif (EXPR) BLOCK ... else
BLOCK, although actually using the latter may not be the best idea for
the readability of your source code.
Documented -X overloading.
Documented that when()
treats specially most of the filetest operators
Documented when
as a syntax modifier.
Eliminated "Old Perl threads tutorial", which described 5005 threads.
pod/perlthrtut.pod is the same material reworked for ithreads.
Correct previous documentation: v-strings are not deprecated
With version objects, we need them to use MODULE VERSION syntax. This patch removes the deprecation notice.
Security contact information is now part of perlsec.
A significant fraction of the core documentation has been updated to clarify the behavior of Perl's Unicode handling.
Much of the remaining core documentation has been reviewed and edited for clarity, consistent use of language, and to fix the spelling of Tom Christiansen's name.
The Pod specification (perlpodspec) has been updated to bring the
specification in line with modern usage already supported by most Pod
systems. A parameter string may now follow the format name in a
"begin/end" region. Links to URIs with a text description are now
allowed. The usage of L<"section">
has been marked as
deprecated.
if.pm has been documented in use as a means to get
conditional loading of modules despite the implicit BEGIN block around
use.
The documentation for $1
in perlvar.pod has been clarified.
\N{U+code point} is now documented.
A new internal cache means that isa()
will often be faster.
The implementation of C3
Method Resolution Order has been
optimised - linearisation for classes with single inheritance is 40%
faster. Performance for multiple inheritance is unchanged.
Under use locale
, the locale-relevant information is now cached on
read-only values, such as the list returned by keys %hash
. This makes
operations such as sort keys %hash
in the scope of use locale
much faster.
Empty DESTROY
methods are no longer called.
Perl_sv_utf8_upgrade()
is now faster.
keys on empty hash is now faster.
if (%foo)
has been optimized to be faster than if (keys %foo)
.
The string repetition operator ($str x $num
) is now several times
faster when $str
has length one or $num
is large.
Reversing an array to itself (as in @a = reverse @a
) in void context
now happens in-place and is several orders of magnitude faster than
it used to be. It will also preserve non-existent elements whenever
possible, i.e. for non magical arrays or tied arrays with EXISTS
and DELETE
methods.
perlapi, perlintern, perlmodlib and perltoc are now all generated at build time, rather than being shipped as part of the release.
If vendorlib
and vendorarch
are the same, then they are only added
to @INC
once.
$Config{usedevel}
and the C-level PERL_USE_DEVEL
are now defined if
perl is built with -Dusedevel
.
Configure will enable use of -fstack-protector
, to provide protection
against stack-smashing attacks, if the compiler supports it.
Configure will now determine the correct prototypes for re-entrant
functions and for gconvert
if you are using a C++ compiler rather
than a C compiler.
On Unix, if you build from a tree containing a git repository, the
configuration process will note the commit hash you have checked out, for
display in the output of perl -v
and perl -V
. Unpushed local commits
are automatically added to the list of local patches displayed by
perl -V
.
Perl now supports SystemTap's dtrace
compatibility layer and an
issue with linking miniperl
has been fixed in the process.
perldoc now uses less -R
instead of less
for improved behaviour
in the face of groff
's new usage of ANSI escape codes.
perl -V
now reports use of the compile-time options USE_PERL_ATOF
and
USE_ATTRIBUTES_FOR_PERLIO
.
As part of the flattening of ext, all extensions on all platforms are built by make_ext.pl. This replaces the Unix-specific ext/util/make_ext, VMS-specific make_ext.com and Win32-specific win32/buildext.pl.
Each release of Perl sees numerous internal changes which shouldn't affect day to day usage but may still be notable for developers working with Perl's source code.
The J.R.R. Tolkien quotes at the head of C source file have been checked and proper citations added, thanks to a patch from Tom Christiansen.
The internal structure of the dual-life modules traditionally found in the lib/ and ext/ directories in the perl source has changed significantly. Where possible, dual-lifed modules have been extracted from lib/ and ext/.
Dual-lifed modules maintained by Perl's developers as part of the Perl core now live in dist/. Dual-lifed modules maintained primarily on CPAN now live in cpan/. When reporting a bug in a module located under cpan/, please send your bug report directly to the module's bug tracker or author, rather than Perl's bug tracker.
\N{...}
now compiles better, always forces UTF-8 internal representation
Perl's developers have fixed several problems with the recognition of
\N{...}
constructs. As part of this, perl will store any scalar
or regex containing \N{name} or \N{U+code point} in its
definition in UTF-8 format. (This was true previously for all occurrences
of \N{name} that did not use a custom translator, but now it's
always true.)
Perl_magic_setmglob now knows about globs, fixing RT #71254.
SVt_RV
no longer exists. RVs are now stored in IVs.
Perl_vcroak()
now accepts a null first argument. In addition, a full
audit was made of the "not NULL" compiler annotations, and those for
several other internal functions were corrected.
New macros dSAVEDERRNO
, dSAVE_ERRNO
, SAVE_ERRNO
, RESTORE_ERRNO
have been added to formalise the temporary saving of the errno
variable.
The function Perl_sv_insert_flags
has been added to augment
Perl_sv_insert
.
The function Perl_newSV_type(type)
has been added, equivalent to
Perl_newSV()
followed by Perl_sv_upgrade(type)
.
The function Perl_newSVpvn_flags()
has been added, equivalent to
Perl_newSVpvn()
and then performing the action relevant to the flag.
Two flag bits are currently supported.
SVf_UTF8
will call SvUTF8_on()
for you. (Note that this does
not convert an sequence of ISO 8859-1 characters to UTF-8). A wrapper,
newSVpvn_utf8()
is available for this.
SVs_TEMP
now calls Perl_sv_2mortal()
on the new SV.
There is also a wrapper that takes constant strings, newSVpvs_flags()
.
The function Perl_croak_xs_usage
has been added as a wrapper to
Perl_croak
.
Perl now exports the functions PerlIO_find_layer
and PerlIO_list_alloc
.
PL_na
has been exterminated from the core code, replaced by local
STRLEN temporaries, or *_nolen()
calls. Either approach is faster than
PL_na
, which is a pointer dereference into the interpreter structure
under ithreads, and a global variable otherwise.
Perl_mg_free()
used to leave freed memory accessible via SvMAGIC()
on the scalar. It now updates the linked list to remove each piece of
magic as it is freed.
Under ithreads, the regex in PL_reg_curpm
is now reference
counted. This eliminates a lot of hackish workarounds to cope with it
not being reference counted.
Perl_mg_magical()
would sometimes incorrectly turn on SvRMAGICAL()
.
This has been fixed.
The public IV and NV flags are now not set if the string value has trailing "garbage". This behaviour is consistent with not setting the public IV or NV flags if the value is out of range for the type.
Uses of Nullav
, Nullcv
, Nullhv
, Nullop
, Nullsv
etc have
been replaced by NULL
in the core code, and non-dual-life modules,
as NULL
is clearer to those unfamiliar with the core code.
A macro MUTABLE_PTR(p)
has been added, which on (non-pedantic) gcc will
not cast away const
, returning a void *
. Macros MUTABLE_SV(av)
,
MUTABLE_SV(cv)
etc build on this, casting to AV *
etc without
casting away const
. This allows proper compile-time auditing of
const
correctness in the core, and helped picked up some errors
(now fixed).
Macros mPUSHs()
and mXPUSHs()
have been added, for pushing SVs on the
stack and mortalizing them.
Use of the private structure mro_meta
has changed slightly. Nothing
outside the core should be accessing this directly anyway.
A new tool, Porting/expand-macro.pl has been added, that allows you to view how a C preprocessor macro would be expanded when compiled. This is handy when trying to decode the macro hell that is the perl guts.
The core distribution can now run its regression tests in parallel on
Unix-like platforms. Instead of running make test
, set TEST_JOBS
in
your environment to the number of tests to run in parallel, and run
make test_harness
. On a Bourne-like shell, this can be done as
- TEST_JOBS=3 make test_harness # Run 3 tests in parallel
An environment variable is used, rather than parallel make itself, because
TAP::Harness needs to be able to schedule individual non-conflicting test
scripts itself, and there is no standard interface to make
utilities to
interact with their job schedulers.
Note that currently some test scripts may fail when run in parallel (most
notably ext/IO/t/io_dir.t
). If necessary run just the failing scripts
again sequentially and see if the failures go away.
It's now possible to override PERL5OPT
and friends in t/TEST
Several tests that have the potential to hang forever if they fail now
incorporate a "watchdog" functionality that will kill them after a timeout,
which helps ensure that make test
and make test_harness
run to
completion automatically.
Perl's developers have added a number of new tests to the core. In addition to the items listed below, many modules updated from CPAN incorporate new tests.
Significant cleanups to core tests to ensure that language and interpreter features are not used before they're tested.
make test_porting
now runs a number of important pre-commit checks
which might be of use to anyone working on the Perl core.
t/porting/podcheck.t automatically checks the well-formedness of POD found in all .pl, .pm and .pod files in the MANIFEST, other than in dual-lifed modules which are primarily maintained outside the Perl core.
t/porting/manifest.t now tests that all files listed in MANIFEST are present.
t/op/while_readdir.t tests that a bare readdir in while loop sets $_.
t/comp/retainedlines.t checks that the debugger can retain source
lines from eval.
t/io/perlio_fail.t checks that bad layers fail.
t/io/perlio_leaks.t checks that PerlIO layers are not leaking.
t/io/perlio_open.t checks that certain special forms of open work.
t/io/perlio.t includes general PerlIO tests.
t/io/pvbm.t checks that there is no unexpected interaction between
the internal types PVBM
and PVGV
.
t/mro/package_aliases.t checks that mro works properly in the presence of aliased packages.
t/op/index_thr.t tests the interaction of index and threads.
t/op/pat_thr.t tests the interaction of esoteric patterns and threads.
t/op/qr_gc.t tests that qr doesn't leak.
t/op/reg_email_thr.t tests the interaction of regex recursion and threads.
t/op/regexp_qr_embed_thr.t tests the interaction of patterns with
embedded qr// and threads.
t/op/regexp_unicode_prop.t tests Unicode properties in regular expressions.
t/op/regexp_unicode_prop_thr.t tests the interaction of Unicode properties and threads.
t/op/reg_nc_tie.t tests the tied methods of Tie::Hash::NamedCapture
.
t/op/reg_posixcc.t checks that POSIX character classes behave consistently.
t/op/re.t checks that exportable re
functions in universal.c work.
t/op/setpgrpstack.t checks that setpgrp works.
t/op/substr_thr.t tests the interaction of substr and threads.
t/op/upgrade.t checks that upgrading and assigning scalars works.
t/uni/lex_utf8.t checks that Unicode in the lexer works.
t/uni/tie.t checks that Unicode and tie work.
t/comp/final_line_num.t tests whether line numbers are correct at EOF
t/comp/form_scope.t tests format scoping.
t/comp/line_debug.t tests whether @{"_<$file"}
works.
t/op/filetest_t.t tests if -t file test works.
t/op/qr.t tests qr.
t/op/utf8cache.t tests malfunctions of the utf8 cache.
t/re/uniprops.t test unicodes \p{}
regex constructs.
t/op/filehandle.t tests some suitably portable filetest operators to check that they work as expected, particularly in the light of some internal changes made in how filehandles are blessed.
t/op/time_loop.t tests that unix times greater than 2**63
, which
can now be handed to gmtime and localtime, do not cause an internal
overflow or an excessively long loop.
SV allocation tracing has been added to the diagnostics enabled by -Dm
.
The tracing can alternatively output via the PERL_MEM_LOG
mechanism, if
that was enabled when the perl binary was compiled.
Smartmatch resolution tracing has been added as a new diagnostic. Use
-DM
to enable it.
A new debugging flag -DB
now dumps subroutine definitions, leaving
-Dx
for its original purpose of dumping syntax trees.
Perl 5.12 provides a number of new diagnostic messages to help you write better code. See perldiag for details of these new messages.
Bad plugin affecting keyword '%s'
gmtime(%.0f) too large
Lexing code attempted to stuff non-Latin-1 character into Latin-1 input
Lexing code internal error (%s)
localtime(%.0f) too large
Overloaded dereference did not return a reference
Overloaded qr did not return a REGEXP
Perl_pmflag() is deprecated, and will be removed from the XS API
lvalue attribute ignored after the subroutine has been defined
This new warning is issued when one attempts to mark a subroutine as lvalue after it has been defined.
Perl now warns you if ++
or --
are unable to change the value
because it's beyond the limit of representation.
This uses a new warnings category: "imprecision".
Show constant in "Useless use of a constant in void context"
Prototype after '%s'
panic: sv_chop %s
This new fatal error occurs when the C routine Perl_sv_chop()
was
passed a position that is not within the scalar's string buffer. This
could be caused by buggy XS code, and at this point recovery is not
possible.
The fatal error Malformed UTF-8 returned by \N is now produced if the
charnames
handler returns malformed UTF-8.
If an unresolved named character or sequence was encountered when
compiling a regex pattern then the fatal error \N{NAME} must be resolved
by the lexer
is now produced. This can happen, for example, when using a
single-quotish context like $re = '\N{SPACE}'; /$re/;
. See perldiag
for more examples of how the lexer can get bypassed.
Invalid hexadecimal number in \N{U+...}
is a new fatal error
triggered when the character constant represented by ...
is not a
valid hexadecimal number.
The new meaning of \N
as [^\n] is not valid in a bracketed character
class, just like . in a character class loses its special meaning,
and will cause the fatal error \N in a character class must be a named
character: \N{...}.
The rules on what is legal for the ...
in \N{...}
have been
tightened up so that unless the ...
begins with an alphabetic
character and continues with a combination of alphanumerics, dashes,
spaces, parentheses or colons then the warning Deprecated character(s)
in \N{...} starting at '%s' is now issued.
The warning Using just the first characters returned by \N{}
will
be issued if the charnames
handler returns a sequence of characters
which exceeds the limit of the number of characters that can be used. The
message will indicate which characters were used and which were discarded.
A number of existing diagnostic messages have been improved or corrected:
A new warning category illegalproto
allows finer-grained control of
warnings around function prototypes.
The two warnings:
have been moved from the syntax
top-level warnings category into a new
first-level category, illegalproto
. These two warnings are currently
the only ones emitted during parsing of an invalid/illegal prototype,
so one can now use
- no warnings 'illegalproto';
to suppress only those, but not other syntax-related warnings. Warnings
where prototypes are changed, ignored, or not met are still in the
prototype category as before.
Deep recursion on subroutine "%s"
It is now possible to change the depth threshold for this warning from the
default of 100, by recompiling the perl binary, setting the C
pre-processor macro PERL_SUB_DEPTH_WARN
to the desired value.
Illegal character in prototype
warning is now more precise
when reporting illegal characters after _
mro merging error messages are now very similar to those produced by Algorithm::C3.
Amelioration of the error message "Unrecognized character %s in column %d"
Changes the error message to "Unrecognized character %s; marked by <-- HERE after %s<-- HERE near column %d". This should make it a little simpler to spot and correct the suspicious character.
Perl now explicitly points to $.
when it causes an uninitialized
warning for ranges in scalar context.
split now warns when called in void context.
printf-style functions called with too few arguments will now issue the
warning "Missing argument in %s"
[perl #71000]
Perl now properly returns a syntax error instead of segfaulting
if each, keys, or values is used without an argument.
tell() now fails properly if called without an argument and when no
previous file was read.
tell() now returns -1
, and sets errno to EBADF
, thus restoring
the 5.8.x behaviour.
overload
no longer implicitly unsets fallback on repeated 'use
overload' lines.
POSIX::strftime() can now handle Unicode characters in the format string.
The syntax
category was removed from 5 warnings that should only be in
deprecated
.
Three fatal pack/unpack error messages have been normalized to
panic: %s
Unicode character is illegal
has been rephrased to be more accurate
It now reads Unicode non-character is illegal in interchange
and the
perldiag documentation has been expanded a bit.
Currently, all but the first of the several characters that the
charnames
handler may return are discarded when used in a regular
expression pattern bracketed character class. If this happens then the
warning Using just the first character returned by \N{} in character
class
will be issued.
The warning Missing right brace on \N{} or unescaped left brace after
\N. Assuming the latter
will be issued if Perl encounters a \N{
but doesn't find a matching }. In this case Perl doesn't know if it
was mistakenly omitted, or if "match non-newline" followed by "match
a {" was desired. It assumes the latter because that is actually a
valid interpretation as written, unlike the other case. If you meant
the former, you need to add the matching right brace. If you did mean
the latter, you can silence this warning by writing instead \N\{.
gmtime and localtime called with numbers smaller than they can
reliably handle will now issue the warnings gmtime(%.0f) too small
and localtime(%.0f) too small.
The following diagnostic messages have been removed:
Runaway format
Can't locate package %s for the parents of %s
In general this warning it only got produced in conjunction with other warnings, and removing it allowed an ISA lookup optimisation to be added.
v-string in use/require is non-portable
h2ph now looks in include-fixed
too, which is a recent addition
to gcc's search path.
h2xs no longer incorrectly treats enum values like macros.
It also now handles C++ style comments (//
) properly in enums.
perl5db.pl now supports LVALUE
subroutines. Additionally, the
debugger now correctly handles proxy constant subroutines, and
subroutine stubs.
perlbug now uses %Module::CoreList::bug_tracker
to print out
upstream bug tracker URLs. If a user identifies a particular module
as the topic of their bug report and we're able to divine the URL for
its upstream bug tracker, perlbug now provide a message to the user
explaining that the core copies the CPAN version directly, and provide
the URL for reporting the bug directly to the upstream author.
perlbug no longer reports "Message sent" when it hasn't actually sent the message
perlthanks is a new utility for sending non-bug-reports to the authors and maintainers of Perl. Getting nothing but bug reports can become a bit demoralising. If Perl 5.12 works well for you, please try out perlthanks. It will make the developers smile.
Perl's developers have fixed bugs in a2p having to do with the
match()
operator in list context. Additionally, a2p no longer
generates code that uses the $[
variable.
U+0FFFF is now a legal character in regular expressions.
pp_qr now always returns a new regexp SV. Resolves RT #69852.
Instead of returning a(nother) reference to the (pre-compiled) regexp in the optree, use reg_temp_copy() to create a copy of it, and return a reference to that. This resolves issues about Regexp::DESTROY not being called in a timely fashion (the original bug tracked by RT #69852), as well as bugs related to blessing regexps, and of assigning to regexps, as described in correspondence added to the ticket.
It transpires that we also need to undo the SvPVX() sharing when ithreads cloning a Regexp SV, because mother_re is set to NULL, instead of a cloned copy of the mother_re. This change might fix bugs with regexps and threads in certain other situations, but as yet neither tests nor bug reports have indicated any problems, so it might not actually be an edge case that it's possible to reach.
Several compilation errors and segfaults when perl was built with -Dmad
were fixed.
Fixes for lexer API changes in 5.11.2 which broke NYTProf's savesrc option.
-t
should only return TRUE for file handles connected to a TTY
The Microsoft C version of isatty()
returns TRUE for all character mode
devices, including the /dev/null-style "nul" device and printers like
"lpt1".
Fixed a regression caused by commit fafafbaf which caused a panic during parameter passing [perl #70171]
On systems which in-place edits without backup files, -i'*' now works as the documentation says it does [perl #70802]
Saving and restoring magic flags no longer loses readonly flag.
The malformed syntax grep EXPR LIST
(note the missing comma) no longer
causes abrupt and total failure.
Regular expressions compiled with qr{} literals properly set $'
when
matching again.
Using named subroutines with sort should no longer lead to bus errors
[perl #71076]
Numerous bugfixes catch small issues caused by the recently-added Lexer API.
Smart match against @_
sometimes gave false negatives. [perl #71078]
$@
may now be assigned a read-only value (without error or busting
the stack).
sort called recursively from within an active comparison subroutine no
longer causes a bus error if run multiple times. [perl #71076]
Tie::Hash::NamedCapture::* will not abort if passed bad input (RT #71828)
@_ and $_ no longer leak under threads (RT #34342 and #41138, also #70602, #70974)
-I
on shebang line now adds directories in front of @INC
as documented, and as does -I
when specified on the command-line.
kill is now fatal when called on non-numeric process identifiers.
Previously, an undef process identifier would be interpreted as a
request to kill process 0, which would terminate the current process
group on POSIX systems. Since process identifiers are always integers,
killing a non-numeric process is now fatal.
5.10.0 inadvertently disabled an optimisation, which caused a measurable
performance drop in list assignment, such as is often used to assign
function parameters from @_
. The optimisation has been re-instated, and
the performance regression fixed. (This fix is also present in 5.10.1)
Fixed memory leak on while (1) { map 1, 1 }
[RT #53038].
Some potential coredumps in PerlIO fixed [RT #57322,54828].
The debugger now works with lvalue subroutines.
The debugger's m command was broken on modules that defined constants
[RT #61222].
crypt and string complement could return tainted values for untainted
arguments [RT #59998].
The -i
.suffix command-line switch now recreates the file using
restricted permissions, before changing its mode to match the original
file. This eliminates a potential race condition [RT #60904].
On some Unix systems, the value in $?
would not have the top bit set
($? & 128
) even if the child core dumped.
Under some circumstances, $^R
could incorrectly become undefined
[RT #57042].
In the XS API, various hash functions, when passed a pre-computed hash where the key is UTF-8, might result in an incorrect lookup.
XS code including XSUB.h before perl.h gave a compile-time error [RT #57176].
$object->isa('Foo')
would report false if the package Foo
didn't exist, even if the object's @ISA
contained Foo
.
Various bugs in the new-to 5.10.0 mro code, triggered by manipulating
@ISA
, have been found and fixed.
Bitwise operations on references could crash the interpreter, e.g.
$x=\$y; $x |= "foo"
[RT #54956].
Patterns including alternation might be sensitive to the internal UTF-8 representation, e.g.
Within UTF8-encoded Perl source files (i.e. where use utf8
is in
effect), double-quoted literal strings could be corrupted where a \xNN
,
\0NNN or \N{}
is followed by a literal character with ordinal value
greater than 255 [RT #59908].
B::Deparse
failed to correctly deparse various constructs:
readpipe STRING
[RT #62428], CORE::require(STRING)
[RT #62488],
sub foo(_)
[RT #62484].
Using setpgrp with no arguments could corrupt the perl stack.
The block form of eval is now specifically trappable by Safe
and
ops
. Previously it was erroneously treated like string eval.
In 5.10.0, the two characters [~ were sometimes parsed as the smart
match operator (~~
) [RT #63854].
In 5.10.0, the *
quantifier in patterns was sometimes treated as
{0,32767}
[RT #60034, #60464]. For example, this match would fail:
- ("ab" x 32768) =~ /^(ab)*$/
shmget was limited to a 32 bit segment size on a 64 bit OS [RT #63924].
Using next or last to exit a given
block no longer produces a
spurious warning like the following:
- Exiting given via last at foo.pl line 123
Assigning a format to a glob could corrupt the format; e.g.:
- *bar=*foo{FORMAT}; # foo format now bad
Attempting to coerce a typeglob to a string or number could cause an
assertion failure. The correct error message is now generated,
Can't coerce GLOB to $type.
Under use filetest 'access'
, -x
was using the wrong access
mode. This has been fixed [RT #49003].
length on a tied scalar that returned a Unicode value would not be
correct the first time. This has been fixed.
Using an array tie inside in array tie could SEGV. This has been
fixed. [RT #51636]
A race condition inside PerlIOStdio_close()
has been identified and
fixed. This used to cause various threading issues, including SEGVs.
In unpack, the use of ()
groups in scalar context was internally
placing a list on the interpreter's stack, which manifested in various
ways, including SEGVs. This is now fixed [RT #50256].
Magic was called twice in substr, \&$x
, tie $x, $m
and chop.
These have all been fixed.
A 5.10.0 optimisation to clear the temporary stack within the implicit
loop of s///ge has been reverted, as it turned out to be the cause of
obscure bugs in seemingly unrelated parts of the interpreter [commit
ef0d4e17921ee3de].
The line numbers for warnings inside elsif
are now correct.
The ..
operator now works correctly with ranges whose ends are at or
close to the values of the smallest and largest integers.
binmode STDIN, ':raw'
could lead to segmentation faults on some platforms.
This has been fixed [RT #54828].
An off-by-one error meant that index $str, ...
was effectively being
executed as index "$str\0", ...
. This has been fixed [RT #53746].
Various leaks associated with named captures in regexes have been fixed [RT #57024].
A weak reference to a hash would leak. This was affecting DBI
[RT #56908].
Using (?|) in a regex could cause a segfault [RT #59734].
Use of a UTF-8 tr// within a closure could cause a segfault [RT #61520].
Calling Perl_sv_chop()
or otherwise upgrading an SV could result in an
unaligned 64-bit access on the SPARC architecture [RT #60574].
In the 5.10.0 release, inc_version_list
would incorrectly list
5.10.*
after 5.8.*
; this affected the @INC
search order
[RT #67628].
In 5.10.0, pack "a*", $tainted_value
returned a non-tainted value
[RT #52552].
In 5.10.0, printf and sprintf could produce the fatal error
panic: utf8_mg_pos_cache_update
when printing UTF-8 strings
[RT #62666].
In the 5.10.0 release, a dynamically created AUTOLOAD
method might be
missed (method cache issue) [RT #60220,60232].
In the 5.10.0 release, a combination of use feature
and //ee could
cause a memory leak [RT #63110].
-C
on the shebang (#!
) line is once more permitted if it is also
specified on the command line. -C
on the shebang line used to be a
silent no-op if it was not also on the command line, so perl 5.10.0
disallowed it, which broke some scripts. Now perl checks whether it is
also on the command line and only dies if it is not [RT #67880].
In 5.10.0, certain types of re-entrant regular expression could crash, or cause the following assertion failure [RT #60508]:
- Assertion rx->sublen >= (s - rx->subbeg) + i failed
Perl now includes previously missing files from the Unicode Character Database.
Perl now honors TMPDIR
when opening an anonymous temporary file.
Perl is incredibly portable. In general, if a platform has a C compiler, someone has ported Perl to it (or will soon). We're happy to announce that Perl 5.12 includes support for several new platforms. At the same time, it's time to bid farewell to some (very) old friends.
Perl's developers have merged patches from Haiku's maintainers. Perl should now build on Haiku.
Perl should now build on MirOS BSD.
Removed libbsd for AIX 5L and 6.1. Only flock() was used from
libbsd.
Removed libgdbm for AIX 5L and 6.1 if libgdbm < 1.8.3-5 is installed. The libgdbm is delivered as an optional package with the AIX Toolbox. Unfortunately the versions below 1.8.3-5 are broken.
Hints changes mean that AIX 4.2 should work again.
Perl now supports IPv6 on Cygwin 1.7 and newer.
On Cygwin we now strip the last number from the DLL. This has been the behaviour in the cygwin.com build for years. The hints files have been updated.
Skip testing the be_BY.CP1131 locale on Darwin 10 (Mac OS X 10.6), as it's still buggy.
Correct infelicities in the regexp used to identify buggy locales on Darwin 8 and 9 (Mac OS X 10.4 and 10.5, respectively).
Fix thread library selection [perl #69686]
The hints files now identify the correct threading libraries on FreeBSD 7 and later.
We now work around a bizarre preprocessor bug in the Irix 6.5 compiler:
cc -E -
unfortunately goes into K&R mode, but cc -E file.c
doesn't.
Hints now supports versions 5.*.
-UDEBUGGING
is now the default on VMS.
Like it has been everywhere else for ages and ages. Also make command-line selection of -UDEBUGGING and -DDEBUGGING work in configure.com; before the only way to turn it off was by saying no in answer to the interactive question.
The default pipe buffer size on VMS has been updated to 8192 on 64-bit systems.
Reads from the in-memory temporary files of PerlIO::scalar
used to fail
if $/
was set to a numeric reference (to indicate record-style reads).
This is now fixed.
VMS now supports getgrgid.
Many improvements and cleanups have been made to the VMS file name handling and conversion code.
Enabling the PERL_VMS_POSIX_EXIT
logical name now encodes a POSIX exit
status in a VMS condition value for better interaction with GNV's bash
shell and other utilities that depend on POSIX exit values. See
$? in perlvms for details.
File::Copy
now detects Unix compatibility mode on VMS.
Various changes from Stratus have been merged in.
There is now support for Symbian S60 3.2 SDK and S60 5.0 SDK.
Perl 5.12 supports Windows 2000 and later. The supporting code for legacy versions of Windows is still included, but will be removed during the next development cycle.
Initial support for building Perl with MinGW-w64 is now available.
perl.exe now includes a manifest resource to specify the trustInfo
settings for Windows Vista and later. Without this setting Windows
would treat perl.exe as a legacy application and apply various
heuristics like redirecting access to protected file system areas
(like the "Program Files" folder) to the users "VirtualStore"
instead of generating a proper "permission denied" error.
The manifest resource also requests the Microsoft Common-Controls version 6.0 (themed controls introduced in Windows XP). Check out the Win32::VisualStyles module on CPAN to switch back to old style unthemed controls for legacy applications.
The -t
filetest operator now only returns true if the filehandle
is connected to a console window. In previous versions of Perl it
would return true for all character mode devices, including NUL
and LPT1.
The -p
filetest operator now works correctly, and the
Fcntl::S_IFIFO constant is defined when Perl is compiled with
Microsoft Visual C. In previous Perl versions -p
always
returned a false value, and the Fcntl::S_IFIFO constant
was not defined.
This bug is specific to Microsoft Visual C and never affected Perl binaries built with MinGW.
The socket error codes are now more widely supported: The POSIX module will define the symbolic names, like POSIX::EWOULDBLOCK, and stringification of socket error codes in $! works as well now;
- C:\>perl -MPOSIX -E "$!=POSIX::EWOULDBLOCK; say $!"
- A non-blocking socket operation could not be completed immediately.
flock() will now set sensible error codes in $!. Previous Perl versions copied the value of $^E into $!, which caused much confusion.
select() now supports all empty fd_set
s more correctly.
'.\foo'
and '..\foo'
were treated differently than
'./foo'
and '../foo'
by do and require [RT #63492].
Improved message window handling means that alarm and kill messages
will no longer be dropped under race conditions.
Various bits of Perl's build infrastructure are no longer converted to win32 line endings at release time. If this hurts you, please report the problem with the perlbug program included with perl.
This is a list of some significant unfixed bugs, which are regressions from either 5.10.x or 5.8.x.
Some CPANPLUS tests may fail if there is a functioning file ../../cpanp-run-perl outside your build directory. The failure shouldn't imply there's a problem with the actual functional software. The bug is already fixed in [RT #74188] and is scheduled for inclusion in perl-v5.12.1.
List::Util::first
misbehaves in the presence of a lexical $_
(typically introduced by my $_
or implicitly by given
). The variable
which gets set for each iteration is the package variable $_
, not the
lexical $_
[RT #67694].
A similar issue may occur in other modules that provide functions which take a block as their first argument, like
- foo { ... $_ ...} list
Some regexes may run much more slowly when run in a child thread compared with the thread the pattern was compiled into [RT #55600].
Things like "\N{LATIN SMALL LIGATURE FF}" =~ /\N{LATIN SMALL LETTER F}+/
will appear to hang as they get into a very long running loop [RT #72998].
Several porters have reported mysterious crashes when Perl's entire test suite is run after a build on certain Windows 2000 systems. When run by hand, the individual tests reportedly work fine.
This one is actually a change introduced in 5.10.0, but it was missed from that release's perldelta, so it is mentioned here instead.
A bugfix related to the handling of the /m modifier and qr resulted
in a change of behaviour between 5.8.x and 5.10.0:
- # matches in 5.8.x, doesn't match in 5.10.0
- $re = qr/^bar/; "foo\nbar" =~ /$re/m;
Perl 5.12.0 represents approximately two years of development since Perl 5.10.0 and contains over 750,000 lines of changes across over 3,000 files from over 200 authors and committers.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.12.0:
Aaron Crane, Abe Timmerman, Abhijit Menon-Sen, Abigail, Adam Russell, Adriano Ferreira, Ævar Arnfjörð Bjarmason, Alan Grover, Alexandr Ciornii, Alex Davies, Alex Vandiver, Andreas Koenig, Andrew Rodland, andrew@sundale.net, Andy Armstrong, Andy Dougherty, Jose AUGUSTE-ETIENNE, Benjamin Smith, Ben Morrow, bharanee rathna, Bo Borgerson, Bo Lindbergh, Brad Gilbert, Bram, Brendan O'Dea, brian d foy, Charles Bailey, Chip Salzenberg, Chris 'BinGOs' Williams, Christoph Lamprecht, Chris Williams, chromatic, Claes Jakobsson, Craig A. Berry, Dan Dascalescu, Daniel Frederick Crisman, Daniel M. Quinlan, Dan Jacobson, Dan Kogai, Dave Mitchell, Dave Rolsky, David Cantrell, David Dick, David Golden, David Mitchell, David M. Syzdek, David Nicol, David Wheeler, Dennis Kaarsemaker, Dintelmann, Peter, Dominic Dunlop, Dr.Ruud, Duke Leto, Enrico Sorcinelli, Eric Brine, Father Chrysostomos, Florian Ragwitz, Frank Wiegand, Gabor Szabo, Gene Sullivan, Geoffrey T. Dairiki, George Greer, Gerard Goossen, Gisle Aas, Goro Fuji, Graham Barr, Green, Paul, Hans Dieter Pearcey, Harmen, H. Merijn Brand, Hugo van der Sanden, Ian Goodacre, Igor Sutton, Ingo Weinhold, James Bence, James Mastros, Jan Dubois, Jari Aalto, Jarkko Hietaniemi, Jay Hannah, Jerry Hedden, Jesse Vincent, Jim Cromie, Jody Belka, John E. Malmberg, John Malmberg, John Peacock, John Peacock via RT, John P. Linderman, John Wright, Josh ben Jore, Jos I. Boumans, Karl Williamson, Kenichi Ishigaki, Ken Williams, Kevin Brintnall, Kevin Ryde, Kurt Starsinic, Leon Brocard, Lubomir Rintel, Luke Ross, Marcel Grünauer, Marcus Holland-Moritz, Mark Jason Dominus, Marko Asplund, Martin Hasch, Mashrab Kuvatov, Matt Kraai, Matt S Trout, Max Maischein, Michael Breen, Michael Cartmell, Michael G Schwern, Michael Witten, Mike Giroux, Milosz Tanski, Moritz Lenz, Nicholas Clark, Nick Cleaton, Niko Tyni, Offer Kaye, Osvaldo Villalon, Paul Fenwick, Paul Gaborit, Paul Green, Paul Johnson, Paul Marquess, Philip Hazel, Philippe Bruhat, Rafael Garcia-Suarez, Rainer Tammer, Rajesh Mandalemula, Reini Urban, Renée Bäcker, Ricardo Signes, Ricardo SIGNES, Richard Foley, Rich Rauenzahn, Rick Delaney, Risto Kankkunen, Robert May, Roberto C. Sanchez, Robin Barker, SADAHIRO Tomoyuki, Salvador Ortiz Garcia, Sam Vilain, Scott Lanning, Sébastien Aperghis-Tramoni, Sérgio Durigan Júnior, Shlomi Fish, Simon 'corecode' Schubert, Sisyphus, Slaven Rezic, Smylers, Steffen Müller, Steffen Ullrich, Stepan Kasal, Steve Hay, Steven Schubiger, Steve Peters, Tels, The Doctor, Tim Bunce, Tim Jenness, Todd Rinaldo, Tom Christiansen, Tom Hukins, Tom Wyant, Tony Cook, Torsten Schoenfeld, Tye McQueen, Vadim Konovalov, Vincent Pit, Hio YAMASHINA, Yasuhiro Matsumoto, Yitzchak Scott-Thoennes, Yuval Kogman, Yves Orton, Zefram, Zsban Ambrus
This is woefully incomplete as it's automatically generated from version
control history. In particular, it doesn't include the names of the
(very much appreciated) contributors who reported issues in previous
versions of Perl that helped make Perl 5.12.0 better. For a more complete
list of all of Perl's historical contributors, please see the AUTHORS
file in the Perl 5.12.0 distribution.
Our "retired" pumpkings Nicholas Clark and Rafael Garcia-Suarez deserve special thanks for their brilliant and substantive ongoing contributions. Nicholas personally authored over 30% of the patches since 5.10.0. Rafael comes in second in patch authorship with 11%, but is first by a long shot in committing patches authored by others, pushing 44% of the commits since 5.10.0 in this category, often after providing considerable coaching to the patch authors. These statistics in no way comprise all of their contributions, but express in shorthand that we couldn't have done it without them.
Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/. There may also be information at http://www.perl.org/, the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analyzed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
http://dev.perl.org/perl5/errata.html for a list of issues found after this release, as well as a list of CPAN modules known to be incompatible with this release.
perl5121delta - what is new for perl v5.12.1
This document describes differences between the 5.12.0 release and the 5.12.1 release.
If you are upgrading from an earlier release such as 5.10.1, first read perl5120delta, which describes differences between 5.10.1 and 5.12.0.
There are no changes intentionally incompatible with 5.12.0. If any incompatibilities with 5.12.0 exist, they are bugs. Please report them.
Other than the bug fixes listed below, there should be no user-visible changes to the core language in this release.
We fixed exporting of is_strict
and is_lax
from version.
These were being exported with a wrapper that treated them as method calls, which caused them to fail. They are just functions, are documented as such, and should never be subclassed, so this patch just exports them directly as functions without the wrapper.
We upgraded CGI.pm to version 3.49 to incorporate fixes for regressions introduced in the release we shipped with Perl 5.12.0.
We upgraded Pod::Simple to version 3.14 to get an improvement to \C\<\< \>\> parsing.
We made a small fix to the CPANPLUS test suite to fix an occasional spurious test failure.
We upgraded Safe to version 2.27 to wrap coderefs returned by reval()
and rdo()
.
We added the new maintenance release policy to perlpolicy.pod
We've clarified the multiple-angle-bracket construct in the spec for POD in perlpodspec
We added a missing explanation for a warning about :=
to perldiag.pod
We removed a false claim in perlunitut that all text strings are Unicode strings in Perl.
We updated the Github mirror link in perlrepository to mirrors/perl, not github/perl
We fixed a a minor error in perl5114delta.pod.
We replaced a mention of the now-obsolete Switch.pm with given/when.
We improved documentation about $sitelibexp/sitecustomize.pl in perlrun.
We corrected perlmodlib.pod which had unintentionally omitted a number of modules.
We updated the documentation for 'require' in perlfunc.pod relating to putting Perl code in @INC.
We reinstated some erroneously-removed documentation about quotemeta in perlfunc.
We fixed an a2p example in perlutil.pod.
We filled in a blank in perlport.pod with the release date of Perl 5.12.
We fixed broken links in a number of perldelta files.
The documentation for Carp.pm incorrectly stated that the $Carp::Verbose variable makes cluck generate stack backtraces.
We fixed a number of typos in Pod::Functions
We improved documentation of case-changing functions in perlfunc.pod
We corrected perlgpl.pod to contain the correct version of the GNU General Public License.
t/op/sselect.t is now less prone to clock jitter during timing checks on Windows.
sleep() time on Win32 may be rounded down to multiple of the clock tick interval.
lib/blib.t and lib/locale.t: Fixes for test failures on Darwin/PPC
perl5db.t: Fix for test failures when Term::ReadLine::Gnu
is installed.
We updated INSTALL with notes about how to deal with broken dbm.h on OpenSUSE (and possibly other platforms)
A bug in how we process filetest operations could cause a segfault. Filetests don't always expect an op on the stack, so we now use TOPs only if we're sure that we're not stat'ing the _ filehandle. This is indicated by OPf_KIDS (as checked in ck_ftst).
See also: http://rt.perl.org/rt3/Public/Bug/Display.html?id=74542
When deparsing a nextstate op that has both a change of package (relative to the previous nextstate) and a label, the package declaration is now emitted first, because it is syntactically impermissible for a label to prefix a package declaration.
XSUB.h now correctly redefines fgets under PERL_IMPLICIT_SYS
See also: http://rt.cpan.org/Public/Bug/Display.html?id=55049
utf8::is_utf8 now respects GMAGIC (e.g. $1)
XS code using fputc()
or fputs()
: on Windows could cause an error
due to their arguments being swapped.
See also: http://rt.perl.org/rt3/Public/Bug/Display.html?id=72704
We fixed a small bug in lex_stuff_pvn() that caused spurious syntax errors in an obscure situation. It happened when stuffing was performed on the last line of a file and the line ended with a statement that lacked a terminating semicolon.
See also: http://rt.perl.org/rt3/Public/Bug/Display.html?id=74006
We fixed a bug that could cause \N{} constructs followed by a single . to be parsed incorrectly.
See also: http://rt.perl.org/rt3/Public/Bug/Display.html?id=74978
We fixed a bug that caused when(scalar) without an argument not to be treated as a syntax error.
See also: http://rt.perl.org/rt3/Public/Bug/Display.html?id=74114
We fixed a regression in the handling of labels immediately before string evals that was introduced in Perl 5.12.0.
See also: http://rt.perl.org/rt3/Public/Bug/Display.html?id=74290
We fixed a regression in case-insensitive matching of folded characters in regular expressions introduced in Perl 5.10.1.
See also: http://rt.perl.org/rt3/Public/Bug/Display.html?id=72998
Perl now allows -Duse64bitint without promoting to use64bitall on HP-UX
Perl now builds on AIX 4.2
The changes required work around AIX 4.2s' lack of support for IPv6,
and limited support for POSIX sigaction()
.
FreeBSD 7 no longer contains /usr/bin/objformat. At build time, Perl now skips the objformat check for versions 7 and higher and assumes ELF.
It's now possible to build extensions on older (pre 7.3-2) VMS systems.
DCL symbol length was limited to 1K up until about seven years or so ago, but there was no particularly deep reason to prevent those older systems from configuring and building Perl.
We fixed the previously-broken -Uuseperlio
build on VMS.
We were checking a variable that doesn't exist in the non-default case of disabling perlio. Now we only look at it when it exists.
We fixed the -Uuseperlio command-line option in configure.com.
Formerly it only worked if you went through all the questions interactively and explicitly answered no.
List::Util::first
misbehaves in the presence of a lexical $_
(typically introduced by my $_
or implicitly by given
). The variable
which gets set for each iteration is the package variable $_
, not the
lexical $_
.
A similar issue may occur in other modules that provide functions which take a block as their first argument, like
- foo { ... $_ ...} list
See also: http://rt.perl.org/rt3/Public/Bug/Display.html?id=67694
Module::Load::Conditional
and version
have an unfortunate
interaction which can cause CPANPLUS
to crash when it encounters
an unparseable version string. Upgrading to CPANPLUS
0.9004 or
Module::Load::Conditional
0.38 from CPAN will resolve this issue.
Perl 5.12.1 represents approximately four weeks of development since Perl 5.12.0 and contains approximately 4,000 lines of changes across 142 files from 28 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.12.1:
Ævar Arnfjörð Bjarmason, Chris Williams, chromatic, Craig A. Berry, David Golden, Father Chrysostomos, Florian Ragwitz, Frank Wiegand, Gene Sullivan, Goro Fuji, H.Merijn Brand, James E Keenan, Jan Dubois, Jesse Vincent, Josh ben Jore, Karl Williamson, Leon Brocard, Michael Schwern, Nga Tang Chan, Nicholas Clark, Niko Tyni, Philippe Bruhat, Rafael Garcia-Suarez, Ricardo Signes, Steffen Mueller, Todd Rinaldo, Vincent Pit and Zefram.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5122delta - what is new for perl v5.12.2
This document describes differences between the 5.12.1 release and the 5.12.2 release.
If you are upgrading from an earlier major version, such as 5.10.1, first read perl5120delta, which describes differences between 5.10.1 and 5.12.0, as well as perl5121delta, which describes earlier changes in the 5.12 stable release series.
There are no changes intentionally incompatible with 5.12.1. If any exist, they are bugs and reports are welcome.
Other than the bug fixes listed below, there should be no user-visible changes to the core language in this release.
This release does not introduce any new modules or pragmata.
In the previous release, no VERSION; statements triggered a bug
which could cause feature bundles to be loaded and strict mode to
be enabled unintentionally.
Carp
Upgraded from version 1.16 to 1.17.
Carp now detects incomplete caller EXPR
overrides and avoids using bogus @DB::args
. To provide backtraces, Carp
relies on particular behaviour of the caller built-in. Carp now detects
if other code has overridden this with an incomplete implementation, and
modifies its backtrace accordingly. Previously incomplete overrides would
cause incorrect values in backtraces (best case), or obscure fatal errors
(worst case)
This fixes certain cases of Bizarre copy of ARRAY
caused by modules
overriding caller() incorrectly.
CPANPLUS
A patch to cpanp-run-perl has been backported from CPANPLUS 0.9004
. This
resolves RT #55964
and RT #57106, both
of which related to failures to install distributions that use
Module::Install::DSL
.
File::Glob
A regression which caused a failure to find CORE::GLOBAL::glob
after
loading File::Glob
to crash has been fixed. Now, it correctly falls back
to external globbing via pp_glob
.
File::Copy
File::Copy::copy(FILE, DIR)
is now documented.
File::Spec
Upgraded from version 3.31 to 3.31_01.
Several portability fixes were made in File::Spec::VMS
: a colon is now
recognized as a delimiter in native filespecs; caret-escaped delimiters are
recognized for better handling of extended filespecs; catpath()
returns
an empty directory rather than the current directory if the input directory
name is empty; abs2rel()
properly handles Unix-style input.
perlbug now always gives the reporter a chance to change the email address it guesses for them.
perlbug should no longer warn about uninitialized values when using the -d
and -v
options.
The existing policy on backward-compatibility and deprecation has been added to perlpolicy, along with definitions of terms like deprecation.
srand's usage has been clarified.
The entry for die was reorganized to emphasize its role in the exception mechanism.
Perl's INSTALL file has been clarified to explicitly state that Perl requires a C89 compliant ANSI C Compiler.
IO::Socket's getsockopt() and setsockopt() have been documented.
alarm()'s inability to interrupt blocking IO on Windows has been documented.
Math::TrulyRandom hasn't been updated since 1996 and has been removed as a recommended solution for random number generation.
perlrun has been updated to clarify the behaviour of octal flags to perl.
To ease user confusion, $#
and $*
, two special variables that were
removed in earlier versions of Perl have been documented.
The version of perlfaq shipped with the Perl core has been updated from the
official FAQ version, which is now maintained in the briandfoy/perlfaq
branch of the Perl repository at git://perl5.git.perl.org/perl.git.
The d_u32align
configuration probe on ARM has been fixed.
An "incompatible operand types
" error in ternary expressions when building
with clang
has been fixed.
Perl now skips setuid File::Copy
tests on partitions it detects to be mounted
as nosuid
.
A possible segfault in the T_PRTOBJ
default typemap has been fixed.
A possible memory leak when using caller EXPR to set
@DB::args
has been fixed.
Several memory leaks when loading XS modules were fixed.
unpack() now handles scalar context correctly for %32H
and %32u
,
fixing a potential crash. split() would crash because the third item
on the stack wasn't the regular expression it expected. unpack("%2H",
...)
would return both the unpacked result and the checksum on the stack,
as would unpack("%2u", ...)
.
[perl #73814]
Perl now avoids using memory after calling free()
in pp_require
when there are CODEREFs in @INC
.
A bug that could cause "Unknown error
" messages when
"call_sv(code, G_EVAL)
" is called from an XS destructor has been fixed.
The implementation of the open $fh, '>' \$buffer
feature
now supports get/set magic and thus tied buffers correctly.
The pp_getc
, pp_tell
, and pp_eof
opcodes now make room on the
stack for their return values in cases where no argument was passed in.
When matching unicode strings under some conditions inappropriate backtracking would
result in a Malformed UTF-8 character (fatal) error. This should no longer occur.
See [perl #75680]
README.aix has been updated with information about the XL C/C++ V11 compiler suite.
When building Perl with the mingw64 x64 cross-compiler incpath
,
libpth
, ldflags
, lddlflags
and ldflags_nolargefiles
values
in Config.pm and Config_heavy.pl were not previously being set
correctly because, with that compiler, the include and lib directories
are not immediately below $(CCHOME).
git_version.h is now installed on VMS. This was an oversight in v5.12.0 which caused some extensions to fail to build.
Several memory leaks in stat FILEHANDLE have been fixed.
A memory leak in Perl_rename()
due to a double allocation has been
fixed.
A memory leak in vms_fid_to_name()
(used by realpath()
and
realname()
) has been fixed.
Perl 5.12.2 represents approximately three months of development since Perl 5.12.1 and contains approximately 2,000 lines of changes across 100 files from 36 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.12.2:
Abigail, Ævar Arnfjörð Bjarmason, Ben Morrow, brian d foy, Brian Phillips, Chas. Owens, Chris 'BinGOs' Williams, Chris Williams, Craig A. Berry, Curtis Jewell, Dan Dascalescu, David Golden, David Mitchell, Father Chrysostomos, Florian Ragwitz, George Greer, H.Merijn Brand, Jan Dubois, Jesse Vincent, Jim Cromie, Karl Williamson, Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯, Leon Brocard, Maik Hentsche, Matt S Trout, Nicholas Clark, Rafael Garcia-Suarez, Rainer Tammer, Ricardo Signes, Salvador Ortiz Garcia, Sisyphus, Slaven Rezic, Steffen Mueller, Tony Cook, Vincent Pit and Yves Orton.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5123delta - what is new for perl v5.12.3
This document describes differences between the 5.12.2 release and the 5.12.3 release.
If you are upgrading from an earlier release such as 5.12.1, first read perl5122delta, which describes differences between 5.12.1 and 5.12.2. The major changes made in 5.12.0 are described in perl5120delta.
- There are no changes intentionally incompatible with 5.12.2. If any
- exist, they are bugs and reports are welcome.
keys, values work on arraysYou can now use the keys, values, each builtin functions on arrays
(previously you could only use them on hashes). See perlfunc for details.
This is actually a change introduced in perl 5.12.0, but it was missed from
that release's perldelta.
"no VERSION" will now correctly deparse with B::Deparse, as will certain constant expressions.
Module::Build should be more reliably pass its tests under cygwin.
Lvalue subroutines are again able to return copy-on-write scalars. This had been broken since version 5.10.0.
A separate DTrace is now build for miniperl, which means that perl can be compiled with -Dusedtrace on Solaris again.
A number of regressions on VMS have been fixed. In addition to minor cleanup of questionable expressions in vms.c, file permissions should no longer be garbled by the PerlIO layer, and spurious record boundaries should no longer be introduced by the PerlIO layer during output.
For more details and discussion on the latter, see:
- http://www.nntp.perl.org/group/perl.vmsperl/2010/11/msg15419.html
A few very small changes were made to the build process on VOS to better support the platform. Longer-than-32-character filenames are now supported on OpenVOS, and build properly without IPv6 support.
Perl 5.12.3 represents approximately four months of development since Perl 5.12.2 and contains approximately 2500 lines of changes across 54 files from 16 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.12.3:
Craig A. Berry, David Golden, David Leadbeater, Father Chrysostomos, Florian Ragwitz, Jesse Vincent, Karl Williamson, Nick Johnston, Nicolas Kaiser, Paul Green, Rafael Garcia-Suarez, Rainer Tammer, Ricardo Signes, Steffen Mueller, Zsbán Ambrus, Ævar Arnfjörð Bjarmason
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5124delta - what is new for perl v5.12.4
This document describes differences between the 5.12.3 release and the 5.12.4 release.
If you are upgrading from an earlier release such as 5.12.2, first read perl5123delta, which describes differences between 5.12.2 and 5.12.3. The major changes made in 5.12.0 are described in perl5120delta.
There are no changes intentionally incompatible with 5.12.3. If any exist, they are bugs and reports are welcome.
When strict "refs" mode is off, %{...}
in rvalue context returns
undef if its argument is undefined. An optimisation introduced in Perl
5.12.0 to make keys %{...}
faster when used as a boolean did not take
this into account, causing keys %{+undef}
(and keys %$foo
when
$foo
is undefined) to be an error, which it should be so in strict
mode only [perl #81750].
lc, uc, lcfirst, and ucfirst no longer return untainted strings
when the argument is tainted. This has been broken since perl 5.8.9
[perl #87336].
Fixed a case where it was possible that a freed buffer may have been read from when parsing a here document.
Module::CoreList has been upgraded from version 2.43 to 2.50.
The cpan/CGI/t/http.t test script has been fixed to work when the environment has HTTPS_* environment variables, such as HTTPS_PROXY.
Updated the documentation for rand() in perlfunc to note that it is not cryptographically secure.
Perl 5.12.4 represents approximately 5 months of development since Perl 5.12.3 and contains approximately 200 lines of changes across 11 files from 8 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.12.4:
Andy Dougherty, David Golden, David Leadbeater, Father Chrysostomos, Florian Ragwitz, Jesse Vincent, Leon Brocard, Zsbán Ambrus.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5125delta - what is new for perl v5.12.5
This document describes differences between the 5.12.4 release and the 5.12.5 release.
If you are upgrading from an earlier release such as 5.12.3, first read perl5124delta, which describes differences between 5.12.3 and 5.12.4.
Encode
decode_xs n-byte heap-overflow (CVE-2011-2939)A bug in Encode
could, on certain inputs, cause the heap to overflow.
This problem has been corrected. Bug reported by Robert Zacek.
File::Glob::bsd_glob()
memory error with GLOB_ALTDIRFUNC (CVE-2011-2728).Calling File::Glob::bsd_glob
with the unsupported flag GLOB_ALTDIRFUNC would
cause an access violation / segfault. A Perl program that accepts a flags value from
an external source could expose itself to denial of service or arbitrary code
execution attacks. There are no known exploits in the wild. The problem has been
corrected by explicitly disabling all unsupported flags and setting unused function
pointers to null. Bug reported by Clément Lecigne.
Poorly written perl code that allows an attacker to specify the count to perl's 'x' string repeat operator can already cause a memory exhaustion denial-of-service attack. A flaw in versions of perl before 5.15.5 can escalate that into a heap buffer overrun; coupled with versions of glibc before 2.16, it possibly allows the execution of arbitrary code.
This problem has been fixed.
There are no changes intentionally incompatible with 5.12.4. If any exist, they are bugs and reports are welcome.
B::Concise no longer produces mangled output with the -tree option [perl #80632].
A regression introduced in Perl 5.8.8 has been fixed, that caused
charnames::viacode(0)
to return undef instead of the string "NULL"
[perl #72624].
See Security.
See Security.
The documentation for the upper
function now actually says "upper", not
"lower".
Module::CoreList has been updated to version 2.50_02 to add data for this release.
The perlebcdic document contains a helpful table to use in tr/// to
convert between EBCDIC and Latin1/ASCII. Unfortunately, the table was the
inverse of the one it describes. This has been corrected.
The section on User-Defined Case Mappings had some bad markup and unclear sentences, making parts of it unreadable. This has been rectified.
This document has been corrected to take non-ASCII platforms into account.
There have been configuration and test fixes to make Perl build cleanly on Lion and Mountain Lion.
The NetBSD hints file was corrected to be compatible with NetBSD 6.*
chop now correctly handles characters above "\x{7fffffff}"
[perl #73246].
($<,$>) = (...)
stopped working properly in 5.12.0. It is supposed
to make a single setreuid()
call, rather than calling setruid()
and
seteuid()
separately. Consequently it did not work properly. This has
been fixed [perl #75212].
Fixed a regression of kill() when a match variable is used for the process ID to kill [perl #75812].
UNIVERSAL::VERSION
no longer leaks memory. It started leaking in Perl
5.10.0.
The C-level my_strftime
functions no longer leaks memory. This fixes a
memory leak in POSIX::strftime
[perl #73520].
caller no longer leaks memory when called from the DB package if
@DB::args
was assigned to after the first call to caller. Carp
was triggering this bug [perl #97010].
Passing to index an offset beyond the end of the string when the string
is encoded internally in UTF8 no longer causes panics [perl #75898].
Syntax errors in (?{...}) blocks in regular expressions no longer
cause panic messages [perl #2353].
Perl 5.10.0 introduced some faulty logic that made "U*" in the middle of a pack template equivalent to "U0" if the input string was empty. This has been fixed [perl #90160].
@_
split() no longer modifies @_
when called in scalar or void context.
In void context it now produces a "Useless use of split" warning.
This is actually a change introduced in perl 5.12.0, but it was missed from
that release's perl5120delta.
Perl 5.12.5 represents approximately 17 months of development since Perl 5.12.4 and contains approximately 1,900 lines of changes across 64 files from 18 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.12.5:
Andy Dougherty, Chris 'BinGOs' Williams, Craig A. Berry, David Mitchell, Dominic Hargreaves, Father Chrysostomos, Florian Ragwitz, George Greer, Goro Fuji, Jesse Vincent, Karl Williamson, Leon Brocard, Nicholas Clark, Rafael Garcia-Suarez, Reini Urban, Ricardo Signes, Steve Hay, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.
Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5140delta - what is new for perl v5.14.0
This document describes differences between the 5.12.0 release and the 5.14.0 release.
If you are upgrading from an earlier release such as 5.10.0, first read perl5120delta, which describes differences between 5.10.0 and 5.12.0.
Some of the bug fixes in this release have been backported to subsequent releases of 5.12.x. Those are indicated with the 5.12.x version in parentheses.
As described in perlpolicy, the release of Perl 5.14.0 marks the official end of support for Perl 5.10. Users of Perl 5.10 or earlier should consider upgrading to a more recent release of Perl.
Perl comes with the Unicode 6.0 data base updated with Corrigendum #8, with one exception noted below. See http://unicode.org/versions/Unicode6.0.0/ for details on the new release. Perl does not support any Unicode provisional properties, including the new ones for this release.
Unicode 6.0 has chosen to use the name BELL
for the character at U+1F514,
which is a symbol that looks like a bell, and is used in Japanese cell
phones. This conflicts with the long-standing Perl usage of having
BELL
mean the ASCII BEL
character, U+0007. In Perl 5.14,
\N{BELL}
continues to mean U+0007, but its use generates a
deprecation warning message unless such warnings are turned off. The
new name for U+0007 in Perl is ALERT
, which corresponds nicely
with the existing shorthand sequence for it, "\a"
. \N{BEL}
means U+0007, with no warning given. The character at U+1F514 has no
name in 5.14, but can be referred to by \N{U+1F514}
.
In Perl 5.16, \N{BELL}
will refer to U+1F514; all code
that uses \N{BELL}
should be converted to use \N{ALERT}
,
\N{BEL}
, or "\a"
before upgrading.
use feature 'unicode_strings'
This release provides full functionality for use feature
'unicode_strings'
. Under its scope, all string operations executed and
regular expressions compiled (even if executed outside its scope) have
Unicode semantics. See the 'unicode_strings' feature in feature.
However, see Inverted bracketed character classes and multi-character folds,
below.
This feature avoids most forms of the "Unicode Bug" (see The Unicode Bug in perlunicode for details). If there is any possibility that your code will process Unicode strings, you are strongly encouraged to use this subpragma to avoid nasty surprises.
\N{NAME} and charnames
enhancements\N{NAME} and charnames::vianame
now know about the abbreviated
character names listed by Unicode, such as NBSP, SHY, LRO, ZWJ, etc.; all
customary abbreviations for the C0 and C1 control characters (such as
ACK, BEL, CAN, etc.); and a few new variants of some C1 full names that
are in common usage.
Unicode has several named character sequences, in which particular sequences
of code points are given names. \N{NAME} now recognizes these.
\N{NAME}, charnames::vianame
, and charnames::viacode
now know about every character in Unicode. In earlier releases of
Perl, they didn't know about the Hangul syllables nor several
CJK (Chinese/Japanese/Korean) characters.
It is now possible to override Perl's abbreviations with your own custom aliases.
You can now create a custom alias of the ordinal of a
character, known by \N{NAME}, charnames::vianame()
, and
charnames::viacode()
. Previously, aliases had to be to official
Unicode character names. This made it impossible to create an alias for
unnamed code points, such as those reserved for private
use.
The new function charnames::string_vianame() is a run-time version
of \N{NAME}}, returning the string of characters whose Unicode
name is its parameter. It can handle Unicode named character
sequences, whereas the pre-existing charnames::vianame() cannot,
as the latter returns a single code point.
See charnames for details on all these changes.
Three new warnings subcategories of "utf8" have been added. These
allow you to turn off some "utf8" warnings, while allowing
other warnings to remain on. The three categories are:
surrogate
when UTF-16 surrogates are encountered;
nonchar
when Unicode non-character code points are encountered;
and non_unicode
when code points above the legal Unicode
maximum of 0x10FFFF are encountered.
With this release, Perl is adopting a model that any unsigned value
can be treated as a code point and encoded internally (as utf8)
without warnings, not just the code points that are legal in Unicode.
However, unless utf8 or the corresponding sub-category (see previous
item) of lexical warnings have been explicitly turned off, outputting
or executing a Unicode-defined operation such as upper-casing
on such a code point generates a warning. Attempting to input these
using strict rules (such as with the :encoding(UTF-8)
layer)
will continue to fail. Prior to this release, handling was
inconsistent and in places, incorrect.
Unicode non-characters, some of which previously were erroneously considered illegal in places by Perl, contrary to the Unicode Standard, are now always legal internally. Inputting or outputting them works the same as with the non-legal Unicode code points, because the Unicode Standard says they are (only) illegal for "open interchange".
The Unicode database files are no longer installed with Perl. This doesn't affect any functionality in Perl and saves significant disk space. If you need these files, you can download them from http://www.unicode.org/Public/zipped/6.0.0/.
(?^...)
construct signifies default modifiersAn ASCII caret "^"
immediately following a "(?"
in a regular
expression now means that the subexpression does not inherit surrounding
modifiers such as /i, but reverts to the Perl defaults. Any modifiers
following the caret override the defaults.
Stringification of regular expressions now uses this notation.
For example, qr/hlagh/i would previously be stringified as
(?i-xsm:hlagh)
, but now it's stringified as (?^i:hlagh)
.
The main purpose of this change is to allow tests that rely on the stringification not to have to change whenever new modifiers are added. See Extended Patterns in perlre.
This change is likely to break code that compares stringified regular
expressions with fixed strings containing ?-xism
.
/d, /l
, /u
, and /a
modifiersFour new regular expression modifiers have been added. These are mutually exclusive: one only can be turned on at a time.
The /l
modifier says to compile the regular expression as if it were
in the scope of use locale
, even if it is not.
The /u
modifier says to compile the regular expression as if it were
in the scope of a use feature 'unicode_strings'
pragma.
The /d (default) modifier is used to override any use locale
and
use feature 'unicode_strings'
pragmas in effect at the time
of compiling the regular expression.
The /a
regular expression modifier restricts \s, \d
and \w
and
the POSIX ([[:posix:]]
) character classes to the ASCII range. Their
complements and \b
and \B
are correspondingly
affected. Otherwise, /a
behaves like the /u
modifier, in that
case-insensitive matching uses Unicode semantics.
If the /a
modifier is repeated, then additionally in case-insensitive
matching, no ASCII character can match a non-ASCII character.
For example,
- "k" =~ /\N{KELVIN SIGN}/ai
- "\xDF" =~ /ss/ai
match but
- "k" =~ /\N{KELVIN SIGN}/aai
- "\xDF" =~ /ss/aai
do not match.
See Modifiers in perlre for more detail.
The substitution (s///) and transliteration
(y///) operators now support an /r
option that
copies the input variable, carries out the substitution on
the copy, and returns the result. The original remains unmodified.
This is particularly useful with map. See perlop for more examples.
It is now safe to use regular expressions within (?{...}) and
(??{...})
code blocks inside regular expressions.
These blocks are still experimental, however, and still have problems with
lexical (my) variables and abnormal exiting.
use re '/flags'
The re
pragma now has the ability to turn on regular expression flags
till the end of the lexical scope:
- use re "/x";
- "foo" =~ / (.+) /; # /x implied
See '/flags' mode in re for details.
There is a new octal escape sequence, "\o"
, in doublequote-like
contexts. This construct allows large octal ordinals beyond the
current max of 0777 to be represented. It also allows you to specify a
character in octal which can safely be concatenated with other regex
snippets and which won't be confused with being a backreference to
a regex capture group. See Capture groups in perlre.
\p{Titlecase}
as a synonym for \p{Title}
This synonym is added for symmetry with the Unicode property names
\p{Uppercase}
and \p{Lowercase}
.
Regular expression debugging output (turned on by use re 'debug'
) now
uses hexadecimal when escaping non-ASCII characters, instead of octal.
delete $+{...}
Custom regular expression engines can now determine the return value of
delete on an entry of %+
or %-
.
Warning: This feature is considered experimental, as the exact behaviour may change in a future version of Perl.
All builtin functions that operate directly on array or hash containers now also accept unblessed hard references to arrays or hashes:
- |----------------------------+---------------------------|
- | Traditional syntax | Terse syntax |
- |----------------------------+---------------------------|
- | push @$arrayref, @stuff | push $arrayref, @stuff |
- | unshift @$arrayref, @stuff | unshift $arrayref, @stuff |
- | pop @$arrayref | pop $arrayref |
- | shift @$arrayref | shift $arrayref |
- | splice @$arrayref, 0, 2 | splice $arrayref, 0, 2 |
- | keys %$hashref | keys $hashref |
- | keys @$arrayref | keys $arrayref |
- | values %$hashref | values $hashref |
- | values @$arrayref | values $arrayref |
- | ($k,$v) = each %$hashref | ($k,$v) = each $hashref |
- | ($k,$v) = each @$arrayref | ($k,$v) = each $arrayref |
- |----------------------------+---------------------------|
This allows these builtin functions to act on long dereferencing chains
or on the return value of subroutines without needing to wrap them in
@{}
or %{}
:
The +
prototype is a special alternative to $
that acts like
\[@%]
when given a literal array or hash variable, but will otherwise
force scalar context on the argument. See Prototypes in perlsub.
package block syntaxA package declaration can now contain a code block, in which case the
declaration is in scope inside that block only. So package Foo { ... }
is precisely equivalent to { package Foo; ... }
. It also works with
a version number in the declaration, as in package Foo 1.2 { ... }
,
which is its most attractive feature. See perlfunc.
Statement labels can now occur before any type of statement or declaration,
such as package.
Multiple statement labels can now appear before a single statement.
Literals may now use either upper case 0X...
or 0B...
prefixes,
in addition to the already supported 0x...
and 0b...
syntax [perl #76296].
C, Ruby, Python, and PHP already support this syntax, and it makes
Perl more internally consistent: a round-trip with eval sprintf
"%#X", 0x10
now returns 16
, just like eval sprintf "%#x", 0x10
.
tie, tied and untie can now be overridden [perl #75902].
To make them more reliable and consistent, several changes have been made
to how die, warn, and $@
behave.
When an exception is thrown inside an eval, the exception is no
longer at risk of being clobbered by destructor code running during unwinding.
Previously, the exception was written into $@
early in the throwing process, and would be overwritten if eval was
used internally in the destructor for an object that had to be freed
while exiting from the outer eval. Now the exception is written
into $@
last thing before exiting the outer eval, so the code
running immediately thereafter can rely on the value in $@
correctly
corresponding to that eval. ($@
is still also set before exiting the
eval, for the sake of destructors that rely on this.)
Likewise, a local $@
inside an eval no longer clobbers any
exception thrown in its scope. Previously, the restoration of $@
upon
unwinding would overwrite any exception being thrown. Now the exception
gets to the eval anyway. So local $@
is safe before a die.
Exceptions thrown from object destructors no longer modify the $@
of the surrounding context. (If the surrounding context was exception
unwinding, this used to be another way to clobber the exception being
thrown.) Previously such an exception was
sometimes emitted as a warning, and then either was
string-appended to the surrounding $@
or completely replaced the
surrounding $@
, depending on whether that exception and the surrounding
$@
were strings or objects. Now, an exception in this situation is
always emitted as a warning, leaving the surrounding $@
untouched.
In addition to object destructors, this also affects any function call
run by XS code using the G_KEEPERR
flag.
Warnings for warn can now be objects in the same way as exceptions
for die. If an object-based warning gets the default handling
of writing to standard error, it is stringified as before with the
filename and line number appended. But a $SIG{__WARN__}
handler now
receives an object-based warning as an object, where previously it
was passed the result of stringifying the object.
$0
sets the legacy process name with prctl() on LinuxOn Linux the legacy process name is now set with prctl(2), in
addition to altering the POSIX name via argv[0]
, as Perl has done
since version 4.000. Now system utilities that read the legacy process
name such as ps, top, and killall recognize the name you set when
assigning to $0
. The string you supply is truncated at 16 bytes;
this limitation is imposed by Linux.
This allows programs that need to have repeatable results not to have to come up with their own seed-generating mechanism. Instead, they can use srand() and stash the return value for future use. One example is a test program with too many combinations to test comprehensively in the time available for each run. It can test a random subset each time and, should there be a failure, log the seed used for that run so this can later be used to produce the same results.
Perl's printf and sprintf operators, and Perl's internal printf replacement
function, now understand the C90 size modifiers "hh" (char
), "z"
(size_t
), and "t" (ptrdiff_t
). Also, when compiled with a C99
compiler, Perl now understands the size modifier "j" (intmax_t
)
(but this is not portable).
So, for example, on any modern machine, sprintf("%hhd", 257)
returns "1".
${^GLOBAL_PHASE}
A new global variable, ${^GLOBAL_PHASE}
, has been added to allow
introspection of the current phase of the Perl interpreter. It's explained in
detail in ${^GLOBAL_PHASE} in perlvar and in
BEGIN, UNITCHECK, CHECK, INIT and END in perlmod.
-d:-foo
calls Devel::foo::unimport
The syntax -d:foo was extended in 5.6.1 to make -d:foo=bar
equivalent to -MDevel::foo=bar, which expands
internally to use Devel::foo 'bar'
.
Perl now allows prefixing the module name with -, with the same
semantics as -M; that is:
-d:-foo
Equivalent to -M-Devel::foo: expands to
no Devel::foo
and calls Devel::foo->unimport()
if that method exists.
-d:-foo=bar
Equivalent to -M-Devel::foo=bar: expands to no Devel::foo 'bar'
,
and calls Devel::foo->unimport("bar")
if that method exists.
This is particularly useful for suppressing the default actions of a
Devel::*
module's import method whilst still loading it for debugging.
When a method call on a filehandle would die because the method cannot
be resolved and IO::File has not been loaded, Perl now loads IO::File
via require and attempts method resolution again:
This also works for globs like STDOUT
, STDERR
, and STDIN
:
- STDOUT->autoflush(1);
Because this on-demand load happens only if method resolution fails, the legacy approach of manually loading an IO::File parent class for partial method support still works as expected:
The Socket
module provides new affordances for IPv6,
including implementations of the Socket::getaddrinfo()
and
Socket::getnameinfo()
functions, along with related constants and a
handful of new functions. See Socket.
The DTrace
probes now include an additional argument, arg3
, which contains
the package the subroutine being entered or left was compiled in.
For example, using the following DTrace script:
and then running:
- $ perl -e 'sub test { }; test'
DTrace
will print:
- main::test
See Internal Changes.
User-Defined Character Properties in perlunicode documented that you can
create custom properties by defining subroutines whose names begin with
"In" or "Is". However, Perl did not actually enforce that naming
restriction, so \p{foo::bar}
could call foo::bar() if it existed. The documented
convention is now enforced.
Also, Perl no longer allows tainted regular expressions to invoke a user-defined property. It simply dies instead [perl #82616].
Perl 5.14.0 is not binary-compatible with any previous stable release.
In addition to the sections that follow, see C API Changes.
Some characters match a sequence of two or three characters in /i
regular expression matching under Unicode rules. One example is
LATIN SMALL LETTER SHARP S
which matches the sequence ss
.
- 'ss' =~ /\A[\N{LATIN SMALL LETTER SHARP S}]\z/i # Matches
This, however, can lead to very counter-intuitive results, especially
when inverted. Because of this, Perl 5.14 does not use multi-character /i
matching in inverted character classes.
- 'ss' =~ /\A[^\N{LATIN SMALL LETTER SHARP S}]+\z/i # ???
This should match any sequences of characters that aren't the SHARP S
nor what SHARP S
matches under /i. "s"
isn't SHARP S
, but
Unicode says that "ss"
is what SHARP S
matches under /i. So
which one "wins"? Do you fail the match because the string has ss
or
accept it because it has an s followed by another s?
Earlier releases of Perl did allow this multi-character matching, but due to bugs, it mostly did not work.
In certain circumstances, \400
-\777
in regexes have behaved
differently than they behave in all other doublequote-like contexts.
Since 5.10.1, Perl has issued a deprecation warning when this happens.
Now, these literals behave the same in all doublequote-like contexts,
namely to be equivalent to \x{100}
-\x{1FF}
, with no deprecation
warning.
Use of \400
-\777
in the command-line option -0 retain their
conventional meaning. They slurp whole input files; previously, this
was documented only for -0777.
Because of various ambiguities, you should use the new
\o{...}
construct to represent characters in octal instead.
\p{}
properties are now immune to case-insensitive matchingFor most Unicode properties, it doesn't make sense to have them match
differently under /i case-insensitive matching. Doing so can lead
to unexpected results and potential security holes. For example
- m/\p{ASCII_Hex_Digit}+/i
could previously match non-ASCII characters because of the Unicode
matching rules (although there were several bugs with this). Now
matching under /i gives the same results as non-/i matching except
for those few properties where people have come to expect differences,
namely the ones where casing is an integral part of their meaning, such
as m/\p{Uppercase}/i and m/\p{Lowercase}/i, both of which match
the same code points as matched by m/\p{Cased}/i.
Details are in Unicode Properties in perlrecharclass.
User-defined property handlers that need to match differently under /i
must be changed to read the new boolean parameter passed to them, which
is non-zero if case-insensitive matching is in effect and 0 otherwise.
See User-Defined Character Properties in perlunicode.
Specifying a Unicode property in the pattern indicates
that the pattern is meant for matching according to Unicode rules, the way
\N{NAME} does.
Regular expressions compiled under use locale
now retain this when
interpolated into a new regular expression compiled outside a
use locale
, and vice-versa.
Previously, one regular expression interpolated into another inherited the localeness of the surrounding regex, losing whatever state it originally had. This is considered a bug fix, but may trip up code that has come to rely on the incorrect behaviour.
Default regular expression modifiers are now notated using
(?^...)
. Code relying on the old stringification will fail.
This is so that when new modifiers are added, such code won't
have to keep changing each time this happens, because the stringification
will automatically incorporate the new modifiers.
Code that needs to work properly with both old- and new-style regexes can avoid the whole issue by using (for perls since 5.9.5; see re):
If the actual stringification is important or older Perls need to be supported, you can use something like the following:
- # Accept both old and new-style stringification
- my $modifiers = (qr/foobar/ =~ /\Q(?^/) ? "^" : "-xism";
And then use $modifiers
instead of -xism
.
Code blocks in regular expressions ((?{...}) and (??{...})
) previously
did not inherit pragmata (strict, warnings, etc.) if the regular expression
was compiled at run time as happens in cases like these two:
- use re "eval";
- $foo =~ $bar; # when $bar contains (?{...})
- $foo =~ /$bar(?{ $finished = 1 })/;
This bug has now been fixed, but code that relied on the buggy behaviour may need to be fixed to account for the correct behaviour.
In the following:
Earlier versions of Perl incorrectly tied the new local array. This has now been fixed. This fix could however potentially cause a change in behaviour of some code.
defined %Foo::
now always returns true, even when no symbols have yet been
defined in that package.
This is a side-effect of removing a special-case kludge in the tokeniser, added for 5.10.0, to hide side-effects of changes to the internal storage of hashes. The fix drastically reduces hashes' memory overhead.
Calling defined on a stash has been deprecated since 5.6.0, warned on
lexicals since 5.6.0, and warned for stashes and other package
variables since 5.12.0. defined %hash
has always exposed an
implementation detail: emptying a hash by deleting all entries from it does
not make defined %hash
false. Hence defined %hash
is not valid code to
determine whether an arbitrary hash is empty. Instead, use the behaviour
of an empty %hash
always returning false in scalar context.
Stash list assignment %foo:: = ()
used to make the stash temporarily
anonymous while it was being emptied. Consequently, any of its
subroutines referenced elsewhere would become anonymous, showing up as
"(unknown)" in caller. They now retain their package names such that
caller returns the original sub name if there is still a reference
to its typeglob and "foo::__ANON__" otherwise [perl #79208].
If you assign a typeglob to a scalar variable:
- $glob = *foo;
the glob that is copied to $glob
is marked with a special flag
indicating that the glob is just a copy. This allows subsequent
assignments to $glob
to overwrite the glob. The original glob,
however, is immutable.
Some Perl operators did not distinguish between these two types of globs.
This would result in strange behaviour in edge cases: untie $scalar
would not untie the scalar if the last thing assigned to it was a glob
(because it treated it as untie *$scalar
, which unties a handle).
Assignment to a glob slot (such as *$glob = \@some_array
) would simply
assign \@some_array
to $glob
.
To fix this, the *{}
operator (including its *foo
and *$foo
forms)
has been modified to make a new immutable glob if its operand is a glob
copy. This allows operators that make a distinction between globs and
scalars to be modified to treat only immutable globs as globs. (tie,
tied and untie have been left as they are for compatibility's sake,
but will warn. See Deprecations.)
This causes an incompatible change in code that assigns a glob to the
return value of *{}
when that operator was passed a glob copy. Take the
following code, for instance:
- $glob = *foo;
- *$glob = *bar;
The *$glob
on the second line returns a new immutable glob. That new
glob is made an alias to *bar
. Then it is discarded. So the second
assignment has no effect.
See http://rt.perl.org/rt3/Public/Bug/Display.html?id=77810 for more detail.
In previous versions of Perl, magic variables like $!
, %SIG
, etc. would
"leak" into other packages. So %foo::SIG
could be used to access signals,
${"foo::!"}
(with strict mode off) to access C's errno
, etc.
This was a bug, or an "unintentional" feature, which caused various ill effects, such as signal handlers being wiped when modules were loaded, etc.
This has been fixed (or the feature has been removed, depending on how you see it).
local() on scalar variables gives them a new value but keeps all their magic intact. This has proven problematic for the default scalar variable $_, where perlsub recommends that any subroutine that assigns to $_ should first localize it. This would throw an exception if $_ is aliased to a read-only variable, and could in general have various unintentional side-effects.
Therefore, as an exception to the general rule, local($_) will not only assign a new value to $_, but also remove all existing magic from it as well.
Parsing the names of packages and package variables has changed:
multiple adjacent pairs of colons, as in foo::::bar
, are now all
treated as package separators.
Regardless of this change, the exact parsing of package separators has never been guaranteed and is subject to change in future Perl versions.
given
return valuesgiven
blocks now return the last evaluated
expression, or an empty list if the block was exited by break
. Thus you
can now write:
See Return value in perlsyn for details.
Functions declared with the following prototypes now behave correctly as unary functions:
- *
- \$ \% \@ \* \&
- \[...]
- ;$ ;*
- ;\$ ;\% etc.
- ;\[...]
Due to this bug fix [perl #75904], functions
using the (*), (;$)
and (;*)
prototypes
are parsed with higher precedence than before. So
in the following example:
- sub foo(;$);
- foo $a < $b;
the second line is now parsed correctly as foo($a) < $b
, rather than
foo($a < $b)
. This happens when one of these operators is used in
an unparenthesised argument:
- < > <= >= lt gt le ge
- == != <=> eq ne cmp ~~
- &
- | ^
- &&
- || //
- .. ...
- ?:
- = += -= *= etc.
- , =>
Previously, the following code resulted in a successful match:
This odd behaviour has now been fixed [perl #77468].
The unary negation operator, -
, now treats strings that look like numbers
as numbers [perl #57706].
Negative zero (-0.0), when converted to a string, now becomes "0" on all platforms. It used to become "-0" on some, but "0" on others.
If you still need to determine whether a zero is negative, use
sprintf("%g", $zero) =~ /^-/
or the Data::Float module on CPAN.
:=
is now a syntax errorPreviously my $pi := 4
was exactly equivalent to my $pi : = 4
,
with the :
being treated as the start of an attribute list, ending before
the =
. The use of :=
to mean : =
was deprecated in 5.12.0, and is
now a syntax error. This allows future use of :=
as a new token.
Outside the core's tests for it, we find no Perl 5 code on CPAN using this construction, so we believe that this change will have little impact on real-world codebases.
If it is absolutely necessary to have empty attribute lists (for example,
because of a code generator), simply avoid the error by adding a space before
the =
.
Characters outside the Unicode "XIDStart" set are no longer allowed at the beginning of an identifier. This means that certain accents and marks that normally follow an alphabetic character may no longer be the first character of an identifier.
On systems other than Windows that do not have
a fchdir
function, newly-created threads no
longer inherit directory handles from their parent threads. Such programs
would usually have crashed anyway [perl #75154].
close on shared pipesTo avoid deadlocks, the close function no longer waits for the
child process to exit if the underlying file descriptor is still
in use by another thread. It returns true in such cases.
On Windows parent processes would not terminate until all forked
children had terminated first. However, kill("KILL", ...)
is
inherently unstable on pseudo-processes, and kill("TERM", ...)
might not get delivered if the child is blocked in a system call.
To avoid the deadlock and still provide a safe mechanism to terminate the hosting process, Perl now no longer waits for children that have been sent a SIGTERM signal. It is up to the parent process to waitpid() for these children if child-cleanup processing must be allowed to finish. However, it is also then the responsibility of the parent to avoid the deadlock by making sure the child process can't be blocked on I/O.
See perlfork for more information about the fork() emulation on Windows.
Several long-standing typos and naming confusions in Policy_sh.SH have been fixed, standardizing on the variable names used in config.sh.
This will change the behaviour of Policy.sh if you happen to have been accidentally relying on its incorrect behaviour.
Perl scripts used to be read in binary mode on Windows for the benefit
of the ByteLoader module (which is no longer part of core Perl). This
had the side-effect of breaking various operations on the DATA
filehandle,
including seek()/tell(), and even simply reading from DATA
after filehandles
have been flushed by a call to system(), backticks, fork() etc.
The default build options for Windows have been changed to read Perl source code on Windows in text mode now. ByteLoader will (hopefully) be updated on CPAN to automatically handle this situation [perl #28106].
See also Deprecated C APIs.
Omitting the space between a regular expression operator or
its modifiers and the following word is deprecated. For
example, m/foo/sand $bar
is for now still parsed
as m/foo/s and $bar
, but will now issue a warning.
\cXThe backslash-c construct was designed as a way of specifying
non-printable characters, but there were no restrictions (on ASCII
platforms) on what the character following the c
could be. Now,
a deprecation warning is raised if that character isn't an ASCII character.
Also, a deprecation warning is raised for "\c{"
(which is the same
as simply saying ";"
).
"\b{"
and "\B{"
In regular expressions, a literal "{"
immediately following a "\b"
(not in a bracketed character class) or a "\B{"
is now deprecated
to allow for its future use by Perl itself.
Perl bundles a handful of library files that predate Perl 5. This bundling is now deprecated for most of these files, which are now available from CPAN. The affected files now warn when run, if they were installed as part of the core.
This is a mandatory warning, not obeying -X or lexical warning bits. The warning is modelled on that supplied by deprecate.pm for deprecated-in-core .pm libraries. It points to the specific CPAN distribution that contains the .pl libraries. The CPAN versions, of course, do not generate the warning.
$[
Assignment to $[
was deprecated and started to give warnings in
Perl version 5.12.0. This version of Perl (5.14) now also emits a warning
when assigning to $[
in list context. This fixes an oversight in 5.12.0.
Historically the parser fooled itself into thinking that qw(...) literals
were always enclosed in parentheses, and as a result you could sometimes omit
parentheses around them:
- for $x qw(a b c) { ... }
The parser no longer lies to itself in this way. Wrap the list literal in parentheses like this:
- for $x (qw(a b c)) { ... }
This is being deprecated because the parentheses in for $i (1,2,3) { ... }
are not part of expression syntax. They are part of the statement
syntax, with the for
statement wanting literal parentheses.
The synthetic parentheses that a qw expression acquired were only
intended to be treated as part of expression syntax.
Note that this does not change the behaviour of cases like:
where parentheses were never required around the expression.
\N{BELL}
This is because Unicode is using that name for a different character. See Unicode Version 6.0 is now supported (mostly) for more explanation.
?PATTERN?
?PATTERN?
(without the initial m) has been deprecated and now produces
a warning. This is to allow future use of ? in new operators.
The match-once functionality is still available as m?PATTERN?.
Calling a tie function (tie, tied, untie) with a scalar argument
acts on a filehandle if the scalar happens to hold a typeglob.
This is a long-standing bug that will be removed in Perl 5.16, as there is currently no way to tie the scalar itself when it holds a typeglob, and no way to untie a scalar that has had a typeglob assigned to it.
Now there is a deprecation warning whenever a tie
function is used on a handle without an explicit *
.
This feature is being deprecated due to its many issues, as documented in User-Defined Case Mappings (for serious hackers only) in perlunicode. This feature will be removed in Perl 5.16. Instead use the CPAN module Unicode::Casing, which provides improved functionality.
The following module will be removed from the core distribution in a future release, and should be installed from CPAN instead. Distributions on CPAN that require this should add it to their prerequisites. The core version of these module now issues a deprecation warning.
If you ship a packaged version of Perl, either alone or as part of a
larger system, then you should carefully consider the repercussions of
core module deprecations. You may want to consider shipping your default
build of Perl with a package for the deprecated module that
installs into vendor
or site
Perl library directories. This will
inhibit the deprecation warnings.
Alternatively, you may want to consider patching lib/deprecate.pm to provide deprecation warnings specific to your packaging system or distribution of Perl, consistent with how your packaging system or distribution manages a staged transition from a release where the installation of a single package provides the given functionality, to a later release where the system administrator needs to know to install multiple packages to get that same functionality.
You can silence these deprecation warnings by installing the module
in question from CPAN. To install the latest version of it by role
rather than by name, just install Task::Deprecations::5_14
.
We strongly recommend that you install and use Devel::NYTProf instead of Devel::DProf, as Devel::NYTProf offers significantly improved profiling and reporting.
Signal dispatch has been moved from the runloop into control ops. This should give a few percent speed increase, and eliminates nearly all the speed penalty caused by the introduction of "safe signals" in 5.8.0. Signals should still be dispatched within the same statement as they were previously. If this does not happen, or if you find it possible to create uninterruptible loops, this is a bug, and reports are encouraged of how to recreate such issues.
Two fewer OPs are used for shift() and pop() calls with no argument (with
implicit @_
). This change makes shift() 5% faster than shift @_
on non-threaded perls, and 25% faster on threaded ones.
The foldEQ_utf8
API function for case-insensitive comparison of strings (which
is used heavily by the regexp engine) was substantially refactored and
optimised -- and its documentation much improved as a free bonus.
Compiling regular expressions has been made faster when upgrading the regex to utf8 is necessary but this isn't known when the compilation begins.
When doing a lot of string appending, perls built to use the system's
malloc
could end up allocating a lot more memory than needed in a
inefficient way.
sv_grow
, the function used to allocate more memory if necessary
when appending to a string, has been taught to round up the memory
it requests to a certain geometric progression, making it much faster on
certain platforms and configurations. On Win32, it's now about 100 times
faster.
PL_*
accessor functions under ithreadsWhen MULTIPLICITY
was first developed, and interpreter state moved into
an interpreter struct, thread- and interpreter-local PL_*
variables
were defined as macros that called accessor functions (returning the
address of the value) outside the Perl core. The intent was to allow
members within the interpreter struct to change size without breaking
binary compatibility, so that bug fixes could be merged to a maintenance
branch that necessitated such a size change. This mechanism was redundant
and penalised well-behaved code. It has been removed.
When there are many weak references to an object, freeing that object can under some circumstances take O(N*N) time to free, where N is the number of references. The circumstances in which this can happen have been reduced [perl #75254]
An earlier optimisation to speed up my @array = ...
and
my %hash = ...
assignments caused a bug and was disabled in Perl 5.12.0.
Now we have found another way to speed up these assignments [perl #82110].
@_
uses less memoryPreviously, @_
was allocated for every subroutine at compile time with
enough space for four entries. Now this allocation is done on demand when
the subroutine is called [perl #72416].
xhv_fill
has been eliminated from struct xpvhv
, saving 1 IV per hash and
on some systems will cause struct xpvhv
to become cache-aligned. To avoid
this memory saving causing a slowdown elsewhere, boolean use of HvFILL
now calls HvTOTALKEYS
instead (which is equivalent), so while the fill
data when actually required are now calculated on demand, cases when
this needs to be done should be rare.
The order of structure elements in SV bodies has changed. Effectively, the NV slot has swapped location with STASH and MAGIC. As all access to SV members is via macros, this should be completely transparent. This change allows the space saving for PVHVs documented above, and may reduce the memory allocation needed for PVIVs on some architectures.
XPV
, XPVIV
, and XPVNV
now allocate only the parts of the SV
body
they actually use, saving some space.
Scalars containing regular expressions now allocate only the part of the SV
body they actually use, saving some space.
The @EXPORT_FAIL
AV is no longer created unless needed, hence neither is
the typeglob backing it. This saves about 200 bytes for every package that
uses Exporter but doesn't use this functionality.
For weak references, the common case of just a single weak reference per referent has been optimised to reduce the storage required. In this case it saves the equivalent of one small Perl array per referent.
%+
and %-
use less memoryThe bulk of the Tie::Hash::NamedCapture
module used to be in the Perl
core. It has now been moved to an XS module to reduce overhead for
programs that do not use %+
or %-
.
The internal structures of threading now make fewer API calls and fewer allocations, resulting in noticeably smaller object code. Additionally, many thread context checks have been deferred so they're done only as needed (although this is only possible for non-debugging builds).
Previously, in code such as
the ops for warn if DEBUG
would be folded to a null
op (ex-const
), but
the nextstate
op would remain, resulting in a runtime op dispatch of
nextstate
, nextstate
, etc.
The execution of a sequence of nextstate
ops is indistinguishable from just
the last nextstate
op so the peephole optimizer now eliminates the first of
a pair of nextstate
ops except when the first carries a label, since labels
must not be eliminated by the optimizer, and label usage isn't conclusively known
at compile time.
CPAN::Meta::YAML 0.003 has been added as a dual-life module. It supports a subset of YAML sufficient for reading and writing META.yml and MYMETA.yml files included with CPAN distributions or generated by the module installation toolchain. It should not be used for any other general YAML parsing or generation task.
CPAN::Meta version 2.110440 has been added as a dual-life module. It provides a standard library to read, interpret and write CPAN distribution metadata files (like META.json and META.yml) that describe a distribution, its contents, and the requirements for building it and installing it. The latest CPAN distribution metadata specification is included as CPAN::Meta::Spec and notes on changes in the specification over time are given in CPAN::Meta::History.
HTTP::Tiny 0.012 has been added as a dual-life module. It is a very small, simple HTTP/1.1 client designed for simple GET requests and file mirroring. It has been added so that CPAN.pm and CPANPLUS can "bootstrap" HTTP access to CPAN using pure Perl without relying on external binaries like curl(1) or wget(1).
JSON::PP 2.27105 has been added as a dual-life module to allow CPAN clients to read META.json files in CPAN distributions.
Module::Metadata 1.000004 has been added as a dual-life module. It gathers package and POD information from Perl module files. It is a standalone module based on Module::Build::ModuleInfo for use by other module installation toolchain components. Module::Build::ModuleInfo has been deprecated in favor of this module instead.
Perl::OSType 1.002 has been added as a dual-life module. It maps Perl operating system names (like "dragonfly" or "MSWin32") to more generic types with standardized names (like "Unix" or "Windows"). It has been refactored out of Module::Build and ExtUtils::CBuilder and consolidates such mappings into a single location for easier maintenance.
The following modules were added by the Unicode::Collate upgrade. See below for details.
Version::Requirements version 0.101020 has been added as a dual-life module. It provides a standard library to model and manipulates module prerequisites and version constraints defined in CPAN::Meta::Spec.
attributes has been upgraded from version 0.12 to 0.14.
Archive::Extract has been upgraded from version 0.38 to 0.48.
Updates since 0.38 include: a safe print method that guards
Archive::Extract from changes to $\
; a fix to the tests when run in core
Perl; support for TZ files; a modification for the lzma
logic to favour IO::Uncompress::Unlzma; and a fix
for an issue with NetBSD-current and its new unzip(1)
executable.
Archive::Tar has been upgraded from version 1.54 to 1.76.
Important changes since 1.54 include the following:
Compatibility with busybox implementations of tar(1).
A fix so that write() and create_archive() close only filehandles they themselves opened.
A bug was fixed regarding the exit code of extract_archive.
The ptar(1) utility has a new option to allow safe creation of tarballs without world-writable files on Windows, allowing those archives to be uploaded to CPAN.
A new ptargrep(1) utility for using regular expressions against the contents of files in a tar archive.
pax extended headers are now skipped.
Attribute::Handlers has been upgraded from version 0.87 to 0.89.
autodie has been upgraded from version 2.06_01 to 2.1001.
AutoLoader has been upgraded from version 5.70 to 5.71.
The B module has been upgraded from version 1.23 to 1.29.
It no longer crashes when taking apart a y/// containing characters
outside the octet range or compiled in a use utf8
scope.
The size of the shared object has been reduced by about 40%, with no reduction in functionality.
B::Concise has been upgraded from version 0.78 to 0.83.
B::Concise marks rv2sv(), rv2av(), and rv2hv() ops with the new
OPpDEREF
flag as "DREFed".
It no longer produces mangled output with the -tree option [perl #80632].
B::Debug has been upgraded from version 1.12 to 1.16.
B::Deparse has been upgraded from version 0.96 to 1.03.
The deparsing of a nextstate
op has changed when it has both a
change of package relative to the previous nextstate, or a change of
%^H
or other state and a label. The label was previously emitted
first, but is now emitted last (5.12.1).
The no 5.13.2
or similar form is now correctly handled by B::Deparse
(5.12.3).
B::Deparse now properly handles the code that applies a conditional
pattern match against implicit $_
as it was fixed in [perl #20444].
Deparsing of our followed by a variable with funny characters
(as permitted under the use utf8
pragma) has also been fixed [perl #33752].
B::Lint has been upgraded from version 1.11_01 to 1.13.
base has been upgraded from version 2.15 to 2.16.
Benchmark has been upgraded from version 1.11 to 1.12.
bignum has been upgraded from version 0.23 to 0.27.
Carp has been upgraded from version 1.15 to 1.20.
Carp now detects incomplete caller EXPR
overrides and avoids using bogus @DB::args
. To provide backtraces,
Carp relies on particular behaviour of the caller() builtin.
Carp now detects if other code has overridden this with an
incomplete implementation, and modifies its backtrace accordingly.
Previously incomplete overrides would cause incorrect values in
backtraces (best case), or obscure fatal errors (worst case).
This fixes certain cases of "Bizarre copy of ARRAY" caused by modules overriding caller() incorrectly (5.12.2).
It now also avoids using regular expressions that cause Perl to load its Unicode tables, so as to avoid the "BEGIN not safe after errors" error that ensue if there has been a syntax error [perl #82854].
CGI has been upgraded from version 3.48 to 3.52.
This provides the following security fixes: the MIME boundary in multipart_init() is now random and the handling of newlines embedded in header values has been improved.
Compress::Raw::Bzip2 has been upgraded from version 2.024 to 2.033.
It has been updated to use bzip2(1) 1.0.6.
Compress::Raw::Zlib has been upgraded from version 2.024 to 2.033.
constant has been upgraded from version 1.20 to 1.21.
Unicode constants work once more. They have been broken since Perl 5.10.0 [CPAN RT #67525].
CPAN has been upgraded from version 1.94_56 to 1.9600.
Major highlights:
Foo::Bar
CPANPLUS has been upgraded from version 0.90 to 0.9103.
A change to cpanp-run-perl
resolves RT #55964
and RT #57106, both
of which related to failures to install distributions that use
Module::Install::DSL
(5.12.2).
A dependency on Config was not recognised as a core module dependency. This has been fixed.
CPANPLUS now includes support for META.json and MYMETA.json.
CPANPLUS::Dist::Build has been upgraded from version 0.46 to 0.54.
Data::Dumper has been upgraded from version 2.125 to 2.130_02.
The indentation used to be off when $Data::Dumper::Terse
was set. This
has been fixed [perl #73604].
This upgrade also fixes a crash when using custom sort functions that might cause the stack to change [perl #74170].
Dumpxs no longer crashes with globs returned by *$io_ref
[perl #72332].
DB_File has been upgraded from version 1.820 to 1.821.
DBM_Filter has been upgraded from version 0.03 to 0.04.
Devel::DProf has been upgraded from version 20080331.00 to 20110228.00.
Merely loading Devel::DProf now no longer triggers profiling to start.
Both use Devel::DProf
and perl -d:DProf ...
behave as before and start
the profiler.
NOTE: Devel::DProf is deprecated and will be removed from a future version of Perl. We strongly recommend that you install and use Devel::NYTProf instead, as it offers significantly improved profiling and reporting.
Devel::Peek has been upgraded from version 1.04 to 1.07.
Devel::SelfStubber has been upgraded from version 1.03 to 1.05.
diagnostics has been upgraded from version 1.19 to 1.22.
It now renders pod links slightly better, and has been taught to find descriptions for messages that share their descriptions with other messages.
Digest::MD5 has been upgraded from version 2.39 to 2.51.
It is now safe to use this module in combination with threads.
Digest::SHA has been upgraded from version 5.47 to 5.61.
shasum
now more closely mimics sha1sum(1)/md5sum(1).
addfile
accepts all POSIX filenames.
New SHA-512/224 and SHA-512/256 transforms (ref. NIST Draft FIPS 180-4 [February 2011])
DirHandle has been upgraded from version 1.03 to 1.04.
Dumpvalue has been upgraded from version 1.13 to 1.16.
DynaLoader has been upgraded from version 1.10 to 1.13.
It fixes a buffer overflow when passed a very long file name.
It no longer inherits from AutoLoader; hence it no longer produces weird error messages for unsuccessful method calls on classes that inherit from DynaLoader [perl #84358].
Encode has been upgraded from version 2.39 to 2.42.
Now, all 66 Unicode non-characters are treated the same way U+FFFF has always been treated: in cases when it was disallowed, all 66 are disallowed, and in cases where it warned, all 66 warn.
Env has been upgraded from version 1.01 to 1.02.
Errno has been upgraded from version 1.11 to 1.13.
The implementation of Errno has been refactored to use about 55% less memory.
On some platforms with unusual header files, like Win32 gcc(1) using mingw64
headers, some constants that weren't actually error numbers have been exposed
by Errno. This has been fixed [perl #77416].
Exporter has been upgraded from version 5.64_01 to 5.64_03.
Exporter no longer overrides $SIG{__WARN__}
[perl #74472]
ExtUtils::CBuilder has been upgraded from version 0.27 to 0.280203.
ExtUtils::Command has been upgraded from version 1.16 to 1.17.
ExtUtils::Constant has been upgraded from 0.22 to 0.23.
The AUTOLOAD helper code generated by ExtUtils::Constant::ProxySubs
can now croak() for missing constants, or generate a complete AUTOLOAD
subroutine in XS, allowing simplification of many modules that use it
(Fcntl, File::Glob, GDBM_File, I18N::Langinfo, POSIX,
Socket).
ExtUtils::Constant::ProxySubs can now optionally push the names of all
constants onto the package's @EXPORT_OK
.
ExtUtils::Install has been upgraded from version 1.55 to 1.56.
ExtUtils::MakeMaker has been upgraded from version 6.56 to 6.57_05.
ExtUtils::Manifest has been upgraded from version 1.57 to 1.58.
ExtUtils::ParseXS has been upgraded from version 2.21 to 2.2210.
Fcntl has been upgraded from version 1.06 to 1.11.
File::Basename has been upgraded from version 2.78 to 2.82.
File::CheckTree has been upgraded from version 4.4 to 4.41.
File::Copy has been upgraded from version 2.17 to 2.21.
File::DosGlob has been upgraded from version 1.01 to 1.04.
It allows patterns containing literal parentheses: they no longer need to be escaped. On Windows, it no longer adds an extra ./ to file names returned when the pattern is a relative glob with a drive specification, like C:*.pl [perl #71712].
File::Fetch has been upgraded from version 0.24 to 0.32.
HTTP::Lite is now supported for the "http" scheme.
The fetch(1) utility is supported on FreeBSD, NetBSD, and
Dragonfly BSD for the http
and ftp
schemes.
File::Find has been upgraded from version 1.15 to 1.19.
It improves handling of backslashes on Windows, so that paths like C:\dir\/file are no longer generated [perl #71710].
File::Glob has been upgraded from version 1.07 to 1.12.
File::Spec has been upgraded from version 3.31 to 3.33.
Several portability fixes were made in File::Spec::VMS: a colon is now recognized as a delimiter in native filespecs; caret-escaped delimiters are recognized for better handling of extended filespecs; catpath() returns an empty directory rather than the current directory if the input directory name is empty; and abs2rel() properly handles Unix-style input (5.12.2).
File::stat has been upgraded from 1.02 to 1.05.
The -x
and -X file test operators now work correctly when run
by the superuser.
Filter::Simple has been upgraded from version 0.84 to 0.86.
GDBM_File has been upgraded from 1.10 to 1.14.
This fixes a memory leak when DBM filters are used.
Hash::Util has been upgraded from 0.07 to 0.11.
Hash::Util no longer emits spurious "uninitialized" warnings when recursively locking hashes that have undefined values [perl #74280].
Hash::Util::FieldHash has been upgraded from version 1.04 to 1.09.
I18N::Collate has been upgraded from version 1.01 to 1.02.
I18N::Langinfo has been upgraded from version 0.03 to 0.08.
langinfo() now defaults to using $_
if there is no argument given, just
as the documentation has always claimed.
I18N::LangTags has been upgraded from version 0.35 to 0.35_01.
if has been upgraded from version 0.05 to 0.0601.
IO has been upgraded from version 1.25_02 to 1.25_04.
This version of IO includes a new IO::Select, which now allows IO::Handle objects (and objects in derived classes) to be removed from an IO::Select set even if the underlying file descriptor is closed or invalid.
IPC::Cmd has been upgraded from version 0.54 to 0.70.
Resolves an issue with splitting Win32 command lines. An argument consisting of the single character "0" used to be omitted (CPAN RT #62961).
IPC::Open3 has been upgraded from 1.05 to 1.09.
open3() now produces an error if the exec call fails, allowing this
condition to be distinguished from a child process that exited with a
non-zero status [perl #72016].
The internal xclose() routine now knows how to handle file descriptors as
documented, so duplicating STDIN
in a child process using its file
descriptor now works [perl #76474].
IPC::SysV has been upgraded from version 2.01 to 2.03.
lib has been upgraded from version 0.62 to 0.63.
Locale::Maketext has been upgraded from version 1.14 to 1.19.
Locale::Maketext now supports external caches.
This upgrade also fixes an infinite loop in
Locale::Maketext::Guts::_compile()
when
working with tainted values (CPAN RT #40727).
->maketext
calls now back up and restore $@
so error
messages are not suppressed (CPAN RT #34182).
Log::Message has been upgraded from version 0.02 to 0.04.
Log::Message::Simple has been upgraded from version 0.06 to 0.08.
Math::BigInt has been upgraded from version 1.89_01 to 1.994.
This fixes, among other things, incorrect results when computing binomial coefficients [perl #77640].
It also prevents sqrt($int) from crashing under use bigrat
.
[perl #73534].
Math::BigInt::FastCalc has been upgraded from version 0.19 to 0.28.
Math::BigRat has been upgraded from version 0.24 to 0.26_02.
Memoize has been upgraded from version 1.01_03 to 1.02.
MIME::Base64 has been upgraded from 3.08 to 3.13.
Includes new functions to calculate the length of encoded and decoded base64 strings.
Now provides encode_base64url() and decode_base64url() functions to process the base64 scheme for "URL applications".
Module::Build has been upgraded from version 0.3603 to 0.3800.
A notable change is the deprecation of several modules. Module::Build::Version has been deprecated and Module::Build now relies on the version pragma directly. Module::Build::ModuleInfo has been deprecated in favor of a standalone copy called Module::Metadata. Module::Build::YAML has been deprecated in favor of CPAN::Meta::YAML.
Module::Build now also generates META.json and MYMETA.json files in accordance with version 2 of the CPAN distribution metadata specification, CPAN::Meta::Spec. The older format META.yml and MYMETA.yml files are still generated.
Module::CoreList has been upgraded from version 2.29 to 2.47.
Besides listing the updated core modules of this release, it also stops listing
the Filespec
module. That module never existed in core. The scripts
generating Module::CoreList confused it with VMS::Filespec, which actually
is a core module as of Perl 5.8.7.
Module::Load has been upgraded from version 0.16 to 0.18.
Module::Load::Conditional has been upgraded from version 0.34 to 0.44.
The mro pragma has been upgraded from version 1.02 to 1.07.
NDBM_File has been upgraded from version 1.08 to 1.12.
This fixes a memory leak when DBM filters are used.
Net::Ping has been upgraded from version 2.36 to 2.38.
NEXT has been upgraded from version 0.64 to 0.65.
Object::Accessor has been upgraded from version 0.36 to 0.38.
ODBM_File has been upgraded from version 1.07 to 1.10.
This fixes a memory leak when DBM filters are used.
Opcode has been upgraded from version 1.15 to 1.18.
The overload pragma has been upgraded from 1.10 to 1.13.
overload::Method
can now handle subroutines that are themselves blessed
into overloaded classes [perl #71998].
The documentation has greatly improved. See Documentation below.
Params::Check has been upgraded from version 0.26 to 0.28.
The parent pragma has been upgraded from version 0.223 to 0.225.
Parse::CPAN::Meta has been upgraded from version 1.40 to 1.4401.
The latest Parse::CPAN::Meta can now read YAML and JSON files using CPAN::Meta::YAML and JSON::PP, which are now part of the Perl core.
PerlIO::encoding has been upgraded from version 0.12 to 0.14.
PerlIO::scalar has been upgraded from 0.07 to 0.11.
A read() after a seek() beyond the end of the string no longer thinks it has data to read [perl #78716].
PerlIO::via has been upgraded from version 0.09 to 0.11.
Pod::Html has been upgraded from version 1.09 to 1.11.
Pod::LaTeX has been upgraded from version 0.58 to 0.59.
Pod::Perldoc has been upgraded from version 3.15_02 to 3.15_03.
Pod::Simple has been upgraded from version 3.13 to 3.16.
POSIX has been upgraded from 1.19 to 1.24.
It now includes constants for POSIX signal constants.
The re pragma has been upgraded from version 0.11 to 0.18.
The use re '/flags'
subpragma is new.
The regmust() function used to crash when called on a regular expression belonging to a pluggable engine. Now it croaks instead.
regmust() no longer leaks memory.
Safe has been upgraded from version 2.25 to 2.29.
Coderefs returned by reval() and rdo() are now wrapped via wrap_code_refs() (5.12.1).
This fixes a possible infinite loop when looking for coderefs.
It adds several version::vxs::*
routines to the default share.
SDBM_File has been upgraded from version 1.06 to 1.09.
SelfLoader has been upgraded from 1.17 to 1.18.
It now works in taint mode [perl #72062].
The sigtrap pragma has been upgraded from version 1.04 to 1.05.
It no longer tries to modify read-only arguments when generating a backtrace [perl #72340].
Socket has been upgraded from version 1.87 to 1.94.
See Improved IPv6 support above.
Storable has been upgraded from version 2.22 to 2.27.
Includes performance improvement for overloaded classes.
This adds support for serialising code references that contain UTF-8 strings
correctly. The Storable minor version
number changed as a result, meaning that
Storable users who set $Storable::accept_future_minor
to a FALSE
value
will see errors (see FORWARD COMPATIBILITY in Storable for more details).
Freezing no longer gets confused if the Perl stack gets reallocated during freezing [perl #80074].
Sys::Hostname has been upgraded from version 1.11 to 1.16.
Term::ANSIColor has been upgraded from version 2.02 to 3.00.
Term::UI has been upgraded from version 0.20 to 0.26.
Test::Harness has been upgraded from version 3.17 to 3.23.
Test::Simple has been upgraded from version 0.94 to 0.98.
Among many other things, subtests without a plan
or no_plan
now have an
implicit done_testing() added to them.
Thread::Semaphore has been upgraded from version 2.09 to 2.12.
It provides two new methods that give more control over the decrementing of
semaphores: down_nb
and down_force
.
Thread::Queue has been upgraded from version 2.11 to 2.12.
The threads pragma has been upgraded from version 1.75 to 1.83.
The threads::shared pragma has been upgraded from version 1.32 to 1.37.
Tie::Hash has been upgraded from version 1.03 to 1.04.
Calling Tie::Hash->TIEHASH()
used to loop forever. Now it croak
s.
Tie::Hash::NamedCapture has been upgraded from version 0.06 to 0.08.
Tie::RefHash has been upgraded from version 1.38 to 1.39.
Time::HiRes has been upgraded from version 1.9719 to 1.9721_01.
Time::Local has been upgraded from version 1.1901_01 to 1.2000.
Time::Piece has been upgraded from version 1.15_01 to 1.20_01.
Unicode::Collate has been upgraded from version 0.52_01 to 0.73.
Unicode::Collate has been updated to use Unicode 6.0.0.
Unicode::Collate::Locale now supports a plethora of new locales: ar, be, bg, de__phonebook, hu, hy, kk, mk, nso, om, tn, vi, hr, ig, ja, ko, ru, sq, se, sr, to, uk, zh, zh__big5han, zh__gb2312han, zh__pinyin, and zh__stroke.
The following modules have been added:
Unicode::Collate::CJK::Big5 for zh__big5han
which makes
tailoring of CJK Unified Ideographs in the order of CLDR's big5han ordering.
Unicode::Collate::CJK::GB2312 for zh__gb2312han
which makes
tailoring of CJK Unified Ideographs in the order of CLDR's gb2312han ordering.
Unicode::Collate::CJK::JISX0208 which makes tailoring of 6355 kanji (CJK Unified Ideographs) in the JIS X 0208 order.
Unicode::Collate::CJK::Korean which makes tailoring of CJK Unified Ideographs in the order of CLDR's Korean ordering.
Unicode::Collate::CJK::Pinyin for zh__pinyin
which makes
tailoring of CJK Unified Ideographs in the order of CLDR's pinyin ordering.
Unicode::Collate::CJK::Stroke for zh__stroke
which makes
tailoring of CJK Unified Ideographs in the order of CLDR's stroke ordering.
This also sees the switch from using the pure-Perl version of this module to the XS version.
Unicode::Normalize has been upgraded from version 1.03 to 1.10.
Unicode::UCD has been upgraded from version 0.27 to 0.32.
A new function, Unicode::UCD::num(), has been added. This function
returns the numeric value of the string passed it or undef if the string
in its entirety has no "safe" numeric value. (For more detail, and for the
definition of "safe", see num() in Unicode::UCD.)
This upgrade also includes several bug fixes:
It is now updated to Unicode Version 6.0.0 with Corrigendum #8, excepting that, just as with Perl 5.14, the code point at U+1F514 has no name.
Hangul syllable code points have the correct names, and their decompositions are always output without requiring Lingua::KO::Hangul::Util to be installed.
CJK (Chinese-Japanese-Korean) code points U+2A700 to U+2B734 and U+2B740 to U+2B81D are now properly handled.
Numeric values are now output for those CJK code points that have them.
Names output for code points with multiple aliases are now the corrected ones.
This now correctly returns "Unknown" instead of undef for the script
of a code point that hasn't been assigned another one.
This now correctly returns "No_Block" instead of undef for the block
of a code point that hasn't been assigned to another one.
The version pragma has been upgraded from 0.82 to 0.88.
Because of a bug, now fixed, the is_strict() and is_lax() functions did not work when exported (5.12.1).
The warnings pragma has been upgraded from version 1.09 to 1.12.
Calling use warnings
without arguments is now significantly more efficient.
The warnings::register pragma has been upgraded from version 1.01 to 1.02.
It is now possible to register warning categories other than the names of packages using warnings::register. See perllexwarn(1) for more information.
XSLoader has been upgraded from version 0.10 to 0.13.
VMS::DCLsym has been upgraded from version 1.03 to 1.05.
Two bugs have been fixed [perl #84086]:
The symbol table name was lost when tying a hash, due to a thinko in
TIEHASH
. The result was that all tied hashes interacted with the
local symbol table.
Unless a symbol table name had been explicitly specified in the call
to the constructor, querying the special key :LOCAL
failed to
identify objects connected to the local symbol table.
The Win32 module has been upgraded from version 0.39 to 0.44.
This release has several new functions: Win32::GetSystemMetrics(), Win32::GetProductInfo(), Win32::GetOSDisplayName().
The names returned by Win32::GetOSName() and Win32::GetOSDisplayName() have been corrected.
XS::Typemap has been upgraded from version 0.03 to 0.05.
As promised in Perl 5.12.0's release notes, the following modules have been removed from the core distribution, and if needed should be installed from CPAN instead.
Class::ISA has been removed from the Perl core. Prior version was 0.36.
Pod::Plainer has been removed from the Perl core. Prior version was 1.02.
Switch has been removed from the Perl core. Prior version was 2.16.
The removal of Shell has been deferred until after 5.14, as the implementation of Shell shipped with 5.12.0 did not correctly issue the warning that it was to be removed from core.
perlgpl has been updated to contain GPL version 1, as is included in the README distributed with Perl (5.12.1).
The perldelta files for Perl 5.12.1 to 5.12.3 have been added from the maintenance branch: perl5121delta, perl5122delta, perl5123delta.
New style guide for POD documentation, split mostly from the NOTES section of the pod2man(1) manpage.
See perlhack and perlrepository revamp, below.
The perlmodlib manpage that came with Perl 5.12.0 was missing several modules due to a bug in the script that generates the list. This has been fixed [perl #74332] (5.12.1).
perlebcdic contains a helpful table to use in tr/// to convert
between EBCDIC and Latin1/ASCII. The table was the inverse of the one
it describes, though the code that used the table worked correctly for
the specific example given.
The table has been corrected and the sample code changed to correspond.
The table has also been changed to hex from octal, and the recipes in the pod have been altered to print out leading zeros to make all values the same length.
perlunicode now contains an explanation of how to override, mangle and otherwise tweak the way Perl handles upper-, lower- and other-case conversions on Unicode data, and how to provide scoped changes to alter one's own code's behaviour without stomping on anybody else's.
This was already true, but it's now Officially Stated For The Record (5.12.2).
\xHH and \oOOO escapesperlop has been updated with more detailed explanation of these two character escapes.
In perlrun, the behaviour of the -0NNN switch for -0400 or higher has been clarified (5.12.2).
perlpolicy now contains the policy on what patches are acceptable for maintenance branches (5.12.1).
perlpolicy now contains the policy on compatibility and deprecation along with definitions of terms like "deprecation" (5.12.2).
The following existing diagnostics are now documented:
perlbook has been expanded to cover many more popular books.
SvTRUE
macroThe documentation for the SvTRUE
macro in
perlapi was simply wrong in stating that
get-magic is not processed. It has been corrected.
Several API functions that process optrees have been newly documented.
perlvar reorders the variables and groups them by topic. Each variable introduced after Perl 5.000 notes the first version in which it is available. perlvar also has a new section for deprecated variables to note when they were removed.
These are now documented in perldata.
use locale
and formatsperlform and perllocale have been corrected to state that
use locale
affects formats.
overload's documentation has practically undergone a rewrite. It is now much more straightforward and clear.
The perlhack document is now much shorter, and focuses on the Perl 5 development process and submitting patches to Perl. The technical content has been moved to several new documents, perlsource, perlinterp, perlhacktut, and perlhacktips. This technical content has been only lightly edited.
The perlrepository document has been renamed to perlgit. This new document is just a how-to on using git with the Perl source code. Any other content that used to be in perlrepository has been moved to perlhack.
Examples in perlfaq4 have been updated to show the use of Time::Piece.
The following additions or changes have been made to diagnostic output, including warnings and fatal error messages. For the complete list of diagnostic messages, see perldiag.
This error occurs when a subroutine reference passed to an attribute handler is called, if the subroutine is a closure [perl #68560].
Perl detected tainted data when trying to compile a regular
expression that contains a call to a user-defined character property
function, meaning \p{IsFoo}
or \p{InFoo}
.
See User-Defined Character Properties in perlunicode and perlsec.
This new error is triggered if a destructor called on an object in a typeglob that is being freed creates a new typeglob entry containing an object with a destructor that creates a new entry containing an object etc.
This new fatal error is produced when parsing code supplied by an extension violates the parser's API in a detectable way.
This new error only occurs if a internal consistency check fails when a pipe is about to be closed.
The regular expression pattern has one of the mutually exclusive modifiers repeated.
The regular expression pattern has more than one of the mutually exclusive modifiers.
Use of an unescaped "{" immediately following a \b
or \B
is now
deprecated in order to reserve its use for Perl itself in a future release.
Performing an operation requiring Unicode semantics (such as case-folding) on a Unicode surrogate or a non-Unicode character now triggers this warning.
See Use of qw(...) as parentheses, above, for details.
The "Variable $foo is not imported" warning that precedes a
strict 'vars'
error has now been assigned the "misc" category, so that
no warnings
will suppress it [perl #73712].
warn() and die() now produce "Wide character" warnings when fed a
character outside the byte range if STDERR
is a byte-sized handle.
The "Layer does not match this perl" error message has been replaced with these more helpful messages [perl #73754]:
PerlIO layer function table size (%d) does not match size expected by this perl (%d)
PerlIO layer instance size (%d) does not match size expected by this perl (%d)
The "Found = in conditional" warning that is emitted when a constant is
assigned to a variable in a condition is now withheld if the constant is
actually a subroutine or one generated by use constant
, since the value
of the constant may not be known at the time the program is written
[perl #77762].
Previously, if none of the gethostbyaddr(), gethostbyname() and gethostent() functions were implemented on a given platform, they would all die with the message "Unsupported socket function 'gethostent' called", with analogous messages for getnet*() and getserv*(). This has been corrected.
The warning message about unrecognized regular expression escapes passed through has been changed to include any literal "{" following the two-character escape. For example, "\q{" is now emitted instead of "\q".
perlbug now looks in the EMAIL environment variable for a return address if the REPLY-TO and REPLYTO variables are empty.
perlbug did not previously generate a "From:" header, potentially resulting in dropped mail; it now includes that header.
The user's address is now used as the Return-Path.
Many systems these days don't have a valid Internet domain name, and perlbug@perl.org does not accept email with a return-path that does not resolve. So the user's address is now passed to sendmail so it's less likely to get stuck in a mail queue somewhere [perl #82996].
perlbug now always gives the reporter a chance to change the email address it guesses for them (5.12.2).
perlbug should no longer warn about uninitialized values when using the -d and -v options (5.12.2).
The remote terminal works after forking and spawns new sessions, one per forked process.
ptargrep is a new utility to apply pattern matching to the contents of
files in a tar archive. It comes with Archive::Tar
.
See also Naming fixes in Policy_sh.SH may invalidate Policy.sh, above.
CCINCDIR and CCLIBDIR for the mingw64 cross-compiler are now correctly under $(CCHOME)\mingw\include and \lib rather than immediately below $(CCHOME).
This means the "incpath", "libpth", "ldflags", "lddlflags" and "ldflags_nolargefiles" values in Config.pm and Config_heavy.pl are now set correctly.
make test.valgrind
has been adjusted to account for cpan/dist/ext
separation.
On compilers that support it, -Wwrite-strings is now added to cflags by default.
The Encode module can now (once again) be included in a static Perl build. The special-case handling for this situation got broken in Perl 5.11.0, and has now been repaired.
The previous default size of a PerlIO buffer (4096 bytes) has been increased to the larger of 8192 bytes and your local BUFSIZ. Benchmarks show that doubling this decade-old default increases read and write performance by around 25% to 50% when using the default layers of perlio on top of unix. To choose a non-default size, such as to get back the old value or to obtain an even larger value, configure with:
- ./Configure -Accflags=-DPERLIOBUF_DEFAULT_BUFSIZ=N
where N is the desired size in bytes; it should probably be a multiple of your page size.
An "incompatible operand types" error in ternary expressions when building
with clang
has been fixed (5.12.2).
Perl now skips setuid File::Copy tests on partitions it detects mounted
as nosuid
(5.12.2).
The last vestiges of support for this platform have been excised from the Perl distribution. It was officially discontinued in version 5.12.0. It had not worked for years before that.
The last vestiges of support for this platform have been excised from the Perl distribution. It was officially discontinued in an earlier version.
README.aix has been updated with information about the XL C/C++ V11 compiler suite (5.12.2).
The d_u32align
configuration probe on ARM has been fixed (5.12.2).
MakeMaker has been updated to build manpages on cygwin.
Improved rebase behaviour
If a DLL is updated on cygwin the old imagebase address is reused. This solves most rebase errors, especially when updating on core DLL's. See http://www.tishler.net/jason/software/rebase/rebase-2.4.2.README for more information.
Support for the standard cygwin dll prefix (needed for FFIs)
Updated build hints file
FreeBSD 7 no longer contains /usr/bin/objformat. At build time, Perl now skips the objformat check for versions 7 and higher and assumes ELF (5.12.1).
Perl now allows -Duse64bitint without promoting to use64bitall
on HP-UX
(5.12.1).
Conversion of strings to floating-point numbers is now more accurate on IRIX systems [perl #32380].
Early versions of Mac OS X (Darwin) had buggy implementations of the setregid(), setreuid(), setrgid(,) and setruid() functions, so Perl would pretend they did not exist.
These functions are now recognised on Mac OS 10.5 (Leopard; Darwin 9) and higher, as they have been fixed [perl #72990].
Previously if you built Perl with a shared libperl.so on MirBSD (the default config), it would work up to the installation; however, once installed, it would be unable to find libperl. Path handling is now treated as in the other BSD dialects.
The NetBSD hints file has been changed to make the system malloc the default.
OpenBSD > 3.7 has a new malloc implementation which is mmap-based, and as such can release memory back to the OS; however, Perl's use of this malloc causes a substantial slowdown, so we now default to using Perl's malloc instead [perl #75742].
Perl now builds again with OpenVOS (formerly known as Stratus VOS) [perl #78132] (5.12.3).
DTrace is now supported on Solaris. There used to be build failures, but these have been fixed [perl #73630] (5.12.3).
Extension building on older (pre 7.3-2) VMS systems was broken because configure.com hit the DCL symbol length limit of 1K. We now work within this limit when assembling the list of extensions in the core build (5.12.1).
We fixed configuring and building Perl with -Uuseperlio (5.12.1).
PerlIOUnix_open
now honours the default permissions on VMS.
When perlio
became the default and unix
became the default bottom layer,
the most common path for creating files from Perl became PerlIOUnix_open
,
which has always explicitly used 0666
as the permission mask. This prevents
inheriting permissions from RMS defaults and ACLs, so to avoid that problem,
we now pass 0777
to open(). In the VMS CRTL, 0777
has a special
meaning over and above intersecting with the current umask; specifically, it
allows Unix syscalls to preserve native default permissions (5.12.3).
The shortening of symbols longer than 31 characters in the core C sources and in extensions is now by default done by the C compiler rather than by xsubpp (which could only do so for generated symbols in XS code). You can reenable xsubpp's symbol shortening by configuring with -Uuseshortenedsymbols, but you'll have some work to do to get the core sources to compile.
Record-oriented files (record format variable or variable with fixed control)
opened for write by the perlio
layer will now be line-buffered to prevent the
introduction of spurious line breaks whenever the perlio buffer fills up.
git_version.h is now installed on VMS. This was an oversight in v5.12.0 which caused some extensions to fail to build (5.12.2).
Several memory leaks in stat FILEHANDLE have been fixed (5.12.2).
A memory leak in Perl_rename() due to a double allocation has been fixed (5.12.2).
A memory leak in vms_fid_to_name() (used by realpath() and realname()> has been fixed (5.12.2).
See also fork() emulation will not wait for signalled children and Perl source code is read in text mode on Windows, above.
Fixed build process for SDK2003SP1 compilers.
Compilation with Visual Studio 2010 is now supported.
When using old 32-bit compilers, the define _USE_32BIT_TIME_T
is now
set in $Config{ccflags}
. This improves portability when compiling
XS extensions using new compilers, but for a Perl compiled with old 32-bit
compilers.
$Config{gccversion}
is now set correctly when Perl is built using the
mingw64 compiler from http://mingw64.org [perl #73754].
When building Perl with the mingw64 x64 cross-compiler incpath
,
libpth
, ldflags
, lddlflags
and ldflags_nolargefiles
values
in Config.pm and Config_heavy.pl were not previously being set
correctly because, with that compiler, the include and lib directories
are not immediately below $(CCHOME) (5.12.2).
The build process proceeds more smoothly with mingw and dmake when
C:\MSYS\bin is in the PATH, due to a Cwd
fix.
Support for building with Visual C++ 2010 is now underway, but is not yet complete. See README.win32 or perlwin32 for more details.
The option to use an externally-supplied crypt(), or to build with no crypt() at all, has been removed. Perl supplies its own crypt() implementation for Windows, and the political situation that required this part of the distribution to sometimes be omitted is long gone.
Modules that create threads should now create CLONE_PARAMS
structures
by calling the new function Perl_clone_params_new(), and free them with
Perl_clone_params_del(). This will ensure compatibility with any future
changes to the internals of the CLONE_PARAMS
structure layout, and that
it is correctly allocated and initialised.
Several functions have been added for parsing Perl statements and expressions. These functions are meant to be used by XS code invoked during Perl parsing, in a recursive-descent manner, to allow modules to augment the standard Perl syntax.
parse_stmtseq() parses a sequence of statements, up to closing brace or EOF.
parse_fullstmt() parses a complete Perl statement, including optional label.
parse_barestmt() parses a statement without a label.
parse_block() parses a code block.
parse_label() parses a statement label, separate from statements.
parse_fullexpr() , parse_listexpr() , parse_termexpr() , and parse_arithexpr() parse expressions at various precedence levels.
A new C API for introspecting the hinthash %^H
at runtime has been
added. See cop_hints_2hv
, cop_hints_fetchpvn
, cop_hints_fetchpvs
,
cop_hints_fetchsv
, and hv_copy_hints_hv
in perlapi for details.
A new, experimental API has been added for accessing the internal
structure that Perl uses for %^H
. See the functions beginning with
cophh_
in perlapi.
The caller_cx
function has been added as an XSUB-writer's equivalent of
caller(). See perlapi for details.
XS code in an extension module can now annotate a subroutine (whether
implemented in XS or in Perl) so that nominated XS code will be called
at compile time (specifically as part of op checking) to change the op
tree of that subroutine. The compile-time check function (supplied by
the extension module) can implement argument processing that can't be
expressed as a prototype, generate customised compile-time warnings,
perform constant folding for a pure function, inline a subroutine
consisting of sufficiently simple ops, replace the whole call with a
custom op, and so on. This was previously all possible by hooking the
entersub
op checker, but the new mechanism makes it easy to tie the
hook to a specific subroutine. See cv_set_call_checker in perlapi.
To help in writing custom check hooks, several subtasks within standard
entersub
op checking have been separated out and exposed in the API.
Custom ops can now be registered with the new custom_op_register
C
function and the XOP
structure. This will make it easier to add new
properties of custom ops in the future. Two new properties have been added
already, xop_class
and xop_peep
.
xop_class
is one of the OA_*OP constants. It allows B and other
introspection mechanisms to work with custom ops
that aren't BASEOPs. xop_peep
is a pointer to
a function that will be called for ops of this
type from Perl_rpeep
.
See Custom Operators in perlguts and Custom Operators in perlapi for more detail.
The old PL_custom_op_names
/PL_custom_op_descs
interface is still
supported but discouraged.
It is now possible for XS code to hook into Perl's lexical scope
mechanism at compile time, using the new Perl_blockhook_register
function. See Compile-time scope hooks in perlguts.
In addition to PL_peepp
, for hooking into the toplevel peephole optimizer, a
PL_rpeepp
is now available to hook into the optimizer recursing into
side-chains of the optree.
The following functions/macros have been added to the API. The *_nomg
macros are equivalent to their non-_nomg
variants, except that they ignore
get-magic. Those ending in _flags
allow one to specify whether
get-magic is processed.
- sv_2bool_flags
- SvTRUE_nomg
- sv_2nv_flags
- SvNV_nomg
- sv_cmp_flags
- sv_cmp_locale_flags
- sv_eq_flags
- sv_collxfrm_flags
In some of these cases, the non-_flags
functions have
been replaced with wrappers around the new functions.
Many functions ending with pvn now have equivalent pv/pvs/sv
versions.
List op-building functions have been added to the API. See op_append_elem, op_append_list, and op_prepend_elem in perlapi.
LINKLIST
The LINKLIST macro, part of op building that constructs the execution-order op chain, has been added to the API.
The save_freeop
, save_op
, save_pushi32ptr
and save_pushptrptr
functions have been added to the API.
A stash can now have a list of effective names in addition to its usual
name. The first effective name can be accessed via the HvENAME
macro,
which is now the recommended name to use in MRO linearisations (HvNAME
being a fallback if there is no HvENAME
).
These names are added and deleted via hv_ename_add
and
hv_ename_delete
. These two functions are not part of the API.
The mg_findext() and
sv_unmagicext()
functions have been added to the API.
They allow extension authors to find and remove magic attached to
scalars based on both the magic type and the magic virtual table, similar to how
sv_magicext() attaches magic of a certain type and with a given virtual table
to a scalar. This eliminates the need for extensions to walk the list of
MAGIC
pointers of an SV
to find the magic that belongs to them.
find_rundefsv
This function returns the SV representing $_
, whether it's lexical
or dynamic.
Perl_croak_no_modify
Perl_croak_no_modify() is short-hand for
Perl_croak("%s", PL_no_modify)
.
PERL_STATIC_INLINE
defineThe PERL_STATIC_INLINE
define has been added to provide the best-guess
incantation to use for static inline functions, if the C compiler supports
C99-style static inline. If it doesn't, it'll give a plain static
.
HAS_STATIC_INLINE
can be used to check if the compiler actually supports
inline functions.
pv_escape
option for hexadecimal escapesA new option, PERL_PV_ESCAPE_NONASCII
, has been added to pv_escape
to
dump all characters above ASCII in hexadecimal. Before, one could get all
characters as hexadecimal or the Latin1 non-ASCII as octal.
lex_start
lex_start
has been added to the API, but is considered experimental.
The op_scope() and op_lvalue() functions have been added to the API, but are considered experimental.
PERL_POLLUTE
has been removedThe option to define PERL_POLLUTE
to expose older 5.005 symbols for
backwards compatibility has been removed. Its use was always discouraged,
and MakeMaker contains a more specific escape hatch:
- perl Makefile.PL POLLUTE=1
This can be used for modules that have not been upgraded to 5.6 naming conventions (and really should be completely obsolete by now).
When Perl's API changes in incompatible ways (which usually happens between major releases), XS modules compiled for previous versions of Perl will no longer work. They need to be recompiled against the new Perl.
The XS_APIVERSION_BOOTCHECK
macro has been added to ensure that modules
are recompiled and to prevent users from accidentally loading modules
compiled for old perls into newer perls. That macro, which is called when
loading every newly compiled extension, compares the API version of the
running perl with the version a module has been compiled for and raises an
exception if they don't match.
The first argument of the C API function Perl_fetch_cop_label
has changed
from struct refcounted_he *
to COP *
, to insulate the user from
implementation details.
This API function was marked as "may change", and likely isn't in use outside the core. (Neither an unpacked CPAN nor Google's codesearch finds any other references to it.)
The new GvCV_set() and GvGP_set() macros are now provided to replace assignment to those two macros.
This allows a future commit to eliminate some backref magic between GV
and CVs, which will require complete control over assignment to the
gp_cv
slot.
Under some circumstances, the CvGV() field of a CV is now
reference-counted. To ensure consistent behaviour, direct assignment to
it, for example CvGV(cv) = gv
is now a compile-time error. A new macro,
CvGV_set(cv,gv)
has been introduced to run this operation
safely. Note that modification of this field is not part of the public
API, regardless of this new macro (and despite its being listed in this section).
The CvSTASH() macro can now only be used as an rvalue. CvSTASH_set() has been added to replace assignment to CvSTASH(). This is to ensure that backreferences are handled properly. These macros are not part of the API.
newFOROP
and newWHILEOP
The way the parser handles labels has been cleaned up and refactored. As a result, the newFOROP() constructor function no longer takes a parameter stating what label is to go in the state op.
The newWHILEOP() and newFOROP() functions no longer accept a line number as a parameter.
uvuni_to_utf8_flags
and utf8n_to_uvuni
Some of the flags parameters to uvuni_to_utf8_flags() and
utf8n_to_uvuni() have changed. This is a result of Perl's now allowing
internal storage and manipulation of code points that are problematic
in some situations. Hence, the default actions for these functions has
been complemented to allow these code points. The new flags are
documented in perlapi. Code that requires the problematic code
points to be rejected needs to change to use the new flags. Some flag
names are retained for backward source compatibility, though they do
nothing, as they are now the default. However the flags
UNICODE_ALLOW_FDD0
, UNICODE_ALLOW_FFFF
, UNICODE_ILLEGAL
, and
UNICODE_IS_ILLEGAL
have been removed, as they stem from a
fundamentally broken model of how the Unicode non-character code points
should be handled, which is now described in
Non-character code points in perlunicode. See also the Unicode section
under Selected Bug Fixes.
Perl_ptr_table_clear
Perl_ptr_table_clear
is no longer part of Perl's public API. Calling it
now generates a deprecation warning, and it will be removed in a future
release.
sv_compile_2op
The sv_compile_2op() API function is now deprecated. Searches suggest that nothing on CPAN is using it, so this should have zero impact.
It attempted to provide an API to compile code down to an optree, but failed to bind correctly to lexicals in the enclosing scope. It's not possible to fix this problem within the constraints of its parameters and return value.
find_rundefsvoffset
The find_rundefsvoffset
function has been deprecated. It appeared that
its design was insufficient for reliably getting the lexical $_
at
run-time.
Use the new find_rundefsv
function or the UNDERBAR
macro
instead. They directly return the right SV
representing $_
, whether it's
lexical or dynamic.
CALL_FPTR
and CPERLscope
Those are left from an old implementation of MULTIPLICITY
using C++ objects,
which was removed in Perl 5.8. Nowadays these macros do exactly nothing, so
they shouldn't be used anymore.
For compatibility, they are still defined for external XS
code. Only
extensions defining PERL_CORE
must be updated now.
The protocol for unwinding the C stack at the last stage of a die
has changed how it identifies the target stack frame. This now uses
a separate variable PL_restartjmpenv
, where previously it relied on
the blk_eval.cur_top_env
pointer in the eval context frame that
has nominally just been discarded. This change means that code running
during various stages of Perl-level unwinding no longer needs to take
care to avoid destroying the ghost frame.
The format of entries on the scope stack has been changed, resulting in a reduction of memory usage of about 10%. In particular, the memory used by the scope stack to record each active lexical variable has been halved.
Memory allocation for pointer tables has been changed. Previously
Perl_ptr_table_store
allocated memory from the same arena system as
SV
bodies and HE
s, with freed memory remaining bound to those arenas
until interpreter exit. Now it allocates memory from arenas private to the
specific pointer table, and that memory is returned to the system when
Perl_ptr_table_free
is called. Additionally, allocation and release are
both less CPU intensive.
UNDERBAR
The UNDERBAR
macro now calls find_rundefsv
. dUNDERBAR
is now a
noop but should still be used to ensure past and future compatibility.
The ibcmp_*
functions have been renamed and are now called foldEQ
,
foldEQ_locale
, and foldEQ_utf8
. The old names are still available as
macros.
chop and chomp implementations mergedThe opcode bodies for chop and chomp and for schop
and schomp
have been merged. The implementation functions Perl_do_chop() and
Perl_do_chomp(), never part of the public API, have been merged and
moved to a static function in pp.c. This shrinks the Perl binary
slightly, and should not affect any code outside the core (unless it is
relying on the order of side-effects when chomp is passed a list of
values).
Perl no longer produces this warning:
- $ perl -we 'open(my $f, ">", \my $x); binmode($f, "scalar")'
- Use of uninitialized value in binmode at -e line 1.
Opening a glob reference via open($fh, ">", \*glob)
no longer
causes the glob to be corrupted when the filehandle is printed to. This would
cause Perl to crash whenever the glob's contents were accessed
[perl #77492].
PerlIO no longer crashes when called recursively, such as from a signal handler. Now it just leaks memory [perl #75556].
Most I/O functions were not warning for unopened handles unless the
"closed" and "unopened" warnings categories were both enabled. Now only
use warnings 'unopened'
is necessary to trigger these warnings, as
had always been the intention.
There have been several fixes to PerlIO layers:
When binmode(FH, ":crlf")
pushes the :crlf
layer on top of the stack,
it no longer enables crlf layers lower in the stack so as to avoid
unexpected results [perl #38456].
Opening a file in :raw
mode now does what it advertises to do (first
open the file, then binmode it), instead of simply leaving off the top
layer [perl #80764].
The three layers :pop
, :utf8
, and :bytes
didn't allow stacking when
opening a file. For example
this:
would throw an "Invalid argument" error. This has been fixed in this release [perl #82484].
The regular expression engine no longer loops when matching
"\N{LATIN SMALL LIGATURE FF}" =~ /f+/i
and similar expressions
[perl #72998] (5.12.1).
The trie runtime code should no longer allocate massive amounts of memory, fixing #74484.
Syntax errors in (?{...}) blocks no longer cause panic messages
[perl #2353].
A pattern like (?:(o){2})?
no longer causes a "panic" error
[perl #39233].
A fatal error in regular expressions containing (.*?)
when processing
UTF-8 data has been fixed [perl #75680] (5.12.2).
An erroneous regular expression engine optimisation that caused regex verbs like
*COMMIT
sometimes to be ignored has been removed.
The regular expression bracketed character class [\8\9]
was effectively the
same as [89\000]
, incorrectly matching a NULL character. It also gave
incorrect warnings that the 8
and 9
were ignored. Now [\8\9]
is the
same as [89]
and gives legitimate warnings that \8
and \9
are
unrecognized escape sequences, passed-through.
A regular expression match in the right-hand side of a global substitution
(s///g) that is in the same scope will no longer cause match variables
to have the wrong values on subsequent iterations. This can happen when an
array or hash subscript is interpolated in the right-hand side, as in
s|(.)|@a{ print($1), /./ }|g
[perl #19078].
Several cases in which characters in the Latin-1 non-ASCII range (0x80 to
0xFF) used not to match themselves, or used to match both a character class
and its complement, have been fixed. For instance, U+00E2 could match both
\w
and \W
[perl #78464] [perl #18281] [perl #60156].
Matching a Unicode character against an alternation containing characters
that happened to match continuation bytes in the former's UTF8
representation (like qq{\x{30ab}} =~ /\xab|\xa9/
) would cause erroneous
warnings [perl #70998].
The trie optimisation was not taking empty groups into account, preventing
"foo" from matching /\A(?:(?:)foo|bar|zot)\z/
[perl #78356].
A pattern containing a +
inside a lookahead would sometimes cause an
incorrect match failure in a global match (for example, /(?=(\S+))/g
)
[perl #68564].
A regular expression optimisation would sometimes cause a match with a
{n,m} quantifier to fail when it should have matched [perl #79152].
Case-insensitive matching in regular expressions compiled under
use locale
now works much more sanely when the pattern or target
string is internally encoded in UTF8. Previously, under these
conditions the localeness was completely lost. Now, code points
above 255 are treated as Unicode, but code points between 0 and 255
are treated using the current locale rules, regardless of whether
the pattern or the string is encoded in UTF8. The few case-insensitive
matches that cross the 255/256 boundary are not allowed. For
example, 0xFF does not caselessly match the character at 0x178,
LATIN CAPITAL LETTER Y WITH DIAERESIS, because 0xFF may not be LATIN
SMALL LETTER Y in the current locale, and Perl has no way of knowing
if that character even exists in the locale, much less what code
point it is.
The (?|...) regular expression construct no longer crashes if the final
branch has more sets of capturing parentheses than any other branch. This
was fixed in Perl 5.10.1 for the case of a single branch, but that fix did
not take multiple branches into account [perl #84746].
A bug has been fixed in the implementation of {...}
quantifiers in
regular expressions that prevented the code block in
/((\w+)(?{ print $2 })){2}/
from seeing the $2
sometimes
[perl #84294].
when (scalar) {...}
no longer crashes, but produces a syntax error
[perl #74114] (5.12.1).
A label right before a string eval (foo: eval $string
) no longer causes
the label to be associated also with the first statement inside the eval
[perl #74290] (5.12.1).
The no 5.13.2
form of no no longer tries to turn on features or
pragmata (like strict) [perl #70075] (5.12.2).
BEGIN {require 5.12.0}
now behaves as documented, rather than behaving
identically to use 5.12.0
. Previously, require in a BEGIN
block
was erroneously executing the use feature ':5.12.0'
and
use strict
behaviour, which only use was documented to
provide [perl #69050].
A regression introduced in Perl 5.12.0, making
my $x = 3; $x = length(undef)
result in $x
set to 3
has been
fixed. $x
will now be undef [perl #85508] (5.12.2).
When strict "refs" mode is off, %{...}
in rvalue context returns
undef if its argument is undefined. An optimisation introduced in Perl
5.12.0 to make keys %{...}
faster when used as a boolean did not take
this into account, causing keys %{+undef}
(and keys %$foo
when
$foo
is undefined) to be an error, which it should be so in strict
mode only [perl #81750].
Constant-folding used to cause
- $text =~ ( 1 ? /phoo/ : /bear/)
to turn into
- $text =~ /phoo/
at compile time. Now it correctly matches against $_
[perl #20444].
Parsing Perl code (either with string eval or by loading modules) from
within a UNITCHECK
block no longer causes the interpreter to crash
[perl #70614].
String evals no longer fail after 2 billion scopes have been
compiled [perl #83364].
The parser no longer hangs when encountering certain Unicode characters, such as U+387 [perl #74022].
Defining a constant with the same name as one of Perl's special blocks
(like INIT
) stopped working in 5.12.0, but has now been fixed
[perl #78634].
A reference to a literal value used as a hash key ($hash{\"foo"}
) used
to be stringified, even if the hash was tied [perl #79178].
A closure containing an if
statement followed by a constant or variable
is no longer treated as a constant [perl #63540].
state can now be used with attributes. It
used to mean the same thing as
my if any attributes were present [perl #68658].
Expressions like @$a > 3
no longer cause $a
to be mentioned in
the "Use of uninitialized value in numeric gt" warning when $a
is
undefined (since it is not part of the > expression, but the operand
of the @
) [perl #72090].
Accessing an element of a package array with a hard-coded number (as opposed to an arbitrary expression) would crash if the array did not exist. Usually the array would be autovivified during compilation, but typeglob manipulation could remove it, as in these two cases which used to crash:
The -C command-line option, when used on the shebang line, can now be followed by other options [perl #72434].
The B
module was returning B::OP
s instead of B::LOGOP
s for
entertry
[perl #80622]. This was due to a bug in the Perl core,
not in B
itself.
Perl 5.10.0 introduced a new internal mechanism for caching MROs (method
resolution orders, or lists of parent classes; aka "isa" caches) to make
method lookup faster (so @ISA
arrays would not have to be searched
repeatedly). Unfortunately, this brought with it quite a few bugs. Almost
all of these have been fixed now, along with a few MRO-related bugs that
existed before 5.10.0:
The following used to have erratic effects on method resolution, because the "isa" caches were not reset or otherwise ended up listing the wrong classes. These have been fixed.
undef *Foo::
)
undef *Foo::ISA
)
delete $Foo::{ISA}
)
*Foo::ISA = \@Bar::ISA
or
*Foo::ISA = *Bar::ISA
) [perl #77238]
undef *Foo::ISA
would even stop a new @Foo::ISA
array from updating
caches.
Typeglob assignments would crash if the glob's stash no longer existed, so
long as the glob assigned to were named ISA
or the glob on either side of
the assignment contained a subroutine.
PL_isarev
, which is accessible to Perl via mro::get_isarev
is now
updated properly when packages are deleted or removed from the @ISA
of
other classes. This allows many packages to be created and deleted without
causing a memory leak [perl #75176].
In addition, various other bugs related to typeglobs and stashes have been fixed:
Some work has been done on the internal pointers that link between symbol tables (stashes), typeglobs, and subroutines. This has the effect that various edge cases related to deleting stashes or stash entries (for example, <%FOO:: = ()>), and complex typeglob or code-reference aliasing, will no longer crash the interpreter.
Assigning a reference to a glob copy now assigns to a glob slot instead of overwriting the glob with a scalar [perl #1804] [perl #77508].
A bug when replacing the glob of a loop variable within the loop has been fixed [perl #21469]. This means the following code will no longer crash:
- for $x (...) {
- *x = *y;
- }
Assigning a glob to a PVLV used to convert it to a plain string. Now it works correctly, and a PVLV can hold a glob. This would happen when a nonexistent hash or array element was passed to a subroutine:
- sub { $_[0] = *foo }->($hash{key});
- # $_[0] would have been the string "*main::foo"
It also happened when a glob was assigned to, or returned from, an element of a tied array or hash [perl #36051].
When trying to report Use of uninitialized value $Foo::BAR
, crashes could
occur if the glob holding the global variable in question had been detached
from its original stash by, for example, delete $::{"Foo::"}
. This has
been fixed by disabling the reporting of variable names in those
cases.
During the restoration of a localised typeglob on scope exit, any destructors called as a result would be able to see the typeglob in an inconsistent state, containing freed entries, which could result in a crash. This would affect code like this:
Now the glob entries are cleared before any destructors are called. This also means that destructors can vivify entries in the glob. So Perl tries again and, if the entries are re-created too many times, dies with a "panic: gp_free ..." error message.
If a typeglob is freed while a subroutine attached to it is still
referenced elsewhere, the subroutine is renamed to __ANON__
in the same
package, unless the package has been undefined, in which case the __ANON__
package is used. This could cause packages to be sometimes autovivified,
such as if the package had been deleted. Now this no longer occurs.
The __ANON__
package is also now used when the original package is
no longer attached to the symbol table. This avoids memory leaks in some
cases [perl #87664].
Subroutines and package variables inside a package whose name ends with
::
can now be accessed with a fully qualified name.
What has become known as "the Unicode Bug" is almost completely resolved in
this release. Under use feature 'unicode_strings'
(which is
automatically selected by use 5.012
and above), the internal
storage format of a string no longer affects the external semantics.
[perl #58182].
There are two known exceptions:
The now-deprecated, user-defined case-changing functions require utf8-encoded strings to operate. The CPAN module Unicode::Casing has been written to replace this feature without its drawbacks, and the feature is scheduled to be removed in 5.16.
quotemeta() (and its in-line equivalent \Q
) can also give different
results depending on whether a string is encoded in UTF-8. See
The Unicode Bug in perlunicode.
Handling of Unicode non-character code points has changed. Previously they were mostly considered illegal, except that in some place only one of the 66 of them was known. The Unicode Standard considers them all legal, but forbids their "open interchange". This is part of the change to allow internal use of any code point (see Core Enhancements). Together, these changes resolve [perl #38722], [perl #51918], [perl #51936], and [perl #63446].
Case-insensitive "/i"
regular expression matching of Unicode
characters that match multiple characters now works much more as
intended. For example
- "\N{LATIN SMALL LIGATURE FFI}" =~ /ffi/ui
and
- "ffi" =~ /\N{LATIN SMALL LIGATURE FFI}/ui
are both true. Previously, there were many bugs with this feature. What hasn't been fixed are the places where the pattern contains the multiple characters, but the characters are split up by other things, such as in
- "\N{LATIN SMALL LIGATURE FFI}" =~ /(f)(f)i/ui
or
- "\N{LATIN SMALL LIGATURE FFI}" =~ /ffi*/ui
or
- "\N{LATIN SMALL LIGATURE FFI}" =~ /[a-f][f-m][g-z]/ui
None of these match.
Also, this matching doesn't fully conform to the current Unicode Standard, which asks that the matching be made upon the NFD (Normalization Form Decomposed) of the text. However, as of this writing (April 2010), the Unicode Standard is currently in flux about what they will recommend doing with regard in such scenarios. It may be that they will throw out the whole concept of multi-character matches. [perl #71736].
Naming a deprecated character in \N{NAME} no longer leaks memory.
We fixed a bug that could cause \N{NAME} constructs followed by
a single "."
to be parsed incorrectly [perl #74978] (5.12.1).
chop now correctly handles characters above "\x{7fffffff}"
[perl #73246].
Passing to index an offset beyond the end of the string when the string
is encoded internally in UTF8 no longer causes panics [perl #75898].
warn() and die() now respect utf8-encoded scalars [perl #45549].
Sometimes the UTF8 length cache would not be reset on a value
returned by substr, causing length(substr($uni_string, ...))
to give
wrong answers. With ${^UTF8CACHE}
set to -1, it would also produce
a "panic" error message [perl #77692].
Overloading now works properly in conjunction with tied variables. What formerly happened was that most ops checked their arguments for overloading before checking for magic, so for example an overloaded object returned by a tied array access would usually be treated as not overloaded [RT #57012].
Various instances of magic (like tie methods) being called on tied variables too many or too few times have been fixed:
$tied->()
did not always call FETCH [perl #8438].
Filetest operators and y/// and tr/// were calling FETCH too
many times.
The =
operator used to ignore magic on its right-hand side if the
scalar happened to hold a typeglob (if a typeglob was the last thing
returned from or assigned to a tied scalar) [perl #77498].
Dereference operators used to ignore magic if the argument was a reference already (such as from a previous FETCH) [perl #72144].
splice now calls set-magic (so changes made
by splice @ISA
are respected by method calls) [perl #78400].
In-memory files created by open($fh, ">", \$buffer)
were not calling
FETCH/STORE at all [perl #43789] (5.12.2).
utf8::is_utf8() now respects get-magic (like $1
) (5.12.1).
Non-commutative binary operators used to swap their operands if the same
tied scalar was used for both operands and returned a different value for
each FETCH. For instance, if $t
returned 2 the first time and 3 the
second, then $t/$t
would evaluate to 1.5. This has been fixed
[perl #87708].
String eval now detects taintedness of overloaded or tied
arguments [perl #75716].
String eval and regular expression matches against objects with string
overloading no longer cause memory corruption or crashes [perl #77084].
readline EXPR now honors <>
overloading on tied
arguments.
<expr>
always respects overloading now if the expression is
overloaded.
Because "<> as glob" was parsed differently from
"<> as filehandle" from 5.6 onwards, something like <$foo[0]>
did
not handle overloading, even if $foo[0]
was an overloaded object. This
was contrary to the documentation for overload, and meant that <>
could not be used as a general overloaded iterator operator.
The fallback behaviour of overloading on binary operators was asymmetric [perl #71286].
Magic applied to variables in the main package no longer affects other packages. See Magic variables outside the main package above [perl #76138].
Sometimes magic (ties, taintedness, etc.) attached to variables could cause an object to last longer than it should, or cause a crash if a tied variable were freed from within a tie method. These have been fixed [perl #81230].
DESTROY methods of objects implementing ties are no longer able to crash by accessing the tied variable through a weak reference [perl #86328].
Fixed a regression of kill() when a match variable is used for the process ID to kill [perl #75812].
$AUTOLOAD
used to remain tainted forever if it ever became tainted. Now
it is correctly untainted if an autoloaded method is called and the method
name was not tainted.
sprintf now dies when passed a tainted scalar for the format. It did
already die for arbitrary expressions, but not for simple scalars
[perl #82250].
lc, uc, lcfirst, and ucfirst no longer return untainted strings
when the argument is tainted. This has been broken since perl 5.8.9
[perl #87336].
The Perl debugger now also works in taint mode [perl #76872].
Subroutine redefinition works once more in the debugger [perl #48332].
When -d is used on the shebang (#!
) line, the debugger now has access
to the lines of the main program. In the past, this sometimes worked and
sometimes did not, depending on the order in which things happened to be
arranged in memory [perl #71806].
A possible memory leak when using caller EXPR to set
@DB::args
has been fixed (5.12.2).
Perl no longer stomps on $DB::single
, $DB::trace
, and $DB::signal
if these variables already have values when $^P
is assigned to [perl #72422].
#line
directives in string evals were not properly updating the arrays
of lines of code (@{"_< ..."}
) that the debugger (or any debugging or
profiling module) uses. In threaded builds, they were not being updated at
all. In non-threaded builds, the line number was ignored, so any change to
the existing line number would cause the lines to be misnumbered
[perl #79442].
Perl no longer accidentally clones lexicals in scope within active stack frames in the parent when creating a child thread [perl #73086].
Several memory leaks in cloning and freeing threaded Perl interpreters have been fixed [perl #77352].
Creating a new thread when directory handles were open used to cause a crash, because the handles were not cloned, but simply passed to the new thread, resulting in a double free.
Now directory handles are cloned properly on Windows
and on systems that have a fchdir
function. On other
systems, new threads simply do not inherit directory
handles from their parent threads [perl #75154].
The typeglob *,
, which holds the scalar variable $,
(output field
separator), had the wrong reference count in child threads.
[perl #78494] When pipes are shared between threads, the close function
(and any implicit close, such as on thread exit) no longer blocks.
Perl now does a timely cleanup of SVs that are cloned into a new thread but then discovered to be orphaned (that is, their owners are not cloned). This eliminates several "scalars leaked" warnings when joining threads.
Lvalue subroutines are again able to return copy-on-write scalars. This had been broken since version 5.10.0 [perl #75656] (5.12.3).
require no longer causes caller to return the wrong file name for
the scope that called require and other scopes higher up that had the
same file name [perl #68712].
sort with a ($$)
-prototyped comparison routine used to cause the value
of @_
to leak out of the sort. Taking a reference to @_
within the
sorting routine could cause a crash [perl #72334].
Match variables (like $1
) no longer persist between calls to a sort
subroutine [perl #76026].
Iterating with foreach
over an array returned by an lvalue sub now works
[perl #23790].
$@
is now localised during calls to binmode to prevent action at a
distance [perl #78844].
Calling a closure prototype (what is passed to an attribute handler for a closure) now results in a "Closure prototype called" error message instead of a crash [perl #68560].
Mentioning a read-only lexical variable from the enclosing scope in a
string eval no longer causes the variable to become writable
[perl #19135].
Within signal handlers, $!
is now implicitly localized.
CHLD signals are no longer unblocked after a signal handler is called if
they were blocked before by POSIX::sigprocmask
[perl #82040].
A signal handler called within a signal handler could cause leaks or double-frees. Now fixed [perl #76248].
Several memory leaks when loading XS modules were fixed (5.12.2).
substr EXPR,OFFSET,LENGTH,REPLACEMENT, index STR,SUBSTR,POSITION, keys HASH, and vec EXPR,OFFSET,BITS could, when used in combination with lvalues, result in leaking the scalar value they operate on, and cause its destruction to happen too late. This has now been fixed.
The postincrement and postdecrement operators, ++
and --
, used to cause
leaks when used on references. This has now been fixed.
Nested map and grep blocks no longer leak memory when processing
large lists [perl #48004].
use VERSION and no VERSION no longer leak memory [perl #78436]
[perl #69050].
.=
followed by <>
or readline would leak memory if $/
contained characters beyond the octet range and the scalar assigned to
happened to be encoded as UTF8 internally [perl #72246].
eval 'BEGIN{die}'
no longer leaks memory on non-threaded builds.
glob() no longer crashes when %File::Glob::
is empty and
CORE::GLOBAL::glob
isn't present [perl #75464] (5.12.2).
readline() has been fixed when interrupted by signals so it no longer returns the "same thing" as before or random memory.
When assigning a list with duplicated keys to a hash, the assignment used to return garbage and/or freed values:
- @a = %h = (list with some duplicate keys);
This has now been fixed [perl #31865].
The mechanism for freeing objects in globs used to leave dangling pointers to freed SVs, meaning Perl users could see corrupted state during destruction.
Perl now frees only the affected slots of the GV, rather than freeing the GV itself. This makes sure that there are no dangling refs or corrupted state during destruction.
The interpreter no longer crashes when freeing deeply-nested arrays of arrays. Hashes have not been fixed yet [perl #44225].
Concatenating long strings under use encoding
no longer causes Perl to
crash [perl #78674].
Calling ->import
on a class lacking an import method could corrupt
the stack, resulting in strange behaviour. For instance,
- push @a, "foo", $b = bar->import;
would assign "foo" to $b
[perl #63790].
The recv function could crash when called with the MSG_TRUNC flag
[perl #75082].
formline no longer crashes when passed a tainted format picture. It also
taints $^A
now if its arguments are tainted [perl #79138].
A bug in how we process filetest operations could cause a segfault.
Filetests don't always expect an op on the stack, so we now use
TOPs only if we're sure that we're not stating the _
filehandle.
This is indicated by OPf_KIDS
(as checked in ck_ftst) [perl #74542]
(5.12.1).
unpack() now handles scalar context correctly for %32H
and %32u
,
fixing a potential crash. split() would crash because the third item
on the stack wasn't the regular expression it expected. unpack("%2H",
...)
would return both the unpacked result and the checksum on the stack,
as would unpack("%2u", ...)
[perl #73814] (5.12.2).
The &
, |, and ^ bitwise operators no longer coerce read-only arguments
[perl #20661].
Stringifying a scalar containing "-0.0" no longer has the effect of turning false into true [perl #45133].
Some numeric operators were converting integers to floating point, resulting in loss of precision on 64-bit platforms [perl #77456].
sprintf() was ignoring locales when called with constant arguments [perl #78632].
Combining the vector (%v
) flag and dynamic precision would
cause sprintf to confuse the order of its arguments, making it
treat the string as the precision and vice-versa [perl #83194].
The C-level lex_stuff_pvn
function would sometimes cause a spurious
syntax error on the last line of the file if it lacked a final semicolon
[perl #74006] (5.12.1).
The eval_sv
and eval_pv
C functions now set $@
correctly when
there is a syntax error and no G_KEEPERR
flag, and never set it if the
G_KEEPERR
flag is present [perl #3719].
The XS multicall API no longer causes subroutines to lose reference counts if called via the multicall interface from within those very subroutines. This affects modules like List::Util. Calling one of its functions with an active subroutine as the first argument could cause a crash [perl #78070].
The SvPVbyte
function available to XS modules now calls magic before
downgrading the SV, to avoid warnings about wide characters [perl #72398].
The ref types in the typemap for XS bindings now support magical variables [perl #72684].
sv_catsv_flags
no longer calls mg_get
on its second argument (the
source string) if the flags passed to it do not include SV_GMAGIC. So it
now matches the documentation.
my_strftime
no longer leaks memory. This fixes a memory leak in
POSIX::strftime
[perl #73520].
XSUB.h now correctly redefines fgets under PERL_IMPLICIT_SYS [perl #55049] (5.12.1).
XS code using fputc() or fputs() on Windows could cause an error due to their arguments being swapped [perl #72704] (5.12.1).
A possible segfault in the T_PTROBJ
default typemap has been fixed
(5.12.2).
A bug that could cause "Unknown error" messages when
call_sv(code, G_EVAL)
is called from an XS destructor has been fixed
(5.12.2).
This is a list of significant unresolved issues which are regressions from earlier versions of Perl or which affect widely-used CPAN modules.
List::Util::first
misbehaves in the presence of a lexical $_
(typically introduced by my $_
or implicitly by given
). The variable
that gets set for each iteration is the package variable $_
, not the
lexical $_
.
A similar issue may occur in other modules that provide functions which take a block as their first argument, like
- foo { ... $_ ...} list
See also: http://rt.perl.org/rt3/Public/Bug/Display.html?id=67694
readline() returns an empty string instead of a cached previous value when it is interrupted by a signal
The changes in prototype handling break Switch. A patch has been sent upstream and will hopefully appear on CPAN soon.
The upgrade to ExtUtils-MakeMaker-6.57_05 has caused some tests in the Module-Install distribution on CPAN to fail. (Specifically, 02_mymeta.t tests 5 and 21; 18_all_from.t tests 6 and 15; 19_authors.t tests 5, 13, 21, and 29; and 20_authors_with_special_characters.t tests 6, 15, and 23 in version 1.00 of that distribution now fail.)
On VMS, Time::HiRes
tests will fail due to a bug in the CRTL's
implementation of setitimer
: previous timer values would be cleared
if a timer expired but not if the timer was reset before expiring. HP
OpenVMS Engineering have corrected the problem and will release a patch
in due course (Quix case # QXCM1001115136).
On VMS, there were a handful of Module::Build
test failures we didn't
get to before the release; please watch CPAN for updates.
You can now use the keys(), values(), and each() builtins on arrays; previously you could use them only on hashes. See perlfunc for details. This is actually a change introduced in perl 5.12.0, but it was missed from that release's perl5120delta.
@_
split() no longer modifies @_
when called in scalar or void context.
In void context it now produces a "Useless use of split" warning.
This was also a perl 5.12.0 change that missed the perldelta.
Randy Kobes, creator of http://kobesearch.cpan.org/ and contributor/maintainer to several core Perl toolchain modules, passed away on September 18, 2010 after a battle with lung cancer. The community was richer for his involvement. He will be missed.
Perl 5.14.0 represents one year of development since Perl 5.12.0 and contains nearly 550,000 lines of changes across nearly 3,000 files from 150 authors and committers.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.14.0:
Aaron Crane, Abhijit Menon-Sen, Abigail, Ævar Arnfjörð Bjarmason, Alastair Douglas, Alexander Alekseev, Alexander Hartmaier, Alexandr Ciornii, Alex Davies, Alex Vandiver, Ali Polatel, Allen Smith, Andreas König, Andrew Rodland, Andy Armstrong, Andy Dougherty, Aristotle Pagaltzis, Arkturuz, Arvan, A. Sinan Unur, Ben Morrow, Bo Lindbergh, Boris Ratner, Brad Gilbert, Bram, brian d foy, Brian Phillips, Casey West, Charles Bailey, Chas. Owens, Chip Salzenberg, Chris 'BinGOs' Williams, chromatic, Craig A. Berry, Curtis Jewell, Dagfinn Ilmari Mannsåker, Dan Dascalescu, Dave Rolsky, David Caldwell, David Cantrell, David Golden, David Leadbeater, David Mitchell, David Wheeler, Eric Brine, Father Chrysostomos, Fingle Nark, Florian Ragwitz, Frank Wiegand, Franz Fasching, Gene Sullivan, George Greer, Gerard Goossen, Gisle Aas, Goro Fuji, Grant McLean, gregor herrmann, H.Merijn Brand, Hongwen Qiu, Hugo van der Sanden, Ian Goodacre, James E Keenan, James Mastros, Jan Dubois, Jay Hannah, Jerry D. Hedden, Jesse Vincent, Jim Cromie, Jirka Hruška, John Peacock, Joshua ben Jore, Joshua Pritikin, Karl Williamson, Kevin Ryde, kmx, Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯, Larwan Berke, Leon Brocard, Leon Timmermans, Lubomir Rintel, Lukas Mai, Maik Hentsche, Marty Pauley, Marvin Humphrey, Matt Johnson, Matt S Trout, Max Maischein, Michael Breen, Michael Fig, Michael G Schwern, Michael Parker, Michael Stevens, Michael Witten, Mike Kelly, Moritz Lenz, Nicholas Clark, Nick Cleaton, Nick Johnston, Nicolas Kaiser, Niko Tyni, Noirin Shirley, Nuno Carvalho, Paul Evans, Paul Green, Paul Johnson, Paul Marquess, Peter J. Holzer, Peter John Acklam, Peter Martini, Philippe Bruhat (BooK), Piotr Fusik, Rafael Garcia-Suarez, Rainer Tammer, Reini Urban, Renee Baecker, Ricardo Signes, Richard Möhn, Richard Soderberg, Rob Hoelz, Robin Barker, Ruslan Zakirov, Salvador Fandiño, Salvador Ortiz Garcia, Shlomi Fish, Sinan Unur, Sisyphus, Slaven Rezic, Steffen Müller, Steve Hay, Steven Schubiger, Steve Peters, Sullivan Beck, Tatsuhiko Miyagawa, Tim Bunce, Todd Rinaldo, Tom Christiansen, Tom Hukins, Tony Cook, Tye McQueen, Vadim Konovalov, Vernon Lyon, Vincent Pit, Walt Mankowski, Wolfram Humann, Yves Orton, Zefram, and Zsbán Ambrus.
This is woefully incomplete as it's automatically generated from version
control history. In particular, it doesn't include the names of the
(very much appreciated) contributors who reported issues in previous
versions of Perl that helped make Perl 5.14.0 better. For a more complete
list of all of Perl's historical contributors, please see the AUTHORS
file in the Perl 5.14.0 distribution.
Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the Perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who are able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please use this address for security issues in the Perl core only, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5141delta - what is new for perl v5.14.1
This document describes differences between the 5.14.0 release and the 5.14.1 release.
If you are upgrading from an earlier release such as 5.12.0, first read perl5140delta, which describes differences between 5.12.0 and 5.14.0.
No changes since 5.14.0.
No changes since 5.14.0.
There are no changes intentionally incompatible with 5.14.0. If any exist, they are bugs and reports are welcome.
There have been no deprecations since 5.14.0.
None
B::Deparse has been upgraded from version 1.03 to 1.04, to address two regressions in Perl 5.14.0:
Deparsing of the glob operator and its diamond (<>
) form now
works again. [perl #90898]
The presence of subroutines named ::::
or ::::::
no longer causes
B::Deparse to hang.
Pod::Perldoc has been upgraded from version 3.15_03 to 3.15_04.
It corrects the search paths on VMS. [perl #90640]
None
None
given
, when
and default
are now listed in perlfunc.
Documentation for use now includes a pointer to if.pm.
perllol has been expanded with examples using the new push $scalar
syntax introduced in Perl 5.14.0.
The explanation of bitwise operators has been expanded to explain how they work on Unicode strings.
The section on the triple-dot or yada-yada operator has been moved up, as it used to separate two closely related sections about the comma operator.
More examples for m//g have been added.
The <<\FOO
here-doc syntax has been documented.
perlrun has undergone a significant clean-up. Most notably, the -0x... form of the -0 flag has been clarified, and the final section on environment variables has been corrected and expanded.
The invocation documentation for WIFEXITED
, WEXITSTATUS
,
WIFSIGNALED
, WTERMSIG
, WIFSTOPPED
, and WSTOPSIG
was corrected.
The following additions or changes have been made to diagnostic output, including warnings and fatal error messages. For the complete list of diagnostic messages, see perldiag.
None
None
None
regexp.h has been modified for compatibility with GCC's -Werror
option, as used by some projects that include perl's header files.
Some test failures in dist/Locale-Maketext/t/09_compile.t that could occur depending on the environment have been fixed. [perl #89896]
A watchdog timer for t/re/re.t was lengthened to accommodate SH-4 systems which were unable to complete the tests before the previous timer ran out.
None
None
Documentation listing the Solaris packages required to build Perl on Solaris 9 and Solaris 10 has been corrected.
The lib/locale.t test script has been updated to work on the upcoming Lion release.
Mac OS X specific compilation instructions have been clarified.
The ODBM_File installation process has been updated with the new library paths on Ubuntu natty.
The compiled representation of formats is now stored via the mg_ptr of their PERL_MAGIC_fm. Previously it was stored in the string buffer, beyond SvLEN(), the regular end of the string. SvCOMPILED() and SvCOMPILED_{on,off}() now exist solely for compatibility for XS code. The first is always 0, the other two now no-ops.
A bug has been fixed that would cause a "Use of freed value in iteration" error if the next two hash elements that would be iterated over are deleted. [perl #85026]
Passing the same constant subroutine to both index and formline no
longer causes one or the other to fail. [perl #89218]
5.14.0 introduced some memory leaks in regular expression character
classes such as [\w\s]
, which have now been fixed.
An edge case in regular expression matching could potentially loop.
This happened only under /i in bracketed character classes that have
characters with multi-character folds, and the target string to match
against includes the first portion of the fold, followed by another
character that has a multi-character fold that begins with the remaining
portion of the fold, plus some more.
- "s\N{U+DF}" =~ /[\x{DF}foo]/i
is one such case. \xDF
folds to "ss"
.
Several Unicode case-folding bugs have been fixed.
The new (in 5.14.0) regular expression modifier /a
when repeated like
/aa
forbids the characters outside the ASCII range that match
characters inside that range from matching under /i. This did not
work under some circumstances, all involving alternation, such as:
- "\N{KELVIN SIGN}" =~ /k|foo/iaa;
succeeded inappropriately. This is now fixed.
Fixed a case where it was possible that a freed buffer may have been read from when parsing a here document.
Perl 5.14.1 represents approximately four weeks of development since Perl 5.14.0 and contains approximately 3500 lines of changes across 38 files from 17 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.14.1:
Bo Lindbergh, Claudio Ramirez, Craig A. Berry, David Leadbeater, Father Chrysostomos, Jesse Vincent, Jim Cromie, Justin Case, Karl Williamson, Leo Lapworth, Nicholas Clark, Nobuhiro Iwamatsu, smash, Tom Christiansen, Ton Hospel, Vladimir Timofeev, and Zsbán Ambrus.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5142delta - what is new for perl v5.14.2
This document describes differences between the 5.14.1 release and the 5.14.2 release.
If you are upgrading from an earlier release such as 5.14.0, first read perl5141delta, which describes differences between 5.14.0 and 5.14.1.
No changes since 5.14.0.
File::Glob::bsd_glob()
memory error with GLOB_ALTDIRFUNC (CVE-2011-2728).Calling File::Glob::bsd_glob
with the unsupported flag GLOB_ALTDIRFUNC would
cause an access violation / segfault. A Perl program that accepts a flags value from
an external source could expose itself to denial of service or arbitrary code
execution attacks. There are no known exploits in the wild. The problem has been
corrected by explicitly disabling all unsupported flags and setting unused function
pointers to null. Bug reported by Clément Lecigne.
Encode
decode_xs n-byte heap-overflow (CVE-2011-2939)A bug in Encode
could, on certain inputs, cause the heap to overflow.
This problem has been corrected. Bug reported by Robert Zacek.
There are no changes intentionally incompatible with 5.14.0. If any exist, they are bugs and reports are welcome.
There have been no deprecations since 5.14.0.
None
CPAN has been upgraded from version 1.9600 to version 1.9600_01.
CPAN::Distribution has been upgraded from version 1.9602 to 1.9602_01.
Backported bugfixes from CPAN version 1.9800. Ensures proper
detection of configure_requires
prerequisites from CPAN Meta files
in the case where dynamic_config
is true. [rt.cpan.org #68835]
Also ensures that configure_requires
is only checked in META files,
not MYMETA files, so protect against MYMETA generation that drops
configure_requires
.
Encode has been upgraded from version 2.42 to 2.42_01.
See Security.
File::Glob has been upgraded from version 1.12 to version 1.13.
See Security.
PerlIO::scalar has been upgraded from version 0.11 to 0.11_01.
It fixes a problem with open my $fh, ">", \$scalar
not working if
$scalar
is a copy-on-write scalar.
None
None
None
A fix to correct the socketsize now makes the test suite pass on HP-UX PA-RISC for 64bitall builds.
The build system has been updated to work with the build tools under Mac OS X 10.7.
In @INC filters (subroutines returned by subroutines in @INC), $_ used to
misbehave: If returned from a subroutine, it would not be copied, but the
variable itself would be returned; and freeing $_ (e.g., with undef *_
)
would cause perl to crash. This has been fixed [perl #91880].
Perl 5.10.0 introduced some faulty logic that made "U*" in the middle of a pack template equivalent to "U0" if the input string was empty. This has been fixed [perl #90160].
caller no longer leaks memory when called from the DB package if
@DB::args
was assigned to after the first call to caller. Carp
was triggering this bug [perl #97010].
utf8::decode
had a nasty bug that would modify copy-on-write scalars'
string buffers in place (i.e., skipping the copy). This could result in
hashes having two elements with the same key [perl #91834].
Localising a tied variable used to make it read-only if it contained a copy-on-write string.
Elements of restricted hashes (see the fields pragma) containing
copy-on-write values couldn't be deleted, nor could such hashes be cleared
(%hash = ()
).
Locking a hash element that is a glob copy no longer causes subsequent assignment to it to corrupt the glob.
A panic involving the combination of the regular expression modifiers
/aa
introduced in 5.14.0 and the \b
escape sequence has been
fixed [perl #95964].
This is a list of some significant unfixed bugs, which are regressions from 5.12.0.
PERL_GLOBAL_STRUCT
is broken.
Since perl 5.14.0, building with -DPERL_GLOBAL_STRUCT
hasn't been
possible. This means that perl currently doesn't work on any platforms that
require it to be built this way, including Symbian.
While PERL_GLOBAL_STRUCT
now works again on recent development versions of
perl, it actually working on Symbian again hasn't been verified.
We'd be very interested in hearing from anyone working with Perl on Symbian.
Perl 5.14.2 represents approximately three months of development since Perl 5.14.1 and contains approximately 1200 lines of changes across 61 files from 9 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.14.2:
Craig A. Berry, David Golden, Father Chrysostomos, Florian Ragwitz, H.Merijn Brand, Karl Williamson, Nicholas Clark, Pau Amma and Ricardo Signes.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5143delta - what is new for perl v5.14.3
This document describes differences between the 5.14.2 release and the 5.14.3 release.
If you are upgrading from an earlier release such as 5.12.0, first read perl5140delta, which describes differences between 5.12.0 and 5.14.0.
No changes since 5.14.0.
Digest
unsafe use of eval (CVE-2011-3597)The Digest->new()
function did not properly sanitize input before
using it in an eval() call, which could lead to the injection of arbitrary
Perl code.
In order to exploit this flaw, the attacker would need to be able to set the algorithm name used, or be able to execute arbitrary Perl code already.
This problem has been fixed.
Poorly written perl code that allows an attacker to specify the count to perl's 'x' string repeat operator can already cause a memory exhaustion denial-of-service attack. A flaw in versions of perl before 5.15.5 can escalate that into a heap buffer overrun; coupled with versions of glibc before 2.16, it possibly allows the execution of arbitrary code.
This problem has been fixed.
There are no changes intentionally incompatible with 5.14.0. If any exist, they are bugs and reports are welcome.
There have been no deprecations since 5.14.0.
None
PerlIO::scalar was updated to fix a bug in which opening a filehandle to a glob copy caused assertion failures (under debugging) or hangs or other erratic behaviour without debugging.
ODBM_File and NDBM_File were updated to allow building on GNU/Hurd.
IPC::Open3 has been updated to fix a regression introduced in perl
5.12, which broke IPC::Open3::open3($in, $out, $err, '-')
.
[perl #95748]
Digest has been upgraded from version 1.16 to 1.16_01.
See Security.
Module::CoreList has been updated to version 2.49_04 to add data for this release.
None
None
perlcheat was updated to 5.14.
h2ph was updated to search correctly gcc include directories on platforms such as Debian with multi-architecture support.
In Configure, the test for procselfexe was refactored into a loop.
None
None
The FreeBSD hints file was corrected to be compatible with FreeBSD 10.0.
Configure was updated for "procselfexe" support on Solaris and NetBSD.
README.hpux was updated to note the existence of a broken header in HP-UX 11.00.
libutil is no longer used when compiling on Linux platforms, which avoids warnings being emitted.
The system gcc (rather than any other gcc which might be in the compiling
user's path) is now used when searching for libraries such as -lm
.
The locale tests were updated to reflect the behaviour of locales in Mountain Lion.
Various build and test fixes were included for GNU/Hurd.
LFS support was enabled in GNU/Hurd.
The NetBSD hints file was corrected to be compatible with NetBSD 6.*
A regression has been fixed that was introduced in 5.14, in /i
regular expression matching, in which a match improperly fails if the
pattern is in UTF-8, the target string is not, and a Latin-1 character
precedes a character in the string that should match the pattern. [perl
#101710]
In case-insensitive regular expression pattern matching, no longer on
UTF-8 encoded strings does the scan for the start of match only look at
the first possible position. This caused matches such as
"f\x{FB00}" =~ /ff/i
to fail.
The sitecustomize support was made relocatableinc aware, so that -Dusesitecustomize and -Duserelocatableinc may be used together.
The smartmatch operator (~~
) was changed so that the right-hand side
takes precedence during Any ~~ Object
operations.
A bug has been fixed in the tainting support, in which an index()
operation on a tainted constant would cause all other constants to become
tainted. [perl #64804]
A regression has been fixed that was introduced in perl 5.12, whereby
tainting errors were not correctly propagated through die().
[perl #111654]
A regression has been fixed that was introduced in perl 5.14, in which
/[[:lower:]]/i
and /[[:upper:]]/i
no longer matched the opposite case.
[perl #101970]
Perl 5.14.3 represents approximately 12 months of development since Perl 5.14.2 and contains approximately 2,300 lines of changes across 64 files from 22 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.14.3:
Abigail, Andy Dougherty, Carl Hayter, Chris 'BinGOs' Williams, Dave Rolsky, David Mitchell, Dominic Hargreaves, Father Chrysostomos, Florian Ragwitz, H.Merijn Brand, Jilles Tjoelker, Karl Williamson, Leon Timmermans, Michael G Schwern, Nicholas Clark, Niko Tyni, Pino Toscano, Ricardo Signes, Salvador Fandiño, Samuel Thibault, Steve Hay, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.
Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5144delta - what is new for perl v5.14.4
This document describes differences between the 5.14.3 release and the 5.14.4 release.
If you are upgrading from an earlier release such as 5.12.0, first read perl5140delta, which describes differences between 5.12.0 and 5.14.0.
No changes since 5.14.0.
This release contains one major, and medium, and a number of minor security fixes. The latter are included mainly to allow the test suite to pass cleanly with the clang compiler's address sanitizer facility.
With a carefully crafted set of hash keys (for example arguments on a URL), it is possible to cause a hash to consume a large amount of memory and CPU, and thus possibly to achieve a Denial-of-Service.
This problem has been fixed.
The UTF-8 encoding implementation in Encode.xs had a memory leak which has been fixed.
A read buffer overflow could occur when copying sockaddr
buffers.
Fairly harmless.
This problem has been fixed.
An extra byte was being copied for some string literals. Fairly harmless.
This problem has been fixed.
A string literal was being used that included two bytes beyond the end of the string. Fairly harmless.
This problem has been fixed.
Under debugging builds, while marking optimised-out regex nodes as type
OPTIMIZED
, it could treat blocks of exact text as if they were nodes,
and thus SEGV. Fairly harmless.
This problem has been fixed.
The statement local $[;
, when preceded by an eval, and when not part
of an assignment, could crash. Fairly harmless.
This problem has been fixed.
Reading or writing strings greater than 2**31 bytes in size could segfault due to integer wraparound.
This problem has been fixed.
There are no changes intentionally incompatible with 5.14.0. If any exist, they are bugs and reports are welcome.
There have been no deprecations since 5.14.0.
None
The following modules have just the minor code fixes as listed above in Security (version numbers have not changed):
Encode has been upgraded from version 2.42_01 to version 2.42_02.
Module::CoreList has been updated to version 2.49_06 to add data for this release.
None.
None.
None.
No new or changed diagnostics.
None
No changes.
None.
None.
5.14.3 failed to compile on VMS due to incomplete application of a patch
series that allowed userelocatableinc
and usesitecustomize
to be
used simultaneously. Other platforms were not affected and the problem
has now been corrected.
In Perl 5.14.0, $tainted ~~ @array
stopped working properly. Sometimes
it would erroneously fail (when $tainted
contained a string that occurs
in the array after the first element) or erroneously succeed (when
undef occurred after the first element) [perl #93590].
None.
Perl 5.14.4 represents approximately 5 months of development since Perl 5.14.3 and contains approximately 1,700 lines of changes across 49 files from 12 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.14.4:
Andy Dougherty, Chris 'BinGOs' Williams, Christian Hansen, Craig A. Berry, Dave Rolsky, David Mitchell, Dominic Hargreaves, Father Chrysostomos, Florian Ragwitz, Reini Urban, Ricardo Signes, Yves Orton.
The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.
For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5160delta - what is new for perl v5.16.0
This document describes differences between the 5.14.0 release and the 5.16.0 release.
If you are upgrading from an earlier release such as 5.12.0, first read perl5140delta, which describes differences between 5.12.0 and 5.14.0.
Some bug fixes in this release have been backported to later releases of 5.14.x. Those are indicated with the 5.14.x version in parentheses.
With the release of Perl 5.16.0, the 5.12.x series of releases is now out of its support period. There may be future 5.12.x releases, but only in the event of a critical security issue. Users of Perl 5.12 or earlier should consider upgrading to a more recent release of Perl.
This policy is described in greater detail in perlpolicy.
use VERSIONAs of this release, version declarations like use v5.16
now disable
all features before enabling the new feature bundle. This means that
the following holds true:
use v5.12
and higher continue to enable strict, but explicit use
strict
and no strict
now override the version declaration, even
when they come first:
There is a new ":default" feature bundle that represents the set of
features enabled before any version declaration or use feature
has
been seen. Version declarations below 5.10 now enable the ":default"
feature set. This does not actually change the behavior of use
v5.8
, because features added to the ":default" set are those that were
traditionally enabled by default, before they could be turned off.
no feature
now resets to the default feature set. To disable all
features (which is likely to be a pretty special-purpose request, since
it presumably won't match any named set of semantics) you can now
write no feature ':all'
.
$[
is now disabled under use v5.16
. It is part of the default
feature set and can be turned on or off explicitly with use feature
'array_base'
.
__SUB__The new __SUB__ token, available under the current_sub
feature
(see feature) or use v5.16
, returns a reference to the current
subroutine, making it easier to write recursive closures.
evalThe eval operator sometimes treats a string argument as a sequence of
characters and sometimes as a sequence of bytes, depending on the
internal encoding. The internal encoding is not supposed to make any
difference, but there is code that relies on this inconsistency.
The new unicode_eval
and evalbytes features (enabled under use
5.16.0
) resolve this. The unicode_eval
feature causes eval
$string
to treat the string always as Unicode. The evalbytes
features provides a function, itself called evalbytes, which
evaluates its argument always as a string of bytes.
These features also fix oddities with source filters leaking to outer dynamic scopes.
See feature for more detail.
substr lvalue revampWhen substr is called in lvalue or potential lvalue context with two
or three arguments, a special lvalue scalar is returned that modifies
the original string (the first argument) when assigned to.
Previously, the offsets (the second and third arguments) passed to
substr would be converted immediately to match the string, negative
offsets being translated to positive and offsets beyond the end of the
string being truncated.
Now, the offsets are recorded without modification in the special
lvalue scalar that is returned, and the original string is not even
looked at by substr itself, but only when the returned lvalue is
read or modified.
These changes result in an incompatible change:
If the original string changes length after the call to substr but
before assignment to its return value, negative offsets will remember
their position from the end of the string, affecting code like this:
The same thing happens with an omitted third argument. The returned lvalue will always extend to the end of the string, even if the string becomes longer.
Since this change also allowed many bugs to be fixed (see The substr operator), and since the behavior of negative offsets has never been specified, the change was deemed acceptable.
tiedThe value returned by tied on a tied variable is now the actual
scalar that holds the object to which the variable is tied. This
lets ties be weakened with Scalar::Util::weaken(tied
$tied_variable)
.
Besides the addition of whole new scripts, and new characters in
existing scripts, this new version of Unicode, as always, makes some
changes to existing characters. One change that may trip up some
applications is that the General Category of two characters in the
Latin-1 range, PILCROW SIGN and SECTION SIGN, has been changed from
Other_Symbol to Other_Punctuation. The same change has been made for
a character in each of Tibetan, Ethiopic, and Aegean.
The code points U+3248..U+324F (CIRCLED NUMBER TEN ON BLACK SQUARE
through CIRCLED NUMBER EIGHTY ON BLACK SQUARE) have had their General
Category changed from Other_Symbol to Other_Numeric. The Line Break
property has changes for Hebrew and Japanese; and because of
other changes in 6.1, the Perl regular expression construct \X
now
works differently for some characters in Thai and Lao.
New aliases (synonyms) have been defined for many property values; these, along with the previously existing ones, are all cross-indexed in perluniprops.
The return value of charnames::viacode()
is affected by other
changes:
- Code point Old Name New Name
- U+000A LINE FEED (LF) LINE FEED
- U+000C FORM FEED (FF) FORM FEED
- U+000D CARRIAGE RETURN (CR) CARRIAGE RETURN
- U+0085 NEXT LINE (NEL) NEXT LINE
- U+008E SINGLE-SHIFT 2 SINGLE-SHIFT-2
- U+008F SINGLE-SHIFT 3 SINGLE-SHIFT-3
- U+0091 PRIVATE USE 1 PRIVATE USE-1
- U+0092 PRIVATE USE 2 PRIVATE USE-2
- U+2118 SCRIPT CAPITAL P WEIERSTRASS ELLIPTIC FUNCTION
Perl will accept any of these names as input, but
charnames::viacode()
now returns the new name of each pair. The
change for U+2118 is considered by Unicode to be a correction, that is
the original name was a mistake (but again, it will remain forever valid
to use it to refer to U+2118). But most of these changes are the
fallout of the mistake Unicode 6.0 made in naming a character used in
Japanese cell phones to be "BELL", which conflicts with the longstanding
industry use of (and Unicode's recommendation to use) that name
to mean the ASCII control character at U+0007. Therefore, that name
has been deprecated in Perl since v5.14, and any use of it will raise a
warning message (unless turned off). The name "ALERT" is now the
preferred name for this code point, with "BEL" an acceptable short
form. The name for the new cell phone character, at code point U+1F514,
remains undefined in this version of Perl (hence we don't
implement quite all of Unicode 6.1), but starting in v5.18, BELL will mean
this character, and not U+0007.
Unicode has taken steps to make sure that this sort of mistake does not happen again. The Standard now includes all generally accepted names and abbreviations for control characters, whereas previously it didn't (though there were recommended names for most of them, which Perl used). This means that most of those recommended names are now officially in the Standard. Unicode did not recommend names for the four code points listed above between U+008E and U+008F, and in standardizing them Unicode subtly changed the names that Perl had previously given them, by replacing the final blank in each name by a hyphen. Unicode also officially accepts names that Perl had deprecated, such as FILE SEPARATOR. Now the only deprecated name is BELL. Finally, Perl now uses the new official names instead of the old (now considered obsolete) names for the first four code points in the list above (the ones which have the parentheses in them).
Now that the names have been placed in the Unicode standard, these kinds of changes should not happen again, though corrections, such as to U+2118, are still possible.
Unicode also added some name abbreviations, which Perl now accepts: SP for SPACE; TAB for CHARACTER TABULATION; NEW LINE, END OF LINE, NL, and EOL for LINE FEED; LOCKING-SHIFT ONE for SHIFT OUT; LOCKING-SHIFT ZERO for SHIFT IN; and ZWNBSP for ZERO WIDTH NO-BREAK SPACE.
More details on this version of Unicode are provided in http://www.unicode.org/versions/Unicode6.1.0/.
use charnames
is no longer needed for \N{name}When \N{name} is encountered, the charnames
module is now
automatically loaded when needed as if the :full
and :short
options had been specified. See charnames for more information.
\N{...}
can now have Unicode loose name matchingThis is described in the charnames
item in
Updated Modules and Pragmata below.
Perl now has proper support for Unicode in symbol names. It used to be
that *{$foo}
would ignore the internal UTF8 flag and use the bytes of
the underlying representation to look up the symbol. That meant that
*{"\x{100}"}
and *{"\xc4\x80"}
would return the same thing. All
these parts of Perl have been fixed to account for Unicode:
Method names (including those passed to use overload
)
Typeglob names (including names of variables, subroutines, and filehandles)
Package names
Symbolic dereferencing
Return value of ref()
Subroutine prototypes
Attributes
Various warnings and error messages that mention variable names or values, methods, etc.
In addition, a parsing bug has been fixed that prevented *{é}
from
implicitly quoting the name, but instead interpreted it as *{+é}
, which
would cause a strict violation.
*{"*a::b"}
automatically strips off the * if it is followed by an ASCII
letter. That has been extended to all Unicode identifier characters.
One-character non-ASCII non-punctuation variables (like $é
) are now
subject to "Used only once" warnings. They used to be exempt, as they
were treated as punctuation variables.
Also, single-character Unicode punctuation variables (like $‰
) are now
supported [perl #69032].
An optional parameter has been added to use locale
- use locale ':not_characters';
which tells Perl to use all but the LC_CTYPE
and LC_COLLATE
portions of the current locale. Instead, the character set is assumed
to be Unicode. This lets locales and Unicode be seamlessly mixed,
including the increasingly frequent UTF-8 locales. When using this
hybrid form of locales, the :locale
layer to the open pragma can
be used to interface with the file system, and there are CPAN modules
available for ARGV and environment variable conversions.
Full details are in perllocale.
fc and corresponding escape sequence \F
for Unicode foldcaseUnicode foldcase is an extension to lowercase that gives better results
when comparing two strings case-insensitively. It has long been used
internally in regular expression /i matching. Now it is available
explicitly through the new fc function call (enabled by
"use feature 'fc'"
, or use v5.16
, or explicitly callable via
CORE::fc
) or through the new \F
sequence in double-quotish
strings.
Full details are in fc.
Script_Extensions
property is now supported.New in Unicode 6.0, this is an improved Script
property. Details
are in Scripts in perlunicode.
Most XS authors will know there is a longstanding bug in the
OUTPUT typemap for T_AVREF (AV*
), T_HVREF (HV*
), T_CVREF (CV*
),
and T_SVREF (SVREF
or \$foo
) that requires manually decrementing
the reference count of the return value instead of the typemap taking
care of this. For backwards-compatibility, this cannot be changed in the
default typemaps. But we now provide additional typemaps
T_AVREF_REFCOUNT_FIXED
, etc. that do not exhibit this bug. Using
them in your extension is as simple as having one line in your
TYPEMAP
section:
- HV* T_HVREF_REFCOUNT_FIXED
is_utf8_char()
The XS-callable function is_utf8_char()
, when presented with
malformed UTF-8 input, can read up to 12 bytes beyond the end of the
string. This cannot be fixed without changing its API, and so its
use is now deprecated. Use is_utf8_char_buf()
(described just below)
instead.
is_utf8_char_buf()
This function is designed to replace the deprecated is_utf8_char() function. It includes an extra parameter to make sure it doesn't read past the end of the input buffer.
is_utf8_foo()
functions, as well as utf8_to_foo()
, etc.Most other XS-callable functions that take UTF-8 encoded input
implicitly assume that the UTF-8 is valid (not malformed) with respect to
buffer length. Do not do things such as change a character's case or
see if it is alphanumeric without first being sure that it is valid
UTF-8. This can be safely done for a whole string by using one of the
functions is_utf8_string()
, is_utf8_string_loc()
, and
is_utf8_string_loclen()
.
Many new functions have been added to the API for manipulating lexical pads. See Pad Data Structures in perlapi for more information.
$$
can be assigned to$$
was made read-only in Perl 5.8.0. But only sometimes: local $$
would make it writable again. Some CPAN modules were using local $$
or
XS code to bypass the read-only check, so there is no reason to keep $$
read-only. (This change also allowed a bug to be fixed while maintaining
backward compatibility.)
$^X
converted to an absolute path on FreeBSD, OS X and Solaris$^X
is now converted to an absolute path on OS X, FreeBSD (without
needing /proc mounted) and Solaris 10 and 11. This augments the
previous approach of using /proc on Linux, FreeBSD, and NetBSD
(in all cases, where mounted).
This makes relocatable perl installations more useful on these platforms. (See "Relocatable @INC" in INSTALL)
The current Perl's feature bundle is now enabled for commands entered in the interactive debugger.
The t command in the debugger, which toggles tracing mode, now accepts a numeric argument that determines how many levels of subroutine calls to trace.
enable
and disable
The debugger now has disable
and enable
commands for disabling
existing breakpoints and re-enabling them. See perldebug.
The debugger's "b" command for setting breakpoints now lets a line number be prefixed with a file name. See b [file]:[line] [condition] in perldebug.
CORE
NamespaceCORE::
prefixThe CORE::
prefix can now be used on keywords enabled by
feature.pm, even outside the scope of use feature
.
CORE
namespaceMany Perl keywords are now available as subroutines in the CORE namespace. This lets them be aliased:
- BEGIN { *entangle = \&CORE::tie }
- entangle $variable, $package, @args;
And for prototypes to be bypassed:
Some of these cannot be called through references or via &foo
syntax,
but must be called as barewords.
See CORE for details.
Automatically generated file handles are now named __ANONIO__ when the variable name cannot be determined, rather than $__ANONIO__.
Custom sort subroutines can now be autoloaded [perl #30661]:
- sub AUTOLOAD { ... }
- @sorted = sort foo @list; # uses AUTOLOAD
continue no longer requires the "switch" featureThe continue keyword has two meanings. It can introduce a continue
block after a loop, or it can exit the current when
block. Up to now,
the latter meaning was valid only with the "switch" feature enabled, and
was a syntax error otherwise. Since the main purpose of feature.pm is to
avoid conflicts with user-defined subroutines, there is no reason for
continue to depend on it.
The phase-change
probes will fire when the interpreter's phase
changes, which tracks the ${^GLOBAL_PHASE}
variable. arg0
is
the new phase name; arg1
is the old one. This is useful
for limiting your instrumentation to one or more of: compile time,
run time, or destruct time.
__FILE__() SyntaxThe __FILE__
, __LINE__
and __PACKAGE__
tokens can now be written
with an empty pair of parentheses after them. This makes them parse the
same way as time, fork and other built-in functions.
\$
prototype accepts any scalar lvalueThe \$
and \[$]
subroutine prototypes now accept any scalar lvalue
argument. Previously they accepted only scalars beginning with $
and
hash and array elements. This change makes them consistent with the way
the built-in read and recv functions (among others) parse their
arguments. This means that one can override the built-in functions with
custom subroutines that parse their arguments the same way.
_
in subroutine prototypesThe _
character in subroutine prototypes is now allowed before @
or
%
.
is_utf8_char_buf()
and not is_utf8_char()
The latter function is now deprecated because its API is insufficient to guarantee that it doesn't read (up to 12 bytes in the worst case) beyond the end of its input string. See is_utf8_char_buf().
Two new XS-accessible functions, utf8_to_uvchr_buf()
and
utf8_to_uvuni_buf()
are now available to prevent this, and the Perl
core has been converted to use them.
See Internal Changes.
File::Glob::bsd_glob()
memory error with GLOB_ALTDIRFUNC (CVE-2011-2728).Calling File::Glob::bsd_glob
with the unsupported flag
GLOB_ALTDIRFUNC would cause an access violation / segfault. A Perl
program that accepts a flags value from an external source could expose
itself to denial of service or arbitrary code execution attacks. There
are no known exploits in the wild. The problem has been corrected by
explicitly disabling all unsupported flags and setting unused function
pointers to null. Bug reported by Clément Lecigne. (5.14.2)
$(
A hypothetical bug (probably unexploitable in practice) because the
incorrect setting of the effective group ID while setting $(
has been
fixed. The bug would have affected only systems that have setresgid()
but not setregid()
, but no such systems are known to exist.
It is now deprecated to directly read the Unicode data base files. These are stored in the lib/unicore directory. Instead, you should use the new functions in Unicode::UCD. These provide a stable API, and give complete information.
Perl may at some point in the future change or remove these files. The file which applications were most likely to have used is lib/unicore/ToDigit.pl. prop_invmap() in Unicode::UCD can be used to get at its data instead.
is_utf8_char()
, utf8_to_uvchr()
and
utf8_to_uvuni()
This function is deprecated because it could read beyond the end of the
input string. Use the new is_utf8_char_buf(),
utf8_to_uvchr_buf()
and utf8_to_uvuni_buf()
instead.
This section serves as a notice of features that are likely to be removed or deprecated in the next release of perl (5.18.0). If your code depends on these features, you should contact the Perl 5 Porters via the mailing list or perlbug to explain your use case and inform the deprecation process.
These modules may be marked as deprecated from the core. This only means that they will no longer be installed by default with the core distribution, but will remain available on the CPAN.
CPANPLUS
Filter::Simple
PerlIO::mmap
Pod::LaTeX
Pod::Parser
SelfLoader
Text::Soundex
Thread.pm
These platforms will probably have their special build support removed during the 5.17.0 development series.
BeOS
djgpp
dgux
EPOC
MPE/iX
Rhapsody
UTS
VM/ESA
Swapping of $< and $>
For more information about this future deprecation, see the relevant RT ticket.
sfio, stdio
Perl supports being built without PerlIO proper, using a stdio or sfio wrapper instead. A perl build like this will not support IO layers and thus Unicode IO, making it rather handicapped.
PerlIO supports a stdio
layer if stdio use is desired, and similarly a
sfio layer could be produced.
Unescaped literal "{"
in regular expressions.
Starting with v5.20, it is planned to require a literal "{"
to be
escaped, for example by preceding it with a backslash. In v5.18, a
deprecated warning message will be emitted for all such uses.
This affects only patterns that are to match a literal "{"
. Other
uses of this character, such as part of a quantifier or sequence as in
those below, are completely unaffected:
- /foo{3,5}/
- /\p{Alphabetic}/
- /\N{DIGIT ZERO}
Removing this will permit extensions to Perl's pattern syntax and better error checking for existing syntax. See Quantifiers in perlre for an example.
Revamping "\Q"
semantics in double-quotish strings when combined with other escapes.
There are several bugs and inconsistencies involving combinations
of \Q
and escapes like \x
, \L
, etc., within a \Q...\E
pair.
These need to be fixed, and doing so will necessarily change current
behavior. The changes have not yet been settled.
Special blocks (BEGIN
, CHECK
, INIT
, UNITCHECK
, END
) are now
called in void context. This avoids wasteful copying of the result of the
last statement [perl #108794].
overloading
pragma and regexp objectsWith no overloading
, regular expression objects returned by qr// are
now stringified as "Regexp=REGEXP(0xbe600d)" instead of the regular
expression itself [perl #108780].
Two presumably unused XS typemap entries have been removed from the core typemap: T_DATAUNIT and T_CALLBACK. If you are, against all odds, a user of these, please see the instructions on how to restore them in perlxstypemap.
These are detailed in Supports (almost) Unicode 6.1 above. You can compile this version of Perl to use Unicode 6.0. See Hacking Perl to work on earlier Unicode versions (for very serious hackers only) in perlunicode.
All support for the Borland compiler has been dropped. The code had not worked for a long time anyway.
Perl should never have exposed certain Unicode properties that are used by Unicode internally and not meant to be publicly available. Use of these has generated deprecated warning messages since Perl 5.12. The removed properties are Other_Alphabetic, Other_Default_Ignorable_Code_Point, Other_Grapheme_Extend, Other_ID_Continue, Other_ID_Start, Other_Lowercase, Other_Math, and Other_Uppercase.
Perl may be recompiled to include any or all of them; instructions are given in Unicode character properties that are NOT accepted by Perl in perluniprops.
The *{...}
operator, when passed a reference to an IO thingy (as in
*{*STDIN{IO}}
), creates a new typeglob containing just that IO object.
Previously, it would stringify as an empty string, but some operators would
treat it as undefined, producing an "uninitialized" warning.
Now it stringifies as __ANONIO__ [perl #96326].
This feature was deprecated in Perl 5.14, and has now been removed. The CPAN module Unicode::Casing provides better functionality without the drawbacks that this feature had, as are detailed in the 5.14 documentation: http://perldoc.perl.org/5.14.0/perlunicode.html#User-Defined-Case-Mappings-%28for-serious-hackers-only%29
XSUB C functions are now 'static', that is, they are not visible from
outside the compilation unit. Users can use the new XS_EXTERNAL(name)
and XS_INTERNAL(name)
macros to pick the desired linking behavior.
The ordinary XS(name)
declaration for XSUBs will continue to declare
non-'static' XSUBs for compatibility, but the XS compiler,
ExtUtils::ParseXS (xsubpp
) will emit 'static' XSUBs by default.
ExtUtils::ParseXS's behavior can be reconfigured from XS using the
EXPORT_XSUB_SYMBOLS
keyword. See perlxs for details.
Weakening read-only references is no longer permitted. It should never have worked anyway, and could sometimes result in crashes.
Attempting to tie a scalar after a typeglob was assigned to it would
instead tie the handle in the typeglob's IO slot. This meant that it was
impossible to tie the scalar itself. Similar problems affected tied and
untie: tied $scalar
would return false on a tied scalar if the last
thing returned was a typeglob, and untie $scalar
on such a tied scalar
would do nothing.
We fixed this problem before Perl 5.14.0, but it caused problems with some CPAN modules, so we put in a deprecation cycle instead.
Now the deprecation has been removed and this bug has been fixed. So
tie $scalar
will always tie the scalar, not the handle it holds. To tie
the handle, use tie *$scalar
(with an explicit asterisk). The same
applies to tied *$scalar
and untie *$scalar
.
xfork()
, xclose_on_exec()
and xpipe_anon()
All three functions were private, undocumented, and unexported. They do not appear to be used by any code on CPAN. Two have been inlined and one deleted entirely.
$$
no longer caches PIDPreviously, if one called fork(3) from C, Perl's
notion of $$
could go out of sync with what getpid() returns. By always
fetching the value of $$
via getpid(), this potential bug is eliminated.
Code that depends on the caching behavior will break. As described in
Core Enhancements,
$$
is now writable, but it will be reset during a
fork.
$$
and getppid() no longer emulate POSIX semantics under LinuxThreadsThe POSIX emulation of $$
and getppid() under the obsolete
LinuxThreads implementation has been removed.
This only impacts users of Linux 2.4 and
users of Debian GNU/kFreeBSD up to and including 6.0, not the vast
majority of Linux installations that use NPTL threads.
This means that getppid(), like $$
, is now always guaranteed to
return the OS's idea of the current state of the process, not perl's
cached version of it.
See the documentation for $$ for details.
$<
, $>
, $(
and $)
are no longer cachedSimilarly to the changes to $$
and getppid(), the internal
caching of $<
, $>
, $(
and $)
has been removed.
When we cached these values our idea of what they were would drift out
of sync with reality if someone (e.g., someone embedding perl) called
sete?[ug]id()
without updating PL_e?[ug]id
. Having to deal with
this complexity wasn't worth it given how cheap the gete?[ug]id()
system call is.
This change will break a handful of CPAN modules that use the XS-level
PL_uid
, PL_gid
, PL_euid
or PL_egid
variables.
The fix for those breakages is to use PerlProc_gete?[ug]id()
to
retrieve them (e.g., PerlProc_getuid()
), and not to assign to
PL_e?[ug]id
if you change the UID/GID/EUID/EGID. There is no longer
any need to do so since perl will always retrieve the up-to-date
version of those values from the OS.
quotemeta and \Q
has changedThis is unlikely to result in a real problem, as Perl does not attach special meaning to any non-ASCII character, so it is currently irrelevant which are quoted or not. This change fixes bug [perl #77654] and brings Perl's behavior more into line with Unicode's recommendations. See quotemeta.
Improved performance for Unicode properties in regular expressions
Matching a code point against a Unicode property is now done via a
binary search instead of linear. This means for example that the worst
case for a 1000 item property is 10 probes instead of 1000. This
inefficiency has been compensated for in the past by permanently storing
in a hash the results of a given probe plus the results for the adjacent
64 code points, under the theory that near-by code points are likely to
be searched for. A separate hash was used for each mention of a Unicode
property in each regular expression. Thus, qr/\p{foo}abc\p{foo}/
would generate two hashes. Any probes in one instance would be unknown
to the other, and the hashes could expand separately to be quite large
if the regular expression were used on many different widely-separated
code points.
Now, however, there is just one hash shared by all instances of a given
property. This means that if \p{foo}
is matched against "A" in one
regular expression in a thread, the result will be known immediately to
all regular expressions, and the relentless march of using up memory is
slowed considerably.
Version declarations with the use keyword (e.g., use 5.012
) are now
faster, as they enable features without loading feature.pm.
local $_
is faster now, as it no longer iterates through magic that it
is not going to copy anyway.
Perl 5.12.0 sped up the destruction of objects whose classes define
empty DESTROY
methods (to prevent autoloading), by simply not
calling such empty methods. This release takes this optimization a
step further, by not calling any DESTROY
method that begins with a
return statement. This can be useful for destructors that are only
used for debugging:
Constant-folding will reduce the first statement to return; if DEBUG
is set to 0, triggering this optimization.
Assigning to a variable that holds a typeglob or copy-on-write scalar is now much faster. Previously the typeglob would be stringified or the copy-on-write scalar would be copied before being clobbered.
Assignment to substr in void context is now more than twice its
previous speed. Instead of creating and returning a special lvalue
scalar that is then assigned to, substr modifies the original string
itself.
substr no longer calculates a value to return when called in void
context.
Due to changes in File::Glob, Perl's glob function and its <...>
equivalent are now much faster. The splitting of the pattern
into words has been rewritten in C, resulting in speed-ups of 20% for
some cases.
This does not affect glob on VMS, as it does not use File::Glob.
The short-circuiting operators &&, ||, and //
, when chained
(such as $a || $b || $c
), are now considerably faster to short-circuit,
due to reduced optree traversal.
The implementation of s///r makes one fewer copy of the scalar's value.
Recursive calls to lvalue subroutines in lvalue scalar context use less memory.
Version::Requirements is now DEPRECATED, use CPAN::Meta::Requirements, which is a drop-in replacement. It will be deleted from perl.git blead in v5.17.0.
arybase -- this new module implements the $[
variable.
PerlIO::mmap 0.010 has been added to the Perl core.
The mmap
PerlIO layer is no longer implemented by perl itself, but has
been moved out into the new PerlIO::mmap module.
This is only an overview of selected module updates. For a complete list of updates, run:
- $ corelist --diff 5.14.0 5.16.0
You can substitute your favorite version in place of 5.14.0, too.
Archive::Extract has been upgraded from version 0.48 to 0.58.
Includes a fix for FreeBSD to only use unzip
if it is located in
/usr/local/bin
, as FreeBSD 9.0 will ship with a limited unzip
in
/usr/bin
.
Archive::Tar has been upgraded from version 1.76 to 1.82.
Adjustments to handle files >8gb (>0777777777777 octal) and a feature to return the MD5SUM of files in the archive.
base has been upgraded from version 2.16 to 2.18.
base
no longer sets a module's $VERSION
to "-1" when a module it
loads does not define a $VERSION
. This change has been made because
"-1" is not a valid version number under the new "lax" criteria used
internally by UNIVERSAL::VERSION
. (See version for more on "lax"
version criteria.)
base
no longer internally skips loading modules it has already loaded
and instead relies on require to inspect %INC
. This fixes a bug
when base
is used with code that clear %INC
to force a module to
be reloaded.
Carp has been upgraded from version 1.20 to 1.26.
It now includes last read filehandle info and puts a dot after the file
and line number, just like errors from die [perl #106538].
charnames has been updated from version 1.18 to 1.30.
charnames
can now be invoked with a new option, :loose
,
which is like the existing :full
option, but enables Unicode loose
name matching. Details are in LOOSE MATCHES in charnames.
B::Deparse has been upgraded from version 1.03 to 1.14. This fixes numerous deparsing bugs.
CGI has been upgraded from version 3.52 to 3.59.
It uses the public and documented FCGI.pm API in CGI::Fast. CGI::Fast was using an FCGI API that was deprecated and removed from documentation more than ten years ago. Usage of this deprecated API with FCGI >= 0.70 or FCGI <= 0.73 introduces a security issue. https://rt.cpan.org/Public/Bug/Display.html?id=68380 http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2011-2766
Things that may break your code:
url()
was fixed to return PATH_INFO
when it is explicitly requested
with either the path=>1
or path_info=>1
flag.
If your code is running under mod_rewrite (or compatible) and you are
calling self_url()
or you are calling url()
and passing
path_info=>1
, these methods will actually be returning
PATH_INFO
now, as you have explicitly requested or self_url()
has requested on your behalf.
The PATH_INFO
has been omitted in such URLs since the issue was
introduced in the 3.12 release in December, 2005.
This bug is so old your application may have come to depend on it or workaround it. Check for application before upgrading to this release.
Examples of affected method calls:
- $q->url(-absolute => 1, -query => 1, -path_info => 1);
- $q->url(-path=>1);
- $q->url(-full=>1,-path=>1);
- $q->url(-rewrite=>1,-path=>1);
- $q->self_url();
We no longer read from STDIN when the Content-Length is not set, preventing requests with no Content-Length from sometimes freezing. This is consistent with the CGI RFC 3875, and is also consistent with CGI::Simple. However, the old behavior may have been expected by some command-line uses of CGI.pm.
In addition, the DELETE HTTP verb is now supported.
Compress::Zlib has been upgraded from version 2.035 to 2.048.
IO::Compress::Zip and IO::Uncompress::Unzip now have support for LZMA (method 14). There is a fix for a CRC issue in IO::Compress::Unzip and it supports Streamed Stored context now. And fixed a Zip64 issue in IO::Compress::Zip when the content size was exactly 0xFFFFFFFF.
Digest::SHA has been upgraded from version 5.61 to 5.71.
Added BITS mode to the addfile method and shasum. This makes partial-byte inputs possible via files/STDIN and lets shasum check all 8074 NIST Msg vectors, where previously special programming was required to do this.
Encode has been upgraded from version 2.42 to 2.44.
Missing aliases added, a deep recursion error fixed and various documentation updates.
Addressed 'decode_xs n-byte heap-overflow' security bug in Unicode.xs (CVE-2011-2939). (5.14.2)
ExtUtils::CBuilder updated from version 0.280203 to 0.280206.
The new version appends CFLAGS and LDFLAGS to their Config.pm counterparts.
ExtUtils::ParseXS has been upgraded from version 2.2210 to 3.16.
Much of ExtUtils::ParseXS, the module behind the XS compiler xsubpp
,
was rewritten and cleaned up. It has been made somewhat more extensible
and now finally uses strictures.
The typemap logic has been moved into a separate module, ExtUtils::Typemaps. See New Modules and Pragmata, above.
For a complete set of changes, please see the ExtUtils::ParseXS changelog, available on the CPAN.
File::Glob has been upgraded from version 1.12 to 1.17.
On Windows, tilde (~) expansion now checks the USERPROFILE
environment
variable, after checking HOME
.
It has a new :bsd_glob
export tag, intended to replace :glob
. Like
:glob
it overrides glob with a function that does not split the glob
pattern into words, but, unlike :glob
, it iterates properly in scalar
context, instead of returning the last file.
There are other changes affecting Perl's own glob operator (which uses
File::Glob internally, except on VMS). See Performance Enhancements
and Selected Bug Fixes.
FindBin updated from version 1.50 to 1.51.
It no longer returns a wrong result if a script of the same name as the current one exists in the path and is executable.
HTTP::Tiny has been upgraded from version 0.012 to 0.017.
Added support for using $ENV{http_proxy}
to set the default proxy host.
Adds additional shorthand methods for all common HTTP verbs,
a post_form()
method for POST-ing x-www-form-urlencoded data and
a www_form_urlencode()
utility method.
IO has been upgraded from version 1.25_04 to 1.25_06, and IO::Handle from version 1.31 to 1.33.
Together, these upgrades fix a problem with IO::Handle's getline
and
getlines
methods. When these methods are called on the special ARGV
handle, the next file is automatically opened, as happens with the built-in
<>
and readline functions. But, unlike the built-ins, these
methods were not respecting the caller's use of the open pragma and
applying the appropriate I/O layers to the newly-opened file
[rt.cpan.org #66474].
IPC::Cmd has been upgraded from version 0.70 to 0.76.
Capturing of command output (both STDOUT
and STDERR
) is now supported
using IPC::Open3 on MSWin32 without requiring IPC::Run.
IPC::Open3 has been upgraded from version 1.09 to 1.12.
Fixes a bug which prevented use of open3
on Windows when *STDIN
,
*STDOUT
or *STDERR
had been localized.
Fixes a bug which prevented duplicating numeric file descriptors on Windows.
open3
with "-" for the program name works once more. This was broken in
version 1.06 (and hence in Perl 5.14.0) [perl #95748].
Locale::Codes has been upgraded from version 3.16 to 3.21.
Added Language Extension codes (langext) and Language Variation codes (langvar) as defined in the IANA language registry.
Added language codes from ISO 639-5
Added language/script codes from the IANA language subtag registry
Fixed an uninitialized value warning [rt.cpan.org #67438].
Fixed the return value for the all_XXX_codes and all_XXX_names functions [rt.cpan.org #69100].
Reorganized modules to move Locale::MODULE to Locale::Codes::MODULE to allow for cleaner future additions. The original four modules (Locale::Language, Locale::Currency, Locale::Country, Locale::Script) will continue to work, but all new sets of codes will be added in the Locale::Codes namespace.
The code2XXX, XXX2code, all_XXX_codes, and all_XXX_names functions now support retired codes. All codesets may be specified by a constant or by their name now. Previously, they were specified only by a constant.
The alias_code function exists for backward compatibility. It has been replaced by rename_country_code. The alias_code function will be removed some time after September, 2013.
All work is now done in the central module (Locale::Codes). Previously, some was still done in the wrapper modules (Locale::Codes::*). Added Language Family codes (langfam) as defined in ISO 639-5.
Math::BigFloat has been upgraded from version 1.993 to 1.997.
The numify
method has been corrected to return a normalized Perl number
(the result of 0 + $thing
), instead of a string [rt.cpan.org #66732].
Math::BigInt has been upgraded from version 1.994 to 1.998.
It provides a new bsgn
method that complements the babs
method.
It fixes the internal objectify
function's handling of "foreign objects"
so they are converted to the appropriate class (Math::BigInt or
Math::BigFloat).
Math::BigRat has been upgraded from version 0.2602 to 0.2603.
int() on a Math::BigRat object containing -1/2 now creates a
Math::BigInt containing 0, rather than -0. Math::BigInt does not even
support negative zero, so the resulting object was actually malformed
[perl #95530].
Math::Complex has been upgraded from version 1.56 to 1.59 and Math::Trig from version 1.2 to 1.22.
Fixes include: correct copy constructor usage; fix polarwise formatting with
numeric format specifier; and more stable great_circle_direction
algorithm.
Module::CoreList has been upgraded from version 2.51 to 2.66.
The corelist
utility now understands the -r
option for displaying
Perl release dates and the --diff
option to print the set of modlib
changes between two perl distributions.
Module::Metadata has been upgraded from version 1.000004 to 1.000009.
Adds provides
method to generate a CPAN META provides data structure
correctly; use of package_versions_from_directory
is discouraged.
ODBM_File has been upgraded from version 1.10 to 1.12.
The XS code is now compiled with PERL_NO_GET_CONTEXT
, which will aid
performance under ithreads.
open has been upgraded from version 1.08 to 1.10.
It no longer turns off layers on standard handles when invoked without the ":std" directive. Similarly, when invoked with the ":std" directive, it now clears layers on STDERR before applying the new ones, and not just on STDIN and STDOUT [perl #92728].
overload has been upgraded from version 1.13 to 1.18.
overload::Overloaded
no longer calls can
on the class, but uses
another means to determine whether the object has overloading. It was
never correct for it to call can
, as overloading does not respect
AUTOLOAD. So classes that autoload methods and implement can
no longer
have to account for overloading [perl #40333].
A warning is now produced for invalid arguments. See New Diagnostics.
PerlIO::scalar has been upgraded from version 0.11 to 0.14.
(This is the module that implements open $fh, '>', \$scalar
.)
It fixes a problem with open my $fh, ">", \$scalar
not working if
$scalar
is a copy-on-write scalar. (5.14.2)
It also fixes a hang that occurs with readline or <$fh>
if a
typeglob has been assigned to $scalar [perl #92258].
It no longer assumes during seek that $scalar is a string internally.
If it didn't crash, it was close to doing so [perl #92706]. Also, the
internal print routine no longer assumes that the position set by seek
is valid, but extends the string to that position, filling the intervening
bytes (between the old length and the seek position) with nulls
[perl #78980].
Printing to an in-memory handle now works if the $scalar holds a reference, stringifying the reference before modifying it. References used to be treated as empty strings.
Printing to an in-memory handle no longer crashes if the $scalar happens to hold a number internally, but no string buffer.
Printing to an in-memory handle no longer creates scalars that confuse the regular expression engine [perl #108398].
Pod::Functions has been upgraded from version 1.04 to 1.05.
Functions.pm is now generated at perl build time from annotations in perlfunc.pod. This will ensure that Pod::Functions and perlfunc remain in synchronisation.
Pod::Html has been upgraded from version 1.11 to 1.1502.
This is an extensive rewrite of Pod::Html to use Pod::Simple under the hood. The output has changed significantly.
Pod::Perldoc has been upgraded from version 3.15_03 to 3.17.
It corrects the search paths on VMS [perl #90640]. (5.14.1)
The -v option now fetches the right section for $0
.
This upgrade has numerous significant fixes. Consult its changelog on the CPAN for more information.
POSIX has been upgraded from version 1.24 to 1.30.
POSIX no longer uses AutoLoader. Any code which was relying on this implementation detail was buggy, and may fail because of this change. The module's Perl code has been considerably simplified, roughly halving the number of lines, with no change in functionality. The XS code has been refactored to reduce the size of the shared object by about 12%, with no change in functionality. More POSIX functions now have tests.
sigsuspend
and pause
now run signal handlers before returning, as the
whole point of these two functions is to wait until a signal has
arrived, and then return after it has been triggered. Delayed, or
"safe", signals were preventing that from happening, possibly resulting in
race conditions [perl #107216].
POSIX::sleep
is now a direct call into the underlying OS sleep
function, instead of being a Perl wrapper on CORE::sleep
.
POSIX::dup2
now returns the correct value on Win32 (i.e., the file
descriptor). POSIX::SigSet
sigsuspend
and sigpending
and
POSIX::pause
now dispatch safe signals immediately before returning to
their caller.
POSIX::Termios::setattr
now defaults the third argument to TCSANOW
,
instead of 0. On most platforms TCSANOW
is defined to be 0, but on some
0 is not a valid parameter, which caused a call with defaults to fail.
Socket has been upgraded from version 1.94 to 2.001.
It has new functions and constants for handling IPv6 sockets:
- pack_ipv6_mreq
- unpack_ipv6_mreq
- IPV6_ADD_MEMBERSHIP
- IPV6_DROP_MEMBERSHIP
- IPV6_MTU
- IPV6_MTU_DISCOVER
- IPV6_MULTICAST_HOPS
- IPV6_MULTICAST_IF
- IPV6_MULTICAST_LOOP
- IPV6_UNICAST_HOPS
- IPV6_V6ONLY
Storable has been upgraded from version 2.27 to 2.34.
It no longer turns copy-on-write scalars into read-only scalars when freezing and thawing.
Sys::Syslog has been upgraded from version 0.27 to 0.29.
This upgrade closes many outstanding bugs.
Term::ANSIColor has been upgraded from version 3.00 to 3.01.
Only interpret an initial array reference as a list of colors, not any initial reference, allowing the colored function to work properly on objects with stringification defined.
Term::ReadLine has been upgraded from version 1.07 to 1.09.
Term::ReadLine now supports any event loop, including unpublished ones and simple IO::Select, loops without the need to rewrite existing code for any particular framework [perl #108470].
threads::shared has been upgraded from version 1.37 to 1.40.
Destructors on shared objects used to be ignored sometimes if the objects were referenced only by shared data structures. This has been mostly fixed, but destructors may still be ignored if the objects still exist at global destruction time [perl #98204].
Unicode::Collate has been upgraded from version 0.73 to 0.89.
Updated to CLDR 1.9.1
Locales updated to CLDR 2.0: mk, mt, nb, nn, ro, ru, sk, sr, sv, uk, zh__pinyin, zh__stroke
Newly supported locales: bn, fa, ml, mr, or, pa, sa, si, si__dictionary, sr_Latn, sv__reformed, ta, te, th, ur, wae.
Tailored compatibility ideographs as well as unified ideographs for the locales: ja, ko, zh__big5han, zh__gb2312han, zh__pinyin, zh__stroke.
Locale/*.pl files are now searched for in @INC.
Unicode::Normalize has been upgraded from version 1.10 to 1.14.
Fixes for the removal of unicore/CompositionExclusions.txt from core.
Unicode::UCD has been upgraded from version 0.32 to 0.43.
This adds four new functions: prop_aliases()
and
prop_value_aliases()
, which are used to find all Unicode-approved
synonyms for property names, or to convert from one name to another;
prop_invlist
which returns all code points matching a given
Unicode binary property; and prop_invmap
which returns the complete
specification of a given Unicode property.
Win32API::File has been upgraded from version 0.1101 to 0.1200.
Added SetStdHandle and GetStdHandle functions
As promised in Perl 5.14.0's release notes, the following modules have been removed from the core distribution, and if needed should be installed from CPAN instead.
Devel::DProf has been removed from the Perl core. Prior version was 20110228.00.
Shell has been removed from the Perl core. Prior version was 0.72_01.
Several old perl4-style libraries which have been deprecated with 5.14 are now removed:
- abbrev.pl assert.pl bigfloat.pl bigint.pl bigrat.pl cacheout.pl
- complete.pl ctime.pl dotsh.pl exceptions.pl fastcwd.pl flush.pl
- getcwd.pl getopt.pl getopts.pl hostname.pl importenv.pl
- lib/find{,depth}.pl look.pl newgetopt.pl open2.pl open3.pl
- pwd.pl shellwords.pl stat.pl tainted.pl termcap.pl timelocal.pl
They can be found on CPAN as Perl4::CoreLibs.
perldtrace describes Perl's DTrace support, listing the provided probes and gives examples of their use.
This document is intended to provide a list of experimental features in Perl. It is still a work in progress.
This a new OO tutorial. It focuses on basic OO concepts, and then recommends that readers choose an OO framework from CPAN.
The new manual describes the XS typemapping mechanism in unprecedented detail and combines new documentation with information extracted from perlxs and the previously unofficial list of all core typemaps.
The HV API has long accepted negative lengths to show that the key is in UTF8. This is now documented.
The boolSV()
macro is now documented.
dbmopen treats a 0 mode as a special case, that prevents a nonexistent
file from being created. This has been the case since Perl 5.000, but was
never documented anywhere. Now the perlfunc entry mentions it
[perl #90064].
As an accident of history, open $fh, '<:', ...
applies the default
layers for the platform (:raw
on Unix, :crlf
on Windows), ignoring
whatever is declared by open.pm. This seems such a useful feature
it has been documented in open and open.
The entry for split has been rewritten. It is now far clearer than
before.
A new section, Autoloading with XSUBs, has been added, which explains the two APIs for accessing the name of the autoloaded sub.
Some function descriptions in perlguts were confusing, as it was not clear whether they referred to the function above or below the description. This has been clarified [perl #91790].
This document has been rewritten from scratch, and its coverage of various OO concepts has been expanded.
Documentation of the smartmatch operator has been reworked and moved from perlsyn to perlop where it belongs.
It has also been corrected for the case of undef on the left-hand
side. The list of different smart match behaviors had an item in the
wrong place.
Documentation of the ellipsis statement (...
) has been reworked and
moved from perlop to perlsyn.
The explanation of bitwise operators has been expanded to explain how they work on Unicode strings (5.14.1).
More examples for m//g have been added (5.14.1).
The <<\FOO
here-doc syntax has been documented (5.14.1).
There is now a standard convention for naming keys in the %^H
,
documented under Key naming.
The example function for checking for taintedness contained a subtle
error. $@
needs to be localized to prevent its changing this
global's value outside the function. The preferred method to check for
this remains tainted in Scalar::Util.
perllol has been expanded with examples using the new push $scalar
syntax introduced in Perl 5.14.0 (5.14.1).
perlmod now states explicitly that some types of explicit symbol table manipulation are not supported. This codifies what was effectively already the case [perl #78074].
The tips on which formatting codes to use have been corrected and greatly expanded.
There are now a couple of example one-liners for previewing POD files after they have been edited.
The (*COMMIT)
directive is now listed in the right section
(Verbs without an argument).
perlrun has undergone a significant clean-up. Most notably, the -0x... form of the -0 flag has been clarified, and the final section on environment variables has been corrected and expanded (5.14.1).
The ($;) prototype syntax, which has existed for rather a long time, is now documented in perlsub. It lets a unary function have the same precedence as a list operator.
The required syntax for tying handles has been documented.
The documentation for $! has been corrected and clarified.
It used to state that $! could be undef, which is not the case. It was
also unclear whether system calls set C's errno
or Perl's $!
[perl #91614].
Documentation for $$ has been amended with additional cautions regarding changing the process ID.
perlxs was extended with documentation on inline typemaps.
perlref has a new Circular References section explaining how circularities may not be freed and how to solve that with weak references.
Parts of perlapi were clarified, and Perl equivalents of some C functions have been added as an additional mode of exposition.
A few parts of perlre and perlrecharclass were clarified.
The old OO tutorials, perltoot, perltooc, and perlboot, have been removed. The perlbot (bag of object tricks) document has been removed as well.
The perldelta files for development releases are no longer packaged with perl. These can still be found in the perl source code repository.
The following additions or changes have been made to diagnostic output, including warnings and fatal error messages. For the complete list of diagnostic messages, see perldiag.
This error occurs when caller tries to set @DB::args
but finds it
tied. Before this error was added, it used to crash instead.
This error is part of a safety check that the tie operator does before
tying a special array like @_
. You should never see this message.
&CORE::%s cannot be called directly
This occurs when a subroutine in the CORE::
namespace is called
with &foo
syntax or through a reference. Some subroutines
in this package cannot yet be called that way, but must be
called as barewords. See Subroutines in the CORE namespace, above.
Source filters apply only to byte streams
This new error occurs when you try to activate a source filter (usually by
loading a source filter module) within a string passed to eval under the
unicode_eval
feature.
The long-deprecated defined(@array) now also warns for package variables.
Previously it issued a warning for lexical variables only.
This new warning occurs when length is used on an array or hash, instead
of scalar(@array) or scalar(keys %hash)
.
lvalue attribute %s already-defined subroutine
attributes.pm now emits this warning when the :lvalue attribute is applied to a Perl subroutine that has already been defined, as doing so can have unexpected side-effects.
This warning, in the "overload" category, is produced when the overload pragma is given an argument it doesn't recognize, presumably a mistyped operator.
$[ used in %s (did you mean $] ?)
This new warning exists to catch the mistaken use of $[
in version
checks. $]
, not $[
, contains the version number.
Useless assignment to a temporary
Assigning to a temporary scalar returned from an lvalue subroutine now produces this warning [perl #31946].
\E
does nothing unless preceded by \Q
, \L
or \U
.
"sort is now a reserved word"
This error used to occur when sort was called without arguments,
followed by ;
or ). (E.g., sort; would die, but {sort}
was
OK.) This error message was added in Perl 3 to catch code like
close(sort) which would no longer work. More than two decades later,
this message is no longer appropriate. Now sort without arguments is
always allowed, and returns an empty list, as it did in those cases
where it was already allowed [perl #90030].
The "Applying pattern match..." or similar warning produced when an
array or hash is on the left-hand side of the =~
operator now
mentions the name of the variable.
The "Attempt to free non-existent shared string" has had the spelling of "non-existent" corrected to "nonexistent". It was already listed with the correct spelling in perldiag.
The error messages for using default
and when
outside a
topicalizer have been standardized to match the messages for continue
and loop controls. They now read 'Can't "default" outside a
topicalizer' and 'Can't "when" outside a topicalizer'. They both used
to be 'Can't use when() outside a topicalizer' [perl #91514].
The message, "Code point 0x%X is not Unicode, no properties match it; all inverse properties do" has been changed to "Code point 0x%X is not Unicode, all \p{} matches fail; all \P{} matches succeed".
Redefinition warnings for constant subroutines used to be mandatory,
even occurring under no warnings
. Now they respect the warnings
pragma.
The "glob failed" warning message is now suppressible via no warnings
[perl #111656].
The Invalid version format error message now says "negative version number" within the parentheses, rather than "non-numeric data", for negative numbers.
The two warnings
Possible attempt to put comments in qw() list
and
Possible attempt to separate words with commas
are no longer mutually exclusive: the same qw construct may produce
both.
The uninitialized warning for y///r when $_
is implicit and
undefined now mentions the variable name, just like the non-/r variation
of the operator.
The 'Use of "foo" without parentheses is ambiguous' warning has been extended to apply also to user-defined subroutines with a (;$) prototype, and not just to built-in functions.
Warnings that mention the names of lexical (my) variables with
Unicode characters in them now respect the presence or absence of the
:utf8
layer on the output handle, instead of outputting UTF8
regardless. Also, the correct names are included in the strings passed
to $SIG{__WARN__}
handlers, rather than the raw UTF8 bytes.
h2ph used to generate code of the form
But the subroutine is a compile-time declaration, and is hence unaffected
by the condition. It has now been corrected to emit a string eval
around the subroutine [perl #99368].
splain no longer emits backtraces with the first line number repeated.
This:
- Uncaught exception from user code:
- Cannot fwiddle the fwuddle at -e line 1.
- at -e line 1
- main::baz() called at -e line 1
- main::bar() called at -e line 1
- main::foo() called at -e line 1
has become this:
- Uncaught exception from user code:
- Cannot fwiddle the fwuddle at -e line 1.
- main::baz() called at -e line 1
- main::bar() called at -e line 1
- main::foo() called at -e line 1
Some error messages consist of multiple lines that are listed as separate entries in perldiag. splain has been taught to find the separate entries in these cases, instead of simply failing to find the message.
This is a new utility, included as part of an IO::Compress::Base upgrade.
zipdetails displays information about the internal record structure of the zip file. It is not concerned with displaying any details of the compressed data stored in the zip file.
regexp.h has been modified for compatibility with GCC's -Werror option, as used by some projects that include perl's header files (5.14.1).
USE_LOCALE{,_COLLATE,_CTYPE,_NUMERIC}
have been added the output of perl -V
as they have affect the behavior of the interpreter binary (albeit
in only a small area).
The code and tests for IPC::Open2 have been moved from ext/IPC-Open2
into ext/IPC-Open3, as IPC::Open2::open2()
is implemented as a thin
wrapper around IPC::Open3::_open3()
, and hence is very tightly coupled to
it.
The magic types and magic vtables are now generated from data in a new script
regen/mg_vtable.pl, instead of being maintained by hand. As different
EBCDIC variants can't agree on the code point for '~', the character to code
point conversion is done at build time by generate_uudmap to a new generated
header mg_data.h. PL_vtbl_bm
and PL_vtbl_fm
are now defined by the
pre-processor as PL_vtbl_regexp
, instead of being distinct C variables.
PL_vtbl_sig
has been removed.
Building with -DPERL_GLOBAL_STRUCT
works again. This configuration is not
generally used.
Perl configured with MAD now correctly frees MADPROP
structures when
OPs are freed. MADPROP
s are now allocated with PerlMemShared_malloc()
makedef.pl has been refactored. This should have no noticeable affect on any of the platforms that use it as part of their build (AIX, VMS, Win32).
useperlio
can no longer be disabled.
The file global.sym is no longer needed, and has been removed. It contained a list of all exported functions, one of the files generated by regen/embed.pl from data in embed.fnc and regen/opcodes. The code has been refactored so that the only user of global.sym, makedef.pl, now reads embed.fnc and regen/opcodes directly, removing the need to store the list of exported functions in an intermediate file.
As global.sym was never installed, this change should not be visible outside the build process.
pod/buildtoc, used by the build process to build perltoc, has been refactored and simplified. It now contains only code to build perltoc; the code to regenerate Makefiles has been moved to Porting/pod_rules.pl. It's a bug if this change has any material effect on the build process.
pod/roffitall is now built by pod/buildtoc, instead of being shipped with the distribution. Its list of manpages is now generated (and therefore current). See also RT #103202 for an unresolved related issue.
The man page for XS::Typemap
is no longer installed. XS::Typemap
is a test module which is not installed, hence installing its
documentation makes no sense.
The -Dusesitecustomize and -Duserelocatableinc options now work together properly.
Since version 1.7, Cygwin supports native UTF-8 paths. If Perl is built under that environment, directory and filenames will be UTF-8 encoded.
Cygwin does not initialize all original Win32 environment variables. See
README.cygwin for a discussion of the newly-added
Cygwin::sync_winenv()
function [perl #110190] and for
further links.
HP-UX PA-RISC/64 now supports gcc-4.x
A fix to correct the socketsize now makes the test suite pass on HP-UX PA-RISC for 64bitall builds. (5.14.2)
Remove unnecessary includes, fix miscellaneous compiler warnings and close some unclosed comments on vms/vms.c.
Remove sockadapt layer from the VMS build.
Explicit support for VMS versions before v7.0 and DEC C versions before v6.0 has been removed.
Since Perl 5.10.1, the home-grown stat wrapper has been unable to
distinguish between a directory name containing an underscore and an
otherwise-identical filename containing a dot in the same position
(e.g., t/test_pl as a directory and t/test.pl as a file). This problem
has been corrected.
The build on VMS now permits names of the resulting symbols in C code for
Perl longer than 31 characters. Symbols like
Perl__it_was_the_best_of_times_it_was_the_worst_of_times
can now be
created freely without causing the VMS linker to seize up.
Numerous build and test failures on GNU/Hurd have been resolved with hints for building DBM modules, detection of the library search path, and enabling of large file support.
Perl is now built with dynamic linking on OpenVOS, the minimum supported version of which is now Release 17.1.0.
The CC workshop C++ compiler is now detected and used on systems that ship without cc.
The compiled representation of formats is now stored via the mg_ptr
of
their PERL_MAGIC_fm
. Previously it was stored in the string buffer,
beyond SvLEN()
, the regular end of the string. SvCOMPILED()
and
SvCOMPILED_{on,off}()
now exist solely for compatibility for XS code.
The first is always 0, the other two now no-ops. (5.14.1)
Some global variables have been marked const
, members in the interpreter
structure have been re-ordered, and the opcodes have been re-ordered. The
op OP_AELEMFAST
has been split into OP_AELEMFAST
and OP_AELEMFAST_LEX
.
When empting a hash of its elements (e.g., via undef(%h), or %h=()), HvARRAY
field is no longer temporarily zeroed. Any destructors called on the freed
elements see the remaining elements. Thus, %h=() becomes more like
delete $h{$_} for keys %h
.
Boyer-Moore compiled scalars are now PVMGs, and the Boyer-Moore tables are now
stored via the mg_ptr of their PERL_MAGIC_bm
.
Previously they were PVGVs, with the tables stored in
the string buffer, beyond SvLEN()
. This eliminates
the last place where the core stores data beyond SvLEN()
.
Simplified logic in Perl_sv_magic()
introduces a small change of
behavior for error cases involving unknown magic types. Previously, if
Perl_sv_magic()
was passed a magic type unknown to it, it would
Croak "Modification of a read-only value attempted" if read only
Return without error if the SV happened to already have this magic
otherwise croak "Don't know how to handle magic of type \\%o"
Now it will always croak "Don't know how to handle magic of type \\%o", even on read-only values, or SVs which already have the unknown magic type.
The experimental fetch_cop_label
function has been renamed to
cop_fetch_label
.
The cop_store_label
function has been added to the API, but is
experimental.
embedvar.h has been simplified, and one level of macro indirection for
PL_* variables has been removed for the default (non-multiplicity)
configuration. PERLVAR*() macros now directly expand their arguments to
tokens such as PL_defgv
, instead of expanding to PL_Idefgv
, with
embedvar.h defining a macro to map PL_Idefgv
to PL_defgv
. XS code
which has unwarranted chumminess with the implementation may need updating.
An API has been added to explicitly choose whether to export XSUB symbols. More detail can be found in the comments for commit e64345f8.
The is_gv_magical_sv
function has been eliminated and merged with
gv_fetchpvn_flags
. It used to be called to determine whether a GV
should be autovivified in rvalue context. Now it has been replaced with a
new GV_ADDMG
flag (not part of the API).
The returned code point from the function utf8n_to_uvuni()
when the input is malformed UTF-8, malformations are allowed, and
utf8
warnings are off is now the Unicode REPLACEMENT CHARACTER
whenever the malformation is such that no well-defined code point can be
computed. Previously the returned value was essentially garbage. The
only malformations that have well-defined values are a zero-length
string (0 is the return), and overlong UTF-8 sequences.
Padlists are now marked AvREAL
; i.e., reference-counted. They have
always been reference-counted, but were not marked real, because pad.c
did its own clean-up, instead of using the usual clean-up code in sv.c.
That caused problems in thread cloning, so now the AvREAL
flag is on,
but is turned off in pad.c right before the padlist is freed (after
pad.c has done its custom freeing of the pads).
All C files that make up the Perl core have been converted to UTF-8.
These new functions have been added as part of the work on Unicode symbols:
- HvNAMELEN
- HvNAMEUTF8
- HvENAMELEN
- HvENAMEUTF8
- gv_init_pv
- gv_init_pvn
- gv_init_pvsv
- gv_fetchmeth_pv
- gv_fetchmeth_pvn
- gv_fetchmeth_sv
- gv_fetchmeth_pv_autoload
- gv_fetchmeth_pvn_autoload
- gv_fetchmeth_sv_autoload
- gv_fetchmethod_pv_flags
- gv_fetchmethod_pvn_flags
- gv_fetchmethod_sv_flags
- gv_autoload_pv
- gv_autoload_pvn
- gv_autoload_sv
- newGVgen_flags
- sv_derived_from_pv
- sv_derived_from_pvn
- sv_derived_from_sv
- sv_does_pv
- sv_does_pvn
- sv_does_sv
- whichsig_pv
- whichsig_pvn
- whichsig_sv
- newCONSTSUB_flags
The gv_fetchmethod_*_flags functions, like gv_fetchmethod_flags, are experimental and may change in a future release.
The following functions were added. These are not part of the API:
- GvNAMEUTF8
- GvENAMELEN
- GvENAME_HEK
- CopSTASH_flags
- CopSTASH_flags_set
- PmopSTASH_flags
- PmopSTASH_flags_set
- sv_sethek
- HEKfARG
There is also a HEKf
macro corresponding to SVf
, for
interpolating HEKs in formatted strings.
sv_catpvn_flags
takes a couple of new internal-only flags,
SV_CATBYTES
and SV_CATUTF8
, which tell it whether the char array to
be concatenated is UTF8. This allows for more efficient concatenation than
creating temporary SVs to pass to sv_catsv
.
For XS AUTOLOAD subs, $AUTOLOAD is set once more, as it was in 5.6.0. This
is in addition to setting SvPVX(cv)
, for compatibility with 5.8 to 5.14.
See Autoloading with XSUBs in perlguts.
Perl now checks whether the array (the linearized isa) returned by a MRO
plugin begins with the name of the class itself, for which the array was
created, instead of assuming that it does. This prevents the first element
from being skipped during method lookup. It also means that
mro::get_linear_isa
may return an array with one more element than the
MRO plugin provided [perl #94306].
PL_curstash
is now reference-counted.
There are now feature bundle hints in PL_hints
($^H
) that version
declarations use, to avoid having to load feature.pm. One setting of
the hint bits indicates a "custom" feature bundle, which means that the
entries in %^H
still apply. feature.pm uses that.
The HINT_FEATURE_MASK
macro is defined in perl.h along with other
hints. Other macros for setting and testing features and bundles are in
the new feature.h. FEATURE_IS_ENABLED
(which has moved to
feature.h) is no longer used throughout the codebase, but more specific
macros, e.g., FEATURE_SAY_IS_ENABLED
, that are defined in feature.h.
lib/feature.pm is now a generated file, created by the new regen/feature.pl script, which also generates feature.h.
Tied arrays are now always AvREAL
. If @_
or DB::args
is tied, it
is reified first, to make sure this is always the case.
Two new functions utf8_to_uvchr_buf()
and utf8_to_uvuni_buf()
have
been added. These are the same as utf8_to_uvchr
and
utf8_to_uvuni
(which are now deprecated), but take an extra parameter
that is used to guard against reading beyond the end of the input
string.
See utf8_to_uvchr_buf in perlapi and utf8_to_uvuni_buf in perlapi.
The regular expression engine now does TRIE case insensitive matches
under Unicode. This may change the output of use re 'debug';
,
and will speed up various things.
There is a new wrap_op_checker()
function, which provides a thread-safe
alternative to writing to PL_check
directly.
A bug has been fixed that would cause a "Use of freed value in iteration" error if the next two hash elements that would be iterated over are deleted [perl #85026]. (5.14.1)
Deleting the current hash iterator (the hash element that would be returned
by the next call to each) in void context used not to free it
[perl #85026].
Deletion of methods via delete $Class::{method}
syntax used to update
method caches if called in void context, but not scalar or list context.
When hash elements are deleted in void context, the internal hash entry is now freed before the value is freed, to prevent destructors called by that latter freeing from seeing the hash in an inconsistent state. It was possible to cause double-frees if the destructor freed the hash itself [perl #100340].
A keys optimization in Perl 5.12.0 to make it faster on empty hashes
caused each not to reset the iterator if called after the last element
was deleted.
Freeing deeply nested hashes no longer crashes [perl #44225].
It is possible from XS code to create hashes with elements that have no values. The hash element and slice operators used to crash when handling these in lvalue context. They now produce a "Modification of non-creatable hash value attempted" error message.
If list assignment to a hash or array triggered destructors that freed the hash or array itself, a crash would ensue. This is no longer the case [perl #107440].
It used to be possible to free the typeglob of a localized array or hash
(e.g., local @{"x"}; delete $::{x}
), resulting in a crash on scope exit.
Some core bugs affecting Hash::Util have been fixed: locking a hash element that is a glob copy no longer causes the next assignment to it to corrupt the glob (5.14.2), and unlocking a hash element that holds a copy-on-write scalar no longer causes modifications to that scalar to modify other scalars that were sharing the same string buffer.
The newHVhv
XS function now works on tied hashes, instead of crashing or
returning an empty hash.
The SvIsCOW
C macro now returns false for read-only copies of typeglobs,
such as those created by:
- $hash{elem} = *foo;
- Hash::Util::lock_value %hash, 'elem';
It used to return true.
The SvPVutf8
C function no longer tries to modify its argument,
resulting in errors [perl #108994].
SvPVutf8
now works properly with magical variables.
SvPVbyte
now works properly non-PVs.
When presented with malformed UTF-8 input, the XS-callable functions
is_utf8_string()
, is_utf8_string_loc()
, and
is_utf8_string_loclen()
could read beyond the end of the input
string by up to 12 bytes. This no longer happens. [perl #32080].
However, currently, is_utf8_char()
still has this defect, see
is_utf8_char() above.
The C-level pregcomp
function could become confused about whether the
pattern was in UTF8 if the pattern was an overloaded, tied, or otherwise
magical scalar [perl #101940].
Tying %^H
no longer causes perl to crash or ignore the contents of
%^H
when entering a compilation scope [perl #106282].
eval $string
and require used not to
localize %^H
during compilation if it
was empty at the time the eval call itself was compiled. This could
lead to scary side effects, like use re "/m"
enabling other flags that
the surrounding code was trying to enable for its caller [perl #68750].
eval $string
and require no longer localize hints ($^H
and %^H
)
at run time, but only during compilation of the $string or required file.
This makes BEGIN { $^H{foo}=7 }
equivalent to
BEGIN { eval '$^H{foo}=7' }
[perl #70151].
Creating a BEGIN block from XS code (via newXS
or newATTRSUB
) would,
on completion, make the hints of the current compiling code the current
hints. This could cause warnings to occur in a non-warning scope.
Copy-on-write or shared hash key scalars
were introduced in 5.8.0, but most Perl code
did not encounter them (they were used mostly internally). Perl
5.10.0 extended them, such that assigning __PACKAGE__
or a
hash key to a scalar would make it copy-on-write. Several parts
of Perl were not updated to account for them, but have now been fixed.
utf8::decode
had a nasty bug that would modify copy-on-write scalars'
string buffers in place (i.e., skipping the copy). This could result in
hashes having two elements with the same key [perl #91834]. (5.14.2)
Lvalue subroutines were not allowing COW scalars to be returned. This was fixed for lvalue scalar context in Perl 5.12.3 and 5.14.0, but list context was not fixed until this release.
Elements of restricted hashes (see the fields pragma) containing
copy-on-write values couldn't be deleted, nor could such hashes be cleared
(%hash = ()
). (5.14.2)
Localizing a tied variable used to make it read-only if it contained a copy-on-write string. (5.14.2)
Assigning a copy-on-write string to a stash element no longer causes a double free. Regardless of this change, the results of such assignments are still undefined.
Assigning a copy-on-write string to a tied variable no longer stops that variable from being tied if it happens to be a PVMG or PVLV internally.
Doing a substitution on a tied variable returning a copy-on-write scalar used to cause an assertion failure or an "Attempt to free nonexistent shared string" warning.
This one is a regression from 5.12: In 5.14.0, the bitwise assignment
operators |=
, ^=
and &=
started leaving the left-hand side
undefined if it happened to be a copy-on-write string [perl #108480].
Storable, Devel::Peek and PerlIO::scalar had similar problems. See Updated Modules and Pragmata, above.
dumpvar.pl, and therefore the x
command in the debugger, have been
fixed to handle objects blessed into classes whose names contain "=". The
contents of such objects used not to be dumped [perl #101814].
The "R" command for restarting a debugger session has been fixed to work on
Windows, or any other system lacking a POSIX::_SC_OPEN_MAX
constant
[perl #87740].
The #line 42 foo
directive used not to update the arrays of lines used
by the debugger if it occurred in a string eval. This was partially fixed
in 5.14, but it worked only for a single #line 42 foo
in each eval. Now
it works for multiple.
When subroutine calls are intercepted by the debugger, the name of the
subroutine or a reference to it is stored in $DB::sub
, for the debugger
to access. Sometimes (such as $foo = *bar; undef *bar; &$foo
)
$DB::sub
would be set to a name that could not be used to find the
subroutine, and so the debugger's attempt to call it would fail. Now the
check to see whether a reference is needed is more robust, so those
problems should not happen anymore [rt.cpan.org #69862].
Every subroutine has a filename associated with it that the debugger uses. The one associated with constant subroutines used to be misallocated when cloned under threads. Consequently, debugging threaded applications could result in memory corruption [perl #96126].
defined(${"..."}), defined(*{"..."}), etc., used to
return true for most, but not all built-in variables, if
they had not been used yet. This bug affected ${^GLOBAL_PHASE}
and
${^UTF8CACHE}
, among others. It also used to return false if the
package name was given as well (${"::!"}
) [perl #97978, #97492].
Perl 5.10.0 introduced a similar bug: defined(*{"foo"}) where "foo"
represents the name of a built-in global variable used to return false if
the variable had never been used before, but only on the first call.
This, too, has been fixed.
Since 5.6.0, *{ ... }
has been inconsistent in how it treats undefined
values. It would die in strict mode or lvalue context for most undefined
values, but would be treated as the empty string (with a warning) for the
specific scalar return by undef() (&PL_sv_undef
internally). This
has been corrected. undef() is now treated like other undefined
scalars, as in Perl 5.005.
Perl has an internal variable that stores the last filehandle to be
accessed. It is used by $.
and by tell and eof without
arguments.
It used to be possible to set this internal variable to a glob copy and then modify that glob copy to be something other than a glob, and still have the last-accessed filehandle associated with the variable after assigning a glob to it again:
- my $foo = *STDOUT; # $foo is a glob copy
- <$foo>; # $foo is now the last-accessed handle
- $foo = 3; # no longer a glob
- $foo = *STDERR; # still the last-accessed handle
Now the $foo = 3
assignment unsets that internal variable, so there
is no last-accessed filehandle, just as if <$foo>
had never
happened.
This also prevents some unrelated handle from becoming the last-accessed handle if $foo falls out of scope and the same internal SV gets used for another handle [perl #97988].
A regression in 5.14 caused these statements not to set that internal variable:
This is now fixed, but tell *{ *$fh }
still has the problem, and it
is not clear how to fix it [perl #106536].
statThe term "filetests" refers to the operators that consist of a hyphen
followed by a single letter: -r
, -x
, -M
, etc. The term "stacked"
when applied to filetests means followed by another filetest operator
sharing the same operand, as in -r -x -w $fooo
.
stat produces more consistent warnings. It no longer warns for "_"
[perl #71002] and no longer skips the warning at times for other unopened
handles. It no longer warns about an unopened handle when the operating
system's fstat
function fails.
stat would sometimes return negative numbers for large inode numbers,
because it was using the wrong internal C type. [perl #84590]
lstat is documented to fall back to stat (with a warning) when given
a filehandle. When passed an IO reference, it was actually doing the
equivalent of stat _
and ignoring the handle.
-T _
with no preceding stat used to produce a
confusing "uninitialized" warning, even though there
is no visible uninitialized value to speak of.
-T
, -B
, -l
and -t
now work
when stacked with other filetest operators
[perl #77388].
In 5.14.0, filetest ops (-r
, -x
, etc.) started calling FETCH on a
tied argument belonging to the previous argument to a list operator, if
called with a bareword argument or no argument at all. This has been
fixed, so push @foo, $tied, -r
no longer calls FETCH on $tied
.
In Perl 5.6, -l
followed by anything other than a bareword would treat
its argument as a file name. That was changed in 5.8 for glob references
(\*foo
), but not for globs themselves (*foo
). -l
started
returning undef for glob references without setting the last
stat buffer that the "_" handle uses, but only if warnings
were turned on. With warnings off, it was the same as 5.6.
In other words, it was simply buggy and inconsistent. Now the 5.6
behavior has been restored.
-l
followed by a bareword no longer "eats" the previous argument to
the list operator in whose argument list it resides. Hence,
print "bar", -l foo
now actually prints "bar", because -l
on longer eats it.
Perl keeps several internal variables to keep track of the last stat buffer, from which file(handle) it originated, what type it was, and whether the last stat succeeded.
There were various cases where these could get out of synch, resulting in
inconsistent or erratic behavior in edge cases (every mention of -T
applies to -B
as well):
-T HANDLE, even though it does a stat, was not resetting the last
stat type, so an lstat _
following it would merrily return the wrong
results. Also, it was not setting the success status.
Freeing the handle last used by stat or a filetest could result in
-T _
using an unrelated handle.
stat with an IO reference would not reset the stat type or record the
filehandle for -T _
to use.
Fatal warnings could cause the stat buffer not to be reset
for a filetest operator on an unopened filehandle or -l
on any handle.
Fatal warnings also stopped -T
from setting $!
.
When the last stat was on an unreadable file, -T _
is supposed to
return undef, leaving the last stat buffer unchanged. But it was
setting the stat type, causing lstat _
to stop working.
-T FILENAME was not resetting the internal stat buffers for
unreadable files.
These have all been fixed.
Several edge cases have been fixed with formats and formline;
in particular, where the format itself is potentially variable (such as
with ties and overloading), and where the format and data differ in their
encoding. In both these cases, it used to possible for the output to be
corrupted [perl #91032].
formline no longer converts its argument into a string in-place. So
passing a reference to formline no longer destroys the reference
[perl #79532].
Assignment to $^A
(the format output accumulator) now recalculates
the number of lines output.
given
and when
given
was not scoping its implicit $_ properly, resulting in memory
leaks or "Variable is not available" warnings [perl #94682].
given
was not calling set-magic on the implicit lexical $_
that it
uses. This meant, for example, that pos would be remembered from one
execution of the same given
block to the next, even if the input were a
different variable [perl #84526].
when
blocks are now capable of returning variables declared inside the
enclosing given
block [perl #93548].
glob operatorOn OSes other than VMS, Perl's glob operator (and the <...>
form)
use File::Glob underneath. File::Glob splits the pattern into words,
before feeding each word to its bsd_glob
function.
There were several inconsistencies in the way the split was done. Now quotation marks (' and ") are always treated as shell-style word delimiters (that allow whitespace as part of a word) and backslashes are always preserved, unless they exist to escape quotation marks. Before, those would only sometimes be the case, depending on whether the pattern contained whitespace. Also, escaped whitespace at the end of the pattern is no longer stripped [perl #40470].
CORE::glob
now works as a way to call the default globbing function. It
used to respect overrides, despite the CORE::
prefix.
Under miniperl (used to configure modules when perl itself is built),
glob now clears %ENV before calling csh, since the latter croaks on some
systems if it does not like the contents of the LS_COLORS environment
variable [perl #98662].
Explicit return now returns the actual argument passed to return, instead of copying it [perl #72724, #72706].
Lvalue subroutines used to enforce lvalue syntax (i.e., whatever can go on
the left-hand side of =
) for the last statement and the arguments to
return. Since lvalue subroutines are not always called in lvalue context,
this restriction has been lifted.
Lvalue subroutines are less restrictive about what values can be returned.
It used to croak on values returned by shift and delete and from
other subroutines, but no longer does so [perl #71172].
Empty lvalue subroutines (sub :lvalue {}
) used to return @_
in list
context. All subroutines used to do this, but regular subs were fixed in
Perl 5.8.2. Now lvalue subroutines have been likewise fixed.
Autovivification now works on values returned from lvalue subroutines
[perl #7946], as does returning keys in lvalue context.
Lvalue subroutines used to copy their return values in rvalue context. Not
only was this a waste of CPU cycles, but it also caused bugs. A ($)
prototype would cause an lvalue sub to copy its return value [perl #51408],
and while(lvalue_sub() =~ m/.../g) { ... }
would loop endlessly
[perl #78680].
When called in potential lvalue context
(e.g., subroutine arguments or a list
passed to for
), lvalue subroutines used to copy
any read-only value that was returned. E.g., sub :lvalue { $] }
would not return $]
, but a copy of it.
When called in potential lvalue context, an lvalue subroutine returning
arrays or hashes used to bind the arrays or hashes to scalar variables,
resulting in bugs. This was fixed in 5.14.0 if an array were the first
thing returned from the subroutine (but not for $scalar, @array
or
hashes being returned). Now a more general fix has been applied
[perl #23790].
Method calls whose arguments were all surrounded with my() or our()
(as in $object->method(my($a,$b))
) used to force lvalue context on
the subroutine. This would prevent lvalue methods from returning certain
values.
Lvalue sub calls that are not determined to be such at compile time
(&$name
or &{"name"}) are no longer exempt from strict refs if they
occur in the last statement of an lvalue subroutine [perl #102486].
Sub calls whose subs are not visible at compile time, if they occurred in the last statement of an lvalue subroutine, would reject non-lvalue subroutines and die with "Can't modify non-lvalue subroutine call" [perl #102486].
Non-lvalue sub calls whose subs are visible at compile time exhibited the opposite bug. If the call occurred in the last statement of an lvalue subroutine, there would be no error when the lvalue sub was called in lvalue context. Perl would blindly assign to the temporary value returned by the non-lvalue subroutine.
AUTOLOAD
routines used to take precedence over the actual sub being
called (i.e., when autoloading wasn't needed), for sub calls in lvalue or
potential lvalue context, if the subroutine was not visible at compile
time.
Applying the :lvalue
attribute to an XSUB or to an aliased subroutine
stub with sub foo :lvalue;
syntax stopped working in Perl 5.12.
This has been fixed.
Applying the :lvalue attribute to subroutine that is already defined does not work properly, as the attribute changes the way the sub is compiled. Hence, Perl 5.12 began warning when an attempt is made to apply the attribute to an already defined sub. In such cases, the attribute is discarded.
But the change in 5.12 missed the case where custom attributes are also
present: that case still silently and ineffectively applied the attribute.
That omission has now been corrected. sub foo :lvalue :Whatever
(when
foo
is already defined) now warns about the :lvalue attribute, and does
not apply it.
A bug affecting lvalue context propagation through nested lvalue subroutine calls has been fixed. Previously, returning a value in nested rvalue context would be treated as lvalue context by the inner subroutine call, resulting in some values (such as read-only values) being rejected.
Arithmetic assignment ($left += $right
) involving overloaded objects
that rely on the 'nomethod' override no longer segfault when the left
operand is not overloaded.
Errors that occur when methods cannot be found during overloading now mention the correct package name, as they did in 5.8.x, instead of erroneously mentioning the "overload" package, as they have since 5.10.0.
Undefining %overload::
no longer causes a crash.
The prototype function no longer dies for the __FILE__
, __LINE__
and __PACKAGE__
directives. It now returns an empty-string prototype
for them, because they are syntactically indistinguishable from nullary
functions like time.
prototype now returns undef for all overridable infix operators,
such as eq
, which are not callable in any way resembling functions.
It used to return incorrect prototypes for some and die for others
[perl #94984].
The prototypes of several built-in functions--getprotobynumber, lock,
not
and select--have been corrected, or at least are now closer to
reality than before.
/[[:ascii:]]/
and /[[:blank:]]/
now use locale rules under
use locale
when the platform supports that. Previously, they used
the platform's native character set.
m/[[:ascii:]]/i and /\p{ASCII}/i
now match identically (when not
under a differing locale). This fixes a regression introduced in 5.14
in which the first expression could match characters outside of ASCII,
such as the KELVIN SIGN.
/.*/g
would sometimes refuse to match at the end of a string that ends
with "\n". This has been fixed [perl #109206].
Starting with 5.12.0, Perl used to get its internal bookkeeping muddled up
after assigning ${ qr// }
to a hash element and locking it with
Hash::Util. This could result in double frees, crashes, or erratic
behavior.
The new (in 5.14.0) regular expression modifier /a
when repeated like
/aa
forbids the characters outside the ASCII range that match
characters inside that range from matching under /i. This did not
work under some circumstances, all involving alternation, such as:
- "\N{KELVIN SIGN}" =~ /k|foo/iaa;
succeeded inappropriately. This is now fixed.
5.14.0 introduced some memory leaks in regular expression character
classes such as [\w\s]
, which have now been fixed. (5.14.1)
An edge case in regular expression matching could potentially loop.
This happened only under /i in bracketed character classes that have
characters with multi-character folds, and the target string to match
against includes the first portion of the fold, followed by another
character that has a multi-character fold that begins with the remaining
portion of the fold, plus some more.
- "s\N{U+DF}" =~ /[\x{DF}foo]/i
is one such case. \xDF
folds to "ss"
. (5.14.1)
A few characters in regular expression pattern matches did not
match correctly in some circumstances, all involving /i. The
affected characters are:
COMBINING GREEK YPOGEGRAMMENI,
GREEK CAPITAL LETTER IOTA,
GREEK CAPITAL LETTER UPSILON,
GREEK PROSGEGRAMMENI,
GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA,
GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS,
GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA,
GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS,
LATIN SMALL LETTER LONG S,
LATIN SMALL LIGATURE LONG S T,
and
LATIN SMALL LIGATURE ST.
A memory leak regression in regular expression compilation under threading has been fixed.
A regression introduced in 5.14.0 has been fixed. This involved an inverted bracketed character class in a regular expression that consisted solely of a Unicode property. That property wasn't getting inverted outside the Latin1 range.
Three problematic Unicode characters now work better in regex pattern matching under /i.
In the past, three Unicode characters:
LATIN SMALL LETTER SHARP S,
GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS,
and
GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS,
along with the sequences that they fold to
(including "ss" for LATIN SMALL LETTER SHARP S),
did not properly match under /i. 5.14.0 fixed some of these cases,
but introduced others, including a panic when one of the characters or
sequences was used in the (?(DEFINE)
regular expression predicate.
The known bugs that were introduced in 5.14 have now been fixed; as well
as some other edge cases that have never worked until now. These all
involve using the characters and sequences outside bracketed character
classes under /i. This closes [perl #98546].
There remain known problems when using certain characters with
multi-character folds inside bracketed character classes, including such
constructs as qr/[\N{LATIN SMALL LETTER SHARP}a-z]/i
. These
remaining bugs are addressed in [perl #89774].
RT #78266: The regex engine has been leaking memory when accessing named captures that weren't matched as part of a regex ever since 5.10 when they were introduced; e.g., this would consume over a hundred MB of memory:
In 5.14, /[[:lower:]]/i
and /[[:upper:]]/i
no longer matched the
opposite case. This has been fixed [perl #101970].
A regular expression match with an overloaded object on the right-hand side would sometimes stringify the object too many times.
A regression has been fixed that was introduced in 5.14, in /i
regular expression matching, in which a match improperly fails if the
pattern is in UTF-8, the target string is not, and a Latin-1 character
precedes a character in the string that should match the pattern.
[perl #101710]
In case-insensitive regular expression pattern matching, no longer on
UTF-8 encoded strings does the scan for the start of match look only at
the first possible position. This caused matches such as
"f\x{FB00}" =~ /ff/i
to fail.
The regexp optimizer no longer crashes on debugging builds when merging fixed-string nodes with inconvenient contents.
A panic involving the combination of the regular expression modifiers
/aa
and the \b
escape sequence introduced in 5.14.0 has been
fixed [perl #95964]. (5.14.2)
The combination of the regular expression modifiers /aa
and the \b
and \B
escape sequences did not work properly on UTF-8 encoded
strings. All non-ASCII characters under /aa
should be treated as
non-word characters, but what was happening was that Unicode rules were
used to determine wordness/non-wordness for non-ASCII characters. This
is now fixed [perl #95968].
(?foo: ...)
no longer loses passed in character set.
The trie optimization used to have problems with alternations containing
an empty (?:), causing "x" =~ /\A(?>(?:(?:)A|B|C?x))\z/
not to
match, whereas it should [perl #111842].
Use of lexical (my) variables in code blocks embedded in regular
expressions will no longer result in memory corruption or crashes.
Nevertheless, these code blocks are still experimental, as there are still
problems with the wrong variables being closed over (in loops for instance)
and with abnormal exiting (e.g., die) causing memory corruption.
The \h
, \H
, \v
and \V
regular expression metacharacters used to
cause a panic error message when trying to match at the end of the
string [perl #96354].
The abbreviations for four C1 control characters MW
PM
, RI
, and
ST
were previously unrecognized by \N{}
, vianame(), and
string_vianame().
Mentioning a variable named "&" other than $&
(i.e., @&
or %&
) no
longer stops $&
from working. The same applies to variables named "'"
and "`" [perl #24237].
Creating a UNIVERSAL::AUTOLOAD
sub no longer stops %+
, %-
and
%!
from working some of the time [perl #105024].
~~
now correctly handles the precedence of Any~~Object, and is not tricked
by an overloaded object on the left-hand side.
In Perl 5.14.0, $tainted ~~ @array
stopped working properly. Sometimes
it would erroneously fail (when $tainted
contained a string that occurs
in the array after the first element) or erroneously succeed (when
undef occurred after the first element) [perl #93590].
sort operatorsort was not treating sub {}
and sub {()}
as equivalent when
such a sub was provided as the comparison routine. It used to croak on
sub {()}
.
sort now works once more with custom sort routines that are XSUBs. It
stopped working in 5.10.0.
sort with a constant for a custom sort routine, although it produces
unsorted results, no longer crashes. It started crashing in 5.10.0.
Warnings emitted by sort when a custom comparison routine returns a
non-numeric value now contain "in sort" and show the line number of the
sort operator, rather than the last line of the comparison routine. The
warnings also now occur only if warnings are enabled in the scope where
sort occurs. Previously the warnings would occur if enabled in the
comparison routine's scope.
sort { $a <=> $b }
, which is optimized internally, now produces
"uninitialized" warnings for NaNs (not-a-number values), since <=>
returns undef for those. This brings it in line with
sort { 1; $a <=> $b }
and other more complex cases, which are not
optimized [perl #94390].
substr operatorTied (and otherwise magical) variables are no longer exempt from the "Attempt to use reference as lvalue in substr" warning.
That warning now occurs when the returned lvalue is assigned to, not
when substr itself is called. This makes a difference only if the
return value of substr is referenced and later assigned to.
Passing a substring of a read-only value or a typeglob to a function (potential lvalue context) no longer causes an immediate "Can't coerce" or "Modification of a read-only value" error. That error occurs only if the passed value is assigned to.
The same thing happens with the "substr outside of string" error. If
the lvalue is only read from, not written to, it is now just a warning, as
with rvalue substr.
substr assignments no longer call FETCH twice if the first argument
is a tied variable, just once.
Some parts of Perl did not work correctly with nulls (chr 0
) embedded in
strings. That meant that, for instance, $m = "a\0b"; foo->$m
would
call the "a" method, instead of the actual method name contained in $m.
These parts of perl have been fixed to support nulls:
Method names
Typeglob names (including filehandle and subroutine names)
Package names, including the return value of ref()
Typeglob elements (*foo{"THING\0stuff"}
)
Signal names
Various warnings and error messages that mention variable names or values, methods, etc.
One side effect of these changes is that blessing into "\0" no longer
causes ref() to return false.
Typeglobs returned from threads are no longer cloned if the parent thread already has a glob with the same name. This means that returned subroutines will now assign to the right package variables [perl #107366].
Some cases of threads crashing due to memory allocation during cloning have been fixed [perl #90006].
Thread joining would sometimes emit "Attempt to free unreferenced scalar"
warnings if caller had been used from the DB
package before thread
creation [perl #98092].
Locking a subroutine (via lock &sub
) is no longer a compile-time error
for regular subs. For lvalue subroutines, it no longer tries to return the
sub as a scalar, resulting in strange side effects like ref \$_
returning "CODE" in some instances.
lock &sub
is now a run-time error if threads::shared is loaded (a
no-op otherwise), but that may be rectified in a future version.
Various cases in which FETCH was being ignored or called too many times have been fixed:
PerlIO::get_layers
[perl #97956]
$tied =~ y/a/b/
, chop $tied
and chomp $tied
when $tied holds a
reference.
When calling local $_
[perl #105912]
Four-argument select
A tied buffer passed to sysread
$tied .= <>
Three-argument open, the third being a tied file handle
(as in open $fh, ">&", $tied
)
sort with a reference to a tied glob for the comparison routine.
..
and ...
in list context [perl #53554].
${$tied}
, @{$tied}
, %{$tied}
and *{$tied}
where the tied
variable returns a string (&{}
was unaffected)
defined ${ $tied_variable }
Various functions that take a filehandle argument in rvalue context
(close, readline, etc.) [perl #97482]
Some cases of dereferencing a complex expression, such as
${ (), $tied } = 1
, used to call FETCH
multiple times, but now call
it once.
$tied->method
where $tied returns a package name--even resulting in
a failure to call the method, due to memory corruption
Assignments like *$tied = \&{"..."}
and *glob = $tied
chdir, chmod, chown, utime, truncate, stat, lstat and
the filetest ops (-r
, -x
, etc.)
caller sets @DB::args
to the subroutine arguments when called from
the DB package. It used to crash when doing so if @DB::args
happened to
be tied. Now it croaks instead.
Tying an element of %ENV or %^H
and then deleting that element would
result in a call to the tie object's DELETE method, even though tying the
element itself is supposed to be equivalent to tying a scalar (the element
is, of course, a scalar) [perl #67490].
When Perl autovivifies an element of a tied array or hash (which entails calling STORE with a new reference), it now calls FETCH immediately after the STORE, instead of assuming that FETCH would have returned the same reference. This can make it easier to implement tied objects [perl #35865, #43011].
Four-argument select no longer produces its "Non-string passed as
bitmask" warning on tied or tainted variables that are strings.
Localizing a tied scalar that returns a typeglob no longer stops it from being tied till the end of the scope.
Attempting to goto out of a tied handle method used to cause memory
corruption or crashes. Now it produces an error message instead
[perl #8611].
A bug has been fixed that occurs when a tied variable is used as a
subroutine reference: if the last thing assigned to or returned from the
variable was a reference or typeglob, the \&$tied
could either crash or
return the wrong subroutine. The reference case is a regression introduced
in Perl 5.10.0. For typeglobs, it has probably never worked till now.
The bitwise complement operator (and possibly other operators, too) when
passed a vstring would leave vstring magic attached to the return value,
even though the string had changed. This meant that
version->new(~v1.2.3)
would create a version looking like "v1.2.3"
even though the string passed to version->new
was actually
"\376\375\374". This also caused B::Deparse to deparse ~v1.2.3
incorrectly, without the ~
[perl #29070].
Assigning a vstring to a magic (e.g., tied, $!
) variable and then
assigning something else used to blow away all magic. This meant that
tied variables would come undone, $!
would stop getting updated on
failed system calls, $|
would stop setting autoflush, and other
mischief would take place. This has been fixed.
version->new("version")
and printf "%vd", "version"
no longer
crash [perl #102586].
Version comparisons, such as those that happen implicitly with use
v5.43
, no longer cause locale settings to change [perl #105784].
Version objects no longer cause memory leaks in boolean context [perl #109762].
Subroutines from the autouse
namespace are once more exempt from
redefinition warnings. This used to work in 5.005, but was broken in
5.6 for most subroutines. For subs created via XS that redefine
subroutines from the autouse
package, this stopped working in 5.10.
New XSUBs now produce redefinition warnings if they overwrite existing
subs, as they did in 5.8.x. (The autouse
logic was reversed in
5.10-14. Only subroutines from the autouse
namespace would warn
when clobbered.)
newCONSTSUB
used to use compile-time warning hints, instead of
run-time hints. The following code should never produce a redefinition
warning, but it used to, if newCONSTSUB
redefined an existing
subroutine:
Redefinition warnings for constant subroutines are on by default (what are known as severe warnings in perldiag). This occurred only when it was a glob assignment or declaration of a Perl subroutine that caused the warning. If the creation of XSUBs triggered the warning, it was not a default warning. This has been corrected.
The internal check to see whether a redefinition warning should occur used to emit "uninitialized" warnings in cases like this:
Various functions that take a filehandle argument in rvalue context
(close, readline, etc.) used to warn twice for an undefined handle
[perl #97482].
dbmopen now only warns once, rather than three times, if the mode
argument is undef [perl #90064].
The +=
operator does not usually warn when the left-hand side is
undef, but it was doing so for tied variables. This has been fixed
[perl #44895].
A bug fix in Perl 5.14 introduced a new bug, causing "uninitialized"
warnings to report the wrong variable if the operator in question had
two operands and one was %{...}
or @{...}
. This has been fixed
[perl #103766].
..
and ...
in list context now mention the name of the variable in
"uninitialized" warnings for string (as opposed to numeric) ranges.
Weakening the first argument to an automatically-invoked DESTROY
method
could result in erroneous "DESTROY created new reference" errors or
crashes. Now it is an error to weaken a read-only reference.
Weak references to lexical hashes going out of scope were not going stale (becoming undefined), but continued to point to the hash.
Weak references to lexical variables going out of scope are now broken before any magical methods (e.g., DESTROY on a tie object) are called. This prevents such methods from modifying the variable that will be seen the next time the scope is entered.
Creating a weak reference to an @ISA array or accessing the array index
($#ISA
) could result in confused internal bookkeeping for elements
later added to the @ISA array. For instance, creating a weak
reference to the element itself could push that weak reference on to @ISA;
and elements added after use of $#ISA
would be ignored by method lookup
[perl #85670].
quotemeta now quotes consistently the same non-ASCII characters under
use feature 'unicode_strings'
, regardless of whether the string is
encoded in UTF-8 or not, hence fixing the last vestiges (we hope) of the
notorious The Unicode Bug in perlunicode. [perl #77654].
Which of these code points is quoted has changed, based on Unicode's recommendations. See quotemeta for details.
study is now a no-op, presumably fixing all outstanding bugs related to
study causing regex matches to behave incorrectly!
When one writes open foo || die
, which used to work in Perl 4, a
"Precedence problem" warning is produced. This warning used erroneously to
apply to fully-qualified bareword handle names not followed by ||. This
has been corrected.
After package aliasing (*foo:: = *bar::
), select with 0 or 1 argument
would sometimes return a name that could not be used to refer to the
filehandle, or sometimes it would return undef even when a filehandle
was selected. Now it returns a typeglob reference in such cases.
PerlIO::get_layers
no longer ignores some arguments that it thinks are
numeric, while treating others as filehandle names. It is now consistent
for flat scalars (i.e., not references).
Unrecognized switches on #!
line
If a switch, such as -x, that cannot occur on the #!
line is used
there, perl dies with "Can't emulate...".
It used to produce the same message for switches that perl did not
recognize at all, whether on the command line or the #!
line.
Now it produces the "Unrecognized switch" error message [perl #104288].
system now temporarily blocks the SIGCHLD signal handler, to prevent the
signal handler from stealing the exit status [perl #105700].
The %n formatting code for printf and sprintf, which causes the number
of characters to be assigned to the next argument, now actually
assigns the number of characters, instead of the number of bytes.
It also works now with special lvalue functions like substr and with
nonexistent hash and array elements [perl #3471, #103492].
Perl skips copying values returned from a subroutine, for the sake of
speed, if doing so would make no observable difference. Because of faulty
logic, this would happen with the
result of delete, shift or splice, even if the result was
referenced elsewhere. It also did so with tied variables about to be freed
[perl #91844, #95548].
utf8::decode
now refuses to modify read-only scalars [perl #91850].
Freeing $_ inside a grep or map block, a code block embedded in a
regular expression, or an @INC filter (a subroutine returned by a
subroutine in @INC) used to result in double frees or crashes
[perl #91880, #92254, #92256].
eval returns undef in scalar context or an empty list in list
context when there is a run-time error. When eval was passed a
string in list context and a syntax error occurred, it used to return a
list containing a single undefined element. Now it returns an empty
list in list context for all errors [perl #80630].
goto &func
no longer crashes, but produces an error message, when
the unwinding of the current subroutine's scope fires a destructor that
undefines the subroutine being "goneto" [perl #99850].
Perl now holds an extra reference count on the package that code is currently compiling in. This means that the following code no longer crashes [perl #101486]:
- package Foo;
- BEGIN {*Foo:: = *Bar::}
- sub foo;
The x
repetition operator no longer crashes on 64-bit builds with large
repeat counts [perl #94560].
Calling require on an implicit $_
when *CORE::GLOBAL::require
has
been overridden does not segfault anymore, and $_
is now passed to the
overriding subroutine [perl #78260].
use and require are no longer affected by the I/O layers active in
the caller's scope (enabled by open.pm) [perl #96008].
our $::é; $é
(which is invalid) no longer produces the "Compilation
error at lib/utf8_heavy.pl..." error message, which it started emitting in
5.10.0 [perl #99984].
On 64-bit systems, read() now understands large string offsets beyond
the 32-bit range.
Errors that occur when processing subroutine attributes no longer cause the subroutine's op tree to leak.
Passing the same constant subroutine to both index and formline no
longer causes one or the other to fail [perl #89218]. (5.14.1)
List assignment to lexical variables declared with attributes in the same
statement (my ($x,@y) : blimp = (72,94)
) stopped working in Perl 5.8.0.
It has now been fixed.
Perl 5.10.0 introduced some faulty logic that made "U*" in the middle of a pack template equivalent to "U0" if the input string was empty. This has been fixed [perl #90160]. (5.14.2)
Destructors on objects were not called during global destruction on objects
that were not referenced by any scalars. This could happen if an array
element were blessed (e.g., bless \$a[0]
) or if a closure referenced a
blessed variable (bless \my @a; sub foo { @a }
).
Now there is an extra pass during global destruction to fire destructors on any objects that might be left after the usual passes that check for objects referenced by scalars [perl #36347].
Fixed a case where it was possible that a freed buffer may have been read from when parsing a here document [perl #90128]. (5.14.1)
each(ARRAY) is now wrapped in defined(...), like each(HASH),
inside a while
condition [perl #90888].
A problem with context propagation when a do block is an argument to
return has been fixed. It used to cause undef to be returned in
certain cases of a return inside an if
block which itself is followed by
another return.
Calling index with a tainted constant no longer causes constants in
subsequently compiled code to become tainted [perl #64804].
Infinite loops like 1 while 1
used to stop strict 'subs'
mode from
working for the rest of the block.
For list assignments like ($a,$b) = ($b,$a)
, Perl has to make a copy of
the items on the right-hand side before assignment them to the left. For
efficiency's sake, it assigns the values on the right straight to the items
on the left if no one variable is mentioned on both sides, as in ($a,$b) =
($c,$d)
. The logic for determining when it can cheat was faulty, in that
&& and || on the right-hand side could fool it. So ($a,$b) =
$some_true_value && ($b,$a)
would end up assigning the value of $b
to
both scalars.
Perl no longer tries to apply lvalue context to the string in
("string", $variable) ||= 1
(which used to be an error). Since the
left-hand side of ||=
is evaluated in scalar context, that's a scalar
comma operator, which gives all but the last item void context. There is
no such thing as void lvalue context, so it was a mistake for Perl to try
to force it [perl #96942].
caller no longer leaks memory when called from the DB package if
@DB::args
was assigned to after the first call to caller. Carp
was triggering this bug [perl #97010]. (5.14.2)
close and similar filehandle functions, when called on built-in global
variables (like $+
), used to die if the variable happened to hold the
undefined value, instead of producing the usual "Use of uninitialized
value" warning.
When autovivified file handles were introduced in Perl 5.6.0, readline
was inadvertently made to autovivify when called as readline($foo) (but
not as <$foo>
). It has now been fixed never to autovivify.
Calling an undefined anonymous subroutine (e.g., what $x holds after
undef &{$x = sub{}}
) used to cause a "Not a CODE reference" error, which
has been corrected to "Undefined subroutine called" [perl #71154].
Causing @DB::args
to be freed between uses of caller no longer
results in a crash [perl #93320].
setpgrp($foo) used to be equivalent to ($foo, setpgrp)
, because
setpgrp was ignoring its argument if there was just one. Now it is
equivalent to setpgrp($foo,0).
shmread was not setting the scalar flags correctly when reading from
shared memory, causing the existing cached numeric representation in the
scalar to persist [perl #98480].
++
and --
now work on copies of globs, instead of dying.
splice() doesn't warn when truncating
You can now limit the size of an array using splice(@a,MAX_LEN) without
worrying about warnings.
$$
is no longer tainted. Since this value comes directly from
getpid()
, it is always safe.
The parser no longer leaks a filehandle if STDIN was closed before parsing started [perl #37033].
die; with a non-reference, non-string, or magical (e.g., tainted)
value in $@ now properly propagates that value [perl #111654].
On Solaris, we have two kinds of failure.
If make is Sun's make, we get an error about a badly formed macro assignment in the Makefile. That happens when ./Configure tries to make depends. Configure then exits 0, but further make-ing fails.
If make is gmake, Configure completes, then we get errors related to /usr/include/stdbool.h
On Win32, a number of tests hang unless STDERR is redirected. The cause of this is still under investigation.
When building as root with a umask that prevents files from being other-readable, t/op/filetest.t will fail. This is a test bug, not a bug in perl's behavior.
Configuring with a recent gcc and link-time-optimization, such as
Configure -Doptimize='-O2 -flto'
fails
because the optimizer optimizes away some of Configure's tests. A
workaround is to omit the -flto
flag when running Configure, but add
it back in while actually building, something like
- sh Configure -Doptimize=-O2
- make OPTIMIZE='-O2 -flto'
The following CPAN modules have test failures with perl 5.16. Patches have been submitted for all of these, so hopefully there will be new releases soon:
Date::Pcalc version 6.1
Module::CPANTS::Analyse version 0.85
This fails due to problems in Module::Find 0.10 and File::MMagic 1.27.
PerlIO::Util version 0.72
Perl 5.16.0 represents approximately 12 months of development since Perl 5.14.0 and contains approximately 590,000 lines of changes across 2,500 files from 139 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.16.0:
Aaron Crane, Abhijit Menon-Sen, Abigail, Alan Haggai Alavi, Alberto Simões, Alexandr Ciornii, Andreas König, Andy Dougherty, Aristotle Pagaltzis, Bo Johansson, Bo Lindbergh, Breno G. de Oliveira, brian d foy, Brian Fraser, Brian Greenfield, Carl Hayter, Chas. Owens, Chia-liang Kao, Chip Salzenberg, Chris 'BinGOs' Williams, Christian Hansen, Christopher J. Madsen, chromatic, Claes Jacobsson, Claudio Ramirez, Craig A. Berry, Damian Conway, Daniel Kahn Gillmor, Darin McBride, Dave Rolsky, David Cantrell, David Golden, David Leadbeater, David Mitchell, Dee Newcum, Dennis Kaarsemaker, Dominic Hargreaves, Douglas Christopher Wilson, Eric Brine, Father Chrysostomos, Florian Ragwitz, Frederic Briere, George Greer, Gerard Goossen, Gisle Aas, H.Merijn Brand, Hojung Youn, Ian Goodacre, James E Keenan, Jan Dubois, Jerry D. Hedden, Jesse Luehrs, Jesse Vincent, Jilles Tjoelker, Jim Cromie, Jim Meyering, Joel Berger, Johan Vromans, Johannes Plunien, John Hawkinson, John P. Linderman, John Peacock, Joshua ben Jore, Juerd Waalboer, Karl Williamson, Karthik Rajagopalan, Keith Thompson, Kevin J. Woolley, Kevin Ryde, Laurent Dami, Leo Lapworth, Leon Brocard, Leon Timmermans, Louis Strous, Lukas Mai, Marc Green, Marcel Grünauer, Mark A. Stratman, Mark Dootson, Mark Jason Dominus, Martin Hasch, Matthew Horsfall, Max Maischein, Michael G Schwern, Michael Witten, Mike Sheldrake, Moritz Lenz, Nicholas Clark, Niko Tyni, Nuno Carvalho, Pau Amma, Paul Evans, Paul Green, Paul Johnson, Perlover, Peter John Acklam, Peter Martini, Peter Scott, Phil Monsen, Pino Toscano, Rafael Garcia-Suarez, Rainer Tammer, Reini Urban, Ricardo Signes, Robin Barker, Rodolfo Carvalho, Salvador Fandiño, Sam Kimbrel, Samuel Thibault, Shawn M Moore, Shigeya Suzuki, Shirakata Kentaro, Shlomi Fish, Sisyphus, Slaven Rezic, Spiros Denaxas, Steffen Müller, Steffen Schwigon, Stephen Bennett, Stephen Oberholtzer, Stevan Little, Steve Hay, Steve Peters, Thomas Sibley, Thorsten Glaser, Timothe Litt, Todd Rinaldo, Tom Christiansen, Tom Hukins, Tony Cook, Vadim Konovalov, Vincent Pit, Vladimir Timofeev, Walt Mankowski, Yves Orton, Zefram, Zsbán Ambrus, Ævar Arnfjörð Bjarmason.
The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.
Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/. There may also be information at http://www.perl.org/, the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please use this address only for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5161delta - what is new for perl v5.16.1
This document describes differences between the 5.16.0 release and the 5.16.1 release.
If you are upgrading from an earlier release such as 5.14.0, first read perl5160delta, which describes differences between 5.14.0 and 5.16.0.
The bugfix was in Scalar-List-Util 1.23_04, and perl 5.16.1 includes Scalar-List-Util 1.25.
There are no changes intentionally incompatible with 5.16.0 If any exist, they are bugs, and we request that you submit a report. See Reporting Bugs below.
Scalar::Util and List::Util have been upgraded from version 1.23 to version 1.25.
B::Deparse has been updated from version 1.14 to 1.14_01. An "uninitialized" warning emitted by B::Deparse has been squashed [perl #113464].
Building perl with some Windows compilers used to fail due to a problem
with miniperl's glob operator (which uses the perlglob
program)
deleting the PATH environment variable [perl #113798].
All C header files from the top-level directory of the distribution are now installed on VMS, providing consistency with a long-standing practice on other platforms. Previously only a subset were installed, which broke non-core extension builds for extensions that depended on the missing include files.
A regression introduced in Perl v5.16.0 involving
tr/SEARCHLIST/REPLACEMENTLIST/ has been fixed. Only the first
instance is supposed to be meaningful if a character appears more than
once in SEARCHLIST. Under some circumstances, the final instance
was overriding all earlier ones. [perl #113584]
B::COP::stashlen
has been added. This provides access to an internal
field added in perl 5.16 under threaded builds. It was broken at the last
minute before 5.16 was released [perl #113034].
The re pragma will no longer clobber $_
. [perl #113750]
Unicode 6.1 published an incorrect alias for one of the
Canonical_Combining_Class property's values (which range between 0 and
254). The alias CCC133
should have been CCC132
. Perl now
overrides the data file furnished by Unicode to give the correct value.
Duplicating scalar filehandles works again. [perl #113764]
Under threaded perls, a runtime code block in a regular expression could
corrupt the package name stored in the op tree, resulting in bad reads
in caller, and possibly crashes [perl #113060].
For efficiency's sake, many operators and built-in functions return the
same scalar each time. Lvalue subroutines and subroutines in the CORE::
namespace were allowing this implementation detail to leak through.
print &CORE::uc("a"), &CORE::uc("b")
used to print "BB". The same thing
would happen with an lvalue subroutine returning the return value of uc.
Now the value is copied in such cases [perl #113044].
__SUB__ now works in special blocks (BEGIN
, END
, etc.).
Formats that reference lexical variables from outside no longer result in crashes.
There are no new known problems, but consult Known Problems in perl5160delta to see those identified in the 5.16.0 release.
Perl 5.16.1 represents approximately 2 months of development since Perl 5.16.0 and contains approximately 14,000 lines of changes across 96 files from 8 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.16.1:
Chris 'BinGOs' Williams, Craig A. Berry, Father Chrysostomos, Karl Williamson, Paul Johnson, Reini Urban, Ricardo Signes, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.
Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5162delta - what is new for perl v5.16.2
This document describes differences between the 5.16.1 release and the 5.16.2 release.
If you are upgrading from an earlier release such as 5.16.0, first read perl5161delta, which describes differences between 5.16.0 and 5.16.1.
There are no changes intentionally incompatible with 5.16.0 If any exist, they are bugs, and we request that you submit a report. See Reporting Bugs below.
Module::CoreList has been upgraded from version 2.70 to version 2.76.
Configure now always adds -qlanglvl=extc99 to the CC flags on AIX when using xlC. This will make it easier to compile a number of XS-based modules that assume C99 [perl #113778].
There are no new known problems.
Perl 5.16.2 represents approximately 2 months of development since Perl 5.16.1 and contains approximately 740 lines of changes across 20 files from 9 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.16.2:
Andy Dougherty, Craig A. Berry, Darin McBride, Dominic Hargreaves, Karen Etheridge, Karl Williamson, Peter Martini, Ricardo Signes, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.
For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5163delta - what is new for perl v5.16.3
This document describes differences between the 5.16.2 release and the 5.16.3 release.
If you are upgrading from an earlier release such as 5.16.1, first read perl5162delta, which describes differences between 5.16.1 and 5.16.2.
No changes since 5.16.0.
This release contains one major and a number of minor security fixes. These latter are included mainly to allow the test suite to pass cleanly with the clang compiler's address sanitizer facility.
With a carefully crafted set of hash keys (for example arguments on a URL), it is possible to cause a hash to consume a large amount of memory and CPU, and thus possibly to achieve a Denial-of-Service.
This problem has been fixed.
Reading or writing strings greater than 2**31 bytes in size could segfault due to integer wraparound.
This problem has been fixed.
The UTF-8 encoding implementation in Encode.xs had a memory leak which has been fixed.
There are no changes intentionally incompatible with 5.16.0. If any exist, they are bugs and reports are welcome.
There have been no deprecations since 5.16.0.
Encode has been upgraded from version 2.44 to version 2.44_01.
Module::CoreList has been upgraded from version 2.76 to version 2.76_02.
XS::APItest has been upgraded from version 0.38 to version 0.39.
None.
Perl 5.16.3 represents approximately 4 months of development since Perl 5.16.2 and contains approximately 870 lines of changes across 39 files from 7 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.16.3:
Andy Dougherty, Chris 'BinGOs' Williams, Dave Rolsky, David Mitchell, Michael Schroeder, Ricardo Signes, Yves Orton.
The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.
For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5180delta - what is new for perl v5.18.0
This document describes differences between the v5.16.0 release and the v5.18.0 release.
If you are upgrading from an earlier release such as v5.14.0, first read perl5160delta, which describes differences between v5.14.0 and v5.16.0.
Newly-added experimental features will now require this incantation:
There is a new warnings category, called "experimental", containing warnings that the feature pragma emits when enabling experimental features.
Newly-added experimental features will also be given special warning IDs, which consist of "experimental::" followed by the name of the feature. (The plan is to extend this mechanism eventually to all warnings, to allow them to be enabled or disabled individually, and not just by category.)
By saying
- no warnings "experimental::feature_name";
you are taking responsibility for any breakage that future changes to, or removal of, the feature may cause.
Since some features (like ~~
or my $_
) now emit experimental warnings,
and you may want to disable them in code that is also run on perls that do not
recognize these warning categories, consider using the if
pragma like this:
Existing experimental features may begin emitting these warnings, too. Please consult perlexperiment for information on which features are considered experimental.
Changes to the implementation of hashes in perl v5.18.0 will be one of the most visible changes to the behavior of existing code.
By default, two distinct hash variables with identical keys and values may now provide their contents in a different order where it was previously identical.
When encountering these changes, the key to cleaning up from them is to accept that hashes are unordered collections and to act accordingly.
The seed used by Perl's hash function is now random. This means that the
order which keys/values will be returned from functions like keys(),
values(), and each() will differ from run to run.
This change was introduced to make Perl's hashes more robust to algorithmic complexity attacks, and also because we discovered that it exposes hash ordering dependency bugs and makes them easier to track down.
Toolchain maintainers might want to invest in additional infrastructure to test for things like this. Running tests several times in a row and then comparing results will make it easier to spot hash order dependencies in code. Authors are strongly encouraged not to expose the key order of Perl's hashes to insecure audiences.
Further, every hash has its own iteration order, which should make it much more difficult to determine what the current hash seed is.
Perl v5.18 includes support for multiple hash functions, and changed the default (to ONE_AT_A_TIME_HARD), you can choose a different algorithm by defining a symbol at compile time. For a current list, consult the INSTALL document. Note that as of Perl v5.18 we can only recommend use of the default or SIPHASH. All the others are known to have security issues and are for research purposes only.
PERL_HASH_SEED
no longer accepts an integer as a parameter;
instead the value is expected to be a binary value encoded in a hex
string, such as "0xf5867c55039dc724". This is to make the
infrastructure support hash seeds of arbitrary lengths, which might
exceed that of an integer. (SipHash uses a 16 byte seed.)
The PERL_PERTURB_KEYS
environment variable allows one to control the level of
randomization applied to keys and friends.
When PERL_PERTURB_KEYS
is 0, perl will not randomize the key order at all. The
chance that keys changes due to an insert will be the same as in previous
perls, basically only when the bucket size is changed.
When PERL_PERTURB_KEYS
is 1, perl will randomize keys in a non-repeatable
way. The chance that keys changes due to an insert will be very high. This
is the most secure and default mode.
When PERL_PERTURB_KEYS
is 2, perl will randomize keys in a repeatable way.
Repeated runs of the same program should produce the same output every time.
PERL_HASH_SEED
implies a non-default PERL_PERTURB_KEYS
setting. Setting
PERL_HASH_SEED=0
(exactly one 0) implies PERL_PERTURB_KEYS=0
(hash key
randomization disabled); settng PERL_HASH_SEED
to any other value implies
PERL_PERTURB_KEYS=2
(deterministic and repeatable hash key randomization).
Specifying PERL_PERTURB_KEYS
explicitly to a different level overrides this
behavior.
Hash::Util::hash_seed() now returns a string instead of an integer. This is to make the infrastructure support hash seeds of arbitrary lengths which might exceed that of an integer. (SipHash uses a 16 byte seed.)
The environment variable PERL_HASH_SEED_DEBUG now makes perl show both the hash function perl was built with, and the seed, in hex, in use for that process. Code parsing this output, should it exist, must change to accommodate the new format. Example of the new format:
- $ PERL_HASH_SEED_DEBUG=1 ./perl -e1
- HASH_FUNCTION = MURMUR3 HASH_SEED = 0x1476bb9f
Perl now supports Unicode 6.2. A list of changes from Unicode 6.1 is at http://www.unicode.org/versions/Unicode6.2.0.
It is possible to define your own names for characters for use in
\N{...}
, charnames::vianame()
, etc. These names can now be
comprised of characters from the whole Unicode range. This allows for
names to be in your native language, and not just English. Certain
restrictions apply to the characters that may be used (you can't define
a name that has punctuation in it, for example). See CUSTOM ALIASES in charnames.
The following new DTrace probes have been added:
op-entry
loading-file
loaded-file
${^LAST_FH}
This new variable provides access to the filehandle that was last read.
This is the handle used by $.
and by tell and eof without
arguments.
This is an experimental feature to allow matching against the union,
intersection, etc., of sets of code points, similar to
Unicode::Regex::Set. It can also be used to extend /x processing
to [bracketed] character classes, and as a replacement of user-defined
properties, allowing more complex expressions than they do. See
Extended Bracketed Character Classes in perlrecharclass.
This new feature is still considered experimental. To enable it:
You can now declare subroutines with state sub foo
, my sub foo
, and
our sub foo
. (state sub
requires that the "state" feature be
enabled, unless you write it as CORE::state sub foo
.)
state sub
creates a subroutine visible within the lexical scope in which
it is declared. The subroutine is shared between calls to the outer sub.
my sub
declares a lexical subroutine that is created each time the
enclosing block is entered. state sub
is generally slightly faster than
my sub
.
our sub
declares a lexical alias to the package subroutine of the same
name.
For more information, see Lexical Subroutines in perlsub.
The loop controls next, last and redo, and the special dump
operator, now allow arbitrary expressions to be used to compute labels at run
time. Previously, any argument that was not a constant was treated as the
empty string.
Several more built-in functions have been added as subroutines to the
CORE:: namespace - namely, those non-overridable keywords that can be
implemented without custom parsers: defined, delete, exists,
glob, pos, protoytpe
, scalar, split, study, and undef.
As some of these have prototypes, prototype('CORE::...') has been
changed to not make a distinction between overridable and non-overridable
keywords. This is to make prototype('CORE::pos') consistent with
prototype(&CORE::pos).
kill with negative signal nameskill has always allowed a negative signal number, which kills the
process group instead of a single process. It has also allowed signal
names. But it did not behave consistently, because negative signal names
were treated as 0. Now negative signals names like -INT
are supported
and treated the same way as -2 [perl #112990].
Some of the changes in the hash overhaul were made to enhance security. Please read that section.
Storable
security warning in documentationThe documentation for Storable
now includes a section which warns readers
of the danger of accepting Storable documents from untrusted sources. The
short version is that deserializing certain types of data can lead to loading
modules and other code execution. This is documented behavior and wanted
behavior, but this opens an attack vector for malicious entities.
Locale::Maketext
allowed code injection via a malicious templateIf users could provide a translation string to Locale::Maketext, this could be used to invoke arbitrary Perl subroutines available in the current process.
This has been fixed, but it is still possible to invoke any method provided by
Locale::Maketext
itself or a subclass that you are using. One of these
methods in turn will invoke the Perl core's sprintf subroutine.
In summary, allowing users to provide translation strings without auditing them is a bad idea.
This vulnerability is documented in CVE-2012-6329.
Poorly written perl code that allows an attacker to specify the count to perl's
x
string repeat operator can already cause a memory exhaustion
denial-of-service attack. A flaw in versions of perl before v5.15.5 can escalate
that into a heap buffer overrun; coupled with versions of glibc before 2.16, it
possibly allows the execution of arbitrary code.
The flaw addressed to this commit has been assigned identifier CVE-2012-5195 and was researched by Tim Brown.
Some of the changes in the hash overhaul are not fully compatible with previous versions of perl. Please read that section.
\N{...}
is now a syntax errorPreviously, it warned, and the Unicode REPLACEMENT CHARACTER was substituted. Unicode now recommends that this situation be a syntax error. Also, the previous behavior led to some confusing warnings and behaviors, and since the REPLACEMENT CHARACTER has no use other than as a stand-in for some unknown character, any code that has this problem is buggy.
\N{}
character name aliases are now errors.Since v5.12.0, it has been deprecated to use certain characters in
user-defined \N{...}
character names. These now cause a syntax
error. For example, it is now an error to begin a name with a digit,
such as in
- my $undraftable = "\N{4F}"; # Syntax error!
or to have commas anywhere in the name. See CUSTOM ALIASES in charnames.
\N{BELL}
now refers to U+1F514 instead of U+0007Unicode 6.0 reused the name "BELL" for a different code point than it traditionally had meant. Since Perl v5.14, use of this name still referred to U+0007, but would raise a deprecation warning. Now, "BELL" refers to U+1F514, and the name for U+0007 is "ALERT". All the functions in charnames have been correspondingly updated.
Unicode has now withdrawn their previous recommendation for regular
expressions to automatically handle cases where a single character can
match multiple characters case-insensitively, for example, the letter
LATIN SMALL LETTER SHARP S and the sequence ss
. This is because
it turns out to be impracticable to do this correctly in all
circumstances. Because Perl has tried to do this as best it can, it
will continue to do so. (We are considering an option to turn it off.)
However, a new restriction is being added on such matches when they
occur in [bracketed] character classes. People were specifying
things such as /[\0-\xff]/i
, and being surprised that it matches the
two character sequence ss
(since LATIN SMALL LETTER SHARP S occurs in
this range). This behavior is also inconsistent with using a
property instead of a range: \p{Block=Latin1}
also includes LATIN
SMALL LETTER SHARP S, but /[\p{Block=Latin1}]/i
does not match ss
.
The new rule is that for there to be a multi-character case-insensitive
match within a bracketed character class, the character must be
explicitly listed, and not as an end point of a range. This more
closely obeys the Principle of Least Astonishment. See
Bracketed Character Classes in perlrecharclass. Note that a bug [perl
#89774], now fixed as part of this change, prevented the previous
behavior from working fully.
Due to an oversight, single character variable names in v5.16 were
completely unrestricted. This opened the door to several kinds of
insanity. As of v5.18, these now follow the rules of other identifiers,
in addition to accepting characters that match the \p{POSIX_Punct}
property.
There is no longer any difference in the parsing of identifiers
specified by using braces versus without braces. For instance, perl
used to allow ${foo:bar}
(with a single colon) but not $foo:bar
.
Now that both are handled by a single code path, they are both treated
the same way: both are forbidden. Note that this change is about the
range of permissible literal identifiers, not other expressions.
No one could recall why \s didn't match \cK
, the vertical tab.
Now it does. Given the extreme rarity of that character, very little
breakage is expected. That said, here's what it means:
\s in a regex now matches a vertical tab in all circumstances.
Literal vertical tabs in a regex literal are ignored when the /x
modifier is used.
Leading vertical tabs, alone or mixed with other whitespace, are now ignored when interpreting a string as a number. For example:
/(?{})/
and /(??{})/
have been heavily reworkedThe implementation of this feature has been almost completely rewritten. Although its main intent is to fix bugs, some behaviors, especially related to the scope of lexical variables, will have changed. This is described more fully in the Selected Bug Fixes section.
It is no longer possible to abuse the way the parser parses s///e like
this:
- %_=(_,"Just another ");
- $_="Perl hacker,\n";
- s//_}->{_/e;print
given
now aliases the global $_
Instead of assigning to an implicit lexical $_
, given
now makes the
global $_
an alias for its argument, just like foreach
. However, it
still uses lexical $_
if there is lexical $_
in scope (again, just like
foreach
) [perl #114020].
Smart match, added in v5.10.0 and significantly revised in v5.10.1, has been a regular point of complaint. Although there are a number of ways in which it is useful, it has also proven problematic and confusing for both users and implementors of Perl. There have been a number of proposals on how to best address the problem. It is clear that smartmatch is almost certainly either going to change or go away in the future. Relying on its current behavior is not recommended.
Warnings will now be issued when the parser sees ~~
, given
, or when
.
To disable these warnings, you can add this line to the appropriate scope:
Consider, though, replacing the use of these features, as they may change behavior again before becoming stable.
$_
is now experimentalSince it was introduced in Perl v5.10, it has caused much confusion with no obvious solution:
Various modules (e.g., List::Util) expect callback routines to use the
global $_
. use List::Util 'first'; my $_; first { $_ == 1 } @list
does not work as one would expect.
A my $_
declaration earlier in the same file can cause confusing closure
warnings.
The "_" subroutine prototype character allows called subroutines to access
your lexical $_
, so it is not really private after all.
Nevertheless, subroutines with a "(@)" prototype and methods cannot access
the caller's lexical $_
, unless they are written in XS.
But even XS routines cannot access a lexical $_
declared, not in the
calling subroutine, but in an outer scope, iff that subroutine happened not
to mention $_
or use any operators that default to $_
.
It is our hope that lexical $_
can be rehabilitated, but this may
cause changes in its behavior. Please use it with caution until it
becomes stable.
$/ = \N
now reads N characters, not N bytesPreviously, when reading from a stream with I/O layers such as
encoding
, the readline() function, otherwise known as the <>
operator, would read N bytes from the top-most layer. [perl #79960]
Now, N characters are read instead.
There is no change in behaviour when reading from streams with no extra layers, since bytes map exactly to characters.
glob is now passed one argumentglob overrides used to be passed a magical undocumented second argument
that identified the caller. Nothing on CPAN was using this, and it got in
the way of a bug fix, so it was removed. If you really need to identify
the caller, see Devel::Callsite on CPAN.
The body of a here document inside a quote-like operator now always begins on the line after the "<<foo" marker. Previously, it was documented to begin on the line following the containing quote-like operator, but that was only sometimes the case [perl #114040].
You may no longer write something like:
- m/a/and 1
Instead you must write
- m/a/ and 1
with whitespace separating the operator from the closing delimiter of the regular expression. Not having whitespace has resulted in a deprecation warning since Perl v5.14.0.
qw lists used to fool the parser into thinking they were always
surrounded by parentheses. This permitted some surprising constructions
such as foreach $x qw(a b c) {...}
, which should really be written
foreach $x (qw(a b c)) {...}
. These would sometimes get the lexer into
the wrong state, so they didn't fully work, and the similar foreach qw(a
b c) {...}
that one might expect to be permitted never worked at all.
This side effect of qw has now been abolished. It has been deprecated
since Perl v5.13.11. It is now necessary to use real parentheses
everywhere that the grammar calls for them.
Turning on any lexical warnings used first to disable all default warnings if lexical warnings were not already enabled:
- $*; # deprecation warning
- use warnings "void";
- $#; # void warning; no deprecation warning
Now, the debugging
, deprecated
, glob, inplace
and malloc
warnings
categories are left on when turning on lexical warnings (unless they are
turned off by no warnings
, of course).
This may cause deprecation warnings to occur in code that used to be free of warnings.
Those are the only categories consisting only of default warnings. Default
warnings in other categories are still disabled by use warnings "category"
,
as we do not yet have the infrastructure for controlling
individual warnings.
state sub
and our sub
Due to an accident of history, state sub
and our sub
were equivalent
to a plain sub, so one could even create an anonymous sub with
our sub { ... }
. These are now disallowed outside of the "lexical_subs"
feature. Under the "lexical_subs" feature they have new meanings described
in Lexical Subroutines in perlsub.
A value stored in an environment variable has always been stringified. In this
release, it is converted to be only a byte string. First, it is forced to be
only a string. Then if the string is utf8 and the equivalent of
utf8::downgrade()
works, that result is used; otherwise, the equivalent of
utf8::encode()
is used, and a warning is issued about wide characters
(Diagnostics).
require dies for unreadable filesWhen require encounters an unreadable file, it now dies. It used to
ignore the file and continue searching the directories in @INC
[perl #113422].
gv_fetchmeth_*
and SUPERThe various gv_fetchmeth_*
XS functions used to treat a package whose
named ended with ::SUPER
specially. A method lookup on the Foo::SUPER
package would be treated as a SUPER
method lookup on the Foo
package. This
is no longer the case. To do a SUPER
lookup, pass the Foo
stash and the
GV_SUPER
flag.
split's first argument is more consistently interpretedAfter some changes earlier in v5.17, split's behavior has been
simplified: if the PATTERN argument evaluates to a string
containing one space, it is treated the way that a literal string
containing one space once was.
The following modules will be removed from the core distribution in a future release, and will at that time need to be installed from CPAN. Distributions on CPAN which require these modules will need to list them as prerequisites.
The core versions of these modules will now issue "deprecated"
-category
warnings to alert you to this fact. To silence these deprecation warnings,
install the modules in question from CPAN.
Note that these are (with rare exceptions) fine modules that you are encouraged to continue to use. Their disinclusion from core primarily hinges on their necessity to bootstrapping a fully functional, CPAN-capable Perl installation, not usually on concerns over their design.
The use of this pragma is now strongly discouraged. It conflates the encoding of source text with the encoding of I/O data, reinterprets escape sequences in source text (a questionable choice), and introduces the UTF-8 bug to all runtime handling of character strings. It is broken as designed and beyond repair.
For using non-ASCII literal characters in source text, please refer to utf8. For dealing with textual I/O data, please refer to Encode and open.
CPANPLUS::*
modules
The following utilities will be removed from the core distribution in a future release as their associated modules have been deprecated. They will remain available with the applicable CPAN distribution.
cpanp-run-perl
These items are part of the CPANPLUS
distribution.
This item is part of the Pod::LaTeX
distribution.
This interpreter-global variable used to track the total number of Perl objects in the interpreter. It is no longer maintained and will be removed altogether in Perl v5.20.
/xWhen a regular expression pattern is compiled with /x, Perl treats 6
characters as white space to ignore, such as SPACE and TAB. However,
Unicode recommends 11 characters be treated thusly. We will conform
with this in a future Perl version. In the meantime, use of any of the
missing characters will raise a deprecation warning, unless turned off.
The five characters are:
- U+0085 NEXT LINE
- U+200E LEFT-TO-RIGHT MARK
- U+200F RIGHT-TO-LEFT MARK
- U+2028 LINE SEPARATOR
- U+2029 PARAGRAPH SEPARATOR
A user-defined character name with trailing or multiple spaces in a row is likely a typo. This now generates a warning when defined, on the assumption that uses of it will be unlikely to include the excess whitespace.
All the functions used to classify characters will be removed from a future version of Perl, and should not be used. With participating C compilers (e.g., gcc), compiling any file that uses any of these will generate a warning. These were not intended for public use; there are equivalent, faster, macros for most of them.
See Character classes in perlapi. The complete list is:
is_uni_alnum
, is_uni_alnumc
, is_uni_alnumc_lc
,
is_uni_alnum_lc
, is_uni_alpha
, is_uni_alpha_lc
,
is_uni_ascii
, is_uni_ascii_lc
, is_uni_blank
,
is_uni_blank_lc
, is_uni_cntrl
, is_uni_cntrl_lc
,
is_uni_digit
, is_uni_digit_lc
, is_uni_graph
,
is_uni_graph_lc
, is_uni_idfirst
, is_uni_idfirst_lc
,
is_uni_lower
, is_uni_lower_lc
, is_uni_print
,
is_uni_print_lc
, is_uni_punct
, is_uni_punct_lc
,
is_uni_space
, is_uni_space_lc
, is_uni_upper
,
is_uni_upper_lc
, is_uni_xdigit
, is_uni_xdigit_lc
,
is_utf8_alnum
, is_utf8_alnumc
, is_utf8_alpha
,
is_utf8_ascii
, is_utf8_blank
, is_utf8_char
,
is_utf8_cntrl
, is_utf8_digit
, is_utf8_graph
,
is_utf8_idcont
, is_utf8_idfirst
, is_utf8_lower
,
is_utf8_mark
, is_utf8_perl_space
, is_utf8_perl_word
,
is_utf8_posix_digit
, is_utf8_print
, is_utf8_punct
,
is_utf8_space
, is_utf8_upper
, is_utf8_xdigit
,
is_utf8_xidcont
, is_utf8_xidfirst
.
In addition these three functions that have never worked properly are
deprecated:
to_uni_lower_lc
, to_uni_title_lc
, and to_uni_upper_lc
.
There are three pairs of characters that Perl recognizes as
metacharacters in regular expression patterns: {}
, []
, and ()
.
These can be used as well to delimit patterns, as in:
- m{foo}
- s(foo)(bar)
Since they are metacharacters, they have special meaning to regular expression patterns, and it turns out that you can't turn off that special meaning by the normal means of preceding them with a backslash, if you use them, paired, within a pattern delimited by them. For example, in
- m{foo\{1,3\}}
the backslashes do not change the behavior, and this matches
"f o"
followed by one to three more occurrences of "o"
.
Usages like this, where they are interpreted as metacharacters, are exceedingly rare; we think there are none, for example, in all of CPAN. Hence, this deprecation should affect very little code. It does give notice, however, that any such code needs to change, which will in turn allow us to change the behavior in future Perl versions so that the backslashes do have an effect, and without fear that we are silently breaking any existing code.
(?
and (*
in regular expressionsA deprecation warning is now raised if the ( and ? are separated
by white space or comments in (?...) regular expression constructs.
Similarly, if the ( and *
are separated in (*VERB...)
constructs.
In theory, you can currently build perl without PerlIO. Instead, you'd use a wrapper around stdio or sfio. In practice, this isn't very useful. It's not well tested, and without any support for IO layers or (thus) Unicode, it's not much of a perl. Building without PerlIO will most likely be removed in the next version of perl.
PerlIO supports a stdio
layer if stdio use is desired. Similarly a
sfio layer could be produced in the future, if needed.
Platforms without support infrastructure
Both Windows CE and z/OS have been historically under-maintained, and are
currently neither successfully building nor regularly being smoke tested.
Efforts are underway to change this situation, but it should not be taken for
granted that the platforms are safe and supported. If they do not become
buildable and regularly smoked, support for them may be actively removed in
future releases. If you have an interest in these platforms and you can lend
your time, expertise, or hardware to help support these platforms, please let
the perl development effort know by emailing perl5-porters@perl.org
.
Some platforms that appear otherwise entirely dead are also on the short list for removal between now and v5.20.0:
We also think it likely that current versions of Perl will no longer build AmigaOS, DJGPP, NetWare (natively), OS/2 and Plan 9. If you are using Perl on such a platform and have an interest in ensuring Perl's future on them, please contact us.
We believe that Perl has long been unable to build on mixed endian architectures (such as PDP-11s), and intend to remove any remaining support code. Similarly, code supporting the long umaintained GNU dld will be removed soon if no-one makes themselves known as an active user.
Swapping of $< and $>
Perl has supported the idiom of swapping $< and $> (and likewise $( and $)) to temporarily drop permissions since 5.0, like this:
- ($<, $>) = ($>, $<);
However, this idiom modifies the real user/group id, which can have undesirable side-effects, is no longer useful on any platform perl supports and complicates the implementation of these variables and list assignment in general.
As an alternative, assignment only to $>
is recommended:
- local $> = $<;
See also: Setuid Demystified.
microperl
, long broken and of unclear present purpose, will be removed.
Revamping "\Q"
semantics in double-quotish strings when combined with
other escapes.
There are several bugs and inconsistencies involving combinations
of \Q
and escapes like \x
, \L
, etc., within a \Q...\E
pair.
These need to be fixed, and doing so will necessarily change current
behavior. The changes have not yet been settled.
Use of $x
, where x
stands for any actual (non-printing) C0 control
character will be disallowed in a future Perl version. Use ${x}
instead (where again x
stands for a control character),
or better, $^A
, where ^ is a caret (CIRCUMFLEX ACCENT),
and A
stands for any of the characters listed at the end of
OPERATOR DIFFERENCES in perlebcdic.
Lists of lexical variable declarations (my($x, $y)
) are now optimised
down to a single op and are hence faster than before.
A new C preprocessor define NO_TAINT_SUPPORT
was added that, if set,
disables Perl's taint support altogether. Using the -T or -t command
line flags will cause a fatal error. Beware that both core tests as
well as many a CPAN distribution's tests will fail with this change. On
the upside, it provides a small performance benefit due to reduced
branching.
Do not enable this unless you know exactly what you are getting yourself into.
pack with constant arguments is now constant folded in most cases
[perl #113470].
Speed up in regular expression matching against Unicode properties. The
largest gain is for \X
, the Unicode "extended grapheme cluster." The
gain for it is about 35% - 40%. Bracketed character classes, e.g.,
[0-9\x{100}]
containing code points above 255 are also now faster.
On platforms supporting it, several former macros are now implemented as static inline functions. This should speed things up slightly on non-GCC platforms.
The optimisation of hashes in boolean context has been extended to
affect scalar(%hash), %hash ? ... : ...
, and sub { %hash || ... }
.
Filetest operators manage the stack in a fractionally more efficient manner.
Globs used in a numeric context are now numified directly in most cases, rather than being numified via stringification.
The x
repetition operator is now folded to a single constant at compile
time if called in scalar context with constant operands and no parentheses
around the left operand.
Config::Perl::V version 0.16 has been added as a dual-lifed module.
It provides structured data retrieval of perl -V
output including
information only known to the perl
binary and not available via Config.
For a complete list of updates, run:
- $ corelist --diff 5.16.0 5.18.0
You can substitute your favorite version in place of 5.16.0
, too.
Archive::Extract has been upgraded to 0.68.
Work around an edge case on Linux with Busybox's unzip.
Archive::Tar has been upgraded to 1.90.
ptar now supports the -T option as well as dashless options [rt.cpan.org #75473], [rt.cpan.org #75475].
Auto-encode filenames marked as UTF-8 [rt.cpan.org #75474].
Don't use tell on IO::Zlib handles [rt.cpan.org #64339].
Don't try to chown on symlinks.
autodie has been upgraded to 2.13.
autodie
now plays nicely with the 'open' pragma.
B has been upgraded to 1.42.
The stashoff
method of COPs has been added. This provides access to an
internal field added in perl 5.16 under threaded builds [perl #113034].
B::COP::stashpv
now supports UTF-8 package names and embedded NULs.
All CVf_*
and GVf_*
and more SV-related flag values are now provided as constants in the B::
namespace and available for export. The default export list has not changed.
This makes the module work with the new pad API.
B::Concise has been upgraded to 0.95.
The -nobanner
option has been fixed, and formats can now be dumped.
When passed a sub name to dump, it will check also to see whether it
is the name of a format. If a sub and a format share the same name,
it will dump both.
This adds support for the new OpMAYBE_TRUEBOOL
and OPpTRUEBOOL
flags.
B::Debug has been upgraded to 1.18.
This adds support (experimentally) for B::PADLIST
, which was
added in Perl 5.17.4.
B::Deparse has been upgraded to 1.20.
Avoid warning when run under perl -w
.
It now deparses
loop controls with the correct precedence, and multiple statements in a
format line are also now deparsed correctly.
This release suppresses trailing semicolons in formats.
This release adds stub deparsing for lexical subroutines.
It no longer dies when deparsing sort without arguments. It now
correctly omits the comma for system $prog @args
and exec $prog
@args
.
bignum, bigint and bigrat have been upgraded to 0.33.
The overrides for hex and oct have been rewritten, eliminating
several problems, and making one incompatible change:
Formerly, whichever of use bigint
or use bigrat
was compiled later
would take precedence over the other, causing hex and oct not to
respect the other pragma when in scope.
Using any of these three pragmata would cause hex and oct anywhere
else in the program to evalute their arguments in list context and prevent
them from inferring $_ when called without arguments.
Using any of these three pragmata would make oct("1234") return 1234
(for any number not beginning with 0) anywhere in the program. Now "1234"
is translated from octal to decimal, whether within the pragma's scope or
not.
The global overrides that facilitate lexical use of hex and oct now
respect any existing overrides that were in place before the new overrides
were installed, falling back to them outside of the scope of use bignum
.
use bignum "hex"
, use bignum "oct"
and similar invocations for bigint
and bigrat now export a hex or oct function, instead of providing a
global override.
Carp has been upgraded to 1.29.
Carp is no longer confused when caller returns undef for a package that
has been deleted.
The longmess()
and shortmess()
functions are now documented.
CGI has been upgraded to 3.63.
Unrecognized HTML escape sequences are now handled better, problematic
trailing newlines are no longer inserted after <form> tags
by startform()
or start_form()
, and bogus "Insecure Dependency"
warnings appearing with some versions of perl are now worked around.
Class::Struct has been upgraded to 0.64.
The constructor now respects overridden accessor methods [perl #29230].
Compress::Raw::Bzip2 has been upgraded to 2.060.
The misuse of Perl's "magic" API has been fixed.
Compress::Raw::Zlib has been upgraded to 2.060.
Upgrade bundled zlib to version 1.2.7.
Fix build failures on Irix, Solaris, and Win32, and also when building as C++ [rt.cpan.org #69985], [rt.cpan.org #77030], [rt.cpan.org #75222].
The misuse of Perl's "magic" API has been fixed.
compress()
, uncompress()
, memGzip()
and memGunzip()
have
been speeded up by making parameter validation more efficient.
CPAN::Meta::Requirements has been upgraded to 2.122.
Treat undef requirements to from_string_hash
as 0 (with a warning).
Added requirements_for_module
method.
CPANPLUS has been upgraded to 0.9135.
Allow adding blib/script to PATH.
Save the history between invocations of the shell.
Handle multiple makemakerargs
and makeflags
arguments better.
This resolves issues with the SQLite source engine.
Data::Dumper has been upgraded to 2.145.
It has been optimized to only build a seen-scalar hash as necessary, thereby speeding up serialization drastically.
Additional tests were added in order to improve statement, branch, condition and subroutine coverage. On the basis of the coverage analysis, some of the internals of Dumper.pm were refactored. Almost all methods are now documented.
DB_File has been upgraded to 1.827.
The main Perl module no longer uses the "@_"
construct.
Devel::Peek has been upgraded to 1.11.
This fixes compilation with C++ compilers and makes the module work with the new pad API.
Digest::MD5 has been upgraded to 2.52.
Fix Digest::Perl::MD5
OO fallback [rt.cpan.org #66634].
Digest::SHA has been upgraded to 5.84.
This fixes a double-free bug, which might have caused vulnerabilities in some cases.
DynaLoader has been upgraded to 1.18.
This is due to a minor code change in the XS for the VMS implementation.
This fixes warnings about using CODE
sections without an OUTPUT
section.
Encode has been upgraded to 2.49.
The Mac alias x-mac-ce has been added, and various bugs have been fixed in Encode::Unicode, Encode::UTF7 and Encode::GSM0338.
Env has been upgraded to 1.04.
Its SPLICE implementation no longer misbehaves in list context.
ExtUtils::CBuilder has been upgraded to 0.280210.
Manifest files are now correctly embedded for those versions of VC++ which make use of them. [perl #111782, #111798].
A list of symbols to export can now be passed to link() when on
Windows, as on other OSes [perl #115100].
ExtUtils::ParseXS has been upgraded to 3.18.
The generated C code now avoids unnecessarily incrementing
PL_amagic_generation
on Perl versions where it's done automatically
(or on current Perl where the variable no longer exists).
This avoids a bogus warning for initialised XSUB non-parameters [perl #112776].
File::Copy has been upgraded to 2.26.
copy()
no longer zeros files when copying into the same directory,
and also now fails (as it has long been documented to do) when attempting
to copy a file over itself.
File::DosGlob has been upgraded to 1.10.
The internal cache of file names that it keeps for each caller is now
freed when that caller is freed. This means
use File::DosGlob 'glob'; eval 'scalar <*>'
no longer leaks memory.
File::Fetch has been upgraded to 0.38.
Added the 'file_default' option for URLs that do not have a file component.
Use File::HomeDir
when available, and provide PERL5_CPANPLUS_HOME
to
override the autodetection.
Always re-fetch CHECKSUMS if fetchdir
is set.
File::Find has been upgraded to 1.23.
This fixes inconsistent unixy path handling on VMS.
Individual files may now appear in list of directories to be searched [perl #59750].
File::Glob has been upgraded to 1.20.
File::Glob has had exactly the same fix as File::DosGlob. Since it is
what Perl's own glob operator itself uses (except on VMS), this means
eval 'scalar <*>'
no longer leaks.
A space-separated list of patterns return long lists of results no longer results in memory corruption or crashes. This bug was introduced in Perl 5.16.0. [perl #114984]
File::Spec::Unix has been upgraded to 3.40.
abs2rel
could produce incorrect results when given two relative paths or
the root directory twice [perl #111510].
File::stat has been upgraded to 1.07.
File::stat
ignores the filetest pragma, and warns when used in
combination therewith. But it was not warning for -r
. This has been
fixed [perl #111640].
-p
now works, and does not return false for pipes [perl #111638].
Previously File::stat
's overloaded -x
and -X operators did not give
the correct results for directories or executable files when running as
root. They had been treating executable permissions for root just like for
any other user, performing group membership tests etc for files not owned
by root. They now follow the correct Unix behaviour - for a directory they
are always true, and for a file if any of the three execute permission bits
are set then they report that root can execute the file. Perl's builtin
-x
and -X operators have always been correct.
File::Temp has been upgraded to 0.23
Fixes various bugs involving directory removal. Defers unlinking tempfiles if the initial unlink fails, which fixes problems on NFS.
GDBM_File has been upgraded to 1.15.
The undocumented optional fifth parameter to TIEHASH
has been
removed. This was intended to provide control of the callback used by
gdbm*
functions in case of fatal errors (such as filesystem problems),
but did not work (and could never have worked). No code on CPAN even
attempted to use it. The callback is now always the previous default,
croak
. Problems on some platforms with how the C
croak
function
is called have also been resolved.
Hash::Util has been upgraded to 0.15.
hash_unlocked
and hashref_unlocked
now returns true if the hash is
unlocked, instead of always returning false [perl #112126].
hash_unlocked
, hashref_unlocked
, lock_hash_recurse
and
unlock_hash_recurse
are now exportable [perl #112126].
Two new functions, hash_locked
and hashref_locked
, have been added.
Oddly enough, these two functions were already exported, even though they
did not exist [perl #112126].
HTTP::Tiny has been upgraded to 0.025.
Add SSL verification features [github #6], [github #9].
Include the final URL in the response hashref.
Add local_address
option.
This improves SSL support.
IO has been upgraded to 1.28.
sync()
can now be called on read-only file handles [perl #64772].
IO::Socket tries harder to cache or otherwise fetch socket information.
IPC::Cmd has been upgraded to 0.80.
Use POSIX::_exit
instead of exit in run_forked
[rt.cpan.org #76901].
IPC::Open3 has been upgraded to 1.13.
The open3()
function no longer uses POSIX::close()
to close file
descriptors since that breaks the ref-counting of file descriptors done by
PerlIO in cases where the file descriptors are shared by PerlIO streams,
leading to attempts to close the file descriptors a second time when
any such PerlIO streams are closed later on.
Locale::Codes has been upgraded to 3.25.
It includes some new codes.
Memoize has been upgraded to 1.03.
Fix the MERGE
cache option.
Module::Build has been upgraded to 0.4003.
Fixed bug where modules without $VERSION
might have a version of '0' listed
in 'provides' metadata, which will be rejected by PAUSE.
Fixed bug in PodParser to allow numerals in module names.
Fixed bug where giving arguments twice led to them becoming arrays, resulting in install paths like ARRAY(0xdeadbeef)/lib/Foo.pm.
A minor bug fix allows markup to be used around the leading "Name" in a POD "abstract" line, and some documentation improvements have been made.
Module::CoreList has been upgraded to 2.90
Version information is now stored as a delta, which greatly reduces the size of the CoreList.pm file.
This restores compatibility with older versions of perl and cleans up the corelist data for various modules.
Module::Load::Conditional has been upgraded to 0.54.
Fix use of requires
on perls installed to a path with spaces.
Various enhancements include the new use of Module::Metadata.
Module::Metadata has been upgraded to 1.000011.
The creation of a Module::Metadata object for a typical module file has
been sped up by about 40%, and some spurious warnings about $VERSION
s
have been suppressed.
Module::Pluggable has been upgraded to 4.7.
Amongst other changes, triggers are now allowed on events, which gives a powerful way to modify behaviour.
Net::Ping has been upgraded to 2.41.
This fixes some test failures on Windows.
Opcode has been upgraded to 1.25.
Reflect the removal of the boolkeys opcode and the addition of the clonecv, introcv and padcv opcodes.
overload has been upgraded to 1.22.
no overload
now warns for invalid arguments, just like use overload
.
PerlIO::encoding has been upgraded to 0.16.
This is the module implementing the ":encoding(...)" I/O layer. It no longer corrupts memory or crashes when the encoding back-end reallocates the buffer or gives it a typeglob or shared hash key scalar.
PerlIO::scalar has been upgraded to 0.16.
The buffer scalar supplied may now only contain code pounts 0xFF or lower. [perl #109828]
Perl::OSType has been upgraded to 1.003.
This fixes a bug detecting the VOS operating system.
Pod::Html has been upgraded to 1.18.
The option --libpods
has been reinstated. It is deprecated, and its use
does nothing other than issue a warning that it is no longer supported.
Since the HTML files generated by pod2html claim to have a UTF-8 charset, actually write the files out using UTF-8 [perl #111446].
Pod::Simple has been upgraded to 3.28.
Numerous improvements have been made, mostly to Pod::Simple::XHTML,
which also has a compatibility change: the codes_in_verbatim
option
is now disabled by default. See cpan/Pod-Simple/ChangeLog for the
full details.
re has been upgraded to 0.23
Single character [class]es like /[s]/
or /[s]/i
are now optimized
as if they did not have the brackets, i.e. /s/
or /s/i
.
See note about op_comp
in the Internal Changes section below.
Safe has been upgraded to 2.35.
Fix interactions with Devel::Cover
.
Don't eval code under no strict
.
Scalar::Util has been upgraded to version 1.27.
Fix an overloading issue with sum
.
first
and reduce
now check the callback first (so &first(1)
is
disallowed).
Fix tainted
on magical values [rt.cpan.org #55763].
Fix sum
on previously magical values [rt.cpan.org #61118].
Fix reading past the end of a fixed buffer [rt.cpan.org #72700].
Search::Dict has been upgraded to 1.07.
No longer require stat on filehandles.
Use fc for casefolding.
Socket has been upgraded to 2.009.
Constants and functions required for IP multicast source group membership have been added.
unpack_sockaddr_in()
and unpack_sockaddr_in6()
now return just the IP
address in scalar context, and inet_ntop()
now guards against incorrect
length scalars being passed in.
This fixes an uninitialized memory read.
Storable has been upgraded to 2.41.
Modifying $_[0]
within STORABLE_freeze
no longer results in crashes
[perl #112358].
An object whose class implements STORABLE_attach
is now thawed only once
when there are multiple references to it in the structure being thawed
[perl #111918].
Restricted hashes were not always thawed correctly [perl #73972].
Storable would croak when freezing a blessed REF object with a
STORABLE_freeze()
method [perl #113880].
It can now freeze and thaw vstrings correctly. This causes a slight incompatible change in the storage format, so the format version has increased to 2.9.
This contains various bugfixes, including compatibility fixes for older versions of Perl and vstring handling.
Sys::Syslog has been upgraded to 0.32.
This contains several bug fixes relating to getservbyname(),
setlogsock()
and log levels in syslog()
, together with fixes for
Windows, Haiku-OS and GNU/kFreeBSD. See cpan/Sys-Syslog/Changes
for the full details.
Term::ANSIColor has been upgraded to 4.02.
Add support for italics.
Improve error handling.
Term::ReadLine has been upgraded to 1.10. This fixes the use of the cpan and cpanp shells on Windows in the event that the current drive happens to contain a \dev\tty file.
Test::Harness has been upgraded to 3.26.
Fix glob semantics on Win32 [rt.cpan.org #49732].
Don't use Win32::GetShortPathName
when calling perl [rt.cpan.org #47890].
Ignore -T when reading shebang [rt.cpan.org #64404].
Handle the case where we don't know the wait status of the test more gracefully.
Make the test summary 'ok' line overridable so that it can be changed to a plugin to make the output of prove idempotent.
Don't run world-writable files.
Text::Tabs and Text::Wrap have been upgraded to 2012.0818. Support for Unicode combining characters has been added to them both.
threads::shared has been upgraded to 1.31.
This adds the option to warn about or ignore attempts to clone structures that can't be cloned, as opposed to just unconditionally dying in that case.
This adds support for dual-valued values as created by Scalar::Util::dualvar.
Tie::StdHandle has been upgraded to 4.3.
READ
now respects the offset argument to read [perl #112826].
Time::Local has been upgraded to 1.2300.
Seconds values greater than 59 but less than 60 no longer cause
timegm()
and timelocal()
to croak.
Unicode::UCD has been upgraded to 0.53.
This adds a function all_casefolds() that returns all the casefolds.
Win32 has been upgraded to 0.47.
New APIs have been added for getting and setting the current code page.
Version::Requirements has been removed from the core distribution. It is available under a different name: CPAN::Meta::Requirements.
perlcheat has been reorganized, and a few new sections were added.
Now explicitly documents the behaviour of hash initializer lists that contain duplicate keys.
The explanation of symbolic references being prevented by "strict refs" now doesn't assume that the reader knows what symbolic references are.
perlfaq has been synchronized with version 5.0150040 from CPAN.
Loop control verbs (dump, goto, next, last and redo) have always
had the same precedence as assignment operators, but this was not documented
until now.
The following additions or changes have been made to diagnostic output, including warnings and fatal error messages. For the complete list of diagnostic messages, see perldiag.
Unterminated delimiter for here document
This message now occurs when a here document label has an initial quotation mark but the final quotation mark is missing.
This replaces a bogus and misleading error message about not finding the label itself [perl #114104].
panic: child pseudo-process was never scheduled
This error is thrown when a child pseudo-process in the ithreads implementation on Windows was not scheduled within the time period allowed and therefore was not able to initialize properly [perl #88840].
Group name must start with a non-digit word character in regex; marked by <-- HERE in m/%s/
This error has been added for (?&0)
, which is invalid. It used to
produce an incomprehensible error message [perl #101666].
Can't use an undefined value as a subroutine reference
Calling an undefined value as a subroutine now produces this error message. It used to, but was accidentally disabled, first in Perl 5.004 for non-magical variables, and then in Perl v5.14 for magical (e.g., tied) variables. It has now been restored. In the mean time, undef was treated as an empty string [perl #113576].
Experimental %s subs not enabled
To use lexical subs, you must first enable them:
'Strings with code points over 0xFF may not be mapped into in-memory file handles'
'Trailing white-space in a charnames alias definition is deprecated'
'A sequence of multiple spaces in a charnames alias definition is deprecated'
Subroutine &%s is not available
(W closure) During compilation, an inner named subroutine or eval is attempting to capture an outer lexical subroutine that is not currently available. This can happen for one of two reasons. First, the lexical subroutine may be declared in an outer anonymous subroutine that has not yet been created. (Remember that named subs are created at compile time, while anonymous subs are created at run-time.) For example,
At the time that f is created, it can't capture the current the "a" sub, since the anonymous subroutine hasn't been created yet. Conversely, the following won't give a warning since the anonymous subroutine has by now been created and is live:
The second situation is caused by an eval accessing a variable that has gone out of scope, for example,
Here, when the '\&a' in the eval is being compiled, f() is not currently being executed, so its &a is not available for capture.
%s subroutine &%s masks earlier declaration in same %s
(W misc) A "my" or "state" subroutine has been redeclared in the current scope or statement, effectively eliminating all access to the previous instance. This is almost always a typographical error. Note that the earlier subroutine will still exist until the end of the scope or until all closure references to it are destroyed.
The %s feature is experimental
(S experimental) This warning is emitted if you enable an experimental
feature via use feature
. Simply suppress the warning if you want
to use the feature, but know that in doing so you are taking the risk
of using an experimental feature which may change or be removed in a
future Perl version:
(W overflow) You called sleep with a number that was larger than it can
reliably handle and sleep probably slept for less time than requested.
Attempts to put wide characters into environment variables via %ENV
now
provoke this warning.
"Invalid negative number (%s) in chr"
chr() now warns when passed a negative value [perl #83048].
srand() now warns when passed a value that doesn't fit in a UV
(since the
value will be truncated rather than overflowing) [perl #40605].
"-i used with no filenames on the command line, reading from STDIN"
Running perl with the -i
flag now warns if no input files are provided on
the command line [perl #113410].
The warning that use of $*
and $#
is no longer supported is now
generated for every location that references them. Previously it would fail
to be generated if another variable using the same typeglob was seen first
(e.g. @*
before $*
), and would not be generated for the second and
subsequent uses. (It's hard to fix the failure to generate warnings at all
without also generating them every time, and warning every time is
consistent with the warnings that $[
used to generate.)
The warnings for \b{
and \B{
were added. They are a deprecation
warning which should be turned off by that category. One should not
have to turn off regular regexp warnings as well to get rid of these.
Constant(%s): Call to &{$^H{%s}} did not return a defined value
Constant overloading that returns undef results in this error message.
For numeric constants, it used to say "Constant(undef)". "undef" has been
replaced with the number itself.
The error produced when a module cannot be loaded now includes a hint that the module may need to be installed: "Can't locate hopping.pm in @INC (you may need to install the hopping module) (@INC contains: ...)"
vector argument not supported with alpha versions
This warning was not suppressable, even with no warnings
. Now it is
suppressible, and has been moved from the "internal" category to the
"printf" category.
Can't do {n,m} with n > m in regex; marked by <-- HERE in m/%s/
This fatal error has been turned into a warning that reads:
Quantifier {n,m} with n > m can't match in regex
(W regexp) Minima should be less than or equal to maxima. If you really want your regexp to match something 0 times, just put {0}.
The "Runaway prototype" warning that occurs in bizarre cases has been removed as being unhelpful and inconsistent.
The "Not a format reference" error has been removed, as the only case in which it could be triggered was a bug.
The "Unable to create sub named %s" error has been removed for the same reason.
The 'Can't use "my %s" in sort comparison' error has been downgraded to a warning, '"my %s" used in sort comparison' (with 'state' instead of 'my' for state variables). In addition, the heuristics for guessing whether lexical $a or $b has been misused have been improved to generate fewer false positives. Lexical $a and $b are no longer disallowed if they are outside the sort block. Also, a named unary or list operator inside the sort block no longer causes the $a or $b to be ignored [perl #86136].
h2xs no longer produces invalid code for empty defines. [perl #20636]
Added useversionedarchname
option to Configure
When set, it includes 'api_versionstring' in 'archname'. E.g. x86_64-linux-5.13.6-thread-multi. It is unset by default.
This feature was requested by Tim Bunce, who observed that
INSTALL_BASE
creates a library structure that does not
differentiate by perl version. Instead, it places architecture
specific files in "$install_base/lib/perl5/$archname". This makes
it difficult to use a common INSTALL_BASE
library path with
multiple versions of perl.
By setting -Duseversionedarchname
, the $archname will be
distinct for architecture and API version, allowing mixed use of
INSTALL_BASE
.
Add a PERL_NO_INLINE_FUNCTIONS
option
If PERL_NO_INLINE_FUNCTIONS
is defined, don't include "inline.h"
This permits test code to include the perl headers for definitions without creating a link dependency on the perl library (which may not exist yet).
Configure will honour the external MAILDOMAIN
environment variable, if set.
installman
no longer ignores the silent option
Both META.yml
and META.json
files are now included in the distribution.
Configure will now correctly detect isblank()
when compiling with a C++
compiler.
The pager detection in Configure has been improved to allow responses which specify options after the program name, e.g. /usr/bin/less -R, if the user accepts the default value. This helps perldoc when handling ANSI escapes [perl #72156].
The test suite now has a section for tests that require very large amounts
of memory. These tests won't run by default; they can be enabled by
setting the PERL_TEST_MEMORY
environment variable to the number of
gibibytes of memory that may be safely used.
BeOS was an operating system for personal computers developed by Be Inc, initially for their BeBox hardware. The OS Haiku was written as an open source replacement for/continuation of BeOS, and its perl port is current and actively maintained.
Support code relating to UTS global has been removed. UTS was a mainframe version of System V created by Amdahl, subsequently sold to UTS Global. The port has not been touched since before Perl v5.8.0, and UTS Global is now defunct.
Support for VM/ESA has been removed. The port was tested on 2.3.0, which IBM ended service on in March 2002. 2.4.0 ended service in June 2003, and was superseded by Z/VM. The current version of Z/VM is V6.2.0, and scheduled for end of service on 2015/04/30.
Support for MPE/IX has been removed.
Support code relating to EPOC has been removed. EPOC was a family of operating systems developed by Psion for mobile devices. It was the predecessor of Symbian. The port was last updated in April 2002.
Support for Rhapsody has been removed.
Configure now always adds -qlanglvl=extc99
to the CC flags on AIX when
using xlC. This will make it easier to compile a number of XS-based modules
that assume C99 [perl #113778].
There is now a workaround for a compiler bug that prevented compiling with clang++ since Perl v5.15.7 [perl #112786].
When compiling the Perl core as C++ (which is only semi-supported), the
mathom functions are now compiled as extern "C"
, to ensure proper
binary compatibility. (However, binary compatibility isn't generally
guaranteed anyway in the situations where this would matter.)
Stop hardcoding an alignment on 8 byte boundaries to fix builds using -Dusemorebits.
Perl should now work out of the box on Haiku R1 Alpha 4.
libc_r
was removed from recent versions of MidnightBSD and older versions
work better with pthread
. Threading is now enabled using pthread
which
corrects build errors with threading enabled on 0.4-CURRENT.
In Configure, avoid running sed commands with flags not supported on Solaris.
Where possible, the case of filenames and command-line arguments is now
preserved by enabling the CRTL features DECC$EFS_CASE_PRESERVE
and
DECC$ARGV_PARSE_STYLE
at start-up time. The latter only takes effect
when extended parse is enabled in the process from which Perl is run.
The character set for Extended Filename Syntax (EFS) is now enabled by default
on VMS. Among other things, this provides better handling of dots in directory
names, multiple dots in filenames, and spaces in filenames. To obtain the old
behavior, set the logical name DECC$EFS_CHARSET
to DISABLE
.
Fixed linking on builds configured with -Dusemymalloc=y
.
Experimental support for building Perl with the HP C++ compiler is available
by configuring with -Dusecxx
.
All C header files from the top-level directory of the distribution are now installed on VMS, providing consistency with a long-standing practice on other platforms. Previously only a subset were installed, which broke non-core extension builds for extensions that depended on the missing include files.
Quotes are now removed from the command verb (but not the parameters) for
commands spawned via system, backticks, or a piped open. Previously,
quotes on the verb were passed through to DCL, which would fail to recognize
the command. Also, if the verb is actually a path to an image or command
procedure on an ODS-5 volume, quoting it now allows the path to contain spaces.
The a2p build has been fixed for the HP C++ compiler on OpenVMS.
Perl can now be built using Microsoft's Visual C++ 2012 compiler by specifying CCTYPE=MSVC110 (or MSVC110FREE if you are using the free Express edition for Windows Desktop) in win32/Makefile.
The option to build without USE_SOCKETS_AS_HANDLES
has been removed.
Fixed a problem where perl could crash while cleaning up threads (including the main thread) in threaded debugging builds on Win32 and possibly other platforms [perl #114496].
A rare race condition that would lead to sleep taking more time than requested, and possibly even hanging, has been fixed [perl #33096].
link on Win32 now attempts to set $!
to more appropriate values
based on the Win32 API error code. [perl #112272]
Perl no longer mangles the environment block, e.g. when launching a new
sub-process, when the environment contains non-ASCII characters. Known
problems still remain, however, when the environment contains characters
outside of the current ANSI codepage (e.g. see the item about Unicode in
%ENV
in http://perl5.git.perl.org/perl.git/blob/HEAD:/Porting/todo.pod).
[perl #113536]
Building perl with some Windows compilers used to fail due to a problem
with miniperl's glob operator (which uses the perlglob
program)
deleting the PATH environment variable [perl #113798].
A new makefile option, USE_64_BIT_INT
, has been added to the Windows
makefiles. Set this to "define" when building a 32-bit perl if you want
it to use 64-bit integers.
Machine code size reductions, already made to the DLLs of XS modules in Perl v5.17.2, have now been extended to the perl DLL itself.
Building with VC++ 6.0 was inadvertently broken in Perl v5.17.2 but has now been fixed again.
Building on WinCE is now possible once again, although more work is required to fully restore a clean build.
Synonyms for the misleadingly named av_len()
have been created:
av_top_index()
and av_tindex
. All three of these return the
number of the highest index in the array, not the number of elements it
contains.
SvUPGRADE() is no longer an expression. Originally this macro (and its underlying function, sv_upgrade()) were documented as boolean, although in reality they always croaked on error and never returned false. In 2005 the documentation was updated to specify a void return value, but SvUPGRADE() was left always returning 1 for backwards compatibility. This has now been removed, and SvUPGRADE() is now a statement with no return value.
So this is now a syntax error:
- if (!SvUPGRADE(sv)) { croak(...); }
If you have code like that, simply replace it with
- SvUPGRADE(sv);
or to avoid compiler warnings with older perls, possibly
- (void)SvUPGRADE(sv);
Perl has a new copy-on-write mechanism that allows any SvPOK scalar to be upgraded to a copy-on-write scalar. A reference count on the string buffer is stored in the string buffer itself. This feature is not enabled by default.
It can be enabled in a perl build by running Configure with -Accflags=-DPERL_NEW_COPY_ON_WRITE, and we would encourage XS authors to try their code with such an enabled perl, and provide feedback. Unfortunately, there is not yet a good guide to updating XS code to cope with COW. Until such a document is available, consult the perl5-porters mailing list.
It breaks a few XS modules by allowing copy-on-write scalars to go through code paths that never encountered them before.
Copy-on-write no longer uses the SvFAKE and SvREADONLY flags. Hence, SvREADONLY indicates a true read-only SV.
Use the SvIsCOW macro (as before) to identify a copy-on-write scalar.
PL_glob_index
is gone.
The private Perl_croak_no_modify has had its context parameter removed. It is now has a void prototype. Users of the public API croak_no_modify remain unaffected.
Copy-on-write (shared hash key) scalars are no longer marked read-only.
SvREADONLY
returns false on such an SV, but SvIsCOW
still returns
true.
A new op type, OP_PADRANGE
has been introduced. The perl peephole
optimiser will, where possible, substitute a single padrange op for a
pushmark followed by one or more pad ops, and possibly also skipping list
and nextstate ops. In addition, the op can carry out the tasks associated
with the RHS of a my(...) = @_
assignment, so those ops may be optimised
away too.
Case-insensitive matching inside a [bracketed] character class with a multi-character fold no longer excludes one of the possibilities in the circumstances that it used to. [perl #89774].
PL_formfeed
has been removed.
The regular expression engine no longer reads one byte past the end of the target string. While for all internally well-formed scalars this should never have been a problem, this change facilitates clever tricks with string buffers in CPAN modules. [perl #73542]
Inside a BEGIN block, PL_compcv
now points to the currently-compiling
subroutine, rather than the BEGIN block itself.
mg_length
has been deprecated.
sv_len
now always returns a byte count and sv_len_utf8
a character
count. Previously, sv_len
and sv_len_utf8
were both buggy and would
sometimes returns bytes and sometimes characters. sv_len_utf8
no longer
assumes that its argument is in UTF-8. Neither of these creates UTF-8 caches
for tied or overloaded values or for non-PVs any more.
sv_mortalcopy
now copies string buffers of shared hash key scalars when
called from XS modules [perl #79824].
RXf_SPLIT
and RXf_SKIPWHITE
are no longer used. They are now
#defined as 0.
The new RXf_MODIFIES_VARS
flag can be set by custom regular expression
engines to indicate that the execution of the regular expression may cause
variables to be modified. This lets s/// know to skip certain
optimisations. Perl's own regular expression engine sets this flag for the
special backtracking verbs that set $REGMARK and $REGERROR.
The APIs for accessing lexical pads have changed considerably.
PADLIST
s are now longer AV
s, but their own type instead.
PADLIST
s now contain a PAD
and a PADNAMELIST
of PADNAME
s,
rather than AV
s for the pad and the list of pad names. PAD
s,
PADNAMELIST
s, and PADNAME
s are to be accessed as such through the
newly added pad API instead of the plain AV
and SV
APIs. See
perlapi for details.
In the regex API, the numbered capture callbacks are passed an index
indicating what match variable is being accessed. There are special
index values for the $`, $&, $&
variables. Previously the same three
values were used to retrieve ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}
too, but these have now been assigned three separate values. See
Numbered capture callbacks in perlreapi.
PL_sawampersand
was previously a boolean indicating that any of
$`, $&, $&
had been seen; it now contains three one-bit flags
indicating the presence of each of the variables individually.
The CV *
typemap entry now supports &{}
overloading and typeglobs,
just like &{...}
[perl #96872].
The SVf_AMAGIC
flag to indicate overloading is now on the stash, not the
object. It is now set automatically whenever a method or @ISA changes, so
its meaning has changed, too. It now means "potentially overloaded". When
the overload table is calculated, the flag is automatically turned off if
there is no overloading, so there should be no noticeable slowdown.
The staleness of the overload tables is now checked when overload methods
are invoked, rather than during bless.
"A" magic is gone. The changes to the handling of the SVf_AMAGIC
flag
eliminate the need for it.
PL_amagic_generation
has been removed as no longer necessary. For XS
modules, it is now a macro alias to PL_na
.
The fallback overload setting is now stored in a stash entry separate from overloadedness itself.
The character-processing code has been cleaned up in places. The changes should be operationally invisible.
The study function was made a no-op in v5.16. It was simply disabled via
a return statement; the code was left in place. Now the code supporting
what study used to do has been removed.
Under threaded perls, there is no longer a separate PV allocated for every
COP to store its package name (cop->stashpv
). Instead, there is an
offset (cop->stashoff
) into the new PL_stashpad
array, which
holds stash pointers.
In the pluggable regex API, the regexp_engine
struct has acquired a new
field op_comp
, which is currently just for perl's internal use, and
should be initialized to NULL by other regex plugin modules.
A new function alloccopstash
has been added to the API, but is considered
experimental. See perlapi.
Perl used to implement get magic in a way that would sometimes hide bugs in code that could call mg_get() too many times on magical values. This hiding of errors no longer occurs, so long-standing bugs may become visible now. If you see magic-related errors in XS code, check to make sure it, together with the Perl API functions it uses, calls mg_get() only once on SvGMAGICAL() values.
OP allocation for CVs now uses a slab allocator. This simplifies memory management for OPs allocated to a CV, so cleaning up after a compilation error is simpler and safer [perl #111462][perl #112312].
PERL_DEBUG_READONLY_OPS
has been rewritten to work with the new slab
allocator, allowing it to catch more violations than before.
The old slab allocator for ops, which was only enabled for PERL_IMPLICIT_SYS
and PERL_DEBUG_READONLY_OPS
, has been retired.
Here document terminators no longer require a terminating newline character when they occur at the end of a file. This was already the case at the end of a string eval [perl #65838].
-DPERL_GLOBAL_STRUCT
builds now free the global struct after
they've finished using it.
A trailing '/' on a path in @INC will no longer have an additional '/' appended.
The :crlf
layer now works when unread data doesn't fit into its own
buffer. [perl #112244].
ungetc()
now handles UTF-8 encoded data. [perl #116322].
A bug in the core typemap caused any C types that map to the T_BOOL core typemap entry to not be set, updated, or modified when the T_BOOL variable was used in an OUTPUT: section with an exception for RETVAL. T_BOOL in an INPUT: section was not affected. Using a T_BOOL return type for an XSUB (RETVAL) was not affected. A side effect of fixing this bug is, if a T_BOOL is specified in the OUTPUT: section (which previous did nothing to the SV), and a read only SV (literal) is passed to the XSUB, croaks like "Modification of a read-only value attempted" will happen. [perl #115796]
On many platforms, providing a directory name as the script name caused perl to do nothing and report success. It should now universally report an error and exit nonzero. [perl #61362]
sort {undef} ...
under fatal warnings no longer crashes. It had
begun crashing in Perl v5.16.
Stashes blessed into each other
(bless \%Foo::, 'Bar'; bless \%Bar::, 'Foo'
) no longer result in double
frees. This bug started happening in Perl v5.16.
Numerous memory leaks have been fixed, mostly involving fatal warnings and syntax errors.
Some failed regular expression matches such as 'f' =~ /../g
were not
resetting pos. Also, "match-once" patterns (m?...?g) failed to reset
it, too, when invoked a second time [perl #23180].
Several bugs involving local *ISA
and local *Foo::
causing stale
MRO caches have been fixed.
Defining a subroutine when its typeglob has been aliased no longer results in stale method caches. This bug was introduced in Perl v5.10.
Localising a typeglob containing a subroutine when the typeglob's package has been deleted from its parent stash no longer produces an error. This bug was introduced in Perl v5.14.
Under some circumstances, local *method=...
would fail to reset method
caches upon scope exit.
/[.foo.]/
is no longer an error, but produces a warning (as before) and
is treated as /[.fo]/
[perl #115818].
goto $tied_var
now calls FETCH before deciding what type of goto
(subroutine or label) this is.
Renaming packages through glob assignment
(*Foo:: = *Bar::; *Bar:: = *Baz::
) in combination with m?...? and
reset no longer makes threaded builds crash.
A number of bugs related to assigning a list to hash have been fixed. Many of
these involve lists with repeated keys like (1, 1, 1, 1)
.
The expression scalar(%h = (1, 1, 1, 1))
now returns 4
, not 2
.
The return value of %h = (1, 1, 1)
in list context was wrong. Previously
this would return (1, undef, 1)
, now it returns (1, undef)
.
Perl now issues the same warning on ($s, %h) = (1, {})
as it does for
(%h) = ({})
, "Reference found where even-sized list expected".
A number of additional edge cases in list assignment to hashes were corrected. For more details see commit 23b7025ebc.
Attributes applied to lexical variables no longer leak memory. [perl #114764]
dump, goto, last, next, redo or require followed by a
bareword (or version) and then an infix operator is no longer a syntax
error. It used to be for those infix operators (like +
) that have a
different meaning where a term is expected. [perl #105924]
require a::b . 1
and require a::b + 1
no longer produce erroneous
ambiguity warnings. [perl #107002]
Class method calls are now allowed on any string, and not just strings beginning with an alphanumeric character. [perl #105922]
An empty pattern created with qr// used in m/// no longer triggers
the "empty pattern reuses last pattern" behaviour. [perl #96230]
Tying a hash during iteration no longer results in a memory leak.
Freeing a tied hash during iteration no longer results in a memory leak.
List assignment to a tied array or hash that dies on STORE no longer results in a memory leak.
If the hint hash (%^H
) is tied, compile-time scope entry (which copies
the hint hash) no longer leaks memory if FETCH dies. [perl #107000]
Constant folding no longer inappropriately triggers the special
split " "
behaviour. [perl #94490]
defined scalar(@array)
, defined do { &foo }
, and similar constructs
now treat the argument to defined as a simple scalar. [perl #97466]
Running a custom debugging that defines no *DB::DB
glob or provides a
subroutine stub for &DB::DB
no longer results in a crash, but an error
instead. [perl #114990]
reset ""
now matches its documentation. reset only resets m?...?
patterns when called with no argument. An empty string for an argument now
does nothing. (It used to be treated as no argument.) [perl #97958]
printf with an argument returning an empty list no longer reads past the
end of the stack, resulting in erratic behaviour. [perl #77094]
--subname
no longer produces erroneous ambiguity warnings.
[perl #77240]
v10
is now allowed as a label or package name. This was inadvertently
broken when v-strings were added in Perl v5.6. [perl #56880]
length, pos, substr and sprintf could be confused by ties,
overloading, references and typeglobs if the stringification of such
changed the internal representation to or from UTF-8. [perl #114410]
utf8::encode now calls FETCH and STORE on tied variables. utf8::decode now calls STORE (it was already calling FETCH).
$tied =~ s/$non_utf8/$utf8/
no longer loops infinitely if the tied
variable returns a Latin-1 string, shared hash key scalar, or reference or
typeglob that stringifies as ASCII or Latin-1. This was a regression from
v5.12.
s/// without /e is now better at detecting when it needs to forego
certain optimisations, fixing some buggy cases:
Match variables in certain constructs (&&, ||, ..
and others) in
the replacement part; e.g., s/(.)/$l{$a||$1}/g. [perl #26986]
Aliases to match variables in the replacement.
$REGERROR
or $REGMARK
in the replacement. [perl #49190]
An empty pattern (s//$foo/) that causes the last-successful pattern to
be used, when that pattern contains code blocks that modify the variables
in the replacement.
The taintedness of the replacement string no longer affects the taintedness
of the return value of s///e.
The $|
autoflush variable is created on-the-fly when needed. If this
happened (e.g., if it was mentioned in a module or eval) when the
currently-selected filehandle was a typeglob with an empty IO slot, it used
to crash. [perl #115206]
Line numbers at the end of a string eval are no longer off by one. [perl #114658]
@INC filters (subroutines returned by subroutines in @INC) that set $_ to a copy-on-write scalar no longer cause the parser to modify that string buffer in place.
length($object) no longer returns the undefined value if the object has
string overloading that returns undef. [perl #115260]
The use of PL_stashcache
, the stash name lookup cache for method calls, has
been restored,
Commit da6b625f78f5f133 in August 2011 inadvertently broke the code that looks
up values in PL_stashcache
. As it's a only cache, quite correctly everything
carried on working without it.
The error "Can't localize through a reference" had disappeared in v5.16.0
when local %$ref
appeared on the last line of an lvalue subroutine.
This error disappeared for \local %$ref
in perl v5.8.1. It has now
been restored.
The parsing of here-docs has been improved significantly, fixing several parsing bugs and crashes and one memory leak, and correcting wrong subsequent line numbers under certain conditions.
Inside an eval, the error message for an unterminated here-doc no longer has a newline in the middle of it [perl #70836].
A substitution inside a substitution pattern (s/${s|||}//) no longer
confuses the parser.
It may be an odd place to allow comments, but s//"" # hello/e
has
always worked, unless there happens to be a null character before the
first #. Now it works even in the presence of nulls.
An invalid range in tr/// or y/// no longer results in a memory leak.
String eval no longer treats a semicolon-delimited quote-like operator at
the very end (eval 'q;;'
) as a syntax error.
warn {$_ => 1} + 1
is no longer a syntax error. The parser used to
get confused with certain list operators followed by an anonymous hash and
then an infix operator that shares its form with a unary operator.
(caller $n)[6]
(which gives the text of the eval) used to return the
actual parser buffer. Modifying it could result in crashes. Now it always
returns a copy. The string returned no longer has "\n;" tacked on to the
end. The returned text also includes here-doc bodies, which used to be
omitted.
The UTF-8 position cache is now reset when accessing magical variables, to avoid the string buffer and the UTF-8 position cache getting out of sync [perl #114410].
Various cases of get magic being called twice for magical UTF-8 strings have been fixed.
This code (when not in the presence of $&
etc)
- $_ = 'x' x 1_000_000;
- 1 while /(.)/;
used to skip the buffer copy for performance reasons, but suffered from $1
etc changing if the original string changed. That's now been fixed.
Perl doesn't use PerlIO anymore to report out of memory messages, as PerlIO might attempt to allocate more memory.
In a regular expression, if something is quantified with {n,m} where
n > m
, it can't possibly match. Previously this was a fatal
error, but now is merely a warning (and that something won't match).
[perl #82954].
It used to be possible for formats defined in subroutines that have subsequently been undefined and redefined to close over variables in the wrong pad (the newly-defined enclosing sub), resulting in crashes or "Bizarre copy" errors.
Redefinition of XSUBs at run time could produce warnings with the wrong line number.
The %vd sprintf format does not support version objects for alpha versions. It used to output the format itself (%vd) when passed an alpha version, and also emit an "Invalid conversion in printf" warning. It no longer does, but produces the empty string in the output. It also no longer leaks memory in this case.
$obj->SUPER::method
calls in the main package could fail if the
SUPER package had already been accessed by other means.
Stash aliasing (*foo:: = *bar::
) no longer causes SUPER calls to ignore
changes to methods or @ISA or use the wrong package.
Method calls on packages whose names end in ::SUPER are no longer treated as SUPER method calls, resulting in failure to find the method. Furthermore, defining subroutines in such packages no longer causes them to be found by SUPER method calls on the containing package [perl #114924].
\w
now matches the code points U+200C (ZERO WIDTH NON-JOINER) and U+200D
(ZERO WIDTH JOINER). \W
no longer matches these. This change is because
Unicode corrected their definition of what \w
should match.
dump LABEL
no longer leaks its label.
Constant folding no longer changes the behaviour of functions like stat()
and truncate() that can take either filenames or handles.
stat 1 ? foo : bar
nows treats its argument as a file name (since it is an
arbitrary expression), rather than the handle "foo".
truncate FOO, $len
no longer falls back to treating "FOO" as a file name if
the filehandle has been deleted. This was broken in Perl v5.16.0.
Subroutine redefinitions after sub-to-glob and glob-to-glob assignments no longer cause double frees or panic messages.
s/// now turns vstrings into plain strings when performing a substitution,
even if the resulting string is the same (s/a/a/).
Prototype mismatch warnings no longer erroneously treat constant subs as having no prototype when they actually have "".
Constant subroutines and forward declarations no longer prevent prototype mismatch warnings from omitting the sub name.
undef on a subroutine now clears call checkers.
The ref operator started leaking memory on blessed objects in Perl v5.16.0.
This has been fixed [perl #114340].
use no longer tries to parse its arguments as a statement, making
use constant { () };
a syntax error [perl #114222].
On debugging builds, "uninitialized" warnings inside formats no longer cause assertion failures.
On debugging builds, subroutines nested inside formats no longer cause assertion failures [perl #78550].
Formats and use statements are now permitted inside formats.
print $x
and sub { print $x }->()
now always produce the same output.
It was possible for the latter to refuse to close over $x if the variable was
not active; e.g., if it was defined outside a currently-running named
subroutine.
Similarly, print $x
and print eval '$x'
now produce the same output.
This also allows "my $x if 0" variables to be seen in the debugger [perl
#114018].
Formats called recursively no longer stomp on their own lexical variables, but each recursive call has its own set of lexicals.
Attempting to free an active format or the handle associated with it no longer results in a crash.
Format parsing no longer gets confused by braces, semicolons and low-precedence
operators. It used to be possible to use braces as format delimiters (instead
of =
and .), but only sometimes. Semicolons and low-precedence operators
in format argument lines no longer confuse the parser into ignoring the line's
return value. In format argument lines, braces can now be used for anonymous
hashes, instead of being treated always as do blocks.
Formats can now be nested inside code blocks in regular expressions and other
quoted constructs (/(?{...})/
and qq/${...}/) [perl #114040].
Formats are no longer created after compilation errors.
Under debugging builds, the -DA command line option started crashing in Perl v5.16.0. It has been fixed [perl #114368].
A potential deadlock scenario involving the premature termination of a pseudo- forked child in a Windows build with ithreads enabled has been fixed. This resolves the common problem of the t/op/fork.t test hanging on Windows [perl #88840].
The code which generates errors from require() could potentially read one or
two bytes before the start of the filename for filenames less than three bytes
long and ending /\.p?\z/
. This has now been fixed. Note that it could
never have happened with module names given to use() or require() anyway.
The handling of pathnames of modules given to require() has been made
thread-safe on VMS.
Non-blocking sockets have been fixed on VMS.
Pod can now be nested in code inside a quoted construct outside of a string eval. This used to work only within string evals [perl #114040].
goto ''
now looks for an empty label, producing the "goto must have
label" error message, instead of exiting the program [perl #111794].
goto "\0"
now dies with "Can't find label" instead of "goto must have
label".
The C function hv_store
used to result in crashes when used on %^H
[perl #111000].
A call checker attached to a closure prototype via cv_set_call_checker
is now copied to closures cloned from it. So cv_set_call_checker
now
works inside an attribute handler for a closure.
Writing to $^N
used to have no effect. Now it croaks with "Modification
of a read-only value" by default, but that can be overridden by a custom
regular expression engine, as with $1
[perl #112184].
undef on a control character glob (undef *^H
) no longer emits an
erroneous warning about ambiguity [perl #112456].
For efficiency's sake, many operators and built-in functions return the
same scalar each time. Lvalue subroutines and subroutines in the CORE::
namespace were allowing this implementation detail to leak through.
print &CORE::uc("a"), &CORE::uc("b")
used to print "BB". The same thing
would happen with an lvalue subroutine returning the return value of uc.
Now the value is copied in such cases.
method {}
syntax with an empty block or a block returning an empty list
used to crash or use some random value left on the stack as its invocant.
Now it produces an error.
vec now works with extremely large offsets (>2 GB) [perl #111730].
Changes to overload settings now take effect immediately, as do changes to
inheritance that affect overloading. They used to take effect only after
bless.
Objects that were created before a class had any overloading used to remain
non-overloaded even if the class gained overloading through use overload
or @ISA changes, and even after bless. This has been fixed
[perl #112708].
Classes with overloading can now inherit fallback values.
Overloading was not respecting a fallback value of 0 if there were
overloaded objects on both sides of an assignment operator like +=
[perl #111856].
pos now croaks with hash and array arguments, instead of producing
erroneous warnings.
while(each %h)
now implies while(defined($_ = each %h))
, like
readline and readdir.
Subs in the CORE:: namespace no longer crash after undef *_
when called
with no argument list (&CORE::time
with no parentheses).
unpack no longer produces the "'/' must follow a numeric type in unpack"
error when it is the data that are at fault [perl #60204].
join and "@array"
now call FETCH only once on a tied $"
[perl #8931].
Some subroutine calls generated by compiling core ops affected by a
CORE::GLOBAL
override had op checking performed twice. The checking
is always idempotent for pure Perl code, but the double checking can
matter when custom call checkers are involved.
A race condition used to exist around fork that could cause a signal sent to the parent to be handled by both parent and child. Signals are now blocked briefly around fork to prevent this from happening [perl #82580].
The implementation of code blocks in regular expressions, such as (?{})
and (??{})
, has been heavily reworked to eliminate a whole slew of bugs.
The main user-visible changes are:
Code blocks within patterns are now parsed in the same pass as the surrounding code; in particular it is no longer necessary to have balanced braces: this now works:
- /(?{ $x='{' })/
This means that this error message is no longer generated:
- Sequence (?{...}) not terminated or not {}-balanced in regex
but a new error may be seen:
- Sequence (?{...}) not terminated with ')'
In addition, literal code blocks within run-time patterns are only compiled once, at perl compile-time:
Lexical variables are now sane as regards scope, recursion and closure
behavior. In particular, /A(?{B})C/
behaves (from a closure viewpoint)
exactly like /A/ && do { B } && /C/
, while qr/A(?{B})C/ is like
sub {/A/ && do { B } && /C/}
. So this code now works how you might
expect, creating three regexes that match 0, 1, and 2:
The use re 'eval'
pragma is now only required for code blocks defined
at runtime; in particular in the following, the text of the $r
pattern is
still interpolated into the new pattern and recompiled, but the individual
compiled code-blocks within $r
are reused rather than being recompiled,
and use re 'eval'
isn't needed any more:
- my $r = qr/abc(?{....})def/;
- /xyz$r/;
Flow control operators no longer crash. Each code block runs in a new
dynamic scope, so next etc. will not see
any enclosing loops. return returns a value
from the code block, not from any enclosing subroutine.
Perl normally caches the compilation of run-time patterns, and doesn't recompile if the pattern hasn't changed, but this is now disabled if required for the correct behavior of closures. For example:
The /msix
and (?msix)
etc. flags are now propagated into the return
value from (??{})
; this now works:
- "AB" =~ /a(??{'b'})/i;
Warnings and errors will appear to come from the surrounding code (or for
run-time code blocks, from an eval) rather than from an re_eval
:
- use re 'eval'; $c = '(?{ warn "foo" })'; /$c/;
- /(?{ warn "foo" })/;
formerly gave:
- foo at (re_eval 1) line 1.
- foo at (re_eval 2) line 1.
and now gives:
- foo at (eval 1) line 1.
- foo at /some/prog line 2.
Perl now can be recompiled to use any Unicode version. In v5.16, it worked on Unicodes 6.0 and 6.1, but there were various bugs if earlier releases were used; the older the release the more problems.
vec no longer produces "uninitialized" warnings in lvalue context
[perl #9423].
An optimization involving fixed strings in regular expressions could cause a severe performance penalty in edge cases. This has been fixed [perl #76546].
In certain cases, including empty subpatterns within a regular expression (such
as (?:) or (?:|)
) could disable some optimizations. This has been fixed.
The "Can't find an opnumber" message that prototype produces when passed
a string like "CORE::nonexistent_keyword" now passes UTF-8 and embedded
NULs through unchanged [perl #97478].
prototype now treats magical variables like $1
the same way as
non-magical variables when checking for the CORE:: prefix, instead of
treating them as subroutine names.
Under threaded perls, a runtime code block in a regular expression could
corrupt the package name stored in the op tree, resulting in bad reads
in caller, and possibly crashes [perl #113060].
Referencing a closure prototype (\&{$_[1]}
in an attribute handler for a
closure) no longer results in a copy of the subroutine (or assertion
failures on debugging builds).
eval '__PACKAGE__'
now returns the right answer on threaded builds if
the current package has been assigned over (as in
*ThisPackage:: = *ThatPackage::
) [perl #78742].
If a package is deleted by code that it calls, it is possible for caller
to see a stack frame belonging to that deleted package. caller could
crash if the stash's memory address was reused for a scalar and a
substitution was performed on the same scalar [perl #113486].
UNIVERSAL::can
no longer treats its first argument differently
depending on whether it is a string or number internally.
open with <&
for the mode checks to see whether the third argument is
a number, in determining whether to treat it as a file descriptor or a handle
name. Magical variables like $1
were always failing the numeric check and
being treated as handle names.
warn's handling of magical variables ($1
, ties) has undergone several
fixes. FETCH
is only called once now on a tied argument or a tied $@
[perl #97480]. Tied variables returning objects that stringify as "" are
no longer ignored. A tied $@
that happened to return a reference the
previous time it was used is no longer ignored.
warn ""
now treats $@
with a number in it the same way, regardless of
whether it happened via $@=3
or $@="3"
. It used to ignore the
former. Now it appends "\t...caught", as it has always done with
$@="3"
.
Numeric operators on magical variables (e.g., $1 + 1
) used to use
floating point operations even where integer operations were more appropriate,
resulting in loss of accuracy on 64-bit platforms [perl #109542].
Unary negation no longer treats a string as a number if the string happened
to be used as a number at some point. So, if $x
contains the string "dogs",
-$x
returns "-dogs" even if $y=0+$x
has happened at some point.
In Perl v5.14, -'-10'
was fixed to return "10", not "+10". But magical
variables ($1
, ties) were not fixed till now [perl #57706].
Unary negation now treats strings consistently, regardless of the internal
UTF8
flag.
A regression introduced in Perl v5.16.0 involving
tr/SEARCHLIST/REPLACEMENTLIST/ has been fixed. Only the first
instance is supposed to be meaningful if a character appears more than
once in SEARCHLIST. Under some circumstances, the final instance
was overriding all earlier ones. [perl #113584]
Regular expressions like qr/\87/ previously silently inserted a NUL
character, thus matching as if it had been written qr/\00087/. Now it
matches as if it had been written as qr/87/, with a message that the
sequence "\8"
is unrecognized.
__SUB__ now works in special blocks (BEGIN
, END
, etc.).
Thread creation on Windows could theoretically result in a crash if done
inside a BEGIN
block. It still does not work properly, but it no longer
crashes [perl #111610].
\&{''}
(with the empty string) now autovivifies a stub like any other
sub name, and no longer produces the "Unable to create sub" error
[perl #94476].
A regression introduced in v5.14.0 has been fixed, in which some calls
to the re
module would clobber $_
[perl #113750].
do FILE
now always either sets or clears $@
, even when the file can't be
read. This ensures that testing $@
first (as recommended by the
documentation) always returns the correct result.
The array iterator used for the each @array
construct is now correctly
reset when @array
is cleared [perl #75596]. This happens, for example, when
the array is globally assigned to, as in @array = (...)
, but not when its
values are assigned to. In terms of the XS API, it means that av_clear()
will now reset the iterator.
This mirrors the behaviour of the hash iterator when the hash is cleared.
$class->can
, $class->isa
, and $class->DOES
now return
correct results, regardless of whether that package referred to by $class
exists [perl #47113].
Arriving signals no longer clear $@
[perl #45173].
Allow my ()
declarations with an empty variable list [perl #113554].
During parsing, subs declared after errors no longer leave stubs [perl #113712].
Closures containing no string evals no longer hang on to their containing subroutines, allowing variables closed over by outer subroutines to be freed when the outer sub is freed, even if the inner sub still exists [perl #89544].
Duplication of in-memory filehandles by opening with a "<&=" or ">&=" mode stopped working properly in v5.16.0. It was causing the new handle to reference a different scalar variable. This has been fixed [perl #113764].
qr// expressions no longer crash with custom regular expression engines
that do not set offs
at regular expression compilation time
[perl #112962].
delete local
no longer crashes with certain magical arrays and hashes
[perl #112966].
local on elements of certain magical arrays and hashes used not to
arrange to have the element deleted on scope exit, even if the element did
not exist before local.
scalar(write) no longer returns multiple items [perl #73690].
String to floating point conversions no longer misparse certain strings under
use locale
[perl #109318].
@INC
filters that die no longer leak memory [perl #92252].
The implementations of overloaded operations are now called in the correct
context. This allows, among other things, being able to properly override
<>
[perl #47119].
Specifying only the fallback
key when calling use overload
now behaves
properly [perl #113010].
sub foo { my $a = 0; while ($a) { ... } }
and
sub foo { while (0) { ... } }
now return the same thing [perl #73618].
String negation now behaves the same under use integer;
as it does
without [perl #113012].
chr now returns the Unicode replacement character (U+FFFD) for -1,
regardless of the internal representation. -1 used to wrap if the argument
was tied or a string internally.
Using a format after its enclosing sub was freed could crash as of
perl v5.12.0, if the format referenced lexical variables from the outer sub.
Using a format after its enclosing sub was undefined could crash as of
perl v5.10.0, if the format referenced lexical variables from the outer sub.
Using a format defined inside a closure, which format references
lexical variables from outside, never really worked unless the write
call was directly inside the closure. In v5.10.0 it even started crashing.
Now the copy of that closure nearest the top of the call stack is used to
find those variables.
Formats that close over variables in special blocks no longer crash if a stub exists with the same name as the special block before the special block is compiled.
The parser no longer gets confused, treating eval foo ()
as a syntax
error if preceded by print; [perl #16249].
The return value of syscall is no longer truncated on 64-bit platforms
[perl #113980].
Constant folding no longer causes print 1 ? FOO : BAR
to print to the
FOO handle [perl #78064].
do subname
now calls the named subroutine and uses the file name it
returns, instead of opening a file named "subname".
Subroutines looked up by rv2cv check hooks (registered by XS modules) are
now taken into consideration when determining whether foo bar
should be
the sub call foo(bar)
or the method call "bar"->foo
.
CORE::foo::bar
is no longer treated specially, allowing global overrides
to be called directly via CORE::GLOBAL::uc(...)
[perl #113016].
Calling an undefined sub whose typeglob has been undefined now produces the customary "Undefined subroutine called" error, instead of "Not a CODE reference".
Two bugs involving @ISA have been fixed. *ISA = *glob_without_array
and
undef *ISA; @{*ISA}
would prevent future modifications to @ISA from
updating the internal caches used to look up methods. The
*glob_without_array case was a regression from Perl v5.12.
Regular expression optimisations sometimes caused $
with /m to
produce failed or incorrect matches [perl #114068].
__SUB__ now works in a sort block when the enclosing subroutine is
predeclared with sub foo;
syntax [perl #113710].
Unicode properties only apply to Unicode code points, which leads to
some subtleties when regular expressions are matched against
above-Unicode code points. There is a warning generated to draw your
attention to this. However, this warning was being generated
inappropriately in some cases, such as when a program was being parsed.
Non-Unicode matches such as \w
and [:word:] should not generate the
warning, as their definitions don't limit them to apply to only Unicode
code points. Now the message is only generated when matching against
\p{}
and \P{}
. There remains a bug, [perl #114148], for the very
few properties in Unicode that match just a single code point. The
warning is not generated if they are matched against an above-Unicode
code point.
Uninitialized warnings mentioning hash elements would only mention the element name if it was not in the first bucket of the hash, due to an off-by-one error.
A regular expression optimizer bug could cause multiline "^" to behave
incorrectly in the presence of line breaks, such that
"/\n\n" =~ m#\A(?:^/$)#im
would not match [perl #115242].
Failed fork in list context no longer corrupts the stack.
@a = (1, 2, fork, 3)
used to gobble up the 2 and assign (1, undef, 3)
if the fork call failed.
Numerous memory leaks have been fixed, mostly involving tied variables that die, regular expression character classes and code blocks, and syntax errors.
Assigning a regular expression (${qr//}
) to a variable that happens to
hold a floating point number no longer causes assertion failures on
debugging builds.
Assigning a regular expression to a scalar containing a number no longer causes subsequent numification to produce random numbers.
Assigning a regular expression to a magic variable no longer wipes away the magic. This was a regression from v5.10.
Assigning a regular expression to a blessed scalar no longer results in crashes. This was also a regression from v5.10.
Regular expression can now be assigned to tied hash and array elements with flattening into strings.
Numifying a regular expression no longer results in an uninitialized warning.
Negative array indices no longer cause EXISTS methods of tied variables to be ignored. This was a regression from v5.12.
Negative array indices no longer result in crashes on arrays tied to non-objects.
$byte_overload .= $utf8
no longer results in doubly-encoded UTF-8 if the
left-hand scalar happened to have produced a UTF-8 string the last time
overloading was invoked.
goto &sub
now uses the current value of @_, instead of using the array
the subroutine was originally called with. This means
local @_ = (...); goto &sub
now works [perl #43077].
If a debugger is invoked recursively, it no longer stomps on its own lexical variables. Formerly under recursion all calls would share the same set of lexical variables [perl #115742].
*_{ARRAY}
returned from a subroutine no longer spontaneously
becomes empty.
When using say to print to a tied filehandle, the value of $\
is
correctly localized, even if it was previously undef. [perl #119927]
UTF8-flagged strings in %ENV
on HP-UX 11.00 are buggy
The interaction of UTF8-flagged strings and %ENV
on HP-UX 11.00 is
currently dodgy in some not-yet-fully-diagnosed way. Expect test
failures in t/op/magic.t, followed by unknown behavior when storing
wide characters in the environment.
Hojung Yoon (AMORETTE), 24, of Seoul, South Korea, went to his long rest on May 8, 2013 with llama figurine and autographed TIMTOADY card. He was a brilliant young Perl 5 & 6 hacker and a devoted member of Seoul.pm. He programmed Perl, talked Perl, ate Perl, and loved Perl. We believe that he is still programming in Perl with his broken IBM laptop somewhere. He will be missed.
Perl v5.18.0 represents approximately 12 months of development since Perl v5.16.0 and contains approximately 400,000 lines of changes across 2,100 files from 113 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl v5.18.0:
Aaron Crane, Aaron Trevena, Abhijit Menon-Sen, Adrian M. Enache, Alan Haggai Alavi, Alexandr Ciornii, Andrew Tam, Andy Dougherty, Anton Nikishaev, Aristotle Pagaltzis, Augustina Blair, Bob Ernst, Brad Gilbert, Breno G. de Oliveira, Brian Carlson, Brian Fraser, Charlie Gonzalez, Chip Salzenberg, Chris 'BinGOs' Williams, Christian Hansen, Colin Kuskie, Craig A. Berry, Dagfinn Ilmari Mannsåker, Daniel Dragan, Daniel Perrett, Darin McBride, Dave Rolsky, David Golden, David Leadbeater, David Mitchell, David Nicol, Dominic Hargreaves, E. Choroba, Eric Brine, Evan Miller, Father Chrysostomos, Florian Ragwitz, François Perrad, George Greer, Goro Fuji, H.Merijn Brand, Herbert Breunung, Hugo van der Sanden, Igor Zaytsev, James E Keenan, Jan Dubois, Jasmine Ahuja, Jerry D. Hedden, Jess Robinson, Jesse Luehrs, Joaquin Ferrero, Joel Berger, John Goodyear, John Peacock, Karen Etheridge, Karl Williamson, Karthik Rajagopalan, Kent Fredric, Leon Timmermans, Lucas Holt, Lukas Mai, Marcus Holland-Moritz, Markus Jansen, Martin Hasch, Matthew Horsfall, Max Maischein, Michael G Schwern, Michael Schroeder, Moritz Lenz, Nicholas Clark, Niko Tyni, Oleg Nesterov, Patrik Hägglund, Paul Green, Paul Johnson, Paul Marquess, Peter Martini, Rafael Garcia-Suarez, Reini Urban, Renee Baecker, Rhesa Rozendaal, Ricardo Signes, Robin Barker, Ronald J. Kimball, Ruslan Zakirov, Salvador Fandiño, Sawyer X, Scott Lanning, Sergey Alekseev, Shawn M Moore, Shirakata Kentaro, Shlomi Fish, Sisyphus, Smylers, Steffen Müller, Steve Hay, Steve Peters, Steven Schubiger, Sullivan Beck, Sven Strickroth, Sébastien Aperghis-Tramoni, Thomas Sibley, Tobias Leich, Tom Wyant, Tony Cook, Vadim Konovalov, Vincent Pit, Volker Schatz, Walt Mankowski, Yves Orton, Zefram.
The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.
Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of perl -V
,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl5181delta - what is new for perl v5.18.1
This document describes differences between the 5.18.0 release and the 5.18.1 release.
If you are upgrading from an earlier release such as 5.16.0, first read perl5180delta, which describes differences between 5.16.0 and 5.18.0.
There are no changes intentionally incompatible with 5.18.0 If any exist, they are bugs, and we request that you submit a report. See Reporting Bugs below.
B has been upgraded from 1.42 to 1.42_01, fixing bugs related to lexical subroutines.
Digest::SHA has been upgraded from 5.84 to 5.84_01, fixing a crashing bug. [RT #118649]
Module::CoreList has been upgraded from 2.89 to 2.96.
A rarely-encounted configuration bug in the AIX hints file has been corrected.
After a patch to the relevant hints file, perl should now build correctly on MidnightBSD 0.4-RELEASE.
Starting in v5.18.0, a construct like /[#](?{})/x
would have its #
incorrectly interpreted as a comment. The code block would be skipped,
unparsed. This has been corrected.
A number of memory leaks related to the new, experimental regexp bracketed character class feature have been plugged.
The OP allocation code now returns correctly aligned memory in all cases
for struct pmop
. Previously it could return memory only aligned to a
4-byte boundary, which is not correct for an ithreads build with 64 bit IVs
on some 32 bit platforms. Notably, this caused the build to fail completely
on sparc GNU/Linux. [RT #118055]
The debugger's man
command been fixed. It was broken in the v5.18.0
release. The man
command is aliased to the names doc
and perldoc
-
all now work again.
@_
is now correctly visible in the debugger, fixing a regression
introduced in v5.18.0's debugger. [RT #118169]
Fixed a small number of regexp constructions that could either fail to match or crash perl when the string being matched against was allocated above the 2GB line on 32-bit systems. [RT #118175]
Perl v5.16 inadvertently introduced a bug whereby calls to XSUBs that were not visible at compile time were treated as lvalues and could be assigned to, even when the subroutine was not an lvalue sub. This has been fixed. [perl #117947]
Perl v5.18 inadvertently introduced a bug whereby dual-vars (i.e.
variables with both string and numeric values, such as $!
) where the
truthness of the variable was determined by the numeric value rather than
the string value. [RT #118159]
Perl v5.18 inadvertently introduced a bug whereby interpolating mixed up-
and down-graded UTF-8 strings in a regex could result in malformed UTF-8
in the pattern: specifically if a downgraded character in the range
\x80..\xff
followed a UTF-8 string, e.g.
[perl #118297].
Lexical constants (my sub a() { 42 }
) no longer crash when inlined.
Parameter prototypes attached to lexical subroutines are now respected when compiling sub calls without parentheses. Previously, the prototypes were honoured only for calls with parentheses. [RT #116735]
Syntax errors in lexical subroutines in combination with calls to the same subroutines no longer cause crashes at compile time.
The dtrace sub-entry probe now works with lexical subs, instead of crashing [perl #118305].
Undefining an inlinable lexical subroutine (my sub foo() { 42 } undef
&foo
) would result in a crash if warnings were turned on.
Deep recursion warnings no longer crash lexical subroutines. [RT #118521]
Perl 5.18.1 represents approximately 2 months of development since Perl 5.18.0 and contains approximately 8,400 lines of changes across 60 files from 12 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.18.1:
Chris 'BinGOs' Williams, Craig A. Berry, Dagfinn Ilmari Mannsåker, David Mitchell, Father Chrysostomos, Karl Williamson, Lukas Mai, Nicholas Clark, Peter Martini, Ricardo Signes, Shlomi Fish, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.
Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of perl -V
,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perldelta - what is new for perl v5.18.2
This document describes differences between the 5.18.1 release and the 5.18.2 release.
If you are upgrading from an earlier release such as 5.18.0, first read perl5181delta, which describes differences between 5.18.0 and 5.18.1.
B has been upgraded from version 1.42_01 to 1.42_02.
The fix for [perl #118525] introduced a regression in the behaviour of
B::CV::GV
, changing the return value from a B::SPECIAL
object on
a NULL
CvGV
to undef. B::CV::GV
again returns a
B::SPECIAL
object in this case. [perl #119413]
B::Concise has been upgraded from version 0.95 to 0.95_01.
This fixes a bug in dumping unexpected SEPCIALs.
English has been upgraded from version 1.06 to 1.06_01. This fixes an
error about the performance of $`
, $&
, and c<$'>.
File::Glob has been upgraded from version 1.20 to 1.20_01.
perlrepository has been restored with a pointer to more useful pages.
perlhack has been updated with the latest changes from blead.
Perl 5.18.1 introduced a regression along with a bugfix for lexical subs. Some B::SPECIAL results from B::CV::GV became undefs instead. This broke Devel::Cover among other libraries. This has been fixed. [perl #119351]
Perl 5.18.0 introduced a regression whereby [:^ascii:]
, if used in the same
character class as other qualifiers, would fail to match characters in the
Latin-1 block. This has been fixed. [perl #120799]
Perl 5.18.0 introduced a regression when using ->SUPER::method with AUTOLOAD by looking up AUTOLOAD from the current package, rather than the current package’s superclass. This has been fixed. [perl #120694]
Perl 5.18.0 introduced a regression whereby -bareword
was no longer
permitted under the strict
and integer
pragmata when used together. This
has been fixed. [perl #120288]
Previously PerlIOBase_dup didn't check if pushing the new layer succeeded before (optionally) setting the utf8 flag. This could cause segfaults-by-nullpointer. This has been fixed.
A buffer overflow with very long identifiers has been fixed.
A regression from 5.16 in the handling of padranges led to assertion failures if a keyword plugin declined to handle the second ‘my’, but only after creating a padop.
This affected, at least, Devel::CallParser under threaded builds.
This has been fixed
The construct $r=qr/.../; /$r/p
is now handled properly, an issue which
had been worsened by changes 5.18.0. [perl #118213]
Perl 5.18.2 represents approximately 3 months of development since Perl 5.18.1 and contains approximately 980 lines of changes across 39 files from 4 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.18.2:
Craig A. Berry, David Mitchell, Ricardo Signes, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.
Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of perl -V
,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl561delta - what's new for perl v5.6.1
This document describes differences between the 5.005 release and the 5.6.1 release.
This section contains a summary of the changes between the 5.6.0 release and the 5.6.1 release. More details about the changes mentioned here may be found in the Changes files that accompany the Perl source distribution. See perlhack for pointers to online resources where you can inspect the individual patches described by these changes.
suidperl will not run /bin/mail anymore, because some platforms have a /bin/mail that is vulnerable to buffer overflow attacks.
Note that suidperl is neither built nor installed by default in any recent version of perl. Use of suidperl is highly discouraged. If you think you need it, try alternatives such as sudo first. See http://www.courtesan.com/sudo/ .
This is not an exhaustive list. It is intended to cover only the significant user-visible changes.
UNIVERSAL::isa()
A bug in the caching mechanism used by UNIVERSAL::isa()
that affected
base.pm has been fixed. The bug has existed since the 5.005 releases,
but wasn't tickled by base.pm in those releases.
Various cases of memory leaks and attempts to access uninitialized memory have been cured. See Known Problems below for further issues.
Numeric conversions did not recognize changes in the string value properly in certain circumstances.
In other situations, large unsigned numbers (those above 2**31) could sometimes lose their unsignedness, causing bogus results in arithmetic operations.
Integer modulus on large unsigned integers sometimes returned incorrect values.
Perl 5.6.0 generated "not a number" warnings on certain conversions where previous versions didn't.
These problems have all been rectified.
Infinity is now recognized as a number.
In Perl 5.6.0, qw(a\\b) produced a string with two backslashes instead of one, in a departure from the behavior in previous versions. The older behavior has been reinstated.
caller() could cause core dumps in certain situations. Carp was sometimes affected by this problem.
Pattern matches on overloaded values are now handled correctly.
Perl 5.6.0 parsed m/\x{ab}/ incorrectly, leading to spurious warnings. This has been corrected.
The RE engine found in Perl 5.6.0 accidentally pessimised certain kinds of simple pattern matches. These are now handled better.
Regular expression debug output (whether through use re 'debug'
or via -Dr
) now looks better.
Multi-line matches like "a\nxb\n" =~ /(?!\A)x/m
were flawed. The
bug has been fixed.
Use of $& could trigger a core dump under some situations. This is now avoided.
Match variables $1 et al., weren't being unset when a pattern match
was backtracking, and the anomaly showed up inside /...(?{ ... }).../
etc. These variables are now tracked correctly.
pos() did not return the correct value within s///ge in earlier versions. This is now handled correctly.
readline() on files opened in "slurp" mode could return an extra "" at the end in certain situations. This has been corrected.
Autovivification of symbolic references of special variables described
in perlvar (as in ${$num}
) was accidentally disabled. This works
again now.
Lexical warnings now propagate correctly into eval "..."
.
use warnings qw(FATAL all)
did not work as intended. This has been
corrected.
Lexical warnings could leak into other scopes in some situations. This is now fixed.
warnings::enabled() now reports the state of $^W correctly if the caller isn't using lexical warnings.
Perl 5.6.0 could emit spurious warnings about redefinition of dl_error() when statically building extensions into perl. This has been corrected.
"our" variables could result in bogus "Variable will not stay shared" warnings. This is now fixed.
"our" variables of the same name declared in two sibling blocks resulted in bogus warnings about "redeclaration" of the variables. The problem has been corrected.
Compatibility of the builtin glob() with old csh-based glob has been
improved with the addition of GLOB_ALPHASORT option. See File::Glob
.
File::Glob::glob() has been renamed to File::Glob::bsd_glob() because the name clashes with the builtin glob(). The older name is still available for compatibility, but is deprecated.
Spurious syntax errors generated in certain situations, when glob() caused File::Glob to be loaded for the first time, have been fixed.
Some cases of inconsistent taint propagation (such as within hash values) have been fixed.
The tainting behavior of sprintf() has been rationalized. It does not taint the result of floating point formats anymore, making the behavior consistent with that of string interpolation.
Arguments to sort() weren't being provided the right wantarray() context. The comparison block is now run in scalar context, and the arguments to be sorted are always provided list context.
sort() is also fully reentrant, in the sense that the sort function can itself call sort(). This did not work reliably in previous releases.
#line directives now work correctly when they appear at the very
beginning of eval "..."
.
The (\&) prototype now works properly.
map() could get pathologically slow when the result list it generates is larger than the source list. The performance has been improved for common scenarios.
Debugger exit code now reflects the script exit code.
Condition "0"
in breakpoints is now treated correctly.
The d
command now checks the line number.
$.
is no longer corrupted by the debugger.
All debugger output now correctly goes to the socket if RemotePort is set.
PERL5OPT can be set to more than one switch group. Previously, it used to be limited to one group of options only.
chop(@list) in list context returned the characters chopped in reverse order. This has been reversed to be in the right order.
Unicode support has seen a large number of incremental improvements, but continues to be highly experimental. It is not expected to be fully supported in the 5.6.x maintenance releases.
substr(), join(), repeat(), reverse(), quotemeta() and string concatenation were all handling Unicode strings incorrectly in Perl 5.6.0. This has been corrected.
Support for tr///CU and tr///UC etc., have been removed since
we realized the interface is broken. For similar functionality,
see pack.
The Unicode Character Database has been updated to version 3.0.1 with additions made available to the public as of August 30, 2000.
The Unicode character classes \p{Blank} and \p{SpacePerl} have been
added. "Blank" is like C isblank(), that is, it contains only
"horizontal whitespace" (the space character is, the newline isn't),
and the "SpacePerl" is the Unicode equivalent of \s (\p{Space}
isn't, since that includes the vertical tabulator character, whereas
\s doesn't.)
If you are experimenting with Unicode support in perl, the development versions of Perl may have more to offer. In particular, I/O layers are now available in the development track, but not in the maintenance track, primarily to do backward compatibility issues. Unicode support is also evolving rapidly on a daily basis in the development track--the maintenance track only reflects the most conservative of these changes.
Support for 64-bit platforms has been improved, but continues to be experimental. The level of support varies greatly among platforms.
The B Compiler and its various backends have had many incremental improvements, but they continue to remain highly experimental. Use in production environments is discouraged.
The perlcc tool has been rewritten so that the user interface is much more like that of a C compiler.
The perlbc tools has been removed. Use perlcc -B
instead.
There have been various bugfixes to support lvalue subroutines better. However, the feature still remains experimental.
IO::Socket::INET failed to open the specified port if the service name was not known. It now correctly uses the supplied port number as is.
File::Find now chdir()s correctly when chasing symbolic links.
xsubpp now tolerates embedded POD sections.
no Module;
no Module;
does not produce an error even if Module does not have an
unimport() method. This parallels the behavior of use vis-a-vis
import.
A large number of tests have been added.
untie() will now call an UNTIE() hook if it exists. See perltie for details.
The -DT
command line switch outputs copious tokenizing information.
See perlrun.
Arrays are now always interpolated in double-quotish strings. Previously,
"foo@bar.com"
used to be a fatal error at compile time, if an array
@bar
was not used or declared. This transitional behavior was
intended to help migrate perl4 code, and is deemed to be no longer useful.
See Arrays now always interpolate into double-quoted strings.
keys(), each(), pop(), push(), shift(), splice() and unshift() can all be overridden now.
my __PACKAGE__ $obj
now does the expected thing.
On some systems (IRIX and Solaris among them) the system malloc is demonstrably
better. While the defaults haven't been changed in order to retain binary
compatibility with earlier releases, you may be better off building perl
with Configure -Uusemymalloc ...
as discussed in the INSTALL file.
Configure
has been enhanced in various ways:
Minimizes use of temporary files.
By default, does not link perl with libraries not used by it, such as the various dbm libraries. SunOS 4.x hints preserve behavior on that platform.
Support for pdp11-style memory models has been removed due to obsolescence.
Building outside the source tree is supported on systems that have symbolic links. This is done by running
- sh /path/to/source/Configure -Dmksymlinks ...
- make all test install
in a directory other than the perl source directory. See INSTALL.
Configure -S
can be run non-interactively.
README.aix, README.solaris and README.macos have been added. README.posix-bc has been renamed to README.bs2000. These are installed as perlaix, perlsolaris, perlmacos, and perlbs2000 respectively.
The following pod documents are brand new:
- perlclib Internal replacements for standard C library functions
- perldebtut Perl debugging tutorial
- perlebcdic Considerations for running Perl on EBCDIC platforms
- perlnewmod Perl modules: preparing a new module for distribution
- perlrequick Perl regular expressions quick start
- perlretut Perl regular expressions tutorial
- perlutil utilities packaged with the Perl distribution
The INSTALL file has been expanded to cover various issues, such as 64-bit support.
A longer list of contributors has been added to the source distribution.
See the file AUTHORS
.
Numerous other changes have been made to the included documentation and FAQs.
The following modules have been added.
Walks Perl syntax tree, printing concise info about ops. See B::Concise.
Returns name and handle of a temporary file safely. See File::Temp.
Converts Pod data to formatted LaTeX. See Pod::LaTeX.
Converts POD data to formatted overstrike text. See Pod::Text::Overstrike.
The following modules have been upgraded.
CGI v2.752 is now included.
CPAN v1.59_54 is now included.
Various bugfixes have been added.
DB_File v1.75 supports newer Berkeley DB versions, among other improvements.
Devel::Peek has been enhanced to support dumping of memory statistics, when perl is built with the included malloc().
File::Find now supports pre and post-processing of the files in order to sort() them, etc.
Getopt::Long v2.25 is included.
Various bug fixes have been included.
IPC::Open3 allows use of numeric file descriptors.
The fmod() function supports modulus operations. Various bug fixes have also been included.
Math::Complex handles inf, NaN etc., better.
ping() could fail on odd number of data bytes, and when the echo service isn't running. This has been corrected.
A memory leak has been fixed.
Version 1.13 of the Pod::Parser suite is included.
Pod::Text and related modules have been upgraded to the versions in podlators suite v2.08.
On dosish platforms, some keys went missing because of lack of support for files with "holes". A workaround for the problem has been added.
Various bug fixes have been included.
Now supports Tie::RefHash::Nestable to automagically tie hashref values.
Various bug fixes have been included.
The following new ports are now available.
Perl now builds under Amdahl UTS.
Perl has also been verified to build under Amiga OS.
Support for EPOC has been much improved. See README.epoc.
Building perl with -Duseithreads or -Duse5005threads now works under HP-UX 10.20 (previously it only worked under 10.30 or later). You will need a thread library package installed. See README.hpux.
Long doubles should now work under Linux.
Mac OS Classic is now supported in the mainstream source package. See README.macos.
Support for MPE/iX has been updated. See README.mpeix.
Support for OS/2 has been improved. See os2/Changes
and README.os2.
Dynamic loading on z/OS (formerly OS/390) has been improved. See README.os390.
Support for VMS has seen many incremental improvements, including
better support for operators like backticks and system(), and better
%ENV handling. See README.vms
and perlvms.
Support for Stratus VOS has been improved. See vos/Changes
and README.vos.
Support for Windows has been improved.
fork() emulation has been improved in various ways, but still continues to be experimental. See perlfork for known bugs and caveats.
%SIG has been enabled under USE_ITHREADS, but its use is completely unsupported under all configurations.
Borland C++ v5.5 is now a supported compiler that can build Perl. However, the generated binaries continue to be incompatible with those generated by the other supported compilers (GCC and Visual C++).
Non-blocking waits for child processes (or pseudo-processes) are
supported via waitpid($pid, &POSIX::WNOHANG)
.
A memory leak in accept() has been fixed.
wait(), waitpid() and backticks now return the correct exit status under Windows 9x.
Trailing new %ENV entries weren't propagated to child processes. This is now fixed.
Current directory entries in %ENV are now correctly propagated to child processes.
Duping socket handles with open(F, ">&MYSOCK") now works under Windows 9x.
The makefiles now provide a single switch to bulk-enable all the features enabled in ActiveState ActivePerl (a popular binary distribution).
Win32::GetCwd() correctly returns C:\ instead of C: when at the drive root. Other bugs in chdir() and Cwd::cwd() have also been fixed.
fork() correctly returns undef and sets EAGAIN when it runs out of pseudo-process handles.
ExtUtils::MakeMaker now uses $ENV{LIB} to search for libraries.
UNC path handling is better when perl is built to support fork().
A handle leak in socket handling has been fixed.
send() works from within a pseudo-process.
Unless specifically qualified otherwise, the remainder of this document covers changes between the 5.005 and 5.6.0 releases.
Perl 5.6.0 introduces the beginnings of support for running multiple interpreters concurrently in different threads. In conjunction with the perl_clone() API call, which can be used to selectively duplicate the state of any given interpreter, it is possible to compile a piece of code once in an interpreter, clone that interpreter one or more times, and run all the resulting interpreters in distinct threads.
On the Windows platform, this feature is used to emulate fork() at the interpreter level. See perlfork for details about that.
This feature is still in evolution. It is eventually meant to be used to selectively clone a subroutine and data reachable from that subroutine in a separate interpreter and run the cloned subroutine in a separate thread. Since there is no shared data between the interpreters, little or no locking will be needed (unless parts of the symbol table are explicitly shared). This is obviously intended to be an easy-to-use replacement for the existing threads support.
Support for cloning interpreters and interpreter concurrency can be enabled using the -Dusethreads Configure option (see win32/Makefile for how to enable it on Windows.) The resulting perl executable will be functionally identical to one that was built with -Dmultiplicity, but the perl_clone() API call will only be available in the former.
-Dusethreads enables the cpp macro USE_ITHREADS by default, which in turn enables Perl source code changes that provide a clear separation between the op tree and the data it operates with. The former is immutable, and can therefore be shared between an interpreter and all of its clones, while the latter is considered local to each interpreter, and is therefore copied for each clone.
Note that building Perl with the -Dusemultiplicity Configure option is adequate if you wish to run multiple independent interpreters concurrently in different threads. -Dusethreads only provides the additional functionality of the perl_clone() API call and other support for running cloned interpreters concurrently.
- NOTE: This is an experimental feature. Implementation details are
- subject to change.
You can now control the granularity of warnings emitted by perl at a finer
level using the use warnings
pragma. warnings and perllexwarn
have copious documentation on this feature.
Perl now uses UTF-8 as its internal representation for character
strings. The utf8
and bytes
pragmas are used to control this support
in the current lexical scope. See perlunicode, utf8 and bytes for
more information.
This feature is expected to evolve quickly to support some form of I/O disciplines that can be used to specify the kind of input and output data (bytes or characters). Until that happens, additional modules from CPAN will be needed to complete the toolkit for dealing with Unicode.
- NOTE: This should be considered an experimental feature. Implementation
- details are subject to change.
The new \N
escape interpolates named characters within strings.
For example, "Hi! \N{WHITE SMILING FACE}"
evaluates to a string
with a Unicode smiley face at the end.
An "our" declaration introduces a value that can be best understood
as a lexically scoped symbolic alias to a global variable in the
package that was current where the variable was declared. This is
mostly useful as an alternative to the vars
pragma, but also provides
the opportunity to introduce typing and other attributes for such
variables. See our.
Literals of the form v1.2.3.4
are now parsed as a string composed
of characters with the specified ordinals. This is an alternative, more
readable way to construct (possibly Unicode) strings instead of
interpolating characters, as in "\x{1}\x{2}\x{3}\x{4}"
. The leading
v
may be omitted if there are more than two ordinals, so 1.2.3
is
parsed the same as v1.2.3
.
Strings written in this form are also useful to represent version "numbers".
It is easy to compare such version "numbers" (which are really just plain
strings) using any of the usual string comparison operators eq
, ne
,
lt
, gt
, etc., or perform bitwise string operations on them using |,
&
, etc.
In conjunction with the new $^V
magic variable (which contains
the perl version as a string), such literals can be used as a readable way
to check if you're running a particular version of Perl:
- # this will parse in older versions of Perl also
- if ($^V and $^V gt v5.6.0) {
- # new features supported
- }
require and use also have some special magic to support such literals.
They will be interpreted as a version rather than as a module name:
Alternatively, the v
may be omitted if there is more than one dot:
Also, sprintf and printf support the Perl-specific format flag %v
to print ordinals of characters in arbitrary strings:
See Scalar value constructors in perldata for additional information.
Beginning with Perl version 5.6.0, the version number convention has been changed to a "dotted integer" scheme that is more commonly found in open source projects.
Maintenance versions of v5.6.0 will be released as v5.6.1, v5.6.2 etc. The next development series following v5.6.0 will be numbered v5.7.x, beginning with v5.7.0, and the next major production release following v5.6.0 will be v5.8.0.
The English module now sets $PERL_VERSION to $^V (a string value) rather
than $]
(a numeric value). (This is a potential incompatibility.
Send us a report via perlbug if you are affected by this.)
The v1.2.3 syntax is also now legal in Perl. See Support for strings represented as a vector of ordinals for more on that.
To cope with the new versioning system's use of at least three significant
digits for each version component, the method used for incrementing the
subversion number has also changed slightly. We assume that versions older
than v5.6.0 have been incrementing the subversion component in multiples of
10. Versions after v5.6.0 will increment them by 1. Thus, using the new
notation, 5.005_03 is the "same" as v5.5.30, and the first maintenance
version following v5.6.0 will be v5.6.1 (which should be read as being
equivalent to a floating point value of 5.006_001 in the older format,
stored in $]
).
Formerly, if you wanted to mark a subroutine as being a method call or
as requiring an automatic lock() when it is entered, you had to declare
that with a use attrs
pragma in the body of the subroutine.
That can now be accomplished with declaration syntax, like this:
(Note how only the first :
is mandatory, and whitespace surrounding
the :
is optional.)
AutoSplit.pm and SelfLoader.pm have been updated to keep the attributes with the stubs they provide. See attributes.
Similar to how constructs such as $x->[0]
autovivify a reference,
handle constructors (open(), opendir(), pipe(), socketpair(), sysopen(),
socket(), and accept()) now autovivify a file or directory handle
if the handle passed to them is an uninitialized scalar variable. This
allows the constructs such as open(my $fh, ...)
and open(local $fh,...)
to be used to create filehandles that will conveniently be closed
automatically when the scope ends, provided there are no other references
to them. This largely eliminates the need for typeglobs when opening
filehandles that must be passed around, as in the following example:
If open() is passed three arguments instead of two, the second argument is used as the mode and the third argument is taken to be the file name. This is primarily useful for protecting against unintended magic behavior of the traditional two-argument form. See open.
Any platform that has 64-bit integers either
- (1) natively as longs or ints
- (2) via special compiler flags
- (3) using long long or int64_t
is able to use "quads" (64-bit integers) as follows:
constants (decimal, hexadecimal, octal, binary) in the code
arguments to oct() and hex()
arguments to print(), printf() and sprintf() (flag prefixes ll, L, q)
printed as such
pack() and unpack() "q" and "Q" formats
in basic arithmetics: + - * / % (NOTE: operating close to the limits of the integer values may produce surprising results)
in bit arithmetics: & | ^ ~ <<>> (NOTE: these used to be forced to be 32 bits wide but now operate on the full native width.)
vec()
Note that unless you have the case (a) you will have to configure and compile Perl using the -Duse64bitint Configure flag.
- NOTE: The Configure flags -Duselonglong and -Duse64bits have been
- deprecated. Use -Duse64bitint instead.
There are actually two modes of 64-bitness: the first one is achieved using Configure -Duse64bitint and the second one using Configure -Duse64bitall. The difference is that the first one is minimal and the second one maximal. The first works in more places than the second.
The use64bitint
does only as much as is required to get 64-bit
integers into Perl (this may mean, for example, using "long longs")
while your memory may still be limited to 2 gigabytes (because your
pointers could still be 32-bit). Note that the name 64bitint does
not imply that your C compiler will be using 64-bit ints (it might,
but it doesn't have to): the use64bitint
means that you will be
able to have 64 bits wide scalar values.
The use64bitall
goes all the way by attempting to switch also
integers (if it can), longs (and pointers) to being 64-bit. This may
create an even more binary incompatible Perl than -Duse64bitint: the
resulting executable may not run at all in a 32-bit box, or you may
have to reboot/reconfigure/rebuild your operating system to be 64-bit
aware.
Natively 64-bit systems like Alpha and Cray need neither -Duse64bitint nor -Duse64bitall.
Last but not least: note that due to Perl's habit of always using floating point numbers, the quads are still not true integers. When quads overflow their limits (0...18_446_744_073_709_551_615 unsigned, -9_223_372_036_854_775_808...9_223_372_036_854_775_807 signed), they are silently promoted to floating point numbers, after which they will start losing precision (in their lower digits).
- NOTE: 64-bit support is still experimental on most platforms.
- Existing support only covers the LP64 data model. In particular, the
- LLP64 data model is not yet supported. 64-bit libraries and system
- APIs on many platforms have not stabilized--your mileage may vary.
If you have filesystems that support "large files" (files larger than 2 gigabytes), you may now also be able to create and access them from Perl.
- NOTE: The default action is to enable large file support, if
- available on the platform.
If the large file support is on, and you have a Fcntl constant O_LARGEFILE, the O_LARGEFILE is automatically added to the flags of sysopen().
Beware that unless your filesystem also supports "sparse files" seeking to umpteen petabytes may be inadvisable.
Note that in addition to requiring a proper file system to do large files you may also need to adjust your per-process (or your per-system, or per-process-group, or per-user-group) maximum filesize limits before running Perl scripts that try to handle large files, especially if you intend to write such files.
Finally, in addition to your process/process group maximum filesize limits, you may have quota limits on your filesystems that stop you (your user id or your user group id) from using large files.
Adjusting your process/user/group/file system/operating system limits is outside the scope of Perl core language. For process limits, you may try increasing the limits using your shell's limits/limit/ulimit command before running Perl. The BSD::Resource extension (not included with the standard Perl distribution) may also be of use, it offers the getrlimit/setrlimit interface that can be used to adjust process resource usage limits, including the maximum filesize limit.
In some systems you may be able to use long doubles to enhance the range and precision of your double precision floating point numbers (that is, Perl's numbers). Use Configure -Duselongdouble to enable this support (if it is available).
You can "Configure -Dusemorebits" to turn on both the 64-bit support and the long double support.
Perl subroutines with a prototype of ($$)
, and XSUBs in general, can
now be used as sort subroutines. In either case, the two elements to
be compared are passed as normal parameters in @_. See sort.
For unprototyped sort subroutines, the historical behavior of passing the elements to be compared as the global variables $a and $b remains unchanged.
sort $coderef @foo
allowedsort() did not accept a subroutine reference as the comparison function in earlier versions. This is now permitted.
Perl now uses the File::Glob implementation of the glob() operator automatically. This avoids using an external csh process and the problems associated with it.
- NOTE: This is currently an experimental feature. Interfaces and
- implementation are subject to change.
In addition to BEGIN
, INIT
, END
, DESTROY
and AUTOLOAD
,
subroutines named CHECK
are now special. These are queued up during
compilation and behave similar to END blocks, except they are called at
the end of compilation rather than at the end of execution. They cannot
be called directly.
For example to match alphabetic characters use /[[:alpha:]]/. See perlre for details.
In 5.005_0x and earlier, perl's rand() function used the C library rand(3) function. As of 5.005_52, Configure tests for drand48(), random(), and rand() (in that order) and picks the first one it finds.
These changes should result in better random numbers from rand().
qw// operatorThe qw// operator is now evaluated at compile time into a true list
instead of being replaced with a run time call to split(). This
removes the confusing misbehaviour of qw// in scalar context, which
had inherited that behaviour from split().
Thus:
- $foo = ($bar) = qw(a b c); print "$foo|$bar\n";
now correctly prints "3|a", instead of "2|a".
Small changes in the hashing algorithm have been implemented in order to improve the distribution of lower order bits in the hashed value. This is expected to yield better performance on keys that are repeated sequences.
The new format type 'Z' is useful for packing and unpacking null-terminated strings. See pack.
The new format type modifier '!' is useful for packing and unpacking native shorts, ints, and longs. See pack.
The template character '/' can be used to specify a counted string type to be packed or unpacked. See pack.
The '#' character in a template introduces a comment up to end of the line. This facilitates documentation of pack() templates.
In previous versions of Perl, you couldn't cache objects so as to allow them to be deleted if the last reference from outside the cache is deleted. The reference in the cache would hold a reference count on the object and the objects would never be destroyed.
Another familiar problem is with circular references. When an object references itself, its reference count would never go down to zero, and it would not get destroyed until the program is about to exit.
Weak references solve this by allowing you to "weaken" any reference, that is, make it not count towards the reference count. When the last non-weak reference to an object is deleted, the object is destroyed and all the weak references to the object are automatically undef-ed.
To use this feature, you need the Devel::WeakRef package from CPAN, which contains additional documentation.
- NOTE: This is an experimental feature. Details are subject to change.
Binary numbers are now supported as literals, in s?printf formats, and
oct():
Subroutines can now return modifiable lvalues. See Lvalue subroutines in perlsub.
- NOTE: This is an experimental feature. Details are subject to change.
Perl now allows the arrow to be omitted in many constructs
involving subroutine calls through references. For example,
$foo[10]->('foo')
may now be written $foo[10]('foo')
.
This is rather similar to how the arrow may be omitted from
$foo[10]->{'foo'}
. Note however, that the arrow is still
required for foo(10)->('bar')
.
Constructs such as ($a ||= 2) += 1
are now allowed.
The exists() builtin now works on subroutine names. A subroutine is considered to exist if it has been declared (even if implicitly). See exists for examples.
The exists() and delete() builtins now work on simple arrays as well. The behavior is similar to that on hash elements.
exists() can be used to check whether an array element has been initialized. This avoids autovivifying array elements that don't exist. If the array is tied, the EXISTS() method in the corresponding tied package will be invoked.
delete() may be used to remove an element from the array and return it. The array element at that position returns to its uninitialized state, so that testing for the same element with exists() will return false. If the element happens to be the one at the end, the size of the array also shrinks up to the highest element that tests true for exists(), or 0 if none such is found. If the array is tied, the DELETE() method in the corresponding tied package will be invoked.
See exists and delete for examples.
Dereferencing some types of reference values in a pseudo-hash,
such as $ph->{foo}[1]
, was accidentally disallowed. This has
been corrected.
When applied to a pseudo-hash element, exists() now reports whether the specified value exists, not merely if the key is valid.
delete() now works on pseudo-hashes. When given a pseudo-hash element or slice it deletes the values corresponding to the keys (but not the keys themselves). See Pseudo-hashes: Using an array as a hash in perlref.
Pseudo-hash slices with constant keys are now optimized to array lookups at compile-time.
List assignments to pseudo-hash slices are now supported.
The fields
pragma now provides ways to create pseudo-hashes, via
fields::new() and fields::phash(). See fields.
- NOTE: The pseudo-hash data type continues to be experimental.
- Limiting oneself to the interface elements provided by the
- fields pragma will provide protection from any future changes.
fork(), exec(), system(), qx//, and pipe open()s now flush buffers of all files opened for output when the operation was attempted. This mostly eliminates confusing buffering mishaps suffered by users unaware of how Perl internally handles I/O.
This is not supported on some platforms like Solaris where a suitably correct implementation of fflush(NULL) isn't available.
Constructs such as open( and close(
are compile time errors. Attempting to read from filehandles that
were opened only for writing will now produce warnings (just as
writing to read-only filehandles does).
open(NEW, "<&OLD")
now attempts to discard any data that
was previously read and buffered in OLD
before duping the handle.
On platforms where doing this is allowed, the next read operation
on NEW
will return the same data as the corresponding operation
on OLD
. Formerly, it would have returned the data from the start
of the following disk block instead.
eof() would return true if no attempt to read from <>
had
yet been made. eof() has been changed to have a little magic of its
own, it now opens the <>
files.
binmode() now accepts a second argument that specifies a discipline for the handle in question. The two pseudo-disciplines ":raw" and ":crlf" are currently supported on DOS-derivative platforms. See binmode and open.
-T
filetest recognizes UTF-8 encoded files as "text"The algorithm used for the -T
filetest has been enhanced to
correctly identify UTF-8 content as "text".
On Unix and similar platforms, system(), qx() and open(FOO, "cmd |") etc., are implemented via fork() and exec(). When the underlying exec() fails, earlier versions did not report the error properly, since the exec() happened to be in a different process.
The child process now communicates with the parent about the error in launching the external command, which allows these constructs to return with their usual error value and set $!.
Line numbers are no longer suppressed (under most likely circumstances) during the global destruction phase.
Diagnostics emitted from code running in threads other than the main thread are now accompanied by the thread ID.
Embedded null characters in diagnostics now actually show up. They used to truncate the message in prior versions.
$foo::a and $foo::b are now exempt from "possible typo" warnings only
if sort() is encountered in package foo
.
Unrecognized alphabetic escapes encountered when parsing quote constructs now generate a warning, since they may take on new semantics in later versions of Perl.
Many diagnostics now report the internal operation in which the warning was provoked, like so:
- Use of uninitialized value in concatenation (.) at (eval 1) line 1.
- Use of uninitialized value in print at (eval 1) line 1.
Diagnostics that occur within eval may also report the file and line number where the eval is located, in addition to the eval sequence number and the line number within the evaluated text itself. For example:
- Not enough arguments for scalar at (eval 4)[newlib/perl5db.pl:1411] line 2, at EOF
Diagnostic output now goes to whichever file the STDERR
handle
is pointing at, instead of always going to the underlying C runtime
library's stderr
.
On systems that support a close-on-exec flag on filehandles, the flag is now set for any handles created by pipe(), socketpair(), socket(), and accept(), if that is warranted by the value of $^F that may be in effect. Earlier versions neglected to set the flag for handles created with these operators. See pipe, socketpair, socket, accept, and $^F in perlvar.
The length argument of syswrite() has become optional.
Expressions such as:
used to be accidentally allowed in earlier versions, and produced unpredictable behaviour. Some produced ancillary warnings when used in this way; others silently did the wrong thing.
The parenthesized forms of most unary operators that expect a single argument now ensure that they are not called with more than one argument, making the cases shown above syntax errors. The usual behaviour of:
remains unchanged. See perlop.
The bit operators (& | ^ ~ <<>>) now operate on the full native
integral width (the exact size of which is available in $Config{ivsize}).
For example, if your platform is either natively 64-bit or if Perl
has been configured to use 64-bit integers, these operations apply
to 8 bytes (as opposed to 4 bytes on 32-bit platforms).
For portability, be sure to mask off the excess bits in the result of
unary ~
, e.g., ~$x & 0xffffffff
.
More potentially unsafe operations taint their results for improved security.
The passwd
and shell
fields returned by the getpwent(), getpwnam(),
and getpwuid() are now tainted, because the user can affect their own
encrypted password and login shell.
The variable modified by shmread(), and messages returned by msgrcv() (and its object-oriented interface IPC::SysV::Msg::rcv) are also tainted, because other untrusted processes can modify messages and shared memory segments for their own nefarious purposes.
Bareword prototypes have been rationalized to enable them to be used
to override builtins that accept barewords and interpret them in
a special way, such as require or do.
Arguments prototyped as *
will now be visible within the subroutine
as either a simple scalar or as a reference to a typeglob.
See Prototypes in perlsub.
require and do may be overriddenrequire and do 'file'
operations may be overridden locally
by importing subroutines of the same name into the current package
(or globally by importing them into the CORE::GLOBAL:: namespace).
Overriding require will also affect use, provided the override
is visible at compile-time.
See Overriding Built-in Functions in perlsub.
Formerly, $^X was synonymous with ${"\cX"}, but $^XY was a syntax
error. Now variable names that begin with a control character may be
arbitrarily long. However, for compatibility reasons, these variables
must be written with explicit braces, as ${^XY}
for example.
${^XYZ}
is synonymous with ${"\cXYZ"}. Variable names with more
than one control character, such as ${^XY^Z}
, are illegal.
The old syntax has not changed. As before, `^X' may be either a
literal control-X character or the two-character sequence `caret' plus
`X'. When braces are omitted, the variable name stops after the
control character. Thus "$^XYZ"
continues to be synonymous with
$^X . "YZ"
as before.
As before, lexical variables may not have names beginning with control
characters. As before, variables whose names begin with a control
character are always forced to be in package `main'. All such variables
are reserved for future extensions, except those that begin with
^_, which may be used by user programs and are guaranteed not to
acquire special meaning in any future version of Perl.
-c
switch$^C
has a boolean value that reflects whether perl is being run
in compile-only mode (i.e. via the -c
switch). Since
BEGIN blocks are executed under such conditions, this variable
enables perl code to determine whether actions that make sense
only during normal running are warranted. See perlvar.
$^V
contains the Perl version number as a string composed of
characters whose ordinals match the version numbers, i.e. v5.6.0.
This may be used in string comparisons.
See Support for strings represented as a vector of ordinals
for an
example.
If Perl is built with the cpp macro PERL_Y2KWARN
defined,
it emits optional warnings when concatenating the number 19
with another number.
This behavior must be specifically enabled when running Configure. See INSTALL and README.Y2K.
In double-quoted strings, arrays now interpolate, no matter what. The behavior in earlier versions of perl 5 was that arrays would interpolate into strings if the array had been mentioned before the string was compiled, and otherwise Perl would raise a fatal compile-time error. In versions 5.000 through 5.003, the error was
- Literal @example now requires backslash
In versions 5.004_01 through 5.6.0, the error was
- In string, @example now must be written as \@example
The idea here was to get people into the habit of writing
"fred\@example.com"
when they wanted a literal @
sign, just as
they have always written "Give me back my \$5"
when they wanted a
literal $
sign.
Starting with 5.6.1, when Perl now sees an @
sign in a
double-quoted string, it always attempts to interpolate an array,
regardless of whether or not the array has been used or declared
already. The fatal error has been downgraded to an optional warning:
- Possible unintended interpolation of @example in string
This warns you that "fred@example.com"
is going to turn into
fred.com
if you don't backslash the @
.
See http://perl.plover.com/at-error.html for more details
about the history here.
The new magic variables @- and @+ provide the starting and ending offsets, respectively, of $&, $1, $2, etc. See perlvar for details.
While used internally by Perl as a pragma, this module also provides a way to fetch subroutine and variable attributes. See attributes.
The Perl Compiler suite has been extensively reworked for this release. More of the standard Perl test suite passes when run under the Compiler, but there is still a significant way to go to achieve production quality compiled executables.
- NOTE: The Compiler suite remains highly experimental. The
- generated code may not be correct, even when it manages to execute
- without errors.
Overall, Benchmark results exhibit lower average error and better timing accuracy.
You can now run tests for n seconds instead of guessing the right number of tests to run: e.g., timethese(-5, ...) will run each code for at least 5 CPU seconds. Zero as the "number of repetitions" means "for at least 3 CPU seconds". The output format has also changed. For example:
will now output something like this:
- Benchmark: running a, b, each for at least 5 CPU seconds...
- a: 5 wallclock secs ( 5.77 usr + 0.00 sys = 5.77 CPU) @ 200551.91/s (n=1156516)
- b: 4 wallclock secs ( 5.00 usr + 0.02 sys = 5.02 CPU) @ 159605.18/s (n=800686)
New features: "each for at least N CPU seconds...", "wallclock secs", and the "@ operations/CPU second (n=operations)".
timethese() now returns a reference to a hash of Benchmark objects containing the test results, keyed on the names of the tests.
timethis() now returns the iterations field in the Benchmark result object instead of 0.
timethese(), timethis(), and the new cmpthese() (see below) can also take a format specifier of 'none' to suppress output.
A new function countit() is just like timeit() except that it takes a TIME instead of a COUNT.
A new function cmpthese() prints a chart comparing the results of each test returned from a timethese() call. For each possible pair of tests, the percentage speed difference (iters/sec or seconds/iter) is shown.
For other details, see Benchmark.
The ByteLoader is a dedicated extension to generate and run Perl bytecode. See ByteLoader.
References can now be used.
The new version also allows a leading underscore in constant names, but disallows a double leading underscore (as in "__LINE__"). Some other names are disallowed or warned against, including BEGIN, END, etc. Some names which were forced into main:: used to fail silently in some cases; now they're fatal (outside of main::) and an optional warning (inside of main::). The ability to detect whether a constant had been set with a given name has been added.
See constant.
This pragma implements the \N
string escape. See charnames.
A Maxdepth
setting can be specified to avoid venturing
too deeply into deep data structures. See Data::Dumper.
The XSUB implementation of Dump() is now automatically called if the
Useqq
setting is not in use.
Dumping qr// objects works correctly.
DB
is an experimental module that exposes a clean abstraction
to Perl's debugging API.
DB_File can now be built with Berkeley DB versions 1, 2 or 3.
See ext/DB_File/Changes
.
Devel::DProf, a Perl source code profiler has been added. See Devel::DProf and dprofpp.
The Devel::Peek module provides access to the internal representation of Perl variables and data. It is a data debugging tool for the XS programmer.
The Dumpvalue module provides screen dumps of Perl data.
DynaLoader now supports a dl_unload_file() function on platforms that support unloading shared objects using dlclose().
Perl can also optionally arrange to unload all extension shared objects
loaded by Perl. To enable this, build Perl with the Configure option
-Accflags=-DDL_UNLOAD_ALL_AT_EXIT
. (This maybe useful if you are
using Apache with mod_perl.)
$PERL_VERSION now stands for $^V
(a string value) rather than for $]
(a numeric value).
Env now supports accessing environment variables like PATH as array variables.
More Fcntl constants added: F_SETLK64, F_SETLKW64, O_LARGEFILE for
large file (more than 4GB) access (NOTE: the O_LARGEFILE is
automatically added to sysopen() flags if large file support has been
configured, as is the default), Free/Net/OpenBSD locking behaviour
flags F_FLOCK, F_POSIX, Linux F_SHLCK, and O_ACCMODE: the combined
mask of O_RDONLY, O_WRONLY, and O_RDWR. The seek()/sysseek()
constants SEEK_SET, SEEK_CUR, and SEEK_END are available via the
:seek
tag. The chmod()/stat() S_IF* constants and S_IS* functions
are available via the :mode
tag.
A compare_text() function has been added, which allows custom comparison functions. See File::Compare.
File::Find now works correctly when the wanted() function is either autoloaded or is a symbolic reference.
A bug that caused File::Find to lose track of the working directory when pruning top-level directories has been fixed.
File::Find now also supports several other options to control its
behavior. It can follow symbolic links if the follow
option is
specified. Enabling the no_chdir
option will make File::Find skip
changing the current directory when walking directories. The untaint
flag can be useful when running with taint checks enabled.
See File::Find.
This extension implements BSD-style file globbing. By default, it will also be used for the internal implementation of the glob() operator. See File::Glob.
New methods have been added to the File::Spec module: devnull() returns the name of the null device (/dev/null on Unix) and tmpdir() the name of the temp directory (normally /tmp on Unix). There are now also methods to convert between absolute and relative filenames: abs2rel() and rel2abs(). For compatibility with operating systems that specify volume names in file paths, the splitpath(), splitdir(), and catdir() methods have been added.
The new File::Spec::Functions modules provides a function interface to the File::Spec module. Allows shorthand
- $fullname = catfile($dir1, $dir2, $file);
instead of
- $fullname = File::Spec->catfile($dir1, $dir2, $file);
Getopt::Long licensing has changed to allow the Perl Artistic License as well as the GPL. It used to be GPL only, which got in the way of non-GPL applications that wanted to use Getopt::Long.
Getopt::Long encourages the use of Pod::Usage to produce help messages. For example:
- use Getopt::Long;
- use Pod::Usage;
- my $man = 0;
- my $help = 0;
- GetOptions('help|?' => \$help, man => \$man) or pod2usage(2);
- pod2usage(1) if $help;
- pod2usage(-exitstatus => 0, -verbose => 2) if $man;
- __END__
- =head1 NAME
- sample - Using Getopt::Long and Pod::Usage
- =head1 SYNOPSIS
- sample [options] [file ...]
- Options:
- -help brief help message
- -man full documentation
- =head1 OPTIONS
- =over 8
- =item B<-help>
- Print a brief help message and exits.
- =item B<-man>
- Prints the manual page and exits.
- =back
- =head1 DESCRIPTION
- B<This program> will read the given input file(s) and do something
- useful with the contents thereof.
- =cut
See Pod::Usage for details.
A bug that prevented the non-option call-back <> from being specified as the first argument has been fixed.
To specify the characters < and > as option starters, use ><. Note, however, that changing option starters is strongly deprecated.
write() and syswrite() will now accept a single-argument form of the call, for consistency with Perl's syswrite().
You can now create a TCP-based IO::Socket::INET without forcing a connect attempt. This allows you to configure its options (like making it non-blocking) and then call connect() manually.
A bug that prevented the IO::Socket::protocol() accessor from ever returning the correct value has been corrected.
IO::Socket::connect now uses non-blocking IO instead of alarm() to do connect timeouts.
IO::Socket::accept now uses select() instead of alarm() for doing timeouts.
IO::Socket::INET->new now sets $! correctly on failure. $@ is still set for backwards compatibility.
Java Perl Lingo is now distributed with Perl. See jpl/README for more information.
use lib
now weeds out any trailing duplicate entries.
no lib
removes all named entries.
The bitwise operations <<
, >>
, &
, |,
and ~
are now supported on bigints.
The accessor methods Re, Im, arg, abs, rho, and theta can now also act as mutators (accessor $z->Re(), mutator $z->Re(3)).
The class method display_format
and the corresponding object method
display_format
, in addition to accepting just one argument, now can
also accept a parameter hash. Recognized keys of a parameter hash are
"style"
, which corresponds to the old one parameter case, and two
new parameters: "format"
, which is a printf()-style format string
(defaults usually to "%.15g"
, you can revert to the default by
setting the format string to undef) used for both parts of a
complex number, and "polar_pretty_print"
(defaults to true),
which controls whether an attempt is made to try to recognize small
multiples and rationals of pi (2pi, pi/2) at the argument (angle) of a
polar complex number.
The potentially disruptive change is that in list context both methods
now return the parameter hash, instead of only the value of the
"style"
parameter.
A little bit of radial trigonometry (cylindrical and spherical), radial coordinate conversions, and the great circle distance were added.
Pod::Parser is a base class for parsing and selecting sections of pod documentation from an input stream. This module takes care of identifying pod paragraphs and commands in the input and hands off the parsed paragraphs and commands to user-defined methods which are free to interpret or translate them as they see fit.
Pod::InputObjects defines some input objects needed by Pod::Parser, and for advanced users of Pod::Parser that need more about a command besides its name and text.
As of release 5.6.0 of Perl, Pod::Parser is now the officially sanctioned "base parser code" recommended for use by all pod2xxx translators. Pod::Text (pod2text) and Pod::Man (pod2man) have already been converted to use Pod::Parser and efforts to convert Pod::HTML (pod2html) are already underway. For any questions or comments about pod parsing and translating issues and utilities, please use the pod-people@perl.org mailing list.
For further information, please see Pod::Parser and Pod::InputObjects.
This utility checks pod files for correct syntax, according to perlpod. Obvious errors are flagged as such, while warnings are printed for mistakes that can be handled gracefully. The checklist is not complete yet. See Pod::Checker.
These modules provide a set of gizmos that are useful mainly for pod
translators. Pod::Find traverses directory structures and
returns found pod files, along with their canonical names (like
File::Spec::Unix
). Pod::ParseUtils contains
Pod::List (useful for storing pod list information), Pod::Hyperlink
(for parsing the contents of L<>
sequences) and Pod::Cache
(for caching information about pod files, e.g., link nodes).
Pod::Select is a subclass of Pod::Parser which provides a function named "podselect()" to filter out user-specified sections of raw pod documentation from an input stream. podselect is a script that provides access to Pod::Select from other scripts to be used as a filter. See Pod::Select.
Pod::Usage provides the function "pod2usage()" to print usage messages for a Perl script based on its embedded pod documentation. The pod2usage() function is generally useful to all script authors since it lets them write and maintain a single source (the pods) for documentation, thus removing the need to create and maintain redundant usage message text consisting of information already in the pods.
There is also a pod2usage script which can be used from other kinds of scripts to print usage messages from pods (even for non-Perl scripts with pods embedded in comments).
For details and examples, please see Pod::Usage.
Pod::Text has been rewritten to use Pod::Parser. While pod2text() is still available for backwards compatibility, the module now has a new preferred interface. See Pod::Text for the details. The new Pod::Text module is easily subclassed for tweaks to the output, and two such subclasses (Pod::Text::Termcap for man-page-style bold and underlining using termcap information, and Pod::Text::Color for markup with ANSI color sequences) are now standard.
pod2man has been turned into a module, Pod::Man, which also uses Pod::Parser. In the process, several outstanding bugs related to quotes in section headers, quoting of code escapes, and nested lists have been fixed. pod2man is now a wrapper script around this module.
An EXISTS method has been added to this module (and sdbm_exists() has been added to the underlying sdbm library), so one can now call exists on an SDBM_File tied hash and get the correct result, rather than a runtime error.
A bug that may have caused data loss when more than one disk block happens to be read from the database in a single FETCH() has been fixed.
Sys::Syslog now uses XSUBs to access facilities from syslog.h so it no longer requires syslog.ph to exist.
Sys::Hostname now uses XSUBs to call the C library's gethostname() or uname() if they exist.
Term::ANSIColor is a very simple module to provide easy and readable access to the ANSI color and highlighting escape sequences, supported by most ANSI terminal emulators. It is now included standard.
The timelocal() and timegm() functions used to silently return bogus results when the date fell outside the machine's integer range. They now consistently croak() if the date falls in an unsupported range.
The error return value in list context has been changed for all functions
that return a list of values. Previously these functions returned a list
with a single element undef if an error occurred. Now these functions
return the empty list in these situations. This applies to the following
functions:
- Win32::FsType
- Win32::GetOSVersion
The remaining functions are unchanged and continue to return undef on
error even in list context.
The Win32::SetLastError(ERROR) function has been added as a complement to the Win32::GetLastError() function.
The new Win32::GetFullPathName(FILENAME) returns the full absolute pathname for FILENAME in scalar context. In list context it returns a two-element list containing the fully qualified directory name and the filename. See Win32.
The XSLoader extension is a simpler alternative to DynaLoader. See XSLoader.
A new feature called "DBM Filters" has been added to all the DBM modules--DB_File, GDBM_File, NDBM_File, ODBM_File, and SDBM_File. DBM Filters add four new methods to each DBM module:
- filter_store_key
- filter_store_value
- filter_fetch_key
- filter_fetch_value
These can be used to filter key-value pairs before the pairs are written to the database or just after they are read from the database. See perldbmfilter for further information.
use attrs
is now obsolete, and is only provided for
backward-compatibility. It's been replaced by the sub : attributes
syntax. See Subroutine Attributes in perlsub and attributes.
Lexical warnings pragma, use warnings;
, to control optional warnings.
See perllexwarn.
use filetest
to control the behaviour of filetests (-r
-w
...). Currently only one subpragma implemented, "use filetest
'access';", that uses access(2) or equivalent to check permissions
instead of using stat(2) as usual. This matters in filesystems
where there are ACLs (access control lists): the stat(2) might lie,
but access(2) knows better.
The open pragma can be used to specify default disciplines for
handle constructors (e.g. open()) and for qx//. The two
pseudo-disciplines :raw
and :crlf
are currently supported on
DOS-derivative platforms (i.e. where binmode is not a no-op).
See also binmode() can be used to set :crlf and :raw modes.
dprofpp
is used to display profile data generated using Devel::DProf
.
See dprofpp.
The find2perl
utility now uses the enhanced features of the File::Find
module. The -depth and -follow options are supported. Pod documentation
is also included in the script.
The h2xs
tool can now work in conjunction with C::Scan
(available
from CPAN) to automatically parse real-life header files. The -M
,
-a
, -k
, and -o
options are new.
perlcc
now supports the C and Bytecode backends. By default,
it generates output from the simple C backend rather than the
optimized C backend.
Support for non-Unix platforms has been improved.
perldoc
has been reworked to avoid possible security holes.
It will not by default let itself be run as the superuser, but you
may still use the -U switch to try to make it drop privileges
first.
Many bug fixes and enhancements were added to perl5db.pl, the
Perl debugger. The help documentation was rearranged. New commands
include < ?, > ?, and { ? to list out current
actions, man docpage to run your doc viewer on some perl
docset, and support for quoted options. The help information was
rearranged, and should be viewable once again if you're using less
as your pager. A serious security hole was plugged--you should
immediately remove all older versions of the Perl debugger as
installed in previous releases, all the way back to perl3, from
your system to avoid being bitten by this.
Many of the platform-specific README files are now part of the perl installation. See perl for the complete list.
The official list of public Perl API functions.
A tutorial for beginners on object-oriented Perl.
An introduction to using the Perl Compiler suite.
A howto document on using the DBM filter facility.
All material unrelated to running the Perl debugger, plus all low-level guts-like details that risked crushing the casual user of the debugger, have been relocated from the old manpage to the next entry below.
This new manpage contains excessively low-level material not related to the Perl debugger, but slightly related to debugging Perl itself. It also contains some arcane internal details of how the debugging process works that may only be of interest to developers of Perl debuggers.
Notes on the fork() emulation currently available for the Windows platform.
An introduction to writing Perl source filters.
Some guidelines for hacking the Perl source code.
A list of internal functions in the Perl source code. (List is currently empty.)
Introduction and reference information about lexically scoped warning categories.
Detailed information about numbers as they are represented in Perl.
A tutorial on using open() effectively.
A tutorial that introduces the essentials of references.
A tutorial on managing class data for object modules.
Discussion of the most often wanted features that may someday be supported in Perl.
An introduction to Unicode support features in Perl.
Many common sort() operations using a simple inlined block are now optimized for faster performance.
Certain operations in the RHS of assignment statements have been optimized to directly set the lexical variable on the LHS, eliminating redundant copying overheads.
Minor changes in how subroutine calls are handled internally provide marginal improvements in performance.
The hash values returned by delete(), each(), values() and hashes in a list context are the actual values in the hash, instead of copies. This results in significantly better performance, because it eliminates needless copying in most situations.
The -Dusethreads flag now enables the experimental interpreter-based thread support by default. To get the flavor of experimental threads that was in 5.005 instead, you need to run Configure with "-Dusethreads -Duse5005threads".
As of v5.6.0, interpreter-threads support is still lacking a way to
create new threads from Perl (i.e., use Thread;
will not work with
interpreter threads). use Thread;
continues to be available when you
specify the -Duse5005threads option to Configure, bugs and all.
- NOTE: Support for threads continues to be an experimental feature.
- Interfaces and implementation are subject to sudden and drastic changes.
The following new flags may be enabled on the Configure command line
by running Configure with -Dflag
.
- usemultiplicity
- usethreads useithreads (new interpreter threads: no Perl API yet)
- usethreads use5005threads (threads as they were in 5.005)
- use64bitint (equal to now deprecated 'use64bits')
- use64bitall
- uselongdouble
- usemorebits
- uselargefiles
- usesocks (only SOCKS v5 supported)
The Configure options enabling the use of threads and the use of 64-bitness are now more daring in the sense that they no more have an explicit list of operating systems of known threads/64-bit capabilities. In other words: if your operating system has the necessary APIs and datatypes, you should be able just to go ahead and use them, for threads by Configure -Dusethreads, and for 64 bits either explicitly by Configure -Duse64bitint or implicitly if your system has 64-bit wide datatypes. See also 64-bit support.
Some platforms have "long doubles", floating point numbers of even larger range than ordinary "doubles". To enable using long doubles for Perl's scalars, use -Duselongdouble.
You can enable both -Duse64bitint and -Duselongdouble with -Dusemorebits. See also 64-bit support.
Some platforms support system APIs that are capable of handling large files (typically, files larger than two gigabytes). Perl will try to use these APIs if you ask for -Duselargefiles.
See Large file support for more information.
You can use "Configure -Uinstallusrbinperl" which causes installperl to skip installing perl also as /usr/bin/perl. This is useful if you prefer not to modify /usr/bin for some reason or another but harmful because many scripts assume to find Perl in /usr/bin/perl.
You can use "Configure -Dusesocks" which causes Perl to probe for the SOCKS proxy protocol library (v5, not v4). For more information on SOCKS, see:
- http://www.socks.nec.com/
-A
flagYou can "post-edit" the Configure variables using the Configure -A
switch. The editing happens immediately after the platform specific
hints files have been processed but before the actual configuration
process starts. Run Configure -h
to find out the full -A
syntax.
The installation structure has been enriched to improve the support for maintaining multiple versions of perl, to provide locations for vendor-supplied modules, scripts, and manpages, and to ease maintenance of locally-added modules, scripts, and manpages. See the section on Installation Directories in the INSTALL file for complete details. For most users building and installing from source, the defaults should be fine.
If you previously used Configure -Dsitelib
or -Dsitearch
to set
special values for library directories, you might wish to consider using
the new -Dsiteprefix
setting instead. Also, if you wish to re-use a
config.sh file from an earlier version of perl, you should be sure to
check that Configure makes sensible choices for the new directories.
See INSTALL for complete details.
In many platforms the vendor-supplied 'cc' is too stripped-down to build Perl (basically, the 'cc' doesn't do ANSI C). If this seems to be the case and the 'cc' does not seem to be the GNU C compiler 'gcc', an automatic attempt is made to find and use 'gcc' instead.
The Mach CThreads (NEXTSTEP, OPENSTEP) are now supported by the Thread extension.
GNU/Hurd is now supported.
Rhapsody/Darwin is now supported.
EPOC is now supported (on Psion 5).
The cygwin port (formerly cygwin32) has been greatly improved.
Perl now works with djgpp 2.02 (and 2.03 alpha).
Environment variable names are not converted to uppercase any more.
Incorrect exit codes from backticks have been fixed.
This port continues to use its own builtin globbing (not File::Glob).
Support for this EBCDIC platform has not been renewed in this release. There are difficulties in reconciling Perl's standardization on UTF-8 as its internal representation for characters with the EBCDIC character set, because the two are incompatible.
It is unclear whether future versions will renew support for this platform, but the possibility exists.
Numerous revisions and extensions to configuration, build, testing, and installation process to accommodate core changes and VMS-specific options.
Expand %ENV-handling code to allow runtime mapping to logical names, CLI symbols, and CRTL environ array.
Extension of subprocess invocation code to accept filespecs as command "verbs".
Add to Perl command line processing the ability to use default file types and
to recognize Unix-style 2>&1
.
Expansion of File::Spec::VMS routines, and integration into ExtUtils::MM_VMS.
Extension of ExtUtils::MM_VMS to handle complex extensions more flexibly.
Barewords at start of Unix-syntax paths may be treated as text rather than only as logical names.
Optional secure translation of several logical names used internally by Perl.
Miscellaneous bugfixing and porting of new core code to VMS.
Thanks are gladly extended to the many people who have contributed VMS patches, testing, and ideas.
Perl can now emulate fork() internally, using multiple interpreters running in different concurrent threads. This support must be enabled at build time. See perlfork for detailed information.
When given a pathname that consists only of a drivename, such as A:
,
opendir() and stat() now use the current working directory for the drive
rather than the drive root.
The builtin XSUB functions in the Win32:: namespace are documented. See Win32.
$^X now contains the full path name of the running executable.
A Win32::GetLongPathName() function is provided to complement Win32::GetFullPathName() and Win32::GetShortPathName(). See Win32.
POSIX::uname() is supported.
system(1,...) now returns true process IDs rather than process handles. kill() accepts any real process id, rather than strictly return values from system(1,...).
For better compatibility with Unix, kill(0, $pid)
can now be used to
test whether a process exists.
The Shell
module is supported.
Better support for building Perl under command.com in Windows 95 has been added.
Scripts are read in binary mode by default to allow ByteLoader (and the filter mechanism in general) to work properly. For compatibility, the DATA filehandle will be set to text mode if a carriage return is detected at the end of the line containing the __END__ or __DATA__ token; if not, the DATA filehandle will be left open in binary mode. Earlier versions always opened the DATA filehandle in text mode.
The glob() operator is implemented via the File::Glob
extension,
which supports glob syntax of the C shell. This increases the flexibility
of the glob() operator, but there may be compatibility issues for
programs that relied on the older globbing syntax. If you want to
preserve compatibility with the older syntax, you might want to run
perl with -MFile::DosGlob
. For details and compatibility information,
see File::Glob.
With $/
set to undef, "slurping" an empty file returns a string of
zero length (instead of undef, as it used to) the first time the
HANDLE is read after $/
is set to undef. Further reads yield
undef.
This means that the following will append "foo" to an empty file (it used to do nothing):
- perl -0777 -pi -e 's/^/foo/' empty_file
The behaviour of:
- perl -pi -e 's/^/foo/' empty_file
is unchanged (it continues to leave the file empty).
eval '...'
improvementsLine numbers (as reflected by caller() and most diagnostics) within
eval '...'
were often incorrect where here documents were involved.
This has been corrected.
Lexical lookups for variables appearing in eval '...'
within
functions that were themselves called within an eval '...'
were
searching the wrong place for lexicals. The lexical search now
correctly ends at the subroutine's block boundary.
The use of return within eval {...}
caused $@ not to be reset
correctly when no exception occurred within the eval. This has
been fixed.
Parsing of here documents used to be flawed when they appeared as
the replacement expression in eval 's/.../.../e'
. This has
been fixed.
Some "errors" encountered at compile time were by necessity generated as warnings followed by eventual termination of the program. This enabled more such errors to be reported in a single run, rather than causing a hard stop at the first error that was encountered.
The mechanism for reporting such errors has been reimplemented
to queue compile-time errors and report them at the end of the
compilation as true errors rather than as warnings. This fixes
cases where error messages leaked through in the form of warnings
when code was compiled at run time using eval STRING
, and
also allows such errors to be reliably trapped using eval "..."
.
Sometimes implicitly closed filehandles (as when they are localized, and Perl automatically closes them on exiting the scope) could inadvertently set $? or $!. This has been corrected.
When taking a slice of a literal list (as opposed to a slice of an array or hash), Perl used to return an empty list if the result happened to be composed of all undef values.
The new behavior is to produce an empty list if (and only if) the original list was empty. Consider the following example:
The old behavior would have resulted in @a having no elements. The new behavior ensures it has three undefined elements.
Note in particular that the behavior of slices of the following cases remains unchanged:
- @a = ()[1,2];
- @a = (getpwent)[7,0];
- @a = (anything_returning_empty_list())[2,1,2];
- @a = @b[2,1,2];
- @a = @c{'a','b','c'};
See perldata.
(\$) prototype and $foo{a}
A scalar reference prototype now correctly allows a hash or array element in that slot.
goto &sub
and AUTOLOADThe goto &sub
construct works correctly when &sub
happens
to be autoloaded.
-bareword
allowed under use integer
The autoquoting of barewords preceded by -
did not work
in prior versions when the integer
pragma was enabled.
This has been fixed.
When code in a destructor threw an exception, it went unnoticed in earlier versions of Perl, unless someone happened to be looking in $@ just after the point the destructor happened to run. Such failures are now visible as warnings when warnings are enabled.
printf() and sprintf() previously reset the numeric locale back to the default "C" locale. This has been fixed.
Numbers formatted according to the local numeric locale (such as using a decimal comma instead of a decimal dot) caused "isn't numeric" warnings, even while the operations accessing those numbers produced correct results. These warnings have been discontinued.
The eval 'return sub {...}'
construct could sometimes leak
memory. This has been fixed.
Operations that aren't filehandle constructors used to leak memory when used on invalid filehandles. This has been fixed.
Constructs that modified @_
could fail to deallocate values
in @_
and thus leak memory. This has been corrected.
Perl could sometimes create empty subroutine stubs when a subroutine was not found in the package. Such cases stopped later method lookups from progressing into base packages. This has been corrected.
-U
When running in unsafe mode, taint violations could sometimes cause silent failures. This has been fixed.
-c
switchPrior versions used to run BEGIN and END blocks when Perl was
run in compile-only mode. Since this is typically not the expected
behavior, END blocks are not executed anymore when the -c
switch
is used, or if compilation fails.
See Support for CHECK blocks for how to run things when the compile phase ends.
Using the __DATA__
token creates an implicit filehandle to
the file that contains the token. It is the program's
responsibility to close it when it is done reading from it.
This caveat is now better explained in the documentation. See perldata.
(W misc) A "my" or "our" variable has been redeclared in the current scope or statement, effectively eliminating all access to the previous instance. This is almost always a typographical error. Note that the earlier variable will still exist until the end of the scope or until all closure referents to it are destroyed.
(F) Lexically scoped subroutines are not yet implemented. Don't try that yet.
(W misc) You seem to have already declared the same global once before in the current lexical scope.
(F) The '!' is allowed in pack() and unpack() only after certain types. See pack.
(F) You had an unpack template indicating a counted-length string, but you have also specified an explicit size for the string. See pack.
(F) You had an unpack template indicating a counted-length string, which must be followed by one of the letters a, A or Z to indicate what sort of string is to be unpacked. See pack.
(F) You had a pack template indicating a counted-length string, Currently the only things that can have their length counted are a*, A* or Z*. See pack.
(F) You had an unpack template that contained a '#', but this did not follow some numeric unpack specification. See pack.
(W regexp) You used a backslash-character combination which is not recognized
by Perl. This combination appears in an interpolated variable or a
'-delimited regular expression. The character was understood literally.
(W regexp) You used a backslash-character combination which is not recognized by Perl inside character classes. The character was understood literally.
(W syntax) You have used a pattern where Perl expected to find a string,
as in the first argument to join. Perl will treat the true
or false result of matching the pattern against $_ as the string,
which is probably not what you had in mind.
(W prototype) You've called a function that has a prototype before the parser saw a definition or declaration for it, and Perl could not check that the call conforms to the prototype. You need to either add an early prototype declaration for the subroutine in question, or move the subroutine definition ahead of the call to get proper prototype checking. Alternatively, if you are certain that you're calling the function correctly, you may put an ampersand before the name to avoid the warning. See perlsub.
(F) The argument to exists() must be a hash or array element, such as:
- $foo{$bar}
- $ref->{"susie"}[12]
(F) The argument to delete() must be either a hash or array element, such as:
- $foo{$bar}
- $ref->{"susie"}[12]
or a hash or array slice, such as:
- @foo[$bar, $baz, $xyzzy]
- @{$ref->[12]}{"susie", "queue"}
(F) The argument to exists() for exists &sub
must be a subroutine
name, and not a subroutine call. exists &sub()
will generate this error.
(W reserved) A lowercase attribute name was used that had a package-specific handler. That name might have a meaning to Perl itself some day, even though it doesn't yet. Perhaps you should use a mixed-case attribute name, instead. See attributes.
(W misc) This prefix usually indicates that a DESTROY() method raised the indicated exception. Since destructors are usually called by the system at arbitrary points during execution, and often a vast number of times, the warning is issued only once for any number of failures that would otherwise result in the same message being repeated.
Failure of user callbacks dispatched using the G_KEEPERR
flag
could also result in this warning. See G_KEEPERR in perlcall.
(F) You wrote require <file>
when you should have written
require 'file'
.
(F) You tried to join a thread from within itself, which is an impossible task. You may be joining the wrong thread, or you may need to move the join() to some other thread.
(F) You've used the /e switch to evaluate the replacement for a substitution, but perl found a syntax error in the code to evaluate, most likely an unexpected right brace '}'.
(S) An internal routine called realloc() on something that had never been
malloc()ed in the first place. Mandatory, but can be disabled by
setting environment variable PERL_BADFREE
to 1.
(W bareword) The compiler found a bareword where it expected a conditional, which often indicates that an || or && was parsed as part of the last argument of the previous construct, for example:
It may also indicate a misspelled constant that has been interpreted as a bareword:
The strict
pragma is useful in avoiding such errors.
(W portable) The binary number you specified is larger than 2**32-1 (4294967295) and therefore non-portable between systems. See perlport for more on portability concerns.
(W portable) Using bit vector sizes larger than 32 is non-portable.
(W internal) A warning peculiar to VMS. While Perl was preparing to iterate over %ENV, it encountered a logical name or symbol definition which was too long, so it was truncated to the string shown.
(P) For some reason you can't check the filesystem of the script for nosuid.
(S) Currently, only scalar variables can declared with a specific class qualifier in a "my" or "our" declaration. The semantics may be extended for other types of variables in future.
(F) Only scalar, array, and hash variables may be declared as "my" or "our" variables. They must have ordinary identifiers as names.
(W signal) Perl has detected that it is being run with the SIGCHLD signal (sometimes known as SIGCLD) disabled. Since disabling this signal will interfere with proper determination of exit status of child processes, Perl has reset the signal to its default value. This situation typically indicates that the parent program under which Perl may be running (e.g., cron) is being very careless.
(F) Subroutines meant to be used in lvalue context should be declared as such, see Lvalue subroutines in perlsub.
(S) A warning peculiar to VMS. Perl tried to read an element of %ENV from the CRTL's internal environment array and discovered the array was missing. You need to figure out where your CRTL misplaced its environ or define PERL_ENV_TABLES (see perlvms) so that environ is not searched.
(S) You requested an inplace edit without creating a backup file. Perl was unable to remove the original file to replace it with the modified file. The file was left unmodified.
(F) Perl detected an attempt to return illegal lvalues (such as temporary or readonly values) from a subroutine used as an lvalue. This is not allowed.
(F) You attempted to weaken something that was not a reference. Only references can be weakened.
(F) The class in the character class [: :] syntax is unknown. See perlre.
(W unsafe) The character class constructs [: :], [= =], and [. .] go inside character classes, the [] are part of the construct, for example: /[012[:alpha:]345]/. Note that [= =] and [. .] are not currently implemented; they are simply placeholders for future extensions.
(F) A constant value (perhaps declared using the use constant
pragma)
is being dereferenced, but it amounts to the wrong type of reference. The
message indicates the type of reference that was expected. This usually
indicates a syntax error in dereferencing the constant value.
See Constant Functions in perlsub and constant.
(F) The parser found inconsistencies either while attempting to define an
overloaded constant, or when trying to find the character name specified
in the \N{...}
escape. Perhaps you forgot to load the corresponding
overload
or charnames
pragma? See charnames and overload.
(F) The CORE:: namespace is reserved for Perl keywords.
(D) defined() is not usually useful on arrays because it checks for an
undefined scalar value. If you want to see if the array is empty,
just use if (@array) { # not empty } for example.
(D) defined() is not usually useful on hashes because it checks for an
undefined scalar value. If you want to see if the hash is empty,
just use if (%hash) { # not empty } for example.
See Server error.
(W misc) Remember that "our" does not localize the declared global variable. You have declared it again in the same lexical scope, which seems superfluous.
See Server error.
(F) While under the use filetest
pragma, switching the real and
effective uids or gids failed.
(W regexp) A character class range must start and end at a literal character, not
another character class like \d
or [:alpha:]. The "-" in your false
range is interpreted as a literal "-". Consider quoting the "-", "\-".
See perlre.
(W io) You tried to read from a filehandle opened only for writing. If you intended it to be a read/write filehandle, you needed to open it with "+<" or "+>" or "+>>" instead of with "<" or nothing. If you intended only to read from the file, use "<". See open.
(W closed) The filehandle you're attempting to flock() got itself closed some time before now. Check your logic flow. flock() operates on filehandles. Are you attempting to call flock() on a dirhandle by the same name?
(F) You've said "use strict vars", which indicates that all variables must either be lexically scoped (using "my"), declared beforehand using "our", or explicitly qualified to say which package the global variable is in (using "::").
(W portable) The hexadecimal number you specified is larger than 2**32-1 (4294967295) and therefore non-portable between systems. See perlport for more on portability concerns.
(W internal) A warning peculiar to VMS. Perl tried to read the CRTL's internal
environ array, and encountered an element without the =
delimiter
used to separate keys from values. The element is ignored.
(W internal) A warning peculiar to VMS. Perl tried to read a logical name or CLI symbol definition when preparing to iterate over %ENV, and didn't see the expected delimiter between key and value, so the line was ignored.
(F) You used a digit other than 0 or 1 in a binary number.
(W digit) You may have tried to use a digit other than 0 or 1 in a binary number. Interpretation of the binary number stopped before the offending digit.
(F) The number of bits in vec() (the third argument) must be a power of two from 1 to 32 (or 64, if your platform supports that).
(W overflow) The hexadecimal, octal or binary number you have specified either as a literal or as an argument to hex() or oct() is too big for your architecture, and has been converted to a floating point number. On a 32-bit architecture the largest hexadecimal, octal or binary number representable without overflow is 0xFFFFFFFF, 037777777777, or 0b11111111111111111111111111111111 respectively. Note that Perl transparently promotes all numbers to a floating point representation internally--subject to loss of precision errors in subsequent operations.
The indicated attribute for a subroutine or variable was not recognized by Perl or by a user-supplied handler. See attributes.
The indicated attributes for a subroutine or variable were not recognized by Perl or by a user-supplied handler. See attributes.
The offending range is now explicitly displayed.
(F) Something other than a colon or whitespace was seen between the elements of an attribute list. If the previous attribute had a parenthesised parameter list, perhaps that list was terminated too soon. See attributes.
(F) Something other than a colon or whitespace was seen between the elements of a subroutine attribute list. If the previous attribute had a parenthesised parameter list, perhaps that list was terminated too soon.
(F) While under the use filetest
pragma, switching the real and
effective uids or gids failed.
(F) Due to limitations in the current implementation, array and hash values cannot be returned in subroutines used in lvalue context. See Lvalue subroutines in perlsub.
See Server error.
(F) Wrong syntax of character name literal \N{charname}
within
double-quotish context.
(W pipe) You used the open(FH, "| command")
or open(FH, "command |")
construction, but the command was missing or blank.
(F) The reserved syntax for lexically scoped subroutines requires that they have a name with which they can be found.
(F) The indicated command line switch needs a mandatory argument, but you haven't specified one.
(F) Fully qualified variable names are not allowed in "our" declarations, because that doesn't make much sense under existing semantics. Such syntax is reserved for future extensions.
(F) The argument to the indicated command line switch must follow immediately after the switch, without intervening spaces.
(S) A warning peculiar to VMS. Perl was unable to find the local timezone offset, so it's assuming that local system time is equivalent to UTC. If it's not, define the logical name SYS$TIMEZONE_DIFFERENTIAL to translate to the number of seconds which need to be added to UTC to get local time.
(W portable) The octal number you specified is larger than 2**32-1 (4294967295) and therefore non-portable between systems. See perlport for more on portability concerns.
See also perlport for writing portable code.
(P) Failed an internal consistency check while trying to reset a weak reference.
(F) forked child returned an incomprehensible message about its errno.
(P) Failed an internal consistency check while trying to reset all weak references to an object.
(W parenthesis) You said something like
- my $foo, $bar = @_;
when you meant
- my ($foo, $bar) = @_;
Remember that "my", "our", and "local" bind tighter than comma.
(W ambiguous) It used to be that Perl would try to guess whether you wanted an array interpolated or a literal @. It no longer does this; arrays are now always interpolated into strings. This means that if you try something like:
- print "fred@example.com";
and the array @example
doesn't exist, Perl is going to print
fred.com
, which is probably not what you wanted. To get a literal
@
sign in a string, put a backslash before it, just as you would
to get a literal $
sign.
(W y2k) You are concatenating the number 19 with another number, which could be a potential Year 2000 problem.
(W deprecated) You have written something like this:
- sub doit
- {
- use attrs qw(locked);
- }
You should use the new declaration syntax instead.
- sub doit : locked
- {
- ...
The use attrs
pragma is now obsolete, and is only provided for
backward-compatibility. See Subroutine Attributes in perlsub.
See Server error.
(F) You can't specify a repeat count so large that it overflows your signed integers. See pack.
(F) You can't specify a repeat count so large that it overflows your signed integers. See unpack.
(S) An internal routine called realloc() on something that had already been freed.
(W misc) You have attempted to weaken a reference that is already weak. Doing so has no effect.
(F) Your system has the setpgrp() from BSD 4.2, which takes no arguments, unlike POSIX setpgid(), which takes a process ID and process group ID.
(W regexp) You applied a regular expression quantifier in a place where it
makes no sense, such as on a zero-width assertion.
Try putting the quantifier inside the assertion instead. For example,
the way to match "abc" provided that it is followed by three
repetitions of "xyz" is /abc(?=(?:xyz){3})/
, not /abc(?=xyz){3}/
.
(F) While under the use filetest
pragma, we cannot switch the
real and effective uids or gids.
(W internal) Warnings peculiar to VMS. You tried to change or delete an element of the CRTL's internal environ array, but your copy of Perl wasn't built with a CRTL that contained the setenv() function. You'll need to rebuild Perl with a CRTL that does, or redefine PERL_ENV_TABLES (see perlvms) so that the environ array isn't the target of the change to %ENV which produced the warning.
(W void) A CHECK or INIT block is being defined during run time proper,
when the opportunity to run them has already passed. Perhaps you are
loading a file with require or do when you should be using
use instead. Or perhaps you should put the require or do
inside a BEGIN block.
(F) The second argument of 3-argument open() is not among the list
of valid modes: <
, >, >>
, +<
,
+>, +>>
, -|, |-.
(P) An error peculiar to VMS. Perl was reading values for %ENV before iterating over it, and someone else stuck a message in the stream of data Perl expected. Someone's very confused, or perhaps trying to subvert Perl's population of %ENV for nefarious purposes.
(W misc) You used a backslash-character combination which is not recognized by Perl. The character was understood literally.
(F) The lexer saw an opening (left) parenthesis character while parsing an attribute list, but the matching closing (right) parenthesis character was not found. You may need to add (or remove) a backslash character to get your parentheses to balance. See attributes.
(F) The lexer found something other than a simple identifier at the start of an attribute, and it wasn't a semicolon or the start of a block. Perhaps you terminated the parameter list of the previous attribute too soon. See attributes.
(F) The lexer saw an opening (left) parenthesis character while parsing a subroutine attribute list, but the matching closing (right) parenthesis character was not found. You may need to add (or remove) a backslash character to get your parentheses to balance.
(F) The lexer found something other than a simple identifier at the start of a subroutine attribute, and it wasn't a semicolon or the start of a block. Perhaps you terminated the parameter list of the previous attribute too soon.
(W misc) A warning peculiar to VMS. Perl tried to read the value of an %ENV element from a CLI symbol table, and found a resultant string longer than 1024 characters. The return value has been truncated to 1024 characters.
(P) The attempt to translate a use Module n.n LIST
statement into
its equivalent BEGIN
block found an internal inconsistency with
the version number.
Compatibility tests for sub : attrs
vs the older use attrs
.
Tests for new environment scalar capability (e.g., use Env qw($BAR);
).
Tests for new environment array capability (e.g., use Env qw(@PATH);
).
IO constants (SEEK_*, _IO*).
Directory-related IO methods (new, read, close, rewind, tied delete).
INET sockets with multi-homed hosts.
IO poll().
UNIX sockets.
Regression tests for my ($x,@y,%z) : attrs
and <sub : attrs>.
File test operators.
Verify operations that access pad objects (lexicals and temporaries).
Verify exists &sub
operations.
Beware that any new warnings that have been added or old ones that have been enhanced are not considered incompatible changes.
Since all new warnings must be explicitly requested via the -w
switch or the warnings
pragma, it is ultimately the programmer's
responsibility to ensure that warnings are enabled judiciously.
All subroutine definitions named CHECK are now special. See
/"Support for CHECK blocks" for more information.
There is a potential incompatibility in the behavior of list slices that are comprised entirely of undefined values. See Behavior of list slices is more consistent.
The English module now sets $PERL_VERSION to $^V (a string value) rather
than $]
(a numeric value). This is a potential incompatibility.
Send us a report via perlbug if you are affected by this.
See Improved Perl version numbering system for the reasons for this change.
1.2.3
parse differently
Previously, numeric literals with more than one dot in them were interpreted as a floating point number concatenated with one or more numbers. Such "numbers" are now parsed as strings composed of the specified ordinals.
For example, print 97.98.99
used to output 97.9899
in earlier
versions, but now prints abc
.
See Support for strings represented as a vector of ordinals.
Perl programs that depend on reproducing a specific set of pseudo-random
numbers may now produce different output due to improvements made to the
rand() builtin. You can use sh Configure -Drandfunc=rand
to obtain
the old behavior.
Even though Perl hashes are not order preserving, the apparently random order encountered when iterating on the contents of a hash is actually determined by the hashing algorithm used. Improvements in the algorithm may yield a random order that is different from that of previous versions, especially when iterating on hashes.
See Better worst-case behavior of hashes for additional information.
undef fails on read only values
Using the undef operator on a readonly value (such as $1) has
the same effect as assigning undef to the readonly value--it
throws an exception.
Pipe and socket handles are also now subject to the close-on-exec behavior determined by the special variable $^F.
"$$1"
to mean "${$}1"
is unsupported
Perl 5.004 deprecated the interpretation of $$1
and
similar within interpolated strings to mean $$ . "1"
,
but still allowed it.
In Perl 5.6.0 and later, "$$1"
always means "${$1}"
.
\(%h)
operate on aliases to values, not copies
delete(), each(), values() and hashes (e.g. \(%h)
)
in a list context return the actual
values in the hash, instead of copies (as they used to in earlier
versions). Typical idioms for using these constructs copy the
returned values, but this can make a significant difference when
creating references to the returned values. Keys in the hash are still
returned as copies when iterating on a hash.
See also delete(), each(), values() and hash iteration are faster.
vec() generates a run-time error if the BITS argument is not a valid power-of-two integer.
Most references to internal Perl operations in diagnostics have been changed to be more descriptive. This may be an issue for programs that may incorrectly rely on the exact text of diagnostics for proper functioning.
%@
has been removed
The undocumented special variable %@
that used to accumulate
"background" errors (such as those that happen in DESTROY())
has been removed, because it could potentially result in memory
leaks.
The not
operator now falls under the "if it looks like a function,
it behaves like a function" rule.
As a result, the parenthesized form can be used with grep and map.
The following construct used to be a syntax error before, but it works
as expected now:
- grep not($_), @things;
On the other hand, using not
with a literal list slice may not
work. The following previously allowed construct:
- print not (1,2,3)[0];
needs to be written with additional parentheses now:
- print not((1,2,3)[0]);
The behavior remains unaffected when not
is not followed by parentheses.
(*) have changed
The semantics of the bareword prototype *
have changed. Perl 5.005
always coerced simple scalar arguments to a typeglob, which wasn't useful
in situations where the subroutine must distinguish between a simple
scalar and a typeglob. The new behavior is to not coerce bareword
arguments to a typeglob. The value will always be visible as either
a simple scalar or as a reference to a typeglob.
If your platform is either natively 64-bit or if Perl has been
configured to used 64-bit integers, i.e., $Config{ivsize} is 8,
there may be a potential incompatibility in the behavior of bitwise
numeric operators (& | ^ ~ <<>>). These operators used to strictly
operate on the lower 32 bits of integers in previous versions, but now
operate over the entire native integral width. In particular, note
that unary ~
will produce different results on platforms that have
different $Config{ivsize}. For portability, be sure to mask off
the excess bits in the result of unary ~
, e.g., ~$x & 0xffffffff
.
As described in Improved security features, there may be more sources of taint in a Perl program.
To avoid these new tainting behaviors, you can build Perl with the
Configure option -Accflags=-DINCOMPLETE_TAINTS
. Beware that the
ensuing perl binary may be insecure.
PERL_POLLUTE
Release 5.005 grandfathered old global symbol names by providing preprocessor
macros for extension source compatibility. As of release 5.6.0, these
preprocessor definitions are not available by default. You need to explicitly
compile perl with -DPERL_POLLUTE
to get these definitions. For
extensions still using the old symbols, this option can be
specified via MakeMaker:
- perl Makefile.PL POLLUTE=1
PERL_IMPLICIT_CONTEXT
This new build option provides a set of macros for all API functions
such that an implicit interpreter/thread context argument is passed to
every API function. As a result of this, something like sv_setsv(foo,bar)
amounts to a macro invocation that actually translates to something like
Perl_sv_setsv(my_perl,foo,bar)
. While this is generally expected
to not have any significant source compatibility issues, the difference
between a macro and a real function call will need to be considered.
This means that there is a source compatibility issue as a result of this if your extensions attempt to use pointers to any of the Perl API functions.
Note that the above issue is not relevant to the default build of Perl, whose interfaces continue to match those of prior versions (but subject to the other options described here).
See Background and PERL_IMPLICIT_CONTEXT in perlguts for detailed information on the ramifications of building Perl with this option.
- NOTE: PERL_IMPLICIT_CONTEXT is automatically enabled whenever Perl is built
- with one of -Dusethreads, -Dusemultiplicity, or both. It is not
- intended to be enabled by users at this time.
PERL_POLLUTE_MALLOC
Enabling Perl's malloc in release 5.005 and earlier caused the namespace of the system's malloc family of functions to be usurped by the Perl versions, since by default they used the same names. Besides causing problems on platforms that do not allow these functions to be cleanly replaced, this also meant that the system versions could not be called in programs that used Perl's malloc. Previous versions of Perl have allowed this behaviour to be suppressed with the HIDEMYMALLOC and EMBEDMYMALLOC preprocessor definitions.
As of release 5.6.0, Perl's malloc family of functions have default names
distinct from the system versions. You need to explicitly compile perl with
-DPERL_POLLUTE_MALLOC
to get the older behaviour. HIDEMYMALLOC
and EMBEDMYMALLOC have no effect, since the behaviour they enabled is now
the default.
Note that these functions do not constitute Perl's memory allocation API. See Memory Allocation in perlguts for further information about that.
PATCHLEVEL
is now PERL_VERSION
The cpp macros PERL_REVISION
, PERL_VERSION
, and PERL_SUBVERSION
are now available by default from perl.h, and reflect the base revision,
patchlevel, and subversion respectively. PERL_REVISION
had no
prior equivalent, while PERL_VERSION
and PERL_SUBVERSION
were
previously available as PATCHLEVEL
and SUBVERSION
.
The new names cause less pollution of the cpp namespace and reflect what the numbers have come to stand for in common practice. For compatibility, the old names are still supported when patchlevel.h is explicitly included (as required before), so there is no source incompatibility from the change.
In general, the default build of this release is expected to be binary compatible for extensions built with the 5.005 release or its maintenance versions. However, specific platforms may have broken binary compatibility due to changes in the defaults used in hints files. Therefore, please be sure to always check the platform-specific README files for any notes to the contrary.
The usethreads or usemultiplicity builds are not binary compatible with the corresponding builds in 5.005.
On platforms that require an explicit list of exports (AIX, OS/2 and Windows, among others), purely internal symbols such as parser functions and the run time opcodes are not exported by default. Perl 5.005 used to export all functions irrespective of whether they were considered part of the public API or not.
For the full list of public API functions, see perlapi.
As of the 5.6.1 release, there is a known leak when code such as this is executed:
64-bit builds
Subtest #15 of lib/b.t may fail under 64-bit builds on platforms such as HP-UX PA64 and Linux IA64. The issue is still being investigated.
The lib/io_multihomed test may hang in HP-UX if Perl has been configured to be 64-bit. Because other 64-bit platforms do not hang in this test, HP-UX is suspect. All other tests pass in 64-bit HP-UX. The test attempts to create and connect to "multihomed" sockets (sockets which have multiple IP addresses).
Note that 64-bit support is still experimental.
Failure of Thread tests
The subtests 19 and 20 of lib/thr5005.t test are known to fail due to fundamental problems in the 5.005 threading implementation. These are not new failures--Perl 5.005_0x has the same bugs, but didn't have these tests. (Note that support for 5.005-style threading remains experimental.)
NEXTSTEP 3.3 POSIX test failure
In NEXTSTEP 3.3p2 the implementation of the strftime(3) in the operating system libraries is buggy: the %j format numbers the days of a month starting from zero, which, while being logical to programmers, will cause the subtests 19 to 27 of the lib/posix test may fail.
Tru64 (aka Digital UNIX, aka DEC OSF/1) lib/sdbm test failure with gcc
If compiled with gcc 2.95 the lib/sdbm test will fail (dump core). The cure is to use the vendor cc, it comes with the operating system and produces good code.
In earlier releases of Perl, EBCDIC environments like OS390 (also known as Open Edition MVS) and VM-ESA were supported. Due to changes required by the UTF-8 (Unicode) support, the EBCDIC platforms are not supported in Perl 5.6.0.
The 5.6.1 release improves support for EBCDIC platforms, but they are not fully supported yet.
In UNICOS/mk the following errors may appear during the Configure run:
- Guessing which symbols your C compiler and preprocessor define...
- CC-20 cc: ERROR File = try.c, Line = 3
- ...
- bad switch yylook 79bad switch yylook 79bad switch yylook 79bad switch yylook 79#ifdef A29K
- ...
- 4 errors detected in the compilation of "try.c".
The culprit is the broken awk of UNICOS/mk. The effect is fortunately rather mild: Perl itself is not adversely affected by the error, only the h2ph utility coming with Perl, and that is rather rarely needed these days.
When the left argument to the arrow operator ->
is an array, or
the scalar operator operating on an array, the result of the
operation must be considered erroneous. For example:
- @x->[2]
- scalar(@x)->[2]
These expressions will get run-time errors in some future release of Perl.
As discussed above, many features are still experimental. Interfaces and implementation of these features are subject to change, and in extreme cases, even subject to removal in some future release of Perl. These features include the following:
(?{ code }) and (??{ code })
(W) Within regular expression character classes ([]) the syntax beginning with "[:" and ending with ":]" is reserved for future extensions. If you need to represent those character sequences inside a regular expression character class, just quote the square brackets with the backslash: "\[:" and ":\]".
(W) A warning peculiar to VMS. A logical name was encountered when preparing to iterate over %ENV which violates the syntactic rules governing logical names. Because it cannot be translated normally, it is skipped, and will not appear in %ENV. This may be a benign occurrence, as some software packages might directly modify logical name tables and introduce nonstandard names, or it may indicate that a logical name table has been corrupted.
The description of this error used to say:
- (Someday it will simply assume that an unbackslashed @
- interpolates an array.)
That day has come, and this fatal error has been removed. It has been replaced by a non-fatal warning instead. See Arrays now always interpolate into double-quoted strings for details.
(W) The compiler found a bareword where it expected a conditional, which often indicates that an || or && was parsed as part of the last argument of the previous construct, for example:
(F) The current implementation of regular expressions uses shorts as address offsets within a string. Unfortunately this means that if the regular expression compiles to longer than 32767, it'll blow up. Usually when you want a regular expression this big, there is a better way to do it with multiple statements. See perlre.
(D) Perl versions before 5.004 misinterpreted any type marker followed by "$" and a digit. For example, "$$0" was incorrectly taken to mean "${$}0" instead of "${$0}". This bug is (mostly) fixed in Perl 5.004.
However, the developers of Perl 5.004 could not fix this bug completely, because at least two widely-used modules depend on the old meaning of "$$0" in a string. So Perl 5.004 still interprets "$$<digit>" in the old (broken) way inside strings; but it generates this message as a warning. And in Perl 5.005, this special treatment will cease.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup. There may also be information at http://www.perl.com/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
Written by Gurusamy Sarathy <gsar@ActiveState.com>, with many contributions from The Perl Porters.
Send omissions or corrections to <perlbug@perl.org>.
perl56delta - what's new for perl v5.6.0
This document describes differences between the 5.005 release and the 5.6.0 release.
Perl 5.6.0 introduces the beginnings of support for running multiple interpreters concurrently in different threads. In conjunction with the perl_clone() API call, which can be used to selectively duplicate the state of any given interpreter, it is possible to compile a piece of code once in an interpreter, clone that interpreter one or more times, and run all the resulting interpreters in distinct threads.
On the Windows platform, this feature is used to emulate fork() at the interpreter level. See perlfork for details about that.
This feature is still in evolution. It is eventually meant to be used to selectively clone a subroutine and data reachable from that subroutine in a separate interpreter and run the cloned subroutine in a separate thread. Since there is no shared data between the interpreters, little or no locking will be needed (unless parts of the symbol table are explicitly shared). This is obviously intended to be an easy-to-use replacement for the existing threads support.
Support for cloning interpreters and interpreter concurrency can be enabled using the -Dusethreads Configure option (see win32/Makefile for how to enable it on Windows.) The resulting perl executable will be functionally identical to one that was built with -Dmultiplicity, but the perl_clone() API call will only be available in the former.
-Dusethreads enables the cpp macro USE_ITHREADS by default, which in turn enables Perl source code changes that provide a clear separation between the op tree and the data it operates with. The former is immutable, and can therefore be shared between an interpreter and all of its clones, while the latter is considered local to each interpreter, and is therefore copied for each clone.
Note that building Perl with the -Dusemultiplicity Configure option is adequate if you wish to run multiple independent interpreters concurrently in different threads. -Dusethreads only provides the additional functionality of the perl_clone() API call and other support for running cloned interpreters concurrently.
- NOTE: This is an experimental feature. Implementation details are
- subject to change.
You can now control the granularity of warnings emitted by perl at a finer
level using the use warnings
pragma. warnings and perllexwarn
have copious documentation on this feature.
Perl now uses UTF-8 as its internal representation for character
strings. The utf8
and bytes
pragmas are used to control this support
in the current lexical scope. See perlunicode, utf8 and bytes for
more information.
This feature is expected to evolve quickly to support some form of I/O disciplines that can be used to specify the kind of input and output data (bytes or characters). Until that happens, additional modules from CPAN will be needed to complete the toolkit for dealing with Unicode.
- NOTE: This should be considered an experimental feature. Implementation
- details are subject to change.
The new \N
escape interpolates named characters within strings.
For example, "Hi! \N{WHITE SMILING FACE}"
evaluates to a string
with a unicode smiley face at the end.
An "our" declaration introduces a value that can be best understood
as a lexically scoped symbolic alias to a global variable in the
package that was current where the variable was declared. This is
mostly useful as an alternative to the vars
pragma, but also provides
the opportunity to introduce typing and other attributes for such
variables. See our.
Literals of the form v1.2.3.4
are now parsed as a string composed
of characters with the specified ordinals. This is an alternative, more
readable way to construct (possibly unicode) strings instead of
interpolating characters, as in "\x{1}\x{2}\x{3}\x{4}"
. The leading
v
may be omitted if there are more than two ordinals, so 1.2.3
is
parsed the same as v1.2.3
.
Strings written in this form are also useful to represent version "numbers".
It is easy to compare such version "numbers" (which are really just plain
strings) using any of the usual string comparison operators eq
, ne
,
lt
, gt
, etc., or perform bitwise string operations on them using |,
&
, etc.
In conjunction with the new $^V
magic variable (which contains
the perl version as a string), such literals can be used as a readable way
to check if you're running a particular version of Perl:
- # this will parse in older versions of Perl also
- if ($^V and $^V gt v5.6.0) {
- # new features supported
- }
require and use also have some special magic to support such
literals, but this particular usage should be avoided because it leads to
misleading error messages under versions of Perl which don't support vector
strings. Using a true version number will ensure correct behavior in all
versions of Perl:
Also, sprintf and printf support the Perl-specific format flag %v
to print ordinals of characters in arbitrary strings:
See Scalar value constructors in perldata for additional information.
Beginning with Perl version 5.6.0, the version number convention has been changed to a "dotted integer" scheme that is more commonly found in open source projects.
Maintenance versions of v5.6.0 will be released as v5.6.1, v5.6.2 etc. The next development series following v5.6.0 will be numbered v5.7.x, beginning with v5.7.0, and the next major production release following v5.6.0 will be v5.8.0.
The English module now sets $PERL_VERSION to $^V (a string value) rather
than $]
(a numeric value). (This is a potential incompatibility.
Send us a report via perlbug if you are affected by this.)
The v1.2.3 syntax is also now legal in Perl. See Support for strings represented as a vector of ordinals for more on that.
To cope with the new versioning system's use of at least three significant
digits for each version component, the method used for incrementing the
subversion number has also changed slightly. We assume that versions older
than v5.6.0 have been incrementing the subversion component in multiples of
10. Versions after v5.6.0 will increment them by 1. Thus, using the new
notation, 5.005_03 is the "same" as v5.5.30, and the first maintenance
version following v5.6.0 will be v5.6.1 (which should be read as being
equivalent to a floating point value of 5.006_001 in the older format,
stored in $]
).
Formerly, if you wanted to mark a subroutine as being a method call or
as requiring an automatic lock() when it is entered, you had to declare
that with a use attrs
pragma in the body of the subroutine.
That can now be accomplished with declaration syntax, like this:
(Note how only the first :
is mandatory, and whitespace surrounding
the :
is optional.)
AutoSplit.pm and SelfLoader.pm have been updated to keep the attributes with the stubs they provide. See attributes.
Similar to how constructs such as $x->[0]
autovivify a reference,
handle constructors (open(), opendir(), pipe(), socketpair(), sysopen(),
socket(), and accept()) now autovivify a file or directory handle
if the handle passed to them is an uninitialized scalar variable. This
allows the constructs such as open(my $fh, ...)
and open(local $fh,...)
to be used to create filehandles that will conveniently be closed
automatically when the scope ends, provided there are no other references
to them. This largely eliminates the need for typeglobs when opening
filehandles that must be passed around, as in the following example:
If open() is passed three arguments instead of two, the second argument is used as the mode and the third argument is taken to be the file name. This is primarily useful for protecting against unintended magic behavior of the traditional two-argument form. See open.
Any platform that has 64-bit integers either
- (1) natively as longs or ints
- (2) via special compiler flags
- (3) using long long or int64_t
is able to use "quads" (64-bit integers) as follows:
constants (decimal, hexadecimal, octal, binary) in the code
arguments to oct() and hex()
arguments to print(), printf() and sprintf() (flag prefixes ll, L, q)
printed as such
pack() and unpack() "q" and "Q" formats
in basic arithmetics: + - * / % (NOTE: operating close to the limits of the integer values may produce surprising results)
in bit arithmetics: & | ^ ~ <<>> (NOTE: these used to be forced to be 32 bits wide but now operate on the full native width.)
vec()
Note that unless you have the case (a) you will have to configure and compile Perl using the -Duse64bitint Configure flag.
- NOTE: The Configure flags -Duselonglong and -Duse64bits have been
- deprecated. Use -Duse64bitint instead.
There are actually two modes of 64-bitness: the first one is achieved using Configure -Duse64bitint and the second one using Configure -Duse64bitall. The difference is that the first one is minimal and the second one maximal. The first works in more places than the second.
The use64bitint
does only as much as is required to get 64-bit
integers into Perl (this may mean, for example, using "long longs")
while your memory may still be limited to 2 gigabytes (because your
pointers could still be 32-bit). Note that the name 64bitint does
not imply that your C compiler will be using 64-bit ints (it might,
but it doesn't have to): the use64bitint
means that you will be
able to have 64 bits wide scalar values.
The use64bitall
goes all the way by attempting to switch also
integers (if it can), longs (and pointers) to being 64-bit. This may
create an even more binary incompatible Perl than -Duse64bitint: the
resulting executable may not run at all in a 32-bit box, or you may
have to reboot/reconfigure/rebuild your operating system to be 64-bit
aware.
Natively 64-bit systems like Alpha and Cray need neither -Duse64bitint nor -Duse64bitall.
Last but not least: note that due to Perl's habit of always using floating point numbers, the quads are still not true integers. When quads overflow their limits (0...18_446_744_073_709_551_615 unsigned, -9_223_372_036_854_775_808...9_223_372_036_854_775_807 signed), they are silently promoted to floating point numbers, after which they will start losing precision (in their lower digits).
- NOTE: 64-bit support is still experimental on most platforms.
- Existing support only covers the LP64 data model. In particular, the
- LLP64 data model is not yet supported. 64-bit libraries and system
- APIs on many platforms have not stabilized--your mileage may vary.
If you have filesystems that support "large files" (files larger than 2 gigabytes), you may now also be able to create and access them from Perl.
- NOTE: The default action is to enable large file support, if
- available on the platform.
If the large file support is on, and you have a Fcntl constant O_LARGEFILE, the O_LARGEFILE is automatically added to the flags of sysopen().
Beware that unless your filesystem also supports "sparse files" seeking to umpteen petabytes may be inadvisable.
Note that in addition to requiring a proper file system to do large files you may also need to adjust your per-process (or your per-system, or per-process-group, or per-user-group) maximum filesize limits before running Perl scripts that try to handle large files, especially if you intend to write such files.
Finally, in addition to your process/process group maximum filesize limits, you may have quota limits on your filesystems that stop you (your user id or your user group id) from using large files.
Adjusting your process/user/group/file system/operating system limits is outside the scope of Perl core language. For process limits, you may try increasing the limits using your shell's limits/limit/ulimit command before running Perl. The BSD::Resource extension (not included with the standard Perl distribution) may also be of use, it offers the getrlimit/setrlimit interface that can be used to adjust process resource usage limits, including the maximum filesize limit.
In some systems you may be able to use long doubles to enhance the range and precision of your double precision floating point numbers (that is, Perl's numbers). Use Configure -Duselongdouble to enable this support (if it is available).
You can "Configure -Dusemorebits" to turn on both the 64-bit support and the long double support.
Perl subroutines with a prototype of ($$)
, and XSUBs in general, can
now be used as sort subroutines. In either case, the two elements to
be compared are passed as normal parameters in @_. See sort.
For unprototyped sort subroutines, the historical behavior of passing the elements to be compared as the global variables $a and $b remains unchanged.
sort $coderef @foo
allowedsort() did not accept a subroutine reference as the comparison function in earlier versions. This is now permitted.
Perl now uses the File::Glob implementation of the glob() operator automatically. This avoids using an external csh process and the problems associated with it.
- NOTE: This is currently an experimental feature. Interfaces and
- implementation are subject to change.
In addition to BEGIN
, INIT
, END
, DESTROY
and AUTOLOAD
,
subroutines named CHECK
are now special. These are queued up during
compilation and behave similar to END blocks, except they are called at
the end of compilation rather than at the end of execution. They cannot
be called directly.
For example to match alphabetic characters use /[[:alpha:]]/. See perlre for details.
In 5.005_0x and earlier, perl's rand() function used the C library rand(3) function. As of 5.005_52, Configure tests for drand48(), random(), and rand() (in that order) and picks the first one it finds.
These changes should result in better random numbers from rand().
qw// operatorThe qw// operator is now evaluated at compile time into a true list
instead of being replaced with a run time call to split(). This
removes the confusing misbehaviour of qw// in scalar context, which
had inherited that behaviour from split().
Thus:
- $foo = ($bar) = qw(a b c); print "$foo|$bar\n";
now correctly prints "3|a", instead of "2|a".
Small changes in the hashing algorithm have been implemented in order to improve the distribution of lower order bits in the hashed value. This is expected to yield better performance on keys that are repeated sequences.
The new format type 'Z' is useful for packing and unpacking null-terminated strings. See pack.
The new format type modifier '!' is useful for packing and unpacking native shorts, ints, and longs. See pack.
The template character '/' can be used to specify a counted string type to be packed or unpacked. See pack.
The '#' character in a template introduces a comment up to end of the line. This facilitates documentation of pack() templates.
In previous versions of Perl, you couldn't cache objects so as to allow them to be deleted if the last reference from outside the cache is deleted. The reference in the cache would hold a reference count on the object and the objects would never be destroyed.
Another familiar problem is with circular references. When an object references itself, its reference count would never go down to zero, and it would not get destroyed until the program is about to exit.
Weak references solve this by allowing you to "weaken" any reference, that is, make it not count towards the reference count. When the last non-weak reference to an object is deleted, the object is destroyed and all the weak references to the object are automatically undef-ed.
To use this feature, you need the Devel::WeakRef package from CPAN, which contains additional documentation.
- NOTE: This is an experimental feature. Details are subject to change.
Binary numbers are now supported as literals, in s?printf formats, and
oct():
Subroutines can now return modifiable lvalues. See Lvalue subroutines in perlsub.
- NOTE: This is an experimental feature. Details are subject to change.
Perl now allows the arrow to be omitted in many constructs
involving subroutine calls through references. For example,
$foo[10]->('foo')
may now be written $foo[10]('foo')
.
This is rather similar to how the arrow may be omitted from
$foo[10]->{'foo'}
. Note however, that the arrow is still
required for foo(10)->('bar')
.
Constructs such as ($a ||= 2) += 1
are now allowed.
The exists() builtin now works on subroutine names. A subroutine is considered to exist if it has been declared (even if implicitly). See exists for examples.
The exists() and delete() builtins now work on simple arrays as well. The behavior is similar to that on hash elements.
exists() can be used to check whether an array element has been initialized. This avoids autovivifying array elements that don't exist. If the array is tied, the EXISTS() method in the corresponding tied package will be invoked.
delete() may be used to remove an element from the array and return it. The array element at that position returns to its uninitialized state, so that testing for the same element with exists() will return false. If the element happens to be the one at the end, the size of the array also shrinks up to the highest element that tests true for exists(), or 0 if none such is found. If the array is tied, the DELETE() method in the corresponding tied package will be invoked.
See exists and delete for examples.
Dereferencing some types of reference values in a pseudo-hash,
such as $ph->{foo}[1]
, was accidentally disallowed. This has
been corrected.
When applied to a pseudo-hash element, exists() now reports whether the specified value exists, not merely if the key is valid.
delete() now works on pseudo-hashes. When given a pseudo-hash element or slice it deletes the values corresponding to the keys (but not the keys themselves). See Pseudo-hashes: Using an array as a hash in perlref.
Pseudo-hash slices with constant keys are now optimized to array lookups at compile-time.
List assignments to pseudo-hash slices are now supported.
The fields
pragma now provides ways to create pseudo-hashes, via
fields::new() and fields::phash(). See fields.
- NOTE: The pseudo-hash data type continues to be experimental.
- Limiting oneself to the interface elements provided by the
- fields pragma will provide protection from any future changes.
fork(), exec(), system(), qx//, and pipe open()s now flush buffers of all files opened for output when the operation was attempted. This mostly eliminates confusing buffering mishaps suffered by users unaware of how Perl internally handles I/O.
This is not supported on some platforms like Solaris where a suitably correct implementation of fflush(NULL) isn't available.
Constructs such as open( and close(
are compile time errors. Attempting to read from filehandles that
were opened only for writing will now produce warnings (just as
writing to read-only filehandles does).
open(NEW, "<&OLD")
now attempts to discard any data that
was previously read and buffered in OLD
before duping the handle.
On platforms where doing this is allowed, the next read operation
on NEW
will return the same data as the corresponding operation
on OLD
. Formerly, it would have returned the data from the start
of the following disk block instead.
eof() would return true if no attempt to read from <>
had
yet been made. eof() has been changed to have a little magic of its
own, it now opens the <>
files.
binmode() now accepts a second argument that specifies a discipline for the handle in question. The two pseudo-disciplines ":raw" and ":crlf" are currently supported on DOS-derivative platforms. See binmode and open.
-T
filetest recognizes UTF-8 encoded files as "text"The algorithm used for the -T
filetest has been enhanced to
correctly identify UTF-8 content as "text".
On Unix and similar platforms, system(), qx() and open(FOO, "cmd |") etc., are implemented via fork() and exec(). When the underlying exec() fails, earlier versions did not report the error properly, since the exec() happened to be in a different process.
The child process now communicates with the parent about the error in launching the external command, which allows these constructs to return with their usual error value and set $!.
Line numbers are no longer suppressed (under most likely circumstances) during the global destruction phase.
Diagnostics emitted from code running in threads other than the main thread are now accompanied by the thread ID.
Embedded null characters in diagnostics now actually show up. They used to truncate the message in prior versions.
$foo::a and $foo::b are now exempt from "possible typo" warnings only
if sort() is encountered in package foo
.
Unrecognized alphabetic escapes encountered when parsing quote constructs now generate a warning, since they may take on new semantics in later versions of Perl.
Many diagnostics now report the internal operation in which the warning was provoked, like so:
- Use of uninitialized value in concatenation (.) at (eval 1) line 1.
- Use of uninitialized value in print at (eval 1) line 1.
Diagnostics that occur within eval may also report the file and line number where the eval is located, in addition to the eval sequence number and the line number within the evaluated text itself. For example:
- Not enough arguments for scalar at (eval 4)[newlib/perl5db.pl:1411] line 2, at EOF
Diagnostic output now goes to whichever file the STDERR
handle
is pointing at, instead of always going to the underlying C runtime
library's stderr
.
On systems that support a close-on-exec flag on filehandles, the flag is now set for any handles created by pipe(), socketpair(), socket(), and accept(), if that is warranted by the value of $^F that may be in effect. Earlier versions neglected to set the flag for handles created with these operators. See pipe, socketpair, socket, accept, and $^F in perlvar.
The length argument of syswrite() has become optional.
Expressions such as:
used to be accidentally allowed in earlier versions, and produced unpredictable behaviour. Some produced ancillary warnings when used in this way; others silently did the wrong thing.
The parenthesized forms of most unary operators that expect a single argument now ensure that they are not called with more than one argument, making the cases shown above syntax errors. The usual behaviour of:
remains unchanged. See perlop.
The bit operators (& | ^ ~ <<>>) now operate on the full native
integral width (the exact size of which is available in $Config{ivsize}).
For example, if your platform is either natively 64-bit or if Perl
has been configured to use 64-bit integers, these operations apply
to 8 bytes (as opposed to 4 bytes on 32-bit platforms).
For portability, be sure to mask off the excess bits in the result of
unary ~
, e.g., ~$x & 0xffffffff
.
More potentially unsafe operations taint their results for improved security.
The passwd
and shell
fields returned by the getpwent(), getpwnam(),
and getpwuid() are now tainted, because the user can affect their own
encrypted password and login shell.
The variable modified by shmread(), and messages returned by msgrcv() (and its object-oriented interface IPC::SysV::Msg::rcv) are also tainted, because other untrusted processes can modify messages and shared memory segments for their own nefarious purposes.
Bareword prototypes have been rationalized to enable them to be used
to override builtins that accept barewords and interpret them in
a special way, such as require or do.
Arguments prototyped as *
will now be visible within the subroutine
as either a simple scalar or as a reference to a typeglob.
See Prototypes in perlsub.
require and do may be overriddenrequire and do 'file'
operations may be overridden locally
by importing subroutines of the same name into the current package
(or globally by importing them into the CORE::GLOBAL:: namespace).
Overriding require will also affect use, provided the override
is visible at compile-time.
See Overriding Built-in Functions in perlsub.
Formerly, $^X was synonymous with ${"\cX"}, but $^XY was a syntax
error. Now variable names that begin with a control character may be
arbitrarily long. However, for compatibility reasons, these variables
must be written with explicit braces, as ${^XY}
for example.
${^XYZ}
is synonymous with ${"\cXYZ"}. Variable names with more
than one control character, such as ${^XY^Z}
, are illegal.
The old syntax has not changed. As before, `^X' may be either a
literal control-X character or the two-character sequence `caret' plus
`X'. When braces are omitted, the variable name stops after the
control character. Thus "$^XYZ"
continues to be synonymous with
$^X . "YZ"
as before.
As before, lexical variables may not have names beginning with control
characters. As before, variables whose names begin with a control
character are always forced to be in package `main'. All such variables
are reserved for future extensions, except those that begin with
^_, which may be used by user programs and are guaranteed not to
acquire special meaning in any future version of Perl.
-c
switch$^C
has a boolean value that reflects whether perl is being run
in compile-only mode (i.e. via the -c
switch). Since
BEGIN blocks are executed under such conditions, this variable
enables perl code to determine whether actions that make sense
only during normal running are warranted. See perlvar.
$^V
contains the Perl version number as a string composed of
characters whose ordinals match the version numbers, i.e. v5.6.0.
This may be used in string comparisons.
See Support for strings represented as a vector of ordinals
for an
example.
If Perl is built with the cpp macro PERL_Y2KWARN
defined,
it emits optional warnings when concatenating the number 19
with another number.
This behavior must be specifically enabled when running Configure. See INSTALL and README.Y2K.
In double-quoted strings, arrays now interpolate, no matter what. The behavior in earlier versions of perl 5 was that arrays would interpolate into strings if the array had been mentioned before the string was compiled, and otherwise Perl would raise a fatal compile-time error. In versions 5.000 through 5.003, the error was
- Literal @example now requires backslash
In versions 5.004_01 through 5.6.0, the error was
- In string, @example now must be written as \@example
The idea here was to get people into the habit of writing
"fred\@example.com"
when they wanted a literal @
sign, just as
they have always written "Give me back my \$5"
when they wanted a
literal $
sign.
Starting with 5.6.1, when Perl now sees an @
sign in a
double-quoted string, it always attempts to interpolate an array,
regardless of whether or not the array has been used or declared
already. The fatal error has been downgraded to an optional warning:
- Possible unintended interpolation of @example in string
This warns you that "fred@example.com"
is going to turn into
fred.com
if you don't backslash the @
.
See http://perl.plover.com/at-error.html for more details
about the history here.
The new magic variables @- and @+ provide the starting and ending offsets, respectively, of $&, $1, $2, etc. See perlvar for details.
While used internally by Perl as a pragma, this module also provides a way to fetch subroutine and variable attributes. See attributes.
The Perl Compiler suite has been extensively reworked for this release. More of the standard Perl test suite passes when run under the Compiler, but there is still a significant way to go to achieve production quality compiled executables.
- NOTE: The Compiler suite remains highly experimental. The
- generated code may not be correct, even when it manages to execute
- without errors.
Overall, Benchmark results exhibit lower average error and better timing accuracy.
You can now run tests for n seconds instead of guessing the right number of tests to run: e.g., timethese(-5, ...) will run each code for at least 5 CPU seconds. Zero as the "number of repetitions" means "for at least 3 CPU seconds". The output format has also changed. For example:
will now output something like this:
- Benchmark: running a, b, each for at least 5 CPU seconds...
- a: 5 wallclock secs ( 5.77 usr + 0.00 sys = 5.77 CPU) @ 200551.91/s (n=1156516)
- b: 4 wallclock secs ( 5.00 usr + 0.02 sys = 5.02 CPU) @ 159605.18/s (n=800686)
New features: "each for at least N CPU seconds...", "wallclock secs", and the "@ operations/CPU second (n=operations)".
timethese() now returns a reference to a hash of Benchmark objects containing the test results, keyed on the names of the tests.
timethis() now returns the iterations field in the Benchmark result object instead of 0.
timethese(), timethis(), and the new cmpthese() (see below) can also take a format specifier of 'none' to suppress output.
A new function countit() is just like timeit() except that it takes a TIME instead of a COUNT.
A new function cmpthese() prints a chart comparing the results of each test returned from a timethese() call. For each possible pair of tests, the percentage speed difference (iters/sec or seconds/iter) is shown.
For other details, see Benchmark.
The ByteLoader is a dedicated extension to generate and run Perl bytecode. See ByteLoader.
References can now be used.
The new version also allows a leading underscore in constant names, but disallows a double leading underscore (as in "__LINE__"). Some other names are disallowed or warned against, including BEGIN, END, etc. Some names which were forced into main:: used to fail silently in some cases; now they're fatal (outside of main::) and an optional warning (inside of main::). The ability to detect whether a constant had been set with a given name has been added.
See constant.
This pragma implements the \N
string escape. See charnames.
A Maxdepth
setting can be specified to avoid venturing
too deeply into deep data structures. See Data::Dumper.
The XSUB implementation of Dump() is now automatically called if the
Useqq
setting is not in use.
Dumping qr// objects works correctly.
DB
is an experimental module that exposes a clean abstraction
to Perl's debugging API.
DB_File can now be built with Berkeley DB versions 1, 2 or 3.
See ext/DB_File/Changes
.
Devel::DProf, a Perl source code profiler has been added. See Devel::DProf and dprofpp.
The Devel::Peek module provides access to the internal representation of Perl variables and data. It is a data debugging tool for the XS programmer.
The Dumpvalue module provides screen dumps of Perl data.
DynaLoader now supports a dl_unload_file() function on platforms that support unloading shared objects using dlclose().
Perl can also optionally arrange to unload all extension shared objects
loaded by Perl. To enable this, build Perl with the Configure option
-Accflags=-DDL_UNLOAD_ALL_AT_EXIT
. (This maybe useful if you are
using Apache with mod_perl.)
$PERL_VERSION now stands for $^V
(a string value) rather than for $]
(a numeric value).
Env now supports accessing environment variables like PATH as array variables.
More Fcntl constants added: F_SETLK64, F_SETLKW64, O_LARGEFILE for
large file (more than 4GB) access (NOTE: the O_LARGEFILE is
automatically added to sysopen() flags if large file support has been
configured, as is the default), Free/Net/OpenBSD locking behaviour
flags F_FLOCK, F_POSIX, Linux F_SHLCK, and O_ACCMODE: the combined
mask of O_RDONLY, O_WRONLY, and O_RDWR. The seek()/sysseek()
constants SEEK_SET, SEEK_CUR, and SEEK_END are available via the
:seek
tag. The chmod()/stat() S_IF* constants and S_IS* functions
are available via the :mode
tag.
A compare_text() function has been added, which allows custom comparison functions. See File::Compare.
File::Find now works correctly when the wanted() function is either autoloaded or is a symbolic reference.
A bug that caused File::Find to lose track of the working directory when pruning top-level directories has been fixed.
File::Find now also supports several other options to control its
behavior. It can follow symbolic links if the follow
option is
specified. Enabling the no_chdir
option will make File::Find skip
changing the current directory when walking directories. The untaint
flag can be useful when running with taint checks enabled.
See File::Find.
This extension implements BSD-style file globbing. By default, it will also be used for the internal implementation of the glob() operator. See File::Glob.
New methods have been added to the File::Spec module: devnull() returns the name of the null device (/dev/null on Unix) and tmpdir() the name of the temp directory (normally /tmp on Unix). There are now also methods to convert between absolute and relative filenames: abs2rel() and rel2abs(). For compatibility with operating systems that specify volume names in file paths, the splitpath(), splitdir(), and catdir() methods have been added.
The new File::Spec::Functions modules provides a function interface to the File::Spec module. Allows shorthand
- $fullname = catfile($dir1, $dir2, $file);
instead of
- $fullname = File::Spec->catfile($dir1, $dir2, $file);
Getopt::Long licensing has changed to allow the Perl Artistic License as well as the GPL. It used to be GPL only, which got in the way of non-GPL applications that wanted to use Getopt::Long.
Getopt::Long encourages the use of Pod::Usage to produce help messages. For example:
- use Getopt::Long;
- use Pod::Usage;
- my $man = 0;
- my $help = 0;
- GetOptions('help|?' => \$help, man => \$man) or pod2usage(2);
- pod2usage(1) if $help;
- pod2usage(-exitstatus => 0, -verbose => 2) if $man;
- __END__
- =head1 NAME
- sample - Using Getopt::Long and Pod::Usage
- =head1 SYNOPSIS
- sample [options] [file ...]
- Options:
- -help brief help message
- -man full documentation
- =head1 OPTIONS
- =over 8
- =item B<-help>
- Print a brief help message and exits.
- =item B<-man>
- Prints the manual page and exits.
- =back
- =head1 DESCRIPTION
- B<This program> will read the given input file(s) and do something
- useful with the contents thereof.
- =cut
See Pod::Usage for details.
A bug that prevented the non-option call-back <> from being specified as the first argument has been fixed.
To specify the characters < and > as option starters, use ><. Note, however, that changing option starters is strongly deprecated.
write() and syswrite() will now accept a single-argument form of the call, for consistency with Perl's syswrite().
You can now create a TCP-based IO::Socket::INET without forcing a connect attempt. This allows you to configure its options (like making it non-blocking) and then call connect() manually.
A bug that prevented the IO::Socket::protocol() accessor from ever returning the correct value has been corrected.
IO::Socket::connect now uses non-blocking IO instead of alarm() to do connect timeouts.
IO::Socket::accept now uses select() instead of alarm() for doing timeouts.
IO::Socket::INET->new now sets $! correctly on failure. $@ is still set for backwards compatibility.
Java Perl Lingo is now distributed with Perl. See jpl/README for more information.
use lib
now weeds out any trailing duplicate entries.
no lib
removes all named entries.
The bitwise operations <<
, >>
, &
, |,
and ~
are now supported on bigints.
The accessor methods Re, Im, arg, abs, rho, and theta can now also act as mutators (accessor $z->Re(), mutator $z->Re(3)).
The class method display_format
and the corresponding object method
display_format
, in addition to accepting just one argument, now can
also accept a parameter hash. Recognized keys of a parameter hash are
"style"
, which corresponds to the old one parameter case, and two
new parameters: "format"
, which is a printf()-style format string
(defaults usually to "%.15g"
, you can revert to the default by
setting the format string to undef) used for both parts of a
complex number, and "polar_pretty_print"
(defaults to true),
which controls whether an attempt is made to try to recognize small
multiples and rationals of pi (2pi, pi/2) at the argument (angle) of a
polar complex number.
The potentially disruptive change is that in list context both methods
now return the parameter hash, instead of only the value of the
"style"
parameter.
A little bit of radial trigonometry (cylindrical and spherical), radial coordinate conversions, and the great circle distance were added.
Pod::Parser is a base class for parsing and selecting sections of pod documentation from an input stream. This module takes care of identifying pod paragraphs and commands in the input and hands off the parsed paragraphs and commands to user-defined methods which are free to interpret or translate them as they see fit.
Pod::InputObjects defines some input objects needed by Pod::Parser, and for advanced users of Pod::Parser that need more about a command besides its name and text.
As of release 5.6.0 of Perl, Pod::Parser is now the officially sanctioned "base parser code" recommended for use by all pod2xxx translators. Pod::Text (pod2text) and Pod::Man (pod2man) have already been converted to use Pod::Parser and efforts to convert Pod::HTML (pod2html) are already underway. For any questions or comments about pod parsing and translating issues and utilities, please use the pod-people@perl.org mailing list.
For further information, please see Pod::Parser and Pod::InputObjects.
This utility checks pod files for correct syntax, according to perlpod. Obvious errors are flagged as such, while warnings are printed for mistakes that can be handled gracefully. The checklist is not complete yet. See Pod::Checker.
These modules provide a set of gizmos that are useful mainly for pod
translators. Pod::Find traverses directory structures and
returns found pod files, along with their canonical names (like
File::Spec::Unix
). Pod::ParseUtils contains
Pod::List (useful for storing pod list information), Pod::Hyperlink
(for parsing the contents of L<>
sequences) and Pod::Cache
(for caching information about pod files, e.g., link nodes).
Pod::Select is a subclass of Pod::Parser which provides a function named "podselect()" to filter out user-specified sections of raw pod documentation from an input stream. podselect is a script that provides access to Pod::Select from other scripts to be used as a filter. See Pod::Select.
Pod::Usage provides the function "pod2usage()" to print usage messages for a Perl script based on its embedded pod documentation. The pod2usage() function is generally useful to all script authors since it lets them write and maintain a single source (the pods) for documentation, thus removing the need to create and maintain redundant usage message text consisting of information already in the pods.
There is also a pod2usage script which can be used from other kinds of scripts to print usage messages from pods (even for non-Perl scripts with pods embedded in comments).
For details and examples, please see Pod::Usage.
Pod::Text has been rewritten to use Pod::Parser. While pod2text() is still available for backwards compatibility, the module now has a new preferred interface. See Pod::Text for the details. The new Pod::Text module is easily subclassed for tweaks to the output, and two such subclasses (Pod::Text::Termcap for man-page-style bold and underlining using termcap information, and Pod::Text::Color for markup with ANSI color sequences) are now standard.
pod2man has been turned into a module, Pod::Man, which also uses Pod::Parser. In the process, several outstanding bugs related to quotes in section headers, quoting of code escapes, and nested lists have been fixed. pod2man is now a wrapper script around this module.
An EXISTS method has been added to this module (and sdbm_exists() has been added to the underlying sdbm library), so one can now call exists on an SDBM_File tied hash and get the correct result, rather than a runtime error.
A bug that may have caused data loss when more than one disk block happens to be read from the database in a single FETCH() has been fixed.
Sys::Syslog now uses XSUBs to access facilities from syslog.h so it no longer requires syslog.ph to exist.
Sys::Hostname now uses XSUBs to call the C library's gethostname() or uname() if they exist.
Term::ANSIColor is a very simple module to provide easy and readable access to the ANSI color and highlighting escape sequences, supported by most ANSI terminal emulators. It is now included standard.
The timelocal() and timegm() functions used to silently return bogus results when the date fell outside the machine's integer range. They now consistently croak() if the date falls in an unsupported range.
The error return value in list context has been changed for all functions
that return a list of values. Previously these functions returned a list
with a single element undef if an error occurred. Now these functions
return the empty list in these situations. This applies to the following
functions:
- Win32::FsType
- Win32::GetOSVersion
The remaining functions are unchanged and continue to return undef on
error even in list context.
The Win32::SetLastError(ERROR) function has been added as a complement to the Win32::GetLastError() function.
The new Win32::GetFullPathName(FILENAME) returns the full absolute pathname for FILENAME in scalar context. In list context it returns a two-element list containing the fully qualified directory name and the filename. See Win32.
The XSLoader extension is a simpler alternative to DynaLoader. See XSLoader.
A new feature called "DBM Filters" has been added to all the DBM modules--DB_File, GDBM_File, NDBM_File, ODBM_File, and SDBM_File. DBM Filters add four new methods to each DBM module:
- filter_store_key
- filter_store_value
- filter_fetch_key
- filter_fetch_value
These can be used to filter key-value pairs before the pairs are written to the database or just after they are read from the database. See perldbmfilter for further information.
use attrs
is now obsolete, and is only provided for
backward-compatibility. It's been replaced by the sub : attributes
syntax. See Subroutine Attributes in perlsub and attributes.
Lexical warnings pragma, use warnings;
, to control optional warnings.
See perllexwarn.
use filetest
to control the behaviour of filetests (-r
-w
...). Currently only one subpragma implemented, "use filetest
'access';", that uses access(2) or equivalent to check permissions
instead of using stat(2) as usual. This matters in filesystems
where there are ACLs (access control lists): the stat(2) might lie,
but access(2) knows better.
The open pragma can be used to specify default disciplines for
handle constructors (e.g. open()) and for qx//. The two
pseudo-disciplines :raw
and :crlf
are currently supported on
DOS-derivative platforms (i.e. where binmode is not a no-op).
See also binmode() can be used to set :crlf and :raw modes.
dprofpp
is used to display profile data generated using Devel::DProf
.
See dprofpp.
The find2perl
utility now uses the enhanced features of the File::Find
module. The -depth and -follow options are supported. Pod documentation
is also included in the script.
The h2xs
tool can now work in conjunction with C::Scan
(available
from CPAN) to automatically parse real-life header files. The -M
,
-a
, -k
, and -o
options are new.
perlcc
now supports the C and Bytecode backends. By default,
it generates output from the simple C backend rather than the
optimized C backend.
Support for non-Unix platforms has been improved.
perldoc
has been reworked to avoid possible security holes.
It will not by default let itself be run as the superuser, but you
may still use the -U switch to try to make it drop privileges
first.
Many bug fixes and enhancements were added to perl5db.pl, the
Perl debugger. The help documentation was rearranged. New commands
include < ?, > ?, and { ? to list out current
actions, man docpage to run your doc viewer on some perl
docset, and support for quoted options. The help information was
rearranged, and should be viewable once again if you're using less
as your pager. A serious security hole was plugged--you should
immediately remove all older versions of the Perl debugger as
installed in previous releases, all the way back to perl3, from
your system to avoid being bitten by this.
Many of the platform-specific README files are now part of the perl installation. See perl for the complete list.
The official list of public Perl API functions.
A tutorial for beginners on object-oriented Perl.
An introduction to using the Perl Compiler suite.
A howto document on using the DBM filter facility.
All material unrelated to running the Perl debugger, plus all low-level guts-like details that risked crushing the casual user of the debugger, have been relocated from the old manpage to the next entry below.
This new manpage contains excessively low-level material not related to the Perl debugger, but slightly related to debugging Perl itself. It also contains some arcane internal details of how the debugging process works that may only be of interest to developers of Perl debuggers.
Notes on the fork() emulation currently available for the Windows platform.
An introduction to writing Perl source filters.
Some guidelines for hacking the Perl source code.
A list of internal functions in the Perl source code. (List is currently empty.)
Introduction and reference information about lexically scoped warning categories.
Detailed information about numbers as they are represented in Perl.
A tutorial on using open() effectively.
A tutorial that introduces the essentials of references.
A tutorial on managing class data for object modules.
Discussion of the most often wanted features that may someday be supported in Perl.
An introduction to Unicode support features in Perl.
Many common sort() operations using a simple inlined block are now optimized for faster performance.
Certain operations in the RHS of assignment statements have been optimized to directly set the lexical variable on the LHS, eliminating redundant copying overheads.
Minor changes in how subroutine calls are handled internally provide marginal improvements in performance.
The hash values returned by delete(), each(), values() and hashes in a list context are the actual values in the hash, instead of copies. This results in significantly better performance, because it eliminates needless copying in most situations.
The -Dusethreads flag now enables the experimental interpreter-based thread support by default. To get the flavor of experimental threads that was in 5.005 instead, you need to run Configure with "-Dusethreads -Duse5005threads".
As of v5.6.0, interpreter-threads support is still lacking a way to
create new threads from Perl (i.e., use Thread;
will not work with
interpreter threads). use Thread;
continues to be available when you
specify the -Duse5005threads option to Configure, bugs and all.
- NOTE: Support for threads continues to be an experimental feature.
- Interfaces and implementation are subject to sudden and drastic changes.
The following new flags may be enabled on the Configure command line
by running Configure with -Dflag
.
- usemultiplicity
- usethreads useithreads (new interpreter threads: no Perl API yet)
- usethreads use5005threads (threads as they were in 5.005)
- use64bitint (equal to now deprecated 'use64bits')
- use64bitall
- uselongdouble
- usemorebits
- uselargefiles
- usesocks (only SOCKS v5 supported)
The Configure options enabling the use of threads and the use of 64-bitness are now more daring in the sense that they no more have an explicit list of operating systems of known threads/64-bit capabilities. In other words: if your operating system has the necessary APIs and datatypes, you should be able just to go ahead and use them, for threads by Configure -Dusethreads, and for 64 bits either explicitly by Configure -Duse64bitint or implicitly if your system has 64-bit wide datatypes. See also 64-bit support.
Some platforms have "long doubles", floating point numbers of even larger range than ordinary "doubles". To enable using long doubles for Perl's scalars, use -Duselongdouble.
You can enable both -Duse64bitint and -Duselongdouble with -Dusemorebits. See also 64-bit support.
Some platforms support system APIs that are capable of handling large files (typically, files larger than two gigabytes). Perl will try to use these APIs if you ask for -Duselargefiles.
See Large file support for more information.
You can use "Configure -Uinstallusrbinperl" which causes installperl to skip installing perl also as /usr/bin/perl. This is useful if you prefer not to modify /usr/bin for some reason or another but harmful because many scripts assume to find Perl in /usr/bin/perl.
You can use "Configure -Dusesocks" which causes Perl to probe for the SOCKS proxy protocol library (v5, not v4). For more information on SOCKS, see:
- http://www.socks.nec.com/
-A
flagYou can "post-edit" the Configure variables using the Configure -A
switch. The editing happens immediately after the platform specific
hints files have been processed but before the actual configuration
process starts. Run Configure -h
to find out the full -A
syntax.
The installation structure has been enriched to improve the support for maintaining multiple versions of perl, to provide locations for vendor-supplied modules, scripts, and manpages, and to ease maintenance of locally-added modules, scripts, and manpages. See the section on Installation Directories in the INSTALL file for complete details. For most users building and installing from source, the defaults should be fine.
If you previously used Configure -Dsitelib
or -Dsitearch
to set
special values for library directories, you might wish to consider using
the new -Dsiteprefix
setting instead. Also, if you wish to re-use a
config.sh file from an earlier version of perl, you should be sure to
check that Configure makes sensible choices for the new directories.
See INSTALL for complete details.
The Mach CThreads (NEXTSTEP, OPENSTEP) are now supported by the Thread extension.
GNU/Hurd is now supported.
Rhapsody/Darwin is now supported.
EPOC is now supported (on Psion 5).
The cygwin port (formerly cygwin32) has been greatly improved.
Perl now works with djgpp 2.02 (and 2.03 alpha).
Environment variable names are not converted to uppercase any more.
Incorrect exit codes from backticks have been fixed.
This port continues to use its own builtin globbing (not File::Glob).
Support for this EBCDIC platform has not been renewed in this release. There are difficulties in reconciling Perl's standardization on UTF-8 as its internal representation for characters with the EBCDIC character set, because the two are incompatible.
It is unclear whether future versions will renew support for this platform, but the possibility exists.
Numerous revisions and extensions to configuration, build, testing, and installation process to accommodate core changes and VMS-specific options.
Expand %ENV-handling code to allow runtime mapping to logical names, CLI symbols, and CRTL environ array.
Extension of subprocess invocation code to accept filespecs as command "verbs".
Add to Perl command line processing the ability to use default file types and
to recognize Unix-style 2>&1
.
Expansion of File::Spec::VMS routines, and integration into ExtUtils::MM_VMS.
Extension of ExtUtils::MM_VMS to handle complex extensions more flexibly.
Barewords at start of Unix-syntax paths may be treated as text rather than only as logical names.
Optional secure translation of several logical names used internally by Perl.
Miscellaneous bugfixing and porting of new core code to VMS.
Thanks are gladly extended to the many people who have contributed VMS patches, testing, and ideas.
Perl can now emulate fork() internally, using multiple interpreters running in different concurrent threads. This support must be enabled at build time. See perlfork for detailed information.
When given a pathname that consists only of a drivename, such as A:
,
opendir() and stat() now use the current working directory for the drive
rather than the drive root.
The builtin XSUB functions in the Win32:: namespace are documented. See Win32.
$^X now contains the full path name of the running executable.
A Win32::GetLongPathName() function is provided to complement Win32::GetFullPathName() and Win32::GetShortPathName(). See Win32.
POSIX::uname() is supported.
system(1,...) now returns true process IDs rather than process handles. kill() accepts any real process id, rather than strictly return values from system(1,...).
For better compatibility with Unix, kill(0, $pid)
can now be used to
test whether a process exists.
The Shell
module is supported.
Better support for building Perl under command.com in Windows 95 has been added.
Scripts are read in binary mode by default to allow ByteLoader (and the filter mechanism in general) to work properly. For compatibility, the DATA filehandle will be set to text mode if a carriage return is detected at the end of the line containing the __END__ or __DATA__ token; if not, the DATA filehandle will be left open in binary mode. Earlier versions always opened the DATA filehandle in text mode.
The glob() operator is implemented via the File::Glob
extension,
which supports glob syntax of the C shell. This increases the flexibility
of the glob() operator, but there may be compatibility issues for
programs that relied on the older globbing syntax. If you want to
preserve compatibility with the older syntax, you might want to run
perl with -MFile::DosGlob
. For details and compatibility information,
see File::Glob.
With $/
set to undef, "slurping" an empty file returns a string of
zero length (instead of undef, as it used to) the first time the
HANDLE is read after $/
is set to undef. Further reads yield
undef.
This means that the following will append "foo" to an empty file (it used to do nothing):
- perl -0777 -pi -e 's/^/foo/' empty_file
The behaviour of:
- perl -pi -e 's/^/foo/' empty_file
is unchanged (it continues to leave the file empty).
eval '...'
improvementsLine numbers (as reflected by caller() and most diagnostics) within
eval '...'
were often incorrect where here documents were involved.
This has been corrected.
Lexical lookups for variables appearing in eval '...'
within
functions that were themselves called within an eval '...'
were
searching the wrong place for lexicals. The lexical search now
correctly ends at the subroutine's block boundary.
The use of return within eval {...}
caused $@ not to be reset
correctly when no exception occurred within the eval. This has
been fixed.
Parsing of here documents used to be flawed when they appeared as
the replacement expression in eval 's/.../.../e'
. This has
been fixed.
Some "errors" encountered at compile time were by necessity generated as warnings followed by eventual termination of the program. This enabled more such errors to be reported in a single run, rather than causing a hard stop at the first error that was encountered.
The mechanism for reporting such errors has been reimplemented
to queue compile-time errors and report them at the end of the
compilation as true errors rather than as warnings. This fixes
cases where error messages leaked through in the form of warnings
when code was compiled at run time using eval STRING
, and
also allows such errors to be reliably trapped using eval "..."
.
Sometimes implicitly closed filehandles (as when they are localized, and Perl automatically closes them on exiting the scope) could inadvertently set $? or $!. This has been corrected.
When taking a slice of a literal list (as opposed to a slice of an array or hash), Perl used to return an empty list if the result happened to be composed of all undef values.
The new behavior is to produce an empty list if (and only if) the original list was empty. Consider the following example:
The old behavior would have resulted in @a having no elements. The new behavior ensures it has three undefined elements.
Note in particular that the behavior of slices of the following cases remains unchanged:
- @a = ()[1,2];
- @a = (getpwent)[7,0];
- @a = (anything_returning_empty_list())[2,1,2];
- @a = @b[2,1,2];
- @a = @c{'a','b','c'};
See perldata.
(\$) prototype and $foo{a}
A scalar reference prototype now correctly allows a hash or array element in that slot.
goto &sub
and AUTOLOADThe goto &sub
construct works correctly when &sub
happens
to be autoloaded.
-bareword
allowed under use integer
The autoquoting of barewords preceded by -
did not work
in prior versions when the integer
pragma was enabled.
This has been fixed.
When code in a destructor threw an exception, it went unnoticed in earlier versions of Perl, unless someone happened to be looking in $@ just after the point the destructor happened to run. Such failures are now visible as warnings when warnings are enabled.
printf() and sprintf() previously reset the numeric locale back to the default "C" locale. This has been fixed.
Numbers formatted according to the local numeric locale (such as using a decimal comma instead of a decimal dot) caused "isn't numeric" warnings, even while the operations accessing those numbers produced correct results. These warnings have been discontinued.
The eval 'return sub {...}'
construct could sometimes leak
memory. This has been fixed.
Operations that aren't filehandle constructors used to leak memory when used on invalid filehandles. This has been fixed.
Constructs that modified @_
could fail to deallocate values
in @_
and thus leak memory. This has been corrected.
Perl could sometimes create empty subroutine stubs when a subroutine was not found in the package. Such cases stopped later method lookups from progressing into base packages. This has been corrected.
-U
When running in unsafe mode, taint violations could sometimes cause silent failures. This has been fixed.
-c
switchPrior versions used to run BEGIN and END blocks when Perl was
run in compile-only mode. Since this is typically not the expected
behavior, END blocks are not executed anymore when the -c
switch
is used, or if compilation fails.
See Support for CHECK blocks for how to run things when the compile phase ends.
Using the __DATA__
token creates an implicit filehandle to
the file that contains the token. It is the program's
responsibility to close it when it is done reading from it.
This caveat is now better explained in the documentation. See perldata.
(W misc) A "my" or "our" variable has been redeclared in the current scope or statement, effectively eliminating all access to the previous instance. This is almost always a typographical error. Note that the earlier variable will still exist until the end of the scope or until all closure referents to it are destroyed.
(F) Lexically scoped subroutines are not yet implemented. Don't try that yet.
(W misc) You seem to have already declared the same global once before in the current lexical scope.
(F) The '!' is allowed in pack() and unpack() only after certain types. See pack.
(F) You had an unpack template indicating a counted-length string, but you have also specified an explicit size for the string. See pack.
(F) You had an unpack template indicating a counted-length string, which must be followed by one of the letters a, A or Z to indicate what sort of string is to be unpacked. See pack.
(F) You had a pack template indicating a counted-length string, Currently the only things that can have their length counted are a*, A* or Z*. See pack.
(F) You had an unpack template that contained a '#', but this did not follow some numeric unpack specification. See pack.
(W regexp) You used a backslash-character combination which is not recognized
by Perl. This combination appears in an interpolated variable or a
'-delimited regular expression. The character was understood literally.
(W regexp) You used a backslash-character combination which is not recognized by Perl inside character classes. The character was understood literally.
(W syntax) You have used a pattern where Perl expected to find a string,
as in the first argument to join. Perl will treat the true
or false result of matching the pattern against $_ as the string,
which is probably not what you had in mind.
(W prototype) You've called a function that has a prototype before the parser saw a definition or declaration for it, and Perl could not check that the call conforms to the prototype. You need to either add an early prototype declaration for the subroutine in question, or move the subroutine definition ahead of the call to get proper prototype checking. Alternatively, if you are certain that you're calling the function correctly, you may put an ampersand before the name to avoid the warning. See perlsub.
(F) The argument to exists() must be a hash or array element, such as:
- $foo{$bar}
- $ref->{"susie"}[12]
(F) The argument to delete() must be either a hash or array element, such as:
- $foo{$bar}
- $ref->{"susie"}[12]
or a hash or array slice, such as:
- @foo[$bar, $baz, $xyzzy]
- @{$ref->[12]}{"susie", "queue"}
(F) The argument to exists() for exists &sub
must be a subroutine
name, and not a subroutine call. exists &sub()
will generate this error.
(W reserved) A lowercase attribute name was used that had a package-specific handler. That name might have a meaning to Perl itself some day, even though it doesn't yet. Perhaps you should use a mixed-case attribute name, instead. See attributes.
(W misc) This prefix usually indicates that a DESTROY() method raised the indicated exception. Since destructors are usually called by the system at arbitrary points during execution, and often a vast number of times, the warning is issued only once for any number of failures that would otherwise result in the same message being repeated.
Failure of user callbacks dispatched using the G_KEEPERR
flag
could also result in this warning. See G_KEEPERR in perlcall.
(F) You wrote require <file>
when you should have written
require 'file'
.
(F) You tried to join a thread from within itself, which is an impossible task. You may be joining the wrong thread, or you may need to move the join() to some other thread.
(F) You've used the /e switch to evaluate the replacement for a substitution, but perl found a syntax error in the code to evaluate, most likely an unexpected right brace '}'.
(S) An internal routine called realloc() on something that had never been
malloc()ed in the first place. Mandatory, but can be disabled by
setting environment variable PERL_BADFREE
to 1.
(W bareword) The compiler found a bareword where it expected a conditional, which often indicates that an || or && was parsed as part of the last argument of the previous construct, for example:
It may also indicate a misspelled constant that has been interpreted as a bareword:
The strict
pragma is useful in avoiding such errors.
(W portable) The binary number you specified is larger than 2**32-1 (4294967295) and therefore non-portable between systems. See perlport for more on portability concerns.
(W portable) Using bit vector sizes larger than 32 is non-portable.
(W internal) A warning peculiar to VMS. While Perl was preparing to iterate over %ENV, it encountered a logical name or symbol definition which was too long, so it was truncated to the string shown.
(P) For some reason you can't check the filesystem of the script for nosuid.
(S) Currently, only scalar variables can declared with a specific class qualifier in a "my" or "our" declaration. The semantics may be extended for other types of variables in future.
(F) Only scalar, array, and hash variables may be declared as "my" or "our" variables. They must have ordinary identifiers as names.
(W signal) Perl has detected that it is being run with the SIGCHLD signal (sometimes known as SIGCLD) disabled. Since disabling this signal will interfere with proper determination of exit status of child processes, Perl has reset the signal to its default value. This situation typically indicates that the parent program under which Perl may be running (e.g., cron) is being very careless.
(F) Subroutines meant to be used in lvalue context should be declared as such, see Lvalue subroutines in perlsub.
(S) A warning peculiar to VMS. Perl tried to read an element of %ENV from the CRTL's internal environment array and discovered the array was missing. You need to figure out where your CRTL misplaced its environ or define PERL_ENV_TABLES (see perlvms) so that environ is not searched.
(S) You requested an inplace edit without creating a backup file. Perl was unable to remove the original file to replace it with the modified file. The file was left unmodified.
(F) Perl detected an attempt to return illegal lvalues (such as temporary or readonly values) from a subroutine used as an lvalue. This is not allowed.
(F) You attempted to weaken something that was not a reference. Only references can be weakened.
(F) The class in the character class [: :] syntax is unknown. See perlre.
(W unsafe) The character class constructs [: :], [= =], and [. .] go inside character classes, the [] are part of the construct, for example: /[012[:alpha:]345]/. Note that [= =] and [. .] are not currently implemented; they are simply placeholders for future extensions.
(F) A constant value (perhaps declared using the use constant
pragma)
is being dereferenced, but it amounts to the wrong type of reference. The
message indicates the type of reference that was expected. This usually
indicates a syntax error in dereferencing the constant value.
See Constant Functions in perlsub and constant.
(F) The parser found inconsistencies either while attempting to define an
overloaded constant, or when trying to find the character name specified
in the \N{...}
escape. Perhaps you forgot to load the corresponding
overload
or charnames
pragma? See charnames and overload.
(F) The CORE:: namespace is reserved for Perl keywords.
(D) defined() is not usually useful on arrays because it checks for an
undefined scalar value. If you want to see if the array is empty,
just use if (@array) { # not empty } for example.
(D) defined() is not usually useful on hashes because it checks for an
undefined scalar value. If you want to see if the hash is empty,
just use if (%hash) { # not empty } for example.
See Server error.
(W misc) Remember that "our" does not localize the declared global variable. You have declared it again in the same lexical scope, which seems superfluous.
See Server error.
(F) While under the use filetest
pragma, switching the real and
effective uids or gids failed.
(W regexp) A character class range must start and end at a literal character, not
another character class like \d
or [:alpha:]. The "-" in your false
range is interpreted as a literal "-". Consider quoting the "-", "\-".
See perlre.
(W io) You tried to read from a filehandle opened only for writing. If you intended it to be a read/write filehandle, you needed to open it with "+<" or "+>" or "+>>" instead of with "<" or nothing. If you intended only to read from the file, use "<". See open.
(W closed) The filehandle you're attempting to flock() got itself closed some time before now. Check your logic flow. flock() operates on filehandles. Are you attempting to call flock() on a dirhandle by the same name?
(F) You've said "use strict vars", which indicates that all variables must either be lexically scoped (using "my"), declared beforehand using "our", or explicitly qualified to say which package the global variable is in (using "::").
(W portable) The hexadecimal number you specified is larger than 2**32-1 (4294967295) and therefore non-portable between systems. See perlport for more on portability concerns.
(W internal) A warning peculiar to VMS. Perl tried to read the CRTL's internal
environ array, and encountered an element without the =
delimiter
used to separate keys from values. The element is ignored.
(W internal) A warning peculiar to VMS. Perl tried to read a logical name or CLI symbol definition when preparing to iterate over %ENV, and didn't see the expected delimiter between key and value, so the line was ignored.
(F) You used a digit other than 0 or 1 in a binary number.
(W digit) You may have tried to use a digit other than 0 or 1 in a binary number. Interpretation of the binary number stopped before the offending digit.
(F) The number of bits in vec() (the third argument) must be a power of two from 1 to 32 (or 64, if your platform supports that).
(W overflow) The hexadecimal, octal or binary number you have specified either as a literal or as an argument to hex() or oct() is too big for your architecture, and has been converted to a floating point number. On a 32-bit architecture the largest hexadecimal, octal or binary number representable without overflow is 0xFFFFFFFF, 037777777777, or 0b11111111111111111111111111111111 respectively. Note that Perl transparently promotes all numbers to a floating point representation internally--subject to loss of precision errors in subsequent operations.
The indicated attribute for a subroutine or variable was not recognized by Perl or by a user-supplied handler. See attributes.
The indicated attributes for a subroutine or variable were not recognized by Perl or by a user-supplied handler. See attributes.
The offending range is now explicitly displayed.
(F) Something other than a colon or whitespace was seen between the elements of an attribute list. If the previous attribute had a parenthesised parameter list, perhaps that list was terminated too soon. See attributes.
(F) Something other than a colon or whitespace was seen between the elements of a subroutine attribute list. If the previous attribute had a parenthesised parameter list, perhaps that list was terminated too soon.
(F) While under the use filetest
pragma, switching the real and
effective uids or gids failed.
(F) Due to limitations in the current implementation, array and hash values cannot be returned in subroutines used in lvalue context. See Lvalue subroutines in perlsub.
See Server error.
(F) Wrong syntax of character name literal \N{charname}
within
double-quotish context.
(W pipe) You used the open(FH, "| command")
or open(FH, "command |")
construction, but the command was missing or blank.
(F) The reserved syntax for lexically scoped subroutines requires that they have a name with which they can be found.
(F) The indicated command line switch needs a mandatory argument, but you haven't specified one.
(F) Fully qualified variable names are not allowed in "our" declarations, because that doesn't make much sense under existing semantics. Such syntax is reserved for future extensions.
(F) The argument to the indicated command line switch must follow immediately after the switch, without intervening spaces.
(S) A warning peculiar to VMS. Perl was unable to find the local timezone offset, so it's assuming that local system time is equivalent to UTC. If it's not, define the logical name SYS$TIMEZONE_DIFFERENTIAL to translate to the number of seconds which need to be added to UTC to get local time.
(W portable) The octal number you specified is larger than 2**32-1 (4294967295) and therefore non-portable between systems. See perlport for more on portability concerns.
See also perlport for writing portable code.
(P) Failed an internal consistency check while trying to reset a weak reference.
(F) forked child returned an incomprehensible message about its errno.
(P) Failed an internal consistency check while trying to reset all weak references to an object.
(W parenthesis) You said something like
- my $foo, $bar = @_;
when you meant
- my ($foo, $bar) = @_;
Remember that "my", "our", and "local" bind tighter than comma.
(W ambiguous) It used to be that Perl would try to guess whether you wanted an array interpolated or a literal @. It no longer does this; arrays are now always interpolated into strings. This means that if you try something like:
- print "fred@example.com";
and the array @example
doesn't exist, Perl is going to print
fred.com
, which is probably not what you wanted. To get a literal
@
sign in a string, put a backslash before it, just as you would
to get a literal $
sign.
(W y2k) You are concatenating the number 19 with another number, which could be a potential Year 2000 problem.
(W deprecated) You have written something like this:
- sub doit
- {
- use attrs qw(locked);
- }
You should use the new declaration syntax instead.
- sub doit : locked
- {
- ...
The use attrs
pragma is now obsolete, and is only provided for
backward-compatibility. See Subroutine Attributes in perlsub.
See Server error.
(F) You can't specify a repeat count so large that it overflows your signed integers. See pack.
(F) You can't specify a repeat count so large that it overflows your signed integers. See unpack.
(S) An internal routine called realloc() on something that had already been freed.
(W misc) You have attempted to weaken a reference that is already weak. Doing so has no effect.
(F) Your system has the setpgrp() from BSD 4.2, which takes no arguments, unlike POSIX setpgid(), which takes a process ID and process group ID.
(W regexp) You applied a regular expression quantifier in a place where it
makes no sense, such as on a zero-width assertion.
Try putting the quantifier inside the assertion instead. For example,
the way to match "abc" provided that it is followed by three
repetitions of "xyz" is /abc(?=(?:xyz){3})/
, not /abc(?=xyz){3}/
.
(F) While under the use filetest
pragma, we cannot switch the
real and effective uids or gids.
(W internal) Warnings peculiar to VMS. You tried to change or delete an element of the CRTL's internal environ array, but your copy of Perl wasn't built with a CRTL that contained the setenv() function. You'll need to rebuild Perl with a CRTL that does, or redefine PERL_ENV_TABLES (see perlvms) so that the environ array isn't the target of the change to %ENV which produced the warning.
(W void) A CHECK or INIT block is being defined during run time proper,
when the opportunity to run them has already passed. Perhaps you are
loading a file with require or do when you should be using
use instead. Or perhaps you should put the require or do
inside a BEGIN block.
(F) The second argument of 3-argument open() is not among the list
of valid modes: <
, >, >>
, +<
,
+>, +>>
, -|, |-.
(P) An error peculiar to VMS. Perl was reading values for %ENV before iterating over it, and someone else stuck a message in the stream of data Perl expected. Someone's very confused, or perhaps trying to subvert Perl's population of %ENV for nefarious purposes.
(W misc) You used a backslash-character combination which is not recognized by Perl. The character was understood literally.
(F) The lexer saw an opening (left) parenthesis character while parsing an attribute list, but the matching closing (right) parenthesis character was not found. You may need to add (or remove) a backslash character to get your parentheses to balance. See attributes.
(F) The lexer found something other than a simple identifier at the start of an attribute, and it wasn't a semicolon or the start of a block. Perhaps you terminated the parameter list of the previous attribute too soon. See attributes.
(F) The lexer saw an opening (left) parenthesis character while parsing a subroutine attribute list, but the matching closing (right) parenthesis character was not found. You may need to add (or remove) a backslash character to get your parentheses to balance.
(F) The lexer found something other than a simple identifier at the start of a subroutine attribute, and it wasn't a semicolon or the start of a block. Perhaps you terminated the parameter list of the previous attribute too soon.
(W misc) A warning peculiar to VMS. Perl tried to read the value of an %ENV element from a CLI symbol table, and found a resultant string longer than 1024 characters. The return value has been truncated to 1024 characters.
(P) The attempt to translate a use Module n.n LIST
statement into
its equivalent BEGIN
block found an internal inconsistency with
the version number.
Compatibility tests for sub : attrs
vs the older use attrs
.
Tests for new environment scalar capability (e.g., use Env qw($BAR);
).
Tests for new environment array capability (e.g., use Env qw(@PATH);
).
IO constants (SEEK_*, _IO*).
Directory-related IO methods (new, read, close, rewind, tied delete).
INET sockets with multi-homed hosts.
IO poll().
UNIX sockets.
Regression tests for my ($x,@y,%z) : attrs
and <sub : attrs>.
File test operators.
Verify operations that access pad objects (lexicals and temporaries).
Verify exists &sub
operations.
Beware that any new warnings that have been added or old ones that have been enhanced are not considered incompatible changes.
Since all new warnings must be explicitly requested via the -w
switch or the warnings
pragma, it is ultimately the programmer's
responsibility to ensure that warnings are enabled judiciously.
All subroutine definitions named CHECK are now special. See
/"Support for CHECK blocks" for more information.
There is a potential incompatibility in the behavior of list slices that are comprised entirely of undefined values. See Behavior of list slices is more consistent.
The English module now sets $PERL_VERSION to $^V (a string value) rather
than $]
(a numeric value). This is a potential incompatibility.
Send us a report via perlbug if you are affected by this.
See Improved Perl version numbering system for the reasons for this change.
1.2.3
parse differently
Previously, numeric literals with more than one dot in them were interpreted as a floating point number concatenated with one or more numbers. Such "numbers" are now parsed as strings composed of the specified ordinals.
For example, print 97.98.99
used to output 97.9899
in earlier
versions, but now prints abc
.
See Support for strings represented as a vector of ordinals.
Perl programs that depend on reproducing a specific set of pseudo-random
numbers may now produce different output due to improvements made to the
rand() builtin. You can use sh Configure -Drandfunc=rand
to obtain
the old behavior.
Even though Perl hashes are not order preserving, the apparently random order encountered when iterating on the contents of a hash is actually determined by the hashing algorithm used. Improvements in the algorithm may yield a random order that is different from that of previous versions, especially when iterating on hashes.
See Better worst-case behavior of hashes for additional information.
undef fails on read only values
Using the undef operator on a readonly value (such as $1) has
the same effect as assigning undef to the readonly value--it
throws an exception.
Pipe and socket handles are also now subject to the close-on-exec behavior determined by the special variable $^F.
"$$1"
to mean "${$}1"
is unsupported
Perl 5.004 deprecated the interpretation of $$1
and
similar within interpolated strings to mean $$ . "1"
,
but still allowed it.
In Perl 5.6.0 and later, "$$1"
always means "${$1}"
.
\(%h)
operate on aliases to values, not copies
delete(), each(), values() and hashes (e.g. \(%h)
)
in a list context return the actual
values in the hash, instead of copies (as they used to in earlier
versions). Typical idioms for using these constructs copy the
returned values, but this can make a significant difference when
creating references to the returned values. Keys in the hash are still
returned as copies when iterating on a hash.
See also delete(), each(), values() and hash iteration are faster.
vec() generates a run-time error if the BITS argument is not a valid power-of-two integer.
Most references to internal Perl operations in diagnostics have been changed to be more descriptive. This may be an issue for programs that may incorrectly rely on the exact text of diagnostics for proper functioning.
%@
has been removed
The undocumented special variable %@
that used to accumulate
"background" errors (such as those that happen in DESTROY())
has been removed, because it could potentially result in memory
leaks.
The not
operator now falls under the "if it looks like a function,
it behaves like a function" rule.
As a result, the parenthesized form can be used with grep and map.
The following construct used to be a syntax error before, but it works
as expected now:
- grep not($_), @things;
On the other hand, using not
with a literal list slice may not
work. The following previously allowed construct:
- print not (1,2,3)[0];
needs to be written with additional parentheses now:
- print not((1,2,3)[0]);
The behavior remains unaffected when not
is not followed by parentheses.
(*) have changed
The semantics of the bareword prototype *
have changed. Perl 5.005
always coerced simple scalar arguments to a typeglob, which wasn't useful
in situations where the subroutine must distinguish between a simple
scalar and a typeglob. The new behavior is to not coerce bareword
arguments to a typeglob. The value will always be visible as either
a simple scalar or as a reference to a typeglob.
If your platform is either natively 64-bit or if Perl has been
configured to used 64-bit integers, i.e., $Config{ivsize} is 8,
there may be a potential incompatibility in the behavior of bitwise
numeric operators (& | ^ ~ <<>>). These operators used to strictly
operate on the lower 32 bits of integers in previous versions, but now
operate over the entire native integral width. In particular, note
that unary ~
will produce different results on platforms that have
different $Config{ivsize}. For portability, be sure to mask off
the excess bits in the result of unary ~
, e.g., ~$x & 0xffffffff
.
As described in Improved security features, there may be more sources of taint in a Perl program.
To avoid these new tainting behaviors, you can build Perl with the
Configure option -Accflags=-DINCOMPLETE_TAINTS
. Beware that the
ensuing perl binary may be insecure.
PERL_POLLUTE
Release 5.005 grandfathered old global symbol names by providing preprocessor
macros for extension source compatibility. As of release 5.6.0, these
preprocessor definitions are not available by default. You need to explicitly
compile perl with -DPERL_POLLUTE
to get these definitions. For
extensions still using the old symbols, this option can be
specified via MakeMaker:
- perl Makefile.PL POLLUTE=1
PERL_IMPLICIT_CONTEXT
This new build option provides a set of macros for all API functions
such that an implicit interpreter/thread context argument is passed to
every API function. As a result of this, something like sv_setsv(foo,bar)
amounts to a macro invocation that actually translates to something like
Perl_sv_setsv(my_perl,foo,bar)
. While this is generally expected
to not have any significant source compatibility issues, the difference
between a macro and a real function call will need to be considered.
This means that there is a source compatibility issue as a result of this if your extensions attempt to use pointers to any of the Perl API functions.
Note that the above issue is not relevant to the default build of Perl, whose interfaces continue to match those of prior versions (but subject to the other options described here).
See Background and PERL_IMPLICIT_CONTEXT in perlguts for detailed information on the ramifications of building Perl with this option.
- NOTE: PERL_IMPLICIT_CONTEXT is automatically enabled whenever Perl is built
- with one of -Dusethreads, -Dusemultiplicity, or both. It is not
- intended to be enabled by users at this time.
PERL_POLLUTE_MALLOC
Enabling Perl's malloc in release 5.005 and earlier caused the namespace of the system's malloc family of functions to be usurped by the Perl versions, since by default they used the same names. Besides causing problems on platforms that do not allow these functions to be cleanly replaced, this also meant that the system versions could not be called in programs that used Perl's malloc. Previous versions of Perl have allowed this behaviour to be suppressed with the HIDEMYMALLOC and EMBEDMYMALLOC preprocessor definitions.
As of release 5.6.0, Perl's malloc family of functions have default names
distinct from the system versions. You need to explicitly compile perl with
-DPERL_POLLUTE_MALLOC
to get the older behaviour. HIDEMYMALLOC
and EMBEDMYMALLOC have no effect, since the behaviour they enabled is now
the default.
Note that these functions do not constitute Perl's memory allocation API. See Memory Allocation in perlguts for further information about that.
PATCHLEVEL
is now PERL_VERSION
The cpp macros PERL_REVISION
, PERL_VERSION
, and PERL_SUBVERSION
are now available by default from perl.h, and reflect the base revision,
patchlevel, and subversion respectively. PERL_REVISION
had no
prior equivalent, while PERL_VERSION
and PERL_SUBVERSION
were
previously available as PATCHLEVEL
and SUBVERSION
.
The new names cause less pollution of the cpp namespace and reflect what the numbers have come to stand for in common practice. For compatibility, the old names are still supported when patchlevel.h is explicitly included (as required before), so there is no source incompatibility from the change.
In general, the default build of this release is expected to be binary compatible for extensions built with the 5.005 release or its maintenance versions. However, specific platforms may have broken binary compatibility due to changes in the defaults used in hints files. Therefore, please be sure to always check the platform-specific README files for any notes to the contrary.
The usethreads or usemultiplicity builds are not binary compatible with the corresponding builds in 5.005.
On platforms that require an explicit list of exports (AIX, OS/2 and Windows, among others), purely internal symbols such as parser functions and the run time opcodes are not exported by default. Perl 5.005 used to export all functions irrespective of whether they were considered part of the public API or not.
For the full list of public API functions, see perlapi.
The subtests 19 and 20 of lib/thr5005.t test are known to fail due to fundamental problems in the 5.005 threading implementation. These are not new failures--Perl 5.005_0x has the same bugs, but didn't have these tests.
In earlier releases of Perl, EBCDIC environments like OS390 (also known as Open Edition MVS) and VM-ESA were supported. Due to changes required by the UTF-8 (Unicode) support, the EBCDIC platforms are not supported in Perl 5.6.0.
The lib/io_multihomed test may hang in HP-UX if Perl has been configured to be 64-bit. Because other 64-bit platforms do not hang in this test, HP-UX is suspect. All other tests pass in 64-bit HP-UX. The test attempts to create and connect to "multihomed" sockets (sockets which have multiple IP addresses).
In NEXTSTEP 3.3p2 the implementation of the strftime(3) in the operating system libraries is buggy: the %j format numbers the days of a month starting from zero, which, while being logical to programmers, will cause the subtests 19 to 27 of the lib/posix test may fail.
If compiled with gcc 2.95 the lib/sdbm test will fail (dump core). The cure is to use the vendor cc, it comes with the operating system and produces good code.
In UNICOS/mk the following errors may appear during the Configure run:
- Guessing which symbols your C compiler and preprocessor define...
- CC-20 cc: ERROR File = try.c, Line = 3
- ...
- bad switch yylook 79bad switch yylook 79bad switch yylook 79bad switch yylook 79#ifdef A29K
- ...
- 4 errors detected in the compilation of "try.c".
The culprit is the broken awk of UNICOS/mk. The effect is fortunately rather mild: Perl itself is not adversely affected by the error, only the h2ph utility coming with Perl, and that is rather rarely needed these days.
When the left argument to the arrow operator ->
is an array, or
the scalar operator operating on an array, the result of the
operation must be considered erroneous. For example:
- @x->[2]
- scalar(@x)->[2]
These expressions will get run-time errors in some future release of Perl.
As discussed above, many features are still experimental. Interfaces and implementation of these features are subject to change, and in extreme cases, even subject to removal in some future release of Perl. These features include the following:
(?{ code }) and (??{ code })
(W) Within regular expression character classes ([]) the syntax beginning with "[:" and ending with ":]" is reserved for future extensions. If you need to represent those character sequences inside a regular expression character class, just quote the square brackets with the backslash: "\[:" and ":\]".
(W) A warning peculiar to VMS. A logical name was encountered when preparing to iterate over %ENV which violates the syntactic rules governing logical names. Because it cannot be translated normally, it is skipped, and will not appear in %ENV. This may be a benign occurrence, as some software packages might directly modify logical name tables and introduce nonstandard names, or it may indicate that a logical name table has been corrupted.
The description of this error used to say:
- (Someday it will simply assume that an unbackslashed @
- interpolates an array.)
That day has come, and this fatal error has been removed. It has been replaced by a non-fatal warning instead. See Arrays now always interpolate into double-quoted strings for details.
(W) The compiler found a bareword where it expected a conditional, which often indicates that an || or && was parsed as part of the last argument of the previous construct, for example:
(F) The current implementation of regular expressions uses shorts as address offsets within a string. Unfortunately this means that if the regular expression compiles to longer than 32767, it'll blow up. Usually when you want a regular expression this big, there is a better way to do it with multiple statements. See perlre.
(D) Perl versions before 5.004 misinterpreted any type marker followed by "$" and a digit. For example, "$$0" was incorrectly taken to mean "${$}0" instead of "${$0}". This bug is (mostly) fixed in Perl 5.004.
However, the developers of Perl 5.004 could not fix this bug completely, because at least two widely-used modules depend on the old meaning of "$$0" in a string. So Perl 5.004 still interprets "$$<digit>" in the old (broken) way inside strings; but it generates this message as a warning. And in Perl 5.005, this special treatment will cease.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup. There may also be information at http://www.perl.com/perl/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
Written by Gurusamy Sarathy <gsar@activestate.com>, with many contributions from The Perl Porters.
Send omissions or corrections to <perlbug@perl.org>.
perl581delta - what is new for perl v5.8.1
This document describes differences between the 5.8.0 release and the 5.8.1 release.
If you are upgrading from an earlier release such as 5.6.1, first read the perl58delta, which describes differences between 5.6.0 and 5.8.0.
In case you are wondering about 5.6.1, it was bug-fix-wise rather identical to the development release 5.7.1. Confused? This timeline hopefully helps a bit: it lists the new major releases, their maintenance releases, and the development releases.
- New Maintenance Development
- 5.6.0 2000-Mar-22
- 5.7.0 2000-Sep-02
- 5.6.1 2001-Apr-08
- 5.7.1 2001-Apr-09
- 5.7.2 2001-Jul-13
- 5.7.3 2002-Mar-05
- 5.8.0 2002-Jul-18
- 5.8.1 2003-Sep-25
Mainly due to security reasons, the "random ordering" of hashes has been made even more random. Previously while the order of hash elements from keys(), values(), and each() was essentially random, it was still repeatable. Now, however, the order varies between different runs of Perl.
Perl has never guaranteed any ordering of the hash keys, and the ordering has already changed several times during the lifetime of Perl 5. Also, the ordering of hash keys has always been, and continues to be, affected by the insertion order.
The added randomness may affect applications.
One possible scenario is when output of an application has included
hash data. For example, if you have used the Data::Dumper module to
dump data into different files, and then compared the files to see
whether the data has changed, now you will have false positives since
the order in which hashes are dumped will vary. In general the cure
is to sort the keys (or the values); in particular for Data::Dumper to
use the Sortkeys
option. If some particular order is really
important, use tied hashes: for example the Tie::IxHash module
which by default preserves the order in which the hash elements
were added.
More subtle problem is reliance on the order of "global destruction". That is what happens at the end of execution: Perl destroys all data structures, including user data. If your destructors (the DESTROY subroutines) have assumed any particular ordering to the global destruction, there might be problems ahead. For example, in a destructor of one object you cannot assume that objects of any other class are still available, unless you hold a reference to them. If the environment variable PERL_DESTRUCT_LEVEL is set to a non-zero value, or if Perl is exiting a spawned thread, it will also destruct the ordinary references and the symbol tables that are no longer in use. You can't call a class method or an ordinary function on a class that has been collected that way.
The hash randomisation is certain to reveal hidden assumptions about some particular ordering of hash elements, and outright bugs: it revealed a few bugs in the Perl core and core modules.
To disable the hash randomisation in runtime, set the environment
variable PERL_HASH_SEED to 0 (zero) before running Perl (for more
information see PERL_HASH_SEED in perlrun), or to disable the feature
completely in compile time, compile with -DNO_HASH_SEED
(see INSTALL).
See Algorithmic Complexity Attacks in perlsec for the original rationale behind this change.
In Perl 5.8.0 all filehandles, including the standard filehandles, were implicitly set to be in Unicode UTF-8 if the locale settings indicated the use of UTF-8. This feature caused too many problems, so the feature was turned off and redesigned: see Core Enhancements.
The version strings or v-strings (see Version Strings in perldata) feature introduced in Perl 5.6.0 has been a source of some confusion-- especially when the user did not want to use it, but Perl thought it knew better. Especially troublesome has been the feature that before a "=>" a version string (a "v" followed by digits) has been interpreted as a v-string instead of a string literal. In other words:
- %h = ( v65 => 42 );
has meant since Perl 5.6.0
- %h = ( 'A' => 42 );
(at least in platforms of ASCII progeny) Perl 5.8.1 restores the more natural interpretation
- %h = ( 'v65' => 42 );
The multi-number v-strings like v65.66 and 65.66.67 still continue to be v-strings in Perl 5.8.
The -C switch has changed in an incompatible way. The old semantics of this switch only made sense in Win32 and only in the "use utf8" universe in 5.6.x releases, and do not make sense for the Unicode implementation in 5.8.0. Since this switch could not have been used by anyone, it has been repurposed. The behavior that this switch enabled in 5.6.x releases may be supported in a transparent, data-dependent fashion in a future release.
For the new life of this switch, see UTF-8 no longer default under UTF-8 locales, and -C in perlrun.
Perl 5.8.1 uses the /d switch when running the cmd.exe shell
internally for system(), backticks, and when opening pipes to external
programs. The extra switch disables the execution of AutoRun commands
from the registry, which is generally considered undesirable when
running external programs. If you wish to retain compatibility with
the older behavior, set PERL5SHELL in your environment to cmd /x/c
.
In Perl 5.8.0 many Unicode features were introduced. One of them was found to be of more nuisance than benefit: the automagic (and silent) "UTF-8-ification" of filehandles, including the standard filehandles, if the user's locale settings indicated use of UTF-8.
For example, if you had en_US.UTF-8
as your locale, your STDIN and
STDOUT were automatically "UTF-8", in other words an implicit
binmode(..., ":utf8") was made. This meant that trying to print, say,
chr(0xff), ended up printing the bytes 0xc3 0xbf. Hardly what
you had in mind unless you were aware of this feature of Perl 5.8.0.
The problem is that the vast majority of people weren't: for example
in RedHat releases 8 and 9 the default locale setting is UTF-8, so
all RedHat users got UTF-8 filehandles, whether they wanted it or not.
The pain was intensified by the Unicode implementation of Perl 5.8.0
(still) having nasty bugs, especially related to the use of s/// and
tr///. (Bugs that have been fixed in 5.8.1)
Therefore a decision was made to backtrack the feature and change it
from implicit silent default to explicit conscious option. The new
Perl command line option -C
and its counterpart environment
variable PERL_UNICODE can now be used to control how Perl and Unicode
interact at interfaces like I/O and for example the command line
arguments. See -C in perlrun and PERL_UNICODE in perlrun for more
information.
In Perl 5.8.0 the so-called "safe signals" were introduced. This means that Perl no longer handles signals immediately but instead "between opcodes", when it is safe to do so. The earlier immediate handling easily could corrupt the internal state of Perl, resulting in mysterious crashes.
However, the new safer model has its problems too. Because now an opcode, a basic unit of Perl execution, is never interrupted but instead let to run to completion, certain operations that can take a long time now really do take a long time. For example, certain network operations have their own blocking and timeout mechanisms, and being able to interrupt them immediately would be nice.
Therefore perl 5.8.1 introduces a "backdoor" to restore the pre-5.8.0
(pre-5.7.3, really) signal behaviour. Just set the environment variable
PERL_SIGNALS to unsafe
, and the old immediate (and unsafe)
signal handling behaviour returns. See PERL_SIGNALS in perlrun
and Deferred Signals (Safe Signals) in perlipc.
In completely unrelated news, you can now use safe signals with POSIX::SigAction. See POSIX::SigAction in POSIX.
Formerly, the indices passed to FETCH
, STORE
, EXISTS
, and
DELETE
methods in tied array class were always non-negative. If
the actual argument was negative, Perl would call FETCHSIZE implicitly
and add the result to the index before passing the result to the tied
array method. This behaviour is now optional. If the tied array class
contains a package variable named $NEGATIVE_INDICES
which is set to
a true value, negative values will be passed to FETCH
, STORE
,
EXISTS
, and DELETE
unchanged.
The syntaxes
now do localise variables, given that the $x is a valid variable name.
The copy of the Unicode Character Database included in Perl 5.8 has been updated to 4.0.0 from 3.2.0. This means for example that the Unicode character properties are as in Unicode 4.0.0.
There is one new feature deprecation. Perl 5.8.0 forgot to add some deprecation warnings, these warnings have now been added. Finally, a reminder of an impending feature removal.
Pseudo-hashes were deprecated in Perl 5.8.0 and will be removed in
Perl 5.10.0, see perl58delta for details. Each attempt to access
pseudo-hashes will trigger the warning Pseudo-hashes are deprecated
.
If you really want to continue using pseudo-hashes but not to see the
deprecation warnings, use:
- no warnings 'deprecated';
Or you can continue to use the fields pragma, but please don't expect the data structures to be pseudohashes any more.
5.005-style threads (activated by use Thread;
) were deprecated in
Perl 5.8.0 and will be removed after Perl 5.8, see perl58delta for
details. Each 5.005-style thread creation will trigger the warning
5.005 threads are deprecated. If you really want to continue
using the 5.005 threads but not to see the deprecation warnings, use:
- no warnings 'deprecated';
The $*
variable controlling multi-line matching has been deprecated
and will be removed after 5.8. The variable has been deprecated for a
long time, and a deprecation warning Use of $* is deprecated
is given,
now the variable will just finally be removed. The functionality has
been supplanted by the /s and /m modifiers on pattern matching.
If you really want to continue using the $*
-variable but not to see
the deprecation warnings, use:
- no warnings 'deprecated';
map in void context is no longer expensive. map is now context
aware, and will not construct a list if called in void context.
If a socket gets closed by the server while printing to it, the client now gets a SIGPIPE. While this new feature was not planned, it fell naturally out of PerlIO changes, and is to be considered an accidental feature.
PerlIO::get_layers(FH) returns the names of the PerlIO layers active on a filehandle.
PerlIO::via layers can now have an optional UTF8 method to indicate whether the layer wants to "auto-:utf8" the stream.
utf8::is_utf8() has been added as a quick way to test whether a scalar is encoded internally in UTF-8 (Unicode).
The following modules and pragmata have been updated since Perl 5.8.0:
In much better shape than it used to be. Still far from perfect, but maybe worth a try.
An optional feature, :hireswallclock
, now allows for high
resolution wall clock times (uses Time::HiRes).
See B::Bytecode.
Now has bytes::substr.
One can now have custom character name aliases.
There is now a simple command line frontend to the CPAN.pm module called cpan.
A new option, Pair, allows choosing the separator between hash keys and values.
Significant updates on the encoding pragma functionality (tr/// and the DATA filehandle, formats).
If a filehandle has been marked as to have an encoding, unmappable characters are detected already during input, not later (when the corrupted data is being used).
The ISO 8859-6 conversion table has been corrected (the 0x30..0x39 erroneously mapped to U+0660..U+0669, instead of U+0030..U+0039). The GSM 03.38 conversion did not handle escape sequences correctly. The UTF-7 encoding has been added (making Encode feature-complete with Unicode::String).
A lot of bugs have been fixed since v1.60, the version included in Perl v5.8.0. Especially noteworthy are the bug in Calc that caused div and mod to fail for some large values, and the fixes to the handling of bad inputs.
Some new features were added, e.g. the broot() method, you can now pass parameters to config() to change some settings at runtime, and it is now possible to trap the creation of NaN and infinity.
As usual, some optimizations took place and made the math overall a tad faster. In some cases, quite a lot faster, actually. Especially alternative libraries like Math::BigInt::GMP benefit from this. In addition, a lot of the quite clunky routines like fsqrt() and flog() are now much much faster.
Diamond inheritance now works.
Reading from non-string scalars (like the special variables, see perlvar) now works.
Complete rewrite. As a side-effect, no longer refuses to startup when run by root.
New utilities: refaddr, isvstring, looks_like_number, set_prototype.
Can now store code references (via B::Deparse, so not foolproof).
Earlier versions of the strict pragma did not check the parameters implicitly passed to its "import" (use) and "unimport" (no) routine. This caused the false idiom such as:
- use strict qw(@ISA);
- @ISA = qw(Foo);
This however (probably) raised the false expectation that the strict refs, vars and subs were being enforced (and that @ISA was somehow "declared"). But the strict refs, vars, and subs are not enforced when using this false idiom.
Starting from Perl 5.8.1, the above will cause an error to be raised. This may cause programs which used to execute seemingly correctly without warnings and errors to fail when run under 5.8.1. This happens because
- use strict qw(@ISA);
will now fail with the error:
- Unknown 'strict' tag(s) '@ISA'
The remedy to this problem is to replace this code with the correct idiom:
Now much more picky about extra or missing output from test scripts.
Use of nanosleep(), if available, allows mixing subsecond sleeps with alarms.
Several fixes, for example for join() problems and memory leaks. In some platforms (like Linux) that use glibc the minimum memory footprint of one ithread has been reduced by several hundred kilobytes.
Many memory leaks have been fixed.
Now returns extra information.
The h2xs
utility now produces a more modern layout:
Foo-Bar/lib/Foo/Bar.pm instead of Foo/Bar/Bar.pm.
Also, the boilerplate test is now called t/Foo-Bar.t
instead of t/1.t.
The Perl debugger (lib/perl5db.pl) has now been extensively documented and bugs found while documenting have been fixed.
perldoc
has been rewritten from scratch to be more robust and
feature rich.
perlcc -B
works now at least somewhat better, while perlcc -c
is rather more broken. (The Perl compiler suite as a whole continues
to be experimental.)
perl573delta has been added to list the differences between the (now quite obsolete) development releases 5.7.2 and 5.7.3.
perl58delta has been added: it is the perldelta of 5.8.0, detailing the differences between 5.6.0 and 5.8.0.
perlartistic has been added: it is the Artistic License in pod format, making it easier for modules to refer to it.
perlcheat has been added: it is a Perl cheat sheet.
perlgpl has been added: it is the GNU General Public License in pod format, making it easier for modules to refer to it.
perlmacosx has been added to tell about the installation and use of Perl in Mac OS X.
perlos400 has been added to tell about the installation and use of Perl in OS/400 PASE.
perlreref has been added: it is a regular expressions quick reference.
The Unix standard Perl location, /usr/bin/perl, is no longer overwritten by default if it exists. This change was very prudent because so many Unix vendors already provide a /usr/bin/perl, but simultaneously many system utilities may depend on that exact version of Perl, so better not to overwrite it.
One can now specify installation directories for site and vendor man and HTML pages, and site and vendor scripts. See INSTALL.
One can now specify a destination directory for Perl installation
by specifying the DESTDIR variable for make install
. (This feature
is slightly different from the previous Configure -Dinstallprefix=...
.)
See INSTALL.
gcc versions 3.x introduced a new warning that caused a lot of noise
during Perl compilation: gcc -Ialreadyknowndirectory (warning:
changing search order). This warning has now been avoided by
Configure weeding out such directories before the compilation.
One can now build subsets of Perl core modules by using the
Configure flags -Dnoextensions=...
and -Donlyextensions=...
,
see INSTALL.
In Cygwin Perl can now be built with threads (Configure -Duseithreads
).
This works with both Cygwin 1.3.22 and Cygwin 1.5.3.
In newer FreeBSD releases Perl 5.8.0 compilation failed because of trying to use malloc.h, which in FreeBSD is just a dummy file, and a fatal error to even try to use. Now malloc.h is not used.
Perl is now known to build also in Hitachi HI-UXMPP.
Perl is now known to build again in LynxOS.
Mac OS X now installs with Perl version number embedded in
installation directory names for easier upgrading of user-compiled
Perl, and the installation directories in general are more standard.
In other words, the default installation no longer breaks the
Apple-provided Perl. On the other hand, with Configure -Dprefix=/usr
you can now really replace the Apple-supplied Perl (please be careful).
Mac OS X now builds Perl statically by default. This change was done
mainly for faster startup times. The Apple-provided Perl is still
dynamically linked and shared, and you can enable the sharedness for
your own Perl builds by Configure -Duseshrplib
.
Perl has been ported to IBM's OS/400 PASE environment. The best way to build a Perl for PASE is to use an AIX host as a cross-compilation environment. See README.os400.
Yet another cross-compilation option has been added: now Perl builds on OpenZaurus, an Linux distribution based on Mandrake + Embedix for the Sharp Zaurus PDA. See the Cross/README file.
Tru64 when using gcc 3 drops the optimisation for toke.c to -O2
because of gigantic memory use with the default -O3
.
Tru64 can now build Perl with the newer Berkeley DBs.
Building Perl on WinCE has been much enhanced, see README.ce and README.perlce.
There have been many fixes in the area of anonymous subs, lexicals and closures. Although this means that Perl is now more "correct", it is possible that some existing code will break that happens to rely on the faulty behaviour. In practice this is unlikely unless your code contains a very complex nesting of anonymous subs, evals and lexicals.
If an input filehandle is marked :utf8
and Perl sees illegal UTF-8
coming in when doing <FH>
, if warnings are enabled a warning is
immediately given - instead of being silent about it and Perl being
unhappy about the broken data later. (The :encoding(utf8)
layer
also works the same way.)
binmode(SOCKET, ":utf8") only worked on the input side, not on the output side of the socket. Now it works both ways.
For threaded Perls certain system database functions like getpwent() and getgrent() now grow their result buffer dynamically, instead of failing. This means that at sites with lots of users and groups the functions no longer fail by returning only partial results.
Perl 5.8.0 had accidentally broken the capability for users to define their own uppercase<->lowercase Unicode mappings (as advertised by the Camel). This feature has been fixed and is also documented better.
In 5.8.0 this
- $some_unicode .= <FH>;
didn't work correctly but instead corrupted the data. This has now been fixed.
Tied methods like FETCH etc. may now safely access tied values, i.e. resulting in a recursive call to FETCH etc. Remember to break the recursion, though.
At startup Perl blocks the SIGFPE signal away since there isn't much Perl can do about it. Previously this blocking was in effect also for programs executed from within Perl. Now Perl restores the original SIGFPE handling routine, whatever it was, before running external programs.
Linenumbers in Perl scripts may now be greater than 65536, or 2**16. (Perl scripts have always been able to be larger than that, it's just that the linenumber for reported errors and warnings have "wrapped around".) While scripts that large usually indicate a need to rethink your code a bit, such Perl scripts do exist, for example as results from generated code. Now linenumbers can go all the way to 4294967296, or 2**32.
Linux
Setting $0 works again (with certain limitations that Perl cannot do much about: see $0 in perlvar)
HP-UX
Setting $0 now works.
VMS
Configuration now tests for the presence of poll()
, and IO::Poll
now uses the vendor-supplied function if detected.
A rare access violation at Perl start-up could occur if the Perl image was installed with privileges or if there was an identifier with the subsystem attribute set in the process's rightslist. Either of these circumstances triggered tainting code that contained a pointer bug. The faulty pointer arithmetic has been fixed.
The length limit on values (not keys) in the %ENV hash has been raised from 255 bytes to 32640 bytes (except when the PERL_ENV_TABLES setting overrides the default use of logical names for %ENV). If it is necessary to access these long values from outside Perl, be aware that they are implemented using search list logical names that store the value in pieces, each 255-byte piece (up to 128 of them) being an element in the search list. When doing a lookup in %ENV from within Perl, the elements are combined into a single value. The existing VMS-specific ability to access individual elements of a search list logical name via the $ENV{'foo;N'} syntax (where N is the search list index) is unimpaired.
The piping implementation now uses local rather than global DCL symbols for inter-process communication.
File::Find could become confused when navigating to a relative directory whose name collided with a logical name. This problem has been corrected by adding directory syntax to relative path names, thus preventing logical name translation.
Win32
A memory leak in the fork() emulation has been fixed.
The return value of the ioctl() built-in function was accidentally broken in 5.8.0. This has been corrected.
The internal message loop executed by perl during blocking operations sometimes interfered with messages that were external to Perl. This often resulted in blocking operations terminating prematurely or returning incorrect results, when Perl was executing under environments that could generate Windows messages. This has been corrected.
Pipes and sockets are now automatically in binary mode.
The four-argument form of select() did not preserve $! (errno) properly when there were errors in the underlying call. This is now fixed.
The "CR CR LF" problem of has been fixed, binmode(FH, ":crlf") is now effectively a no-op.
All the warnings related to pack() and unpack() were made more informative and consistent.
The old version
- A thread exited while %d other threads were still running
was misleading because the "other" included also the thread giving the warning.
It is not illegal to clear a restricted hash, so the warning was removed.
You must specify the block of code for sub.
The old version
- Invalid [] range "%s" in transliteration operator
was simply wrong because there are no "[] ranges" in tr///.
Self-explanatory.
The padding spaces would appear after the newline, which is probably not what you had in mind.
If you think this
- $x & $y == 0
tests whether the bitwise AND of $x and $y is zero, you will like this warning.
This warning should have been already in 5.8.0, since they are.
You cannot read() (or sysread()) from a closed or unopened filehandle.
This warning should have been already in 5.8.0, since they are.
Something pulled the plug on a live tied variable, Perl plays safe by bailing out.
An illegal user-defined Unicode casemapping was specified.
Something modified the values being iterated over. This is not good.
These news matter to you only if you either write XS code or like to
know about or hack Perl internals (using Devel::Peek or any of the
B::
modules counts), or like to run Perl with the -D
option.
The embedding examples of perlembed have been reviewed to be up to date and consistent: for example, the correct use of PERL_SYS_INIT3() and PERL_SYS_TERM().
Extensive reworking of the pad code (the code responsible for lexical variables) has been conducted by Dave Mitchell.
Extensive work on the v-strings by John Peacock.
UTF-8 length and position cache: to speed up the handling of Unicode (UTF-8) scalars, a cache was introduced. Potential problems exist if an extension bypasses the official APIs and directly modifies the PV of an SV: the UTF-8 cache does not get cleared as it should.
APIs obsoleted in Perl 5.8.0, like sv_2pv, sv_catpvn, sv_catsv, sv_setsv, are again available.
Certain Perl core C APIs like cxinc and regatom are no longer available at all to code outside the Perl core of the Perl core extensions. This is intentional. They never should have been available with the shorter names, and if you application depends on them, you should (be ashamed and) contact perl5-porters to discuss what are the proper APIs.
Certain Perl core C APIs like Perl_list
are no longer available
without their Perl_
prefix. If your XS module stops working
because some functions cannot be found, in many cases a simple fix is
to add the Perl_
prefix to the function and the thread context
aTHX_
as the first argument of the function call. This is also how
it should always have been done: letting the Perl_-less forms to leak
from the core was an accident. For cleaner embedding you can also
force this for all APIs by defining at compile time the cpp define
PERL_NO_SHORT_NAMES.
Perl_save_bool() has been added.
Regexp objects (those created with qr) now have S-magic rather than
R-magic. This fixed regexps of the form /...(??{...;$x})/ to no
longer ignore changes made to $x. The S-magic avoids dropping
the caching optimization and making (??{...}) constructs obscenely
slow (and consequently useless). See also Magic Variables in perlguts.
Regexp::Copy was affected by this change.
The Perl internal debugging macros DEBUG() and DEB() have been renamed to PERL_DEBUG() and PERL_DEB() to avoid namespace conflicts.
-DL
removed (the leaktest had been broken and unsupported for years,
use alternative debugging mallocs or tools like valgrind and Purify).
Verbose modifier v
added for -DXv
and -Dsv
, see perlrun.
In Perl 5.8.0 there were about 69000 separate tests in about 700 test files, in Perl 5.8.1 there are about 77000 separate tests in about 780 test files. The exact numbers depend on the Perl configuration and on the operating system platform.
The hash randomisation mentioned in Incompatible Changes is definitely problematic: it will wake dormant bugs and shake out bad assumptions.
If you want to use mod_perl 2.x with Perl 5.8.1, you will need mod_perl-1.99_10 or higher. Earlier versions of mod_perl 2.x do not work with the randomised hashes. (mod_perl 1.x works fine.) You will also need Apache::Test 1.04 or higher.
Many of the rarer platforms that worked 100% or pretty close to it with perl 5.8.0 have been left a little bit untended since their maintainers have been otherwise busy lately, and therefore there will be more failures on those platforms. Such platforms include Mac OS Classic, IBM z/OS (and other EBCDIC platforms), and NetWare. The most common Perl platforms (Unix and Unix-like, Microsoft platforms, and VMS) have large enough testing and expert population that they are doing well.
Tied hashes do not currently return anything useful in scalar context, for example when used as boolean tests:
- if (%tied_hash) { ... }
The current nonsensical behaviour is always to return false, regardless of whether the hash is empty or has elements.
The root cause is that there is no interface for the implementors of tied hashes to implement the behaviour of a hash in scalar context.
The subtests 9 and 18 of lib/Net/Ping/t/450_service.t, and the subtest 2 of lib/Net/Ping/t/510_ping_udp.t might fail if you have an unusual networking setup. For example in the latter case the test is trying to send a UDP ping to the IP address 127.0.0.1.
The C-generating compiler backend B::C (the frontend being
perlcc -c
) is even more broken than it used to be because of
the extensive lexical variable changes. (The good news is that
B::Bytecode and ByteLoader are better than they used to be.)
IBM z/OS and other EBCDIC platforms continue to be problematic regarding Unicode support. Many Unicode tests are skipped when they really should be fixed.
In Cygwin 1.5 the io/tell and op/sysio tests have failures for some yet unknown reason. In 1.5.5 the threads tests stress_cv, stress_re, and stress_string are failing unless the environment variable PERLIO is set to "perlio" (which makes also the io/tell failure go away).
Perl 5.8.1 does build and work well with Cygwin 1.3: with (uname -a)
CYGWIN_NT-5.0 ... 1.3.22(0.78/3/2) 2003-03-18 09:20 i686 ...
a 100% "make test" was achieved with Configure -des -Duseithreads
.
With certain HP C compiler releases (e.g. B.11.11.02) you will get many warnings like this (lines wrapped for easier reading):
- cc: "/usr/include/sys/socket.h", line 504: warning 562:
- Redeclaration of "sendfile" with a different storage class specifier:
- "sendfile" will have internal linkage.
- cc: "/usr/include/sys/socket.h", line 505: warning 562:
- Redeclaration of "sendpath" with a different storage class specifier:
- "sendpath" will have internal linkage.
The warnings show up both during the build of Perl and during certain lib/ExtUtils tests that invoke the C compiler. The warning, however, is not serious and can be ignored.
The test t/uni/tr_7jis.t is known to report failure under 'make test' or the test harness with certain releases of IRIX (at least IRIX 6.5 and MIPSpro Compilers Version 7.3.1.1m), but if run manually the test fully passes.
The Perl malloc (-Dusemymalloc
) does not work at all in Mac OS X.
This is not that serious, though, since the native malloc works just
fine.
In the latest Tru64 releases (e.g. v5.1B or later) gcc cannot be used
to compile a threaded Perl (-Duseithreads) because the system
<pthread.h>
file doesn't know about gcc.
As of the 5.8.0 release, sysopen()/sysread()/syswrite() do not behave like they used to in 5.6.1 and earlier with respect to "text" mode. These built-ins now always operate in "binary" mode (even if sysopen() was passed the O_TEXT flag, or if binmode() was used on the file handle). Note that this issue should only make a difference for disk files, as sockets and pipes have always been in "binary" mode in the Windows port. As this behavior is currently considered a bug, compatible behavior may be re-introduced in a future release. Until then, the use of sysopen(), sysread() and syswrite() is not supported for "text" mode operations.
The following things might happen in future. The first publicly available releases having these characteristics will be the developer releases Perl 5.9.x, culminating in the Perl 5.10.0 release. These are our best guesses at the moment: we reserve the right to rethink.
PerlIO will become The Default. Currently (in Perl 5.8.x) the stdio library is still used if Perl thinks it can use certain tricks to make stdio go really fast. For future releases our goal is to make PerlIO go even faster.
A new feature called assertions will be available. This means that
one can have code called assertions sprinkled in the code: usually
they are optimised away, but they can be enabled with the -A
option.
A new operator //
(defined-or) will be available. This means that
one will be able to say
- $a // $b
instead of
- defined $a ? $a : $b
and
- $c //= $d;
instead of
- $c = $d unless defined $c;
The operator will have the same precedence and associativity as ||.
A source code patch against the Perl 5.8.1 sources will be available
in CPAN as authors/id/H/HM/HMBRAND/dor-5.8.1.diff.
unpack() will default to unpacking the $_
.
Various Copy-On-Write techniques will be investigated in hopes of speeding up Perl.
CPANPLUS, Inline, and Module::Build will become core modules.
The ability to write true lexically scoped pragmas will be introduced.
Work will continue on the bytecompiler and byteloader.
v-strings as they currently exist are scheduled to be deprecated. The
v-less form (1.2.3) will become a "version object" when used with use,
require, and $VERSION
. $^V will also be a "version object" so the
printf("%vd",...) construct will no longer be needed. The v-ful version
(v1.2.3) will become obsolete. The equivalence of strings and v-strings (e.g.
that currently 5.8.0 is equal to "\5\8\0") will go away. There may be no
deprecation warning for v-strings, though: it is quite hard to detect when
v-strings are being used safely, and when they are not.
5.005 Threads Will Be Removed
The $*
Variable Will Be Removed
(it was deprecated a long time ago)
Pseudohashes Will Be Removed
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://bugs.perl.org/ . There may also be information at http://www.perl.com/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl582delta - what is new for perl v5.8.2
This document describes differences between the 5.8.1 release and the 5.8.2 release.
If you are upgrading from an earlier release such as 5.6.1, first read the perl58delta, which describes differences between 5.6.0 and 5.8.0, and the perl581delta, which describes differences between 5.8.0 and 5.8.1.
For threaded builds for modules calling certain re-entrant system calls, binary compatibility was accidentally lost between 5.8.0 and 5.8.1. Binary compatibility with 5.8.0 has been restored in 5.8.2, which necessitates breaking compatibility with 5.8.1. We see this as the lesser of two evils.
This will only affect people who have a threaded perl 5.8.1, and compiled modules which use these calls, and now attempt to run the compiled modules with 5.8.2. The fix is to re-compile and re-install the modules using 5.8.2.
The hash randomisation introduced with 5.8.1 has been amended. It transpired that although the implementation introduced in 5.8.1 was source compatible with 5.8.0, it was not binary compatible in certain cases. 5.8.2 contains an improved implementation which is both source and binary compatible with both 5.8.0 and 5.8.1, and remains robust against the form of attack which prompted the change for 5.8.1.
We are grateful to the Debian project for their input in this area. See Algorithmic Complexity Attacks in perlsec for the original rationale behind this change.
Several memory leaks associated with variables shared between threads have been fixed.
The following modules and pragmata have been updated since Perl 5.8.1:
Documentation improved
Documentation improved
Documentation improved
Some syntax errors involving unrecognized filetest operators are now handled correctly by the parser.
Interpreter initialization is more complete when -DMULTIPLICITY is off. This should resolve problems with initializing and destroying the Perl interpreter more than once in a single process.
Dynamic linker flags have been tweaked for Solaris and OS X, which should solve problems seen while building some XS modules.
Bugs in OS/2 sockets and tmpfile have been fixed.
In OS X setreuid
and friends are troublesome - perl will now work
around their problems as best possible.
Starting with 5.8.3 we intend to make more frequent maintenance releases, with a smaller number of changes in each. The intent is to propagate bug fixes out to stable releases more rapidly and make upgrading stable releases less of an upheaval. This should give end users more flexibility in their choice of upgrade timing, and allow them easier assessment of the impact of upgrades. The current plan is for code freezes as follows
5.8.3 23:59:59 GMT, Wednesday December 31st 2003
5.8.4 23:59:59 GMT, Wednesday March 31st 2004
5.8.5 23:59:59 GMT, Wednesday June 30th 2004
with the release following soon after, when testing is complete.
See Future Directions in perl581delta for more soothsaying.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://bugs.perl.org/. There may also be information at http://www.perl.com/, the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl583delta - what is new for perl v5.8.3
This document describes differences between the 5.8.2 release and the 5.8.3 release.
If you are upgrading from an earlier release such as 5.6.1, first read the perl58delta, which describes differences between 5.6.0 and 5.8.0, and the perl581delta and perl582delta, which describe differences between 5.8.0, 5.8.1 and 5.8.2
There are no changes incompatible with 5.8.2.
A SCALAR
method is now available for tied hashes. This is called when
a tied hash is used in scalar context, such as
- if (%tied_hash) {
- ...
- }
The old behaviour was that %tied_hash would return whatever would have been
returned for that hash before the hash was tied (so usually 0). The new
behaviour in the absence of a SCALAR method is to return TRUE if in the
middle of an each iteration, and otherwise call FIRSTKEY to check if the
hash is empty (making sure that a subsequent each will also begin by
calling FIRSTKEY). Please see SCALAR in perltie for the full details and
caveats.
A function again
is provided to resolve problems where modules in different
directories wish to use FindBin.
You can now weaken references to read only values.
cond_wait
has a new two argument form. cond_timedwait
has been added.
find2perl
now assumes -print
as a default action. Previously, it
needed to be specified explicitly.
A new utility, prove
, makes it easy to run an individual regression test
at the command line. prove
is part of Test::Harness, which users of earlier
Perl versions can install from CPAN.
The documentation has been revised in places to produce more standard manpages.
The documentation for the special code blocks (BEGIN, CHECK, INIT, END) has been improved.
Perl now builds on OpenVMS I64
Using substr() on a UTF8 string could cause subsequent accesses on that string to return garbage. This was due to incorrect UTF8 offsets being cached, and is now fixed.
join() could return garbage when the same join() statement was used to process 8 bit data having earlier processed UTF8 data, due to the flags on that statement's temporary workspace not being reset correctly. This is now fixed.
$a .. $b
will now work as expected when either $a or $b is undef
Using Unicode keys with tied hashes should now work correctly.
Reading $^E now preserves $!. Previously, the C code implementing $^E
did not preserve errno
, so reading $^E could cause errno
and therefore
$!
to change unexpectedly.
Reentrant functions will (once more) work with C++. 5.8.2 introduced a bugfix which accidentally broke the compilation of Perl extensions written in C++
The fatal error "DESTROY created new reference to dead object" is now documented in perldiag.
The hash code has been refactored to reduce source duplication. The external interface is unchanged, and aside from the bug fixes described above, there should be no change in behaviour.
hv_clear_placeholders
is now part of the perl API
Some C macros have been tidied. In particular macros which create temporary local variables now name these variables more defensively, which should avoid bugs where names clash.
<signal.h> is now always included.
Configure
now invokes callbacks regardless of the value of the variable
they are called for. Previously callbacks were only invoked in the
case $variable $define) branch. This change should only affect platform
maintainers writing configuration hints files.
The regression test ext/threads/shared/t/wait.t fails on early RedHat 9 and HP-UX 10.20 due to bugs in their threading implementations. RedHat users should see https://rhn.redhat.com/errata/RHBA-2003-136.html and consider upgrading their glibc.
Detached threads aren't supported on Windows yet, as they may lead to memory access violation problems.
There is a known race condition opening scripts in suidperl
. suidperl
is neither built nor installed by default, and has been deprecated since
perl 5.8.0. You are advised to replace use of suidperl with tools such
as sudo ( http://www.courtesan.com/sudo/ )
We have a backlog of unresolved bugs. Dealing with bugs and bug reports is unglamorous work; not something ideally suited to volunteer labour, but that is all that we have.
The perl5 development team are implementing changes to help address this problem, which should go live in early 2004.
Code freeze for the next maintenance release (5.8.4) is on March 31st 2004, with release expected by mid April. Similarly 5.8.5's freeze will be at the end of June, with release by mid July.
Iain 'Spoon' Truskett, Perl hacker, author of perlreref and contributor to CPAN, died suddenly on 29th December 2003, aged 24. He will be missed.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://bugs.perl.org. There may also be information at http://www.perl.org, the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl584delta - what is new for perl v5.8.4
This document describes differences between the 5.8.3 release and the 5.8.4 release.
Many minor bugs have been fixed. Scripts which happen to rely on previously erroneous behaviour will consider these fixes as incompatible changes :-) You are advised to perform sufficient acceptance testing on this release to satisfy yourself that this does not affect you, before putting this release into production.
The diagnostic output of Carp has been changed slightly, to add a space after the comma between arguments. This makes it much easier for tools such as web browsers to wrap it, but might confuse any automatic tools which perform detailed parsing of Carp output.
The internal dump output has been improved, so that non-printable characters
such as newline and backspace are output in \x
notation, rather than
octal. This might just confuse non-robust tools which parse the output of
modules such as Devel::Peek.
Perl can now be built to detect attempts to assign pathologically large chunks of memory. Previously such assignments would suffer from integer wrap-around during size calculations causing a misallocation, which would crash perl, and could theoretically be used for "stack smashing" attacks. The wrapping defaults to enabled on platforms where we know it works (most AIX configurations, BSDi, Darwin, DEC OSF/1, FreeBSD, HP/UX, GNU Linux, OpenBSD, Solaris, VMS and most Win32 compilers) and defaults to disabled on other platforms.
The copy of the Unicode Character Database included in Perl 5.8 has been updated to 4.0.1 from 4.0.0.
Paul Szabo has analysed and patched suidperl
to remove existing known
insecurities. Currently there are no known holes in suidperl
, but previous
experience shows that we cannot be confident that these were the last. You may
no longer invoke the set uid perl directly, so to preserve backwards
compatibility with scripts that invoke #!/usr/bin/suidperl the only set uid
binary is now sperl5.8.
n (sperl5.8.4
for this release). suidperl
is installed as a hard link to perl
; both suidperl
and perl
will
invoke sperl5.8.4
automatically the set uid binary, so this change should
be completely transparent.
For new projects the core perl team would strongly recommend that you use
dedicated, single purpose security tools such as sudo
in preference to
suidperl
.
In addition to bug fixes, format's features have been enhanced. See
perlform
The (mis)use of /tmp in core modules and documentation has been tidied up.
Some modules available both within the perl core and independently from CPAN
("dual-life modules") have not yet had these changes applied; the changes
will be integrated into future stable perl releases as the modules are
updated on CPAN.
There is experimental support for Linux abstract Unix domain sockets.
Synced with its CPAN version 2.10
syslog()
can now use numeric constants for facility names and priorities,
in addition to strings.
Win32.pm/Win32.xs has moved from the libwin32 module to core Perl
Detached threads are now also supported on Windows.
In place sort optimised (eg @a = sort @a
)
Unnecessary assignment optimised away in
Optimised map in scalar context
The Perl debugger (lib/perl5db.pl) can now save all debugger commands for sourcing later, and can display the parent inheritance tree of a given class.
The build process on both VMS and Windows has had several minor improvements made. On Windows Borland's C compiler can now compile perl with PerlIO and/or USE_LARGE_FILES enabled.
perl.exe
on Windows now has a "Camel" logo icon. The use of a camel with
the topic of Perl is a trademark of O'Reilly and Associates Inc., and is used
with their permission (ie distribution of the source, compiling a Windows
executable from it, and using that executable locally). Use of the supplied
camel for anything other than a perl executable's icon is specifically not
covered, and anyone wishing to redistribute perl binaries with the icon
should check directly with O'Reilly beforehand.
Perl should build cleanly on Stratus VOS once more.
More utf8 bugs fixed, notably in how chomp, chop, send, and
syswrite and interact with utf8 data. Concatenation now works correctly
when use bytes;
is in scope.
Pragmata are now correctly propagated into (?{...}) constructions in regexps. Code such as
- my $x = qr{ ... (??{ $x }) ... };
will now (correctly) fail under use strict. (As the inner $x
is and
has always referred to $::x
)
The "const in void context" warning has been suppressed for a constant in an
optimised-away boolean expression such as 5 || print;
perl -i
could fchmod(stdin)
by mistake. This is serious if stdin is
attached to a terminal, and perl is running as root. Now fixed.
Carp
and the internal diagnostic routines used by Devel::Peek
have been
made clearer, as described in Incompatible Changes
Some bugs have been fixed in the hash internals. Restricted hashes and their place holders are now allocated and deleted at slightly different times, but this should not be visible to user code.
Code freeze for the next maintenance release (5.8.5) will be on 30th June 2004, with release by mid July.
This release is known not to build on Windows 95.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://bugs.perl.org. There may also be information at http://www.perl.org, the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl585delta - what is new for perl v5.8.5
This document describes differences between the 5.8.4 release and the 5.8.5 release.
There are no changes incompatible with 5.8.4.
Perl's regular expression engine now contains support for matching on the intersection of two Unicode character classes. You can also now refer to user-defined character classes from within other user defined character classes.
Carp improved to work nicely with Safe. Carp's message reporting should now be anomaly free - it will always print out line number information.
CGI upgraded to version 3.05
charnames now avoids clobbering $_
Digest upgraded to version 1.08
Encode upgraded to version 2.01
FileCache upgraded to version 1.04
libnet upgraded to version 1.19
Pod::Parser upgraded to version 1.28
Pod::Perldoc upgraded to version 3.13
Pod::LaTeX upgraded to version 0.57
Safe now works properly with Carp
Scalar-List-Utils upgraded to version 1.14
Shell's documentation has been re-written, and its historical partial auto-quoting of command arguments can now be disabled.
Test upgraded to version 1.25
Test::Harness upgraded to version 2.42
Time::Local upgraded to version 1.10
Unicode::Collate upgraded to version 0.40
Unicode::Normalize upgraded to version 0.30
The debugger can now emulate stepping backwards, by restarting and rerunning all bar the last command from a saved command history.
h2ph is now able to understand a very limited set of C inline functions -- basically, the inline functions that look like CPP macros. This has been introduced to deal with some of the headers of the newest versions of the glibc. The standard warning still applies; to quote h2ph's documentation, you may need to dicker with the files produced.
Perl 5.8.5 should build cleanly from source on LynxOS.
The in-place sort optimisation introduced in 5.8.4 had a bug. For example, in code such as
- @a = sort ($b, @a)
the result would omit the value $b. This is now fixed.
The optimisation for unnecessary assignments introduced in 5.8.4 could give spurious warnings. This has been fixed.
Perl should now correctly detect and read BOM-marked and (BOMless) UTF-16 scripts of either endianness.
Creating a new thread when weak references exist was buggy, and would often cause warnings at interpreter destruction time. The known bug is now fixed.
Several obscure bugs involving manipulating Unicode strings with substr have
been fixed.
Previously if Perl's file globbing function encountered a directory that it did not have permission to open it would return immediately, leading to unexpected truncation of the list of results. This has been fixed, to be consistent with Unix shells' globbing behaviour.
Thread creation time could vary wildly between identical runs. This was caused by a poor hashing algorithm in the thread cloning routines, which has now been fixed.
The internals of the ithreads implementation were not checking if OS-level
thread creation had failed. threads->create() now returns undef in if
thread creation fails instead of crashing perl.
Perl -V has several improvements
correctly outputs local patch names that contain embedded code snippets or other characters that used to confuse it.
arguments to -V that look like regexps will give multiple lines of output.
a trailing colon suppresses the linefeed and ';' terminator, allowing embedding of queries into shell commands.
a leading colon removes the 'name=' part of the response, allowing mapping to any name.
When perl fails to find the specified script, it now outputs a second line
suggesting that the user use the -S
flag:
- $ perl5.8.5 missing.pl
- Can't open perl script "missing.pl": No such file or directory.
- Use -S to search $PATH for it.
The Unicode character class files used by the regular expression engine are now built at build time from the supplied Unicode consortium data files, instead of being shipped prebuilt. This makes the compressed Perl source tarball about 200K smaller. A side effect is that the layout of files inside lib/unicore has changed.
The regression test t/uni/class.t is now performing considerably more tests, and can take several minutes to run even on a fast machine.
This release is known not to build on Windows 95.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://bugs.perl.org. There may also be information at http://www.perl.org, the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl586delta - what is new for perl v5.8.6
This document describes differences between the 5.8.5 release and the 5.8.6 release.
There are no changes incompatible with 5.8.5.
The perl interpreter is now more tolerant of UTF-16-encoded scripts.
On Win32, Perl can now use non-IFS compatible LSPs, which allows Perl to work in conjunction with firewalls such as McAfee Guardian. For full details see the file README.win32, particularly if you're running Win95.
With the base
pragma, an intermediate class with no fields used to messes
up private fields in the base class. This has been fixed.
Cwd upgraded to version 3.01 (as part of the new PathTools distribution)
Devel::PPPort upgraded to version 3.03
File::Spec upgraded to version 3.01 (as part of the new PathTools distribution)
Encode upgraded to version 2.08
ExtUtils::MakeMaker remains at version 6.17, as later stable releases currently available on CPAN have some issues with core modules on some core platforms.
I18N::LangTags upgraded to version 0.35
Math::BigInt upgraded to version 1.73
Math::BigRat upgraded to version 0.13
MIME::Base64 upgraded to version 3.05
POSIX::sigprocmask function can now retrieve the current signal mask without also setting it.
Time::HiRes upgraded to version 1.65
Perl has a new -dt command-line flag, which enables threads support in the debugger.
reverse sort ...
is now optimized to sort in reverse, avoiding the
generation of a temporary intermediate list.
for (reverse @foo)
now iterates in reverse, avoiding the generation of a
temporary reversed list.
The regexp engine is now more robust when given invalid utf8 input, as is sometimes generated by buggy XS modules.
foreach
on threads::shared array used to be able to crash Perl. This bug
has now been fixed.
A regexp in STDOUT
's destructor used to coredump, because the regexp pad
was already freed. This has been fixed.
goto &
is now more robust - bugs in deep recursion and chained goto &
have been fixed.
Using delete on an array no longer leaks memory. A pop of an item from a
shared array reference no longer causes a leak.
eval_sv()
failing a taint test could corrupt the stack - this has been
fixed.
On platforms with 64 bit pointers numeric comparison operators used to erroneously compare the addresses of references that are overloaded, rather than using the overloaded values. This has been fixed.
read into a UTF8-encoded buffer with an offset off the end of the buffer
no longer mis-calculates buffer lengths.
Although Perl has promised since version 5.8 that sort() would be
stable, the two cases sort {$b cmp $a}
and sort {$b <=> $a}
could
produce non-stable sorts. This is corrected in perl5.8.6.
Localising $^D
no longer generates a diagnostic message about valid -D
flags.
For -t and -T, Too late for "-T" option has been changed to the more informative "-T" is on the #! line, it must also be used on the command line
From now on all applications embedding perl will behave as if perl were compiled with -DPERL_USE_SAFE_PUTENV. See "Environment access" in the INSTALL file for details.
Most C
source files now have comments at the top explaining their purpose,
which should help anyone wishing to get an overview of the implementation.
There are significantly more tests for the B
suite of modules.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://bugs.perl.org. There may also be information at http://www.perl.org, the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl587delta - what is new for perl v5.8.7
This document describes differences between the 5.8.6 release and the 5.8.7 release.
There are no changes incompatible with 5.8.6.
The copy of the Unicode Character Database included in Perl 5.8 has been updated to 4.1.0 from 4.0.1. See http://www.unicode.org/versions/Unicode4.1.0/#NotableChanges for the notable changes.
A pair of exploits in suidperl
involving debugging code have been closed.
For new projects the core perl team strongly recommends that you use
dedicated, single purpose security tools such as sudo
in preference to
suidperl
.
The perl interpreter can be built to allow the use of a site customization
script. By default this is not enabled, to be consistent with previous perl
releases. To use this, add -Dusesitecustomize
to the command line flags
when running the Configure
script. See also -f in perlrun.
Config.pm
is now much smaller.Config.pm
is now about 3K rather than 32K, with the infrequently used
code and %Config
values loaded on demand. This is transparent to the
programmer, but means that most code will save parsing and loading 29K of
script (for example, code that uses File::Find
).
B upgraded to version 1.09
base upgraded to version 2.07
bignum upgraded to version 0.17
bytes upgraded to version 1.02
Carp upgraded to version 1.04
CGI upgraded to version 3.10
Class::ISA upgraded to version 0.33
Data::Dumper upgraded to version 2.121_02
DB_File upgraded to version 1.811
Devel::PPPort upgraded to version 3.06
Digest upgraded to version 1.10
Encode upgraded to version 2.10
FileCache upgraded to version 1.05
File::Path upgraded to version 1.07
File::Temp upgraded to version 0.16
IO::File upgraded to version 1.11
IO::Socket upgraded to version 1.28
Math::BigInt upgraded to version 1.77
Math::BigRat upgraded to version 0.15
overload upgraded to version 1.03
PathTools upgraded to version 3.05
Pod::HTML upgraded to version 1.0503
Pod::Perldoc upgraded to version 3.14
Pod::LaTeX upgraded to version 0.58
Pod::Parser upgraded to version 1.30
Symbol upgraded to version 1.06
Term::ANSIColor upgraded to version 1.09
Test::Harness upgraded to version 2.48
Test::Simple upgraded to version 0.54
Text::Wrap upgraded to version 2001.09293, to fix a bug when wrap() was called with a non-space separator.
threads::shared upgraded to version 0.93
Time::HiRes upgraded to version 1.66
Time::Local upgraded to version 1.11
Unicode::Normalize upgraded to version 0.32
utf8 upgraded to version 1.05
Win32 upgraded to version 0.24, which provides Win32::GetFileVersion
find2perl
has new options -iname
, -path
and -ipath
.
The internal pointer mapping hash used during ithreads cloning now uses an arena for memory allocation. In tests this reduced ithreads cloning time by about 10%.
The Win32 "dmake" makefile.mk has been updated to make it compatible with the latest versions of dmake.
PERL_MALLOC
, DEBUG_MSTATS
, PERL_HASH_SEED_EXPLICIT
and NO_HASH_SEED
should now work in Win32 makefiles.
The socket() function on Win32 has been fixed so that it is able to use transport providers which specify a protocol of 0 (meaning any protocol is allowed) once more. (This was broken in 5.8.6, and typically caused the use of ICMP sockets to fail.)
Another obscure bug involving substr and UTF-8 caused by bad internal
offset caching has been identified and fixed.
A bug involving the loading of UTF-8 tables by the regexp engine has been
fixed - code such as "\x{100}" =~ /[[:print:]]/
will no longer give
corrupt results.
Case conversion operations such as uc on a long Unicode string could
exhaust memory. This has been fixed.
index/rindex were buggy for some combinations of Unicode and
non-Unicode data. This has been fixed.
read (and presumably sysread) would expose the UTF-8 internals when
reading from a byte oriented file handle into a UTF-8 scalar. This has
been fixed.
Using closures with ithreads could cause perl to crash. This was due to failure to correctly lock internal OP structures, and has been fixed.
The return value of close now correctly reflects any file errors that
occur while flushing the handle's data, instead of just giving failure if
the actual underlying file close operation failed.
not() || 1
used to segfault. not()
now behaves like not(0)
, which was
the pre 5.6.0 behaviour.
h2ph
has various enhancements to cope with constructs in header files that
used to result in incorrect or invalid output.
There is a new taint error, "%ENV is aliased to %s". This error is thrown
when taint checks are enabled and when *ENV
has been aliased, so that
%ENV
has no env-magic anymore and hence the environment cannot be verified
as taint-free.
The internals of pack and unpack have been updated. All legitimate
templates should work as before, but there may be some changes in the error
reported for complex failure cases. Any behaviour changes for non-error cases
are bugs, and should be reported.
There has been a fair amount of refactoring of the C
source code, partly to
make it tidier and more maintainable. The resulting object code and the
perl
binary may well be smaller than 5.8.6, and hopefully faster in some
cases, but apart from this there should be no user-detectable changes.
${^UTF8LOCALE}
has been added to give perl space access to PL_utf8locale
.
The size of the arenas used to allocate SV heads and most SV bodies can now be changed at compile time. The old size was 1008 bytes, the new default size is 4080 bytes.
Unicode strings returned from overloaded operators can be buggy. This is a long standing bug reported since 5.8.6 was released, but we do not yet have a suitable fix for it.
On UNICOS, lib/Math/BigInt/t/bigintc.t hangs burning CPU. ext/B/t/bytecode.t and ext/Socket/t/socketpair.t both fail tests. These are unlikely to be resolved, as our valiant UNICOS porter's last Cray is being decommissioned.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://bugs.perl.org. There may also be information at http://www.perl.org, the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl588delta - what is new for perl v5.8.8
This document describes differences between the 5.8.7 release and the 5.8.8 release.
There are no changes intentionally incompatible with 5.8.7. If any exist, they are bugs and reports are welcome.
chdir, chmod and chown can now work on filehandles as well as
filenames, if the system supports respectively fchdir
, fchmod
and
fchown
, thanks to a patch provided by Gisle Aas.
Attribute::Handlers
upgraded to version 0.78_02
Documentation typo fix
attrs
upgraded to version 1.02
Internal cleanup only
autouse
upgraded to version 1.05
Simplified implementation
B
upgraded to version 1.09_01
The inheritance hierarchy of the B::
modules has been corrected;
B::NV
now inherits from B::SV
(instead of B::IV
).
blib
upgraded to version 1.03
Documentation typo fix
ByteLoader
upgraded to version 0.06
Internal cleanup
CGI
upgraded to version 3.15
Extraneous "?" from self_url()
removed
scrolling_list()
select attribute fixed
virtual_port
now works properly with the https protocol
upload_hook()
and append()
now works in function-oriented mode
POST_MAX
doesn't cause the client to hang any more
Automatic tab indexes are now disabled and new -tabindex
pragma has
been added to turn automatic indexes back on
end_form()
doesn't emit empty (and non-validating) <div>
CGI::Carp
works better in certain mod_perl configurations
Setting $CGI::TMPDIRECTORY
is now effective
Enhanced documentation
charnames
upgraded to version 1.05
viacode()
now accept hex strings and has been optimized.
CPAN
upgraded to version 1.76_02
1 minor bug fix for Win32
Cwd
upgraded to version 3.12
canonpath()
on Win32 now collapses foo\.. sections correctly.
Improved behaviour on Symbian OS.
Enhanced documentation and typo fixes
Internal cleanup
Data::Dumper
upgraded to version 2.121_08
A problem where Data::Dumper
would sometimes update the iterator state
of hashes has been fixed
Numeric labels now work
Internal cleanup
DB
upgraded to version 1.01
A problem where the state of the regexp engine would sometimes get clobbered when running under the debugger has been fixed.
DB_File
upgraded to version 1.814
Adds support for Berkeley DB 4.4.
Devel::DProf
upgraded to version 20050603.00
Internal cleanup
Devel::Peek
upgraded to version 1.03
Internal cleanup
Devel::PPPort
upgraded to version 3.06_01
--compat-version
argument checking has been improved
Files passed on the command line are filtered by default
--nofilter
option to override the filtering has been added
Enhanced documentation
diagnostics
upgraded to version 1.15
Documentation typo fix
Digest
upgraded to version 1.14
The constructor now knows which module implements SHA-224
Documentation tweaks and typo fixes
Digest::MD5
upgraded to version 2.36
XSLoader
is now used for faster loading
Enhanced documentation including MD5 weaknesses discovered lately
Dumpvalue
upgraded to version 1.12
Documentation fix
DynaLoader
upgraded but unfortunately we're not able to increment its version number :-(
Implements dl_unload_file
on Win32
Internal cleanup
XSLoader
0.06 incorporated; small optimisation for calling
bootstrap_inherit()
and documentation enhancements.
Encode
upgraded to version 2.12
A coderef is now acceptable for CHECK
!
3 new characters added to the ISO-8859-7 encoding
New encoding MIME-Header-ISO_2022_JP
added
Problem with partial characters and encoding(utf-8-strict)
fixed.
Documentation enhancements and typo fixes
English
upgraded to version 1.02
the $COMPILING
variable has been added
ExtUtils::Constant
upgraded to version 0.17
Improved compatibility with older versions of perl
ExtUtils::MakeMaker
upgraded to version 6.30 (was 6.17)
Too much to list here; see http://search.cpan.org/dist/ExtUtils-MakeMaker/Changes
File::Basename
upgraded to version 2.74, with changes contributed by Michael Schwern.
Documentation clarified and errors corrected.
basename
now strips trailing path separators before processing the name.
basename
now returns / for parameter /, to make basename
consistent with the shell utility of the same name.
The suffix is no longer stripped if it is identical to the remaining characters in the name, again for consistency with the shell utility.
Some internal code cleanup.
File::Copy
upgraded to version 2.09
Copying a file onto itself used to fail.
Moving a file between file systems now preserves the access and modification time stamps
File::Find
upgraded to version 1.10
Win32 portability fixes
Enhanced documentation
File::Glob
upgraded to version 1.05
Internal cleanup
File::Path
upgraded to version 1.08
mkpath
now preserves errno
when mkdir fails
File::Spec
upgraded to version 3.12
File::Spec-
rootdir()> now returns \
on Win32, instead of /
$^O
could sometimes become tainted. This has been fixed.
canonpath
on Win32 now collapses foo/..
(or foo\..
) sections
correctly, rather than doing the "misguided" work it was previously doing.
Note that canonpath
on Unix still does not collapse these sections, as
doing so would be incorrect.
Some documentation improvements
Some internal code cleanup
FileCache
upgraded to version 1.06
POD formatting errors in the documentation fixed
Filter::Simple
upgraded to version 0.82
FindBin
upgraded to version 1.47
Now works better with directories where access rights are more restrictive than usual.
GDBM_File
upgraded to version 1.08
Internal cleanup
Getopt::Long
upgraded to version 2.35
prefix_pattern
has now been complemented by a new configuration
option long_prefix_pattern
that allows the user to specify what
prefix patterns should have long option style semantics applied.
Options can now take multiple values at once (experimental)
Various bug fixes
if
upgraded to version 0.05
Give more meaningful error messages from if
when invoked with a
condition in list context.
Restore backwards compatibility with earlier versions of perl
IO
upgraded to version 1.22
Enhanced documentation
Internal cleanup
IPC::Open2
upgraded to version 1.02
Enhanced documentation
IPC::Open3
upgraded to version 1.02
Enhanced documentation
List::Util
upgraded to version 1.18 (was 1.14)
Fix pure-perl version of refaddr
to avoid blessing an un-blessed reference
Use XSLoader
for faster loading
Fixed various memory leaks
Internal cleanup and portability fixes
Math::Complex
upgraded to version 1.35
atan2(0, i)
now works, as do all the (computable) complex argument cases
Fixes for certain bugs in make
and emake
Support returning the kth root directly
Support [2,-3pi/8] in emake
Support inf
for make
/emake
Document make
/emake
more visibly
Math::Trig
upgraded to version 1.03
Add more great circle routines: great_circle_waypoint
and
great_circle_destination
MIME::Base64
upgraded to version 3.07
Use XSLoader
for faster loading
Enhanced documentation
Internal cleanup
NDBM_File
upgraded to version 1.06
Enhanced documentation
ODBM_File
upgraded to version 1.06
Documentation typo fixed
Internal cleanup
Opcode
upgraded to version 1.06
Enhanced documentation
Internal cleanup
open upgraded to version 1.05
Enhanced documentation
overload
upgraded to version 1.04
Enhanced documentation
PerlIO
upgraded to version 1.04
PerlIO::via
iterate over layers properly now
PerlIO::scalar
understands $/ = ""
now
encoding(utf-8-strict)
with partial characters now works
Enhanced documentation
Internal cleanup
Pod::Functions
upgraded to version 1.03
Documentation typos fixed
Pod::Html
upgraded to version 1.0504
HTML output will now correctly link
to =item
s on the same page, and should be valid XHTML.
Variable names are recognized as intended
Documentation typos fixed
Pod::Parser
upgraded to version 1.32
Allow files that start with =head
on the first line
Win32 portability fix
Exit status of pod2usage
fixed
New -noperldoc
switch for pod2usage
Arbitrary URL schemes now allowed
Documentation typos fixed
POSIX
upgraded to version 1.09
Documentation typos fixed
Internal cleanup
re
upgraded to version 0.05
Documentation typo fixed
Safe
upgraded to version 2.12
Minor documentation enhancement
SDBM_File
upgraded to version 1.05
Documentation typo fixed
Internal cleanup
Socket
upgraded to version 1.78
Internal cleanup
Storable
upgraded to version 2.15
This includes the STORABLE_attach
hook functionality added by
Adam Kennedy, and more frugal memory requirements when storing under ithreads
, by
using the ithreads
cloning tracking code.
Switch
upgraded to version 2.10_01
Documentation typos fixed
Sys::Syslog
upgraded to version 0.13
Now provides numeric macros and meaningful Exporter
tags.
No longer uses Sys::Hostname
as it may provide useless values in
unconfigured network environments, so instead uses INADDR_LOOPBACK
directly.
syslog()
now uses local timestamp.
setlogmask()
now behaves like its C counterpart.
setlogsock()
will now croak()
as documented.
Improved error and warnings messages.
Improved documentation.
Term::ANSIColor
upgraded to version 1.10
Fixes a bug in colored
when $EACHLINE
is set that caused it to not color
lines consisting solely of 0 (literal zero).
Improved tests.
Term::ReadLine
upgraded to version 1.02
Documentation tweaks
Test::Harness
upgraded to version 2.56 (was 2.48)
The Test::Harness
timer is now off by default.
Now shows elapsed time in milliseconds.
Various bug fixes
Test::Simple
upgraded to version 0.62 (was 0.54)
is_deeply()
no longer fails to work for many cases
Various minor bug fixes
Documentation enhancements
Text::Tabs
upgraded to version 2005.0824
Provides a faster implementation of expand
Text::Wrap
upgraded to version 2005.082401
Adds $Text::Wrap::separator2
, which allows you to preserve existing newlines
but add line-breaks with some other string.
threads
upgraded to version 1.07
threads::shared
upgraded to version 0.94
Documentation changes only
Note: An improved implementation of threads::shared
is available on
CPAN - this will be merged into 5.8.9 if it proves stable.
Tie::Hash
upgraded to version 1.02
Documentation typo fixed
Time::HiRes
upgraded to version 1.86 (was 1.66)
clock_nanosleep()
and clock()
functions added
Support for the POSIX clock_gettime()
and clock_getres()
has been added
Return undef or an empty list if the C gettimeofday()
function fails
Improved nanosleep
detection
Internal cleanup
Enhanced documentation
Unicode::Collate
upgraded to version 0.52
Now implements UCA Revision 14 (based on Unicode 4.1.0).
Unicode::Collate-
new> method no longer overwrites user's $_
Enhanced documentation
Unicode::UCD
upgraded to version 0.24
Documentation typos fixed
User::grent
upgraded to version 1.01
Documentation typo fixed
utf8
upgraded to version 1.06
Documentation typos fixed
vmsish
upgraded to version 1.02
Documentation typos fixed
warnings
upgraded to version 1.05
Gentler messing with Carp::
internals
Internal cleanup
Documentation update
Win32
upgraded to version 0.2601
Provides Windows Vista support to Win32::GetOSName
Documentation enhancements
XS::Typemap
upgraded to version 0.02
Internal cleanup
h2xs
enhancementsh2xs
implements new option --use-xsloader
to force use of
XSLoader
even in backwards compatible modules.
The handling of authors' names that had apostrophes has been fixed.
Any enums with negative values are now skipped.
perlivp
enhancementsperlivp
implements new option -a
and will not check for *.ph
files by default any more. Use the -a
option to run all tests.
The perlglossary manpage is a glossary of terms used in the Perl documentation, technical and otherwise, kindly provided by O'Reilly Media, inc.
Weak reference creation is now O(1) rather than O(n), courtesy of Nicholas Clark. Weak reference deletion remains O(n), but if deletion only happens at program exit, it may be skipped completely.
Salvador Fandiño provided improvements to reduce the memory usage of sort
and to speed up some cases.
Jarkko Hietaniemi and Andy Lester worked to mark as much data as possible in
the C source files as static
, to increase the proportion of the executable
file that the operating system can share between process, and thus reduce
real memory usage on multi-user systems.
Parallel makes should work properly now, although there may still be problems
if make test
is instructed to run in parallel.
Building with Borland's compilers on Win32 should work more smoothly. In particular Steve Hay has worked to side step many warnings emitted by their compilers and at least one C compiler internal error.
Configure
will now detect clearenv
and unsetenv
, thanks to a patch
from Alan Burlison. It will also probe for futimes
and whether sprintf
correctly returns the length of the formatted string, which will both be used
in perl 5.8.9.
There are improved hints for next-3.0, vmesa, IX, Darwin, Solaris, Linux, DEC/OSF, HP-UX and MPE/iX
Perl extensions on Windows now can be statically built into the Perl DLL, thanks to a work by Vadim Konovalov. (This improvement was actually in 5.8.7, but was accidentally omitted from perl587delta).
Previously when running with warnings enabled globally via -w
, selective
disabling of specific warning categories would actually turn off all warnings.
This is now fixed; now no warnings 'io';
will only turn off warnings in the
io
class. Previously it would erroneously turn off all warnings.
This bug fix may cause some programs to start correctly issuing warnings.
Perl 5.8.4 introduced a change so that assignments of undef to a
scalar, or of an empty list to an array or a hash, were optimised away. As
this could cause problems when goto jumps were involved, this change
has been backed out.
Using the sprintf() function with some formats could lead to a buffer overflow in some specific cases. This has been fixed, along with several other bugs, notably in bounds checking.
In related fixes, it was possible for badly written code that did not follow
the documentation of Sys::Syslog
to have formatting vulnerabilities.
Sys::Syslog
has been changed to protect people from poor quality third
party code.
It had been reported that running under perl's debugger when processing Unicode data could cause unexpectedly large slowdowns. The most likely cause of this was identified and fixed by Nicholas Clark.
FindBin
now works better with directories where access rights are more
restrictive than usual.
Several memory leaks in ithreads were closed. An improved implementation of
threads::shared
is available on CPAN - this will be merged into 5.8.9 if
it proves stable.
Trailing spaces are now trimmed from $!
and $^E
.
Operations that require perl to read a process's list of groups, such as reads
of $(
and $)
, now dynamically allocate memory rather than using a
fixed sized array. The fixed size array could cause C stack exhaustion on
systems configured to use large numbers of groups.
PerlIO::scalar
now works better with non-default $/
settings.
You can now use the x
operator to repeat a qw// list. This used
to raise a syntax error.
The debugger now traces correctly execution in eval("")uated code that contains #line directives.
The value of the open pragma is no longer ignored for three-argument
opens.
The optimisation of for (reverse @a)
introduced in perl 5.8.6 could
misbehave when the array had undefined elements and was used in LVALUE
context. Dave Mitchell provided a fix.
Some case insensitive matches between UTF-8 encoded data and 8 bit regexps, and vice versa, could give malformed character warnings. These have been fixed by Dave Mitchell and Yves Orton.
lcfirst and ucfirst could corrupt the string for certain cases where
the length UTF-8 encoding of the string in lower case, upper case or title
case differed. This was fixed by Nicholas Clark.
Perl will now use the C library calls unsetenv
and clearenv
if present
to delete keys from %ENV
and delete %ENV
entirely, thanks to a patch
from Alan Burlison.
This is a new warning, produced in situations such as this:
This is a new warning, produced when number has been passed as a argument to select(), instead of a bitmask.
This syntax error indicates that the lexer couldn't find the final
delimiter of a ?PATTERN?
construct. Mentioning the ternary operator in
this error message makes it easier to diagnose syntax errors.
There has been a fair amount of refactoring of the C
source code, partly to
make it tidier and more maintainable. The resulting object code and the
perl
binary may well be smaller than 5.8.7, in particular due to a change
contributed by Dave Mitchell which reworked the warnings code to be
significantly smaller. Apart from being smaller and possibly faster, there
should be no user-detectable changes.
Andy Lester supplied many improvements to determine which function
parameters and local variables could actually be declared const
to the C
compiler. Steve Peters provided new *_set
macros and reworked the core to
use these rather than assigning to macros in LVALUE context.
Dave Mitchell improved the lexer debugging output under -DT
Nicholas Clark changed the string buffer allocation so that it is now rounded
up to the next multiple of 4 (or 8 on platforms with 64 bit pointers). This
should reduce the number of calls to realloc
without actually using any
extra memory.
The HV
's array of HE*
s is now allocated at the correct (minimal) size,
thanks to another change by Nicholas Clark. Compile with
-DPERL_USE_LARGE_HV_ALLOC
to use the old, sloppier, default.
For XS or embedding debugging purposes, if perl is compiled with
-DDEBUG_LEAKING_SCALARS_FORK_DUMP
in addition to
-DDEBUG_LEAKING_SCALARS
then a child process is forked just before
global destruction, which is used to display the values of any scalars
found to have leaked at the end of global destruction. Without this, the
scalars have already been freed sufficiently at the point of detection that
it is impossible to produce any meaningful dump of their contents. This
feature was implemented by the indefatigable Nicholas Clark, based on an idea
by Mike Giroux.
The optimiser on HP-UX 11.23 (Itanium 2) is currently partly disabled (scaled down to +O1) when using HP C-ANSI-C; the cause of problems at higher optimisation levels is still unclear.
There are a handful of remaining test failures on VMS, mostly due to test fixes and minor module tweaks with too many dependencies to integrate into this release from the development stream, where they have all been corrected. The following is a list of expected failures with the patch number of the fix where that is known:
- ext/Devel/PPPort/t/ppphtest.t #26913
- ext/List/Util/t/p_tainted.t #26912
- lib/ExtUtils/t/PL_FILES.t #26813
- lib/ExtUtils/t/basic.t #26813
- t/io/fs.t
- t/op/cmp.t
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://bugs.perl.org. There may also be information at http://www.perl.org, the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl589delta - what is new for perl v5.8.9
This document describes differences between the 5.8.8 release and the 5.8.9 release.
The 5.8.9 release will be the last significant release of the 5.8.x series. Any future releases of 5.8.x will likely only be to deal with security issues, and platform build failures. Hence you should look to migrating to 5.10.x, if you have not started already. See Known Problems for more information.
A particular construction in the source code of extensions written in C++ may need changing. See Changed Internals for more details. All extensions written in C, most written in C++, and all existing compiled extensions are unaffected. This was necessary to improve C++ support.
Other than this, there are no changes intentionally incompatible with 5.8.8. If any exist, they are bugs and reports are welcome.
The copy of the Unicode Character Database included in Perl 5.8 has been updated to 5.1.0 from 4.1.0. See http://www.unicode.org/versions/Unicode5.1.0/#NotableChanges for the notable changes.
It is now possible to call stat and the -X filestat operators on
directory handles. As both directory and file handles are barewords, there
can be ambiguities over which was intended. In these situations the file
handle semantics are preferred. Both also treat *FILE{IO}
filehandles
like *FILE
filehandles.
It's possible to enhance the mechanism of subroutine hooks in @INC by adding a source filter on top of the filehandle opened and returned by the hook. This feature was planned a long time ago, but wasn't quite working until now. See require for details. (Nicholas Clark)
The constant folding routine is now wrapped in an exception handler, and if folding throws an exception (such as attempting to evaluate 0/0), perl now retains the current optree, rather than aborting the whole program. Without this change, programs would not compile if they had expressions that happened to generate exceptions, even though those expressions were in code that could never be reached at runtime. (Nicholas Clark, Dave Mitchell)
no VERSION
You can now use no followed by a version number to specify that you
want to use a version of perl older than the specified one.
The code that caches calculated UTF-8 byte offsets for character offsets for a string has been re-written. Several bugs have been located and eliminated, and the code now makes better use of the information it has, so should be faster. In particular, it doesn't scan to the end of a string before calculating an offset within the string, which should speed up some operations on long strings. It is now possible to disable the caching code at run time, to verify that it is not the cause of suspected problems.
There is now Configure support for creating a perl tree that is relocatable at run time. see Relocatable installations.
${^CHILD_ERROR_NATIVE}
This variable gives the native status returned by the last pipe close,
backtick command, successful call to wait or waitpid, or from the
system operator. See perlvar for details. (Contributed by Gisle Aas.)
${^UTF8CACHE}
This variable controls the state of the internal UTF-8 offset caching code. 1 for on (the default), 0 for off, -1 to debug the caching code by checking all its results against linear scans, and panicking on any discrepancy.
readpipe is now overridableThe built-in function readpipe is now overridable. Overriding it permits
also to override its operator counterpart, qx// (also known as ``
).
Perl 5.8.9 (and 5.10.0 onwards) now provides a couple of macros to do very
basic exception handling in XS modules. You can use these macros if you call
code that may croak
, but you need to do some cleanup before giving control
back to Perl. See Exception Handling in perlguts for more details.
-Dq
suppresses the EXECUTING... message when running under -D
-Dl
logs runops loop entry and exit, and jump level popping.
-Dv
displays the process id as part of the trace output.
Some pure-perl code that the regexp engine was using to retrieve Unicode properties and transliteration mappings has been reimplemented in XS for faster execution. (SADAHIRO Tomoyuki)
The interpreter internals now support a far more memory efficient form of inlineable constants. Storing a reference to a constant value in a symbol table is equivalent to a full typeglob referencing a constant subroutine, but using about 400 bytes less memory. This proxy constant subroutine is automatically upgraded to a real typeglob with subroutine if necessary. The approach taken is analogous to the existing space optimisation for subroutine stub declarations, which are stored as plain scalars in place of the full typeglob.
However, to aid backwards compatibility of existing code, which (wrongly) does not expect anything other than typeglobs in symbol tables, nothing in core uses this feature, other than the regression tests.
Stubs for prototyped subroutines have been stored in symbol tables as plain strings, and stubs for unprototyped subroutines as the number -1, since 5.005, so code which assumes that the core only places typeglobs in symbol tables has been making incorrect assumptions for over 10 years.
Compile support added for:
DragonFlyBSD
MidnightBSD
MirOS BSD
RISC OS
Cray XT4/Catamount
Module::Pluggable
is a simple framework to create modules that accept
pluggable sub-modules. The bundled version is 3.8
Module::CoreList
is a hash of hashes that is keyed on perl version as
indicated in $]
. The bundled version is 2.17
Win32API::File
now available in core on Microsoft Windows. The bundled
version is 0.1001_01
Devel::InnerPackage
finds all the packages defined by a single file. It is
part of the Module::Pluggable
distribution. The bundled version is 0.3
attributes
upgraded to version 0.09
AutoLoader
upgraded to version 5.67
AutoSplit
upgraded to 1.06
autouse
upgraded to version 1.06
B
upgraded from 1.09_01 to 1.19
provides new pad related abstraction macros B::NV::COP_SEQ_RANGE_LOW
,
B::NV::COP_SEQ_RANGE_HIGH
, B::NV::PARENT_PAD_INDEX
,
B::NV::PARENT_FAKELEX_FLAGS
, which hides the difference in storage in
5.10.0 and later.
provides B::sub_generation
, which exposes PL_sub_generation
provides B::GV::isGV_with_GP
, which on pre-5.10 perls always returns true.
New type B::HE
added with methods VAL
, HASH
and SVKEY_force
The B::GVf_IMPORTED_CV
flag is now set correctly when a proxy
constant subroutine is imported.
bugs fixed in the handling of PMOP
s.
B::BM::PREVIOUS
returns now U32
, not U16
.
B::CV::START
and B:CV::ROOT
return now NULL
on an XSUB,
B::CV::XSUB
and B::CV::XSUBANY
return 0 on a non-XSUB.
B::C
upgraded to 1.05
B::Concise
upgraded to 0.76
new option -src
causes the rendering of each statement (starting with
the nextstate OP) to be preceded by the first line of source code that
generates it.
new option -stash="somepackage"
, requires "somepackage", and then renders
each function defined in its namespace.
now has documentation of detailed hint symbols.
B::Debug
upgraded to version 1.05
B::Deparse
upgraded to version 0.87
now handles ''-
()>, ::()
, sub :: {}
, etc. correctly [RT #43010].
All bugs in parsing these kinds of syntax are now fixed:
- perl -MO=Deparse -e '"my %h = "->()'
- perl -MO=Deparse -e '::->()'
- perl -MO=Deparse -e 'sub :: {}'
- perl -MO=Deparse -e 'package a; sub a::b::c {}'
- perl -MO=Deparse -e 'sub the::main::road {}'
does not deparse $^H{v_string}
, which is automatically set by the
internals.
B::Lint
upgraded to version 1.11
B::Terse
upgraded to version 1.05
base
upgraded to version 2.13
loading a module via base.pm would mask a global $SIG{__DIE__}
in that
module.
push all classes at once in @ISA
Benchmark
upgraded to version 1.10
bigint
upgraded to 0.23
bignum
upgraded to 0.23
bigrat
upgraded to 0.23
blib
upgraded to 0.04
Carp
upgraded to version 1.10
The argument backtrace code now shows undef as undef,
instead of a string "undef".
CGI
upgraded to version 3.42
charnames
upgraded to 1.06
constant
upgraded to version 1.17
CPAN
upgraded to version 1.9301
Cwd
upgraded to version 3.29 with some platform specific
improvements (including for VMS).
Data::Dumper
upgraded to version 2.121_17
Fixes hash iterator current position with the pure Perl version [RT #40668]
Performance enhancements, which will be most evident on platforms where
repeated calls to C's realloc()
are slow, such as Win32.
DB_File
upgraded to version 1.817
DB_Filter
upgraded to version 0.02
Devel::DProf
upgraded to version 20080331.00
Devel::Peek
upgraded to version 1.04
Devel::PPPort
upgraded to version 3.14
diagnostics
upgraded to version 1.16
Digest
upgraded to version 1.15
Digest::MD5
upgraded to version 2.37
DirHandle
upgraded to version 1.02
now localises $.
, $@
, $!
, $^E
, and $?
before closing the
directory handle to suppress leaking any side effects of warnings about it
already being closed.
DynaLoader
upgraded to version 1.09
DynaLoader
can now dynamically load a loadable object from a file with a
non-default file extension.
Encode
upgraded to version 2.26
Encode::Alias
includes a fix for encoding "646" on Solaris (better known as
ASCII).
English
upgraded to version 1.03
Errno
upgraded to version 1.10
Exporter
upgraded to version 5.63
ExtUtils::Command
upgraded to version 1.15
ExtUtils::Constant
upgraded to version 0.21
ExtUtils::Embed
upgraded to version 1.28
ExtUtils::Install
upgraded to version 1.50_01
ExtUtils::Installed
upgraded to version 1.43
ExtUtils::MakeMaker
upgraded to version 6.48
support for INSTALLSITESCRIPT
and INSTALLVENDORSCRIPT
configuration.
ExtUtils::Manifest
upgraded to version 1.55
ExtUtils::ParseXS
upgraded to version 2.19
Fatal
upgraded to version 1.06
allows built-ins in CORE::GLOBAL
to be made fatal.
Fcntl
upgraded to version 1.06
fields
upgraded to version 2.12
File::Basename
upgraded to version 2.77
FileCache
upgraded to version 1.07
File::Compare
upgraded to 1.1005
File::Copy
upgraded to 2.13
now uses 3-arg open.
File::DosGlob
upgraded to 1.01
File::Find
upgraded to version 1.13
File::Glob
upgraded to version 1.06
fixes spurious results with brackets inside braces.
File::Path
upgraded to version 2.07_02
File::Spec
upgraded to version 3.29
improved handling of bad arguments.
some platform specific improvements (including for VMS and Cygwin), with
an optimisation on abs2rel
when handling both relative arguments.
File::stat
upgraded to version 1.01
File::Temp
upgraded to version 0.20
filetest
upgraded to version 1.02
Filter::Util::Call
upgraded to version 1.07
Filter::Simple
upgraded to version 0.83
FindBin
upgraded to version 1.49
GDBM_File
upgraded to version 1.09
Getopt::Long
upgraded to version 2.37
Getopt::Std
upgraded to version 1.06
Hash::Util
upgraded to version 0.06
if
upgraded to version 0.05
IO
upgraded to version 1.23
Reduced number of calls to getpeername in IO::Socket
IPC::Open
upgraded to version 1.03
IPC::Open3
upgraded to version 1.03
IPC::SysV
upgraded to version 2.00
lib
upgraded to version 0.61
avoid warning about loading .par files.
libnet
upgraded to version 1.22
List::Util
upgraded to 1.19
Locale::Maketext
upgraded to 1.13
Math::BigFloat
upgraded to version 1.60
Math::BigInt
upgraded to version 1.89
Math::BigRat
upgraded to version 0.22
implements new as_float
method.
Math::Complex
upgraded to version 1.54.
Math::Trig
upgraded to version 1.18.
NDBM_File
upgraded to version 1.07
improve g++ handling for systems using GDBM compatibility headers.
Net::Ping
upgraded to version 2.35
NEXT
upgraded to version 0.61
fix several bugs with NEXT
when working with AUTOLOAD
, eval block, and
within overloaded stringification.
ODBM_File
upgraded to 1.07
open upgraded to 1.06
ops
upgraded to 1.02
PerlIO::encoding
upgraded to version 0.11
PerlIO::scalar
upgraded to version 0.06
[RT #40267] PerlIO::scalar
doesn't respect readonly-ness.
PerlIO::via
upgraded to version 0.05
Pod::Html
upgraded to version 1.09
Pod::Parser
upgraded to version 1.35
Pod::Usage
upgraded to version 1.35
POSIX
upgraded to version 1.15
POSIX
constants that duplicate those in Fcntl
are now imported from
Fcntl
and re-exported, rather than being duplicated by POSIX
POSIX::remove
can remove empty directories.
POSIX::setlocale
safer to call multiple times.
POSIX::SigRt
added, which provides access to POSIX realtime signal
functionality on systems that support it.
re
upgraded to version 0.06_01
Safe
upgraded to version 2.16
Scalar::Util
upgraded to 1.19
SDBM_File
upgraded to version 1.06
SelfLoader
upgraded to version 1.17
Shell
upgraded to version 0.72
sigtrap
upgraded to version 1.04
Socket
upgraded to version 1.81
this fixes an optimistic use of gethostbyname
Storable
upgraded to 2.19
Switch
upgraded to version 2.13
Sys::Syslog
upgraded to version 0.27
Term::ANSIColor
upgraded to version 1.12
Term::Cap
upgraded to version 1.12
Term::ReadLine
upgraded to version 1.03
Test::Builder
upgraded to version 0.80
Test::Harness
upgraded version to 2.64
this makes it able to handle newlines.
Test::More
upgraded to version 0.80
Test::Simple
upgraded to version 0.80
Text::Balanced
upgraded to version 1.98
Text::ParseWords
upgraded to version 3.27
Text::Soundex
upgraded to version 3.03
Text::Tabs
upgraded to version 2007.1117
Text::Wrap
upgraded to version 2006.1117
Thread
upgraded to version 2.01
Thread::Semaphore
upgraded to version 2.09
Thread::Queue
upgraded to version 2.11
added capability to add complex structures (e.g., hash of hashes) to queues.
added capability to dequeue multiple items at once.
added new methods to inspect and manipulate queues: peek
, insert
and
extract
Tie::Handle
upgraded to version 4.2
Tie::Hash
upgraded to version 1.03
Tie::Memoize
upgraded to version 1.1
Tie::Memoize::EXISTS
now correctly caches its results.
Tie::RefHash
upgraded to version 1.38
Tie::Scalar
upgraded to version 1.01
Tie::StdHandle
upgraded to version 4.2
Time::gmtime
upgraded to version 1.03
Time::Local
upgraded to version 1.1901
Time::HiRes
upgraded to version 1.9715 with various build improvements
(including VMS) and minor platform-specific bug fixes (including
for HP-UX 11 ia64).
threads
upgraded to 1.71
threads::shared
upgraded to version 1.27
smaller and faster implementation that eliminates one internal structure and the consequent level of indirection.
user locks are now stored in a safer manner.
new function shared_clone
creates a copy of an object leaving
shared elements as-is and deep-cloning non-shared elements.
added new is_shared
method.
Unicode::Normalize
upgraded to version 1.02
Unicode::UCD
upgraded to version 0.25
warnings
upgraded to version 1.05_01
Win32
upgraded to version 0.38
added new function GetCurrentProcessId
which returns the regular Windows
process identifier of the current process, even when called from within a fork.
XSLoader
upgraded to version 0.10
XS::APItest
and XS::Typemap
are for internal use only and hence
no longer installed. Many more tests have been added to XS::APItest
.
Andreas König contributed two functions to save and load the debugger history.
NEXT::AUTOLOAD
no longer emits warnings under the debugger.
The debugger should now correctly find tty the device on OS X 10.5 and VMS
when the program forks.
LVALUE subs now work inside the debugger.
Perl 5.8.9 adds a new utility perlthanks, which is a variant of perlbug, but for sending non-bug-reports to the authors and maintainers of Perl. Getting nothing but bug reports can become a bit demoralising - we'll see if this changes things.
perlbug now checks if you're reporting about a non-core module and suggests you report it to the CPAN author instead.
won't define an empty string as a constant [RT #25366]
has examples for h2xs -X
now attempts to deal sensibly with the difference in path implications
between ""
and <>
quoting in #include
statements.
now generates correct code for #if defined A || defined B
[RT #39130]
As usual, the documentation received its share of corrections, clarifications
and other nitfixes. More
tags were added for indexing.
perlunitut is a tutorial written by Juerd Waalboer on Unicode-related terminology and how to correctly handle Unicode in Perl scripts.
perlunicode is updated in section user defined properties.
perluniintro has been updated in the example of detecting data that is not valid in particular encoding.
perlcommunity provides an overview of the Perl Community along with further resources.
CORE documents the pseudo-namespace for Perl's core routines.
perlglossary adds deprecated modules and features and to be dropped modules.
perlhack has been updated and added resources on smoke testing.
The Perl FAQs (perlfaq1..perlfaq9) have been updated.
perlcheat is updated with better details on \w
, \d
, and \s.
perldebug is updated with information on how to call the debugger.
perldiag documentation updated with subroutine with an ampersand on the
argument to exists and delete and also several terminology updates on
warnings.
perlfork documents the limitation of exec inside pseudo-processes.
Function alarm now mentions Time::HiRes::ualarm
in preference
to select.
Regarding precedence in -X, filetest operators are the same as unary
operators, but not regarding parsing and parentheses (spotted by Eirik Berg
Hanssen).
reverse function documentation received scalar context examples.
perllocale documentation is adjusted for number localization and
POSIX::setlocale
to fix Debian bug #379463.
perlmodlib is updated with CPAN::API::HOWTO
and
Sys::Syslog::win32::Win32
perlre documentation updated to reflect the differences between
[[:xxxxx:]] and \p{IsXxxxx}
matches. Also added section on /g and
/c modifiers.
perlreguts describe the internals of the regular expressions engine. It has been contributed by Yves Orton.
perlrebackslash describes all perl regular expression backslash and escape sequences.
perlrecharclass describes the syntax and use of character classes in Perl Regular Expressions.
perlrun is updated to clarify on the hash seed PERL_HASH_SEED. Also more
information in options -x
and -u
.
perlsub example is updated to use a lexical variable for opendir syntax.
perlvar fixes confusion about real GID $(
and effective GID $)
.
Perl thread tutorial example is fixed in section Queues: Passing Data Around in perlthrtut and perlthrtut.
perlhack documentation extensively improved by Jarkko Hietaniemi and others.
perltoot provides information on modifying @UNIVERSAL::ISA
.
perlport documentation extended to include different kill(-9, ...)
semantics on Windows. It also clearly states dump is not supported on Win32
and cygwin.
INSTALL has been updated and modernised.
The default since perl 5.000 has been for perl to create an empty scalar
with every new typeglob. The increased use of lexical variables means that
most are now unused. Thanks to Nicholas Clark's efforts, Perl can now be
compiled with -DPERL_DONT_CREATE_GVSV
to avoid creating these empty scalars.
This will significantly decrease the number of scalars allocated for all
configurations, and the number of scalars that need to be copied for ithread
creation. Whilst this option is binary compatible with existing perl
installations, it does change a long-standing assumption about the
internals, hence it is not enabled by default, as some third party code may
rely on the old behaviour.
We would recommend testing with this configuration on new deployments of perl, particularly for multi-threaded servers, to see whether all third party code is compatible with it, as this configuration may give useful performance improvements. For existing installations we would not recommend changing to this configuration unless thorough testing is performed before deployment.
diagnostics
no longer uses $&
, which results in large speedups
for regexp matching in all code using it.
Regular expressions classes of a single character are now treated the same as if the character had been used as a literal, meaning that code that uses char-classes as an escaping mechanism will see a speedup. (Yves Orton)
Creating anonymous array and hash references (ie. []
and {}
) now incurs
no more overhead than creating an anonymous list or hash. Nicholas Clark
provided changes with a saving of two ops and one stack push, which was measured
as a slightly better than 5% improvement for these operations.
Many calls to strlen()
have been eliminated, either because the length was
already known, or by adopting or enhancing APIs that pass lengths. This has
been aided by the adoption of a my_sprintf()
wrapper, which returns the
correct C89 value - the length of the formatted string. Previously we could
not rely on the return value of sprintf(), because on some ancient but
extant platforms it still returns char *
.
index is now faster if the search string is stored in UTF-8 but only contains
characters in the Latin-1 range.
The Unicode swatch cache inside the regexp engine is now used. (the lookup had a key mismatch, present since the initial implementation). [RT #42839]
There is now Configure support for creating a relocatable perl tree. If
you Configure with -Duserelocatableinc
, then the paths in @INC
(and
everything else in %Config
) can be optionally located via the path of the
perl executable.
At start time, if any paths in @INC
or Config
that Configure marked
as relocatable (by starting them with ".../"
), then they are prefixed the
directory of $^X
. This allows the relocation can be configured on a
per-directory basis, although the default with -Duserelocatableinc
is that
everything is relocated. The initial install is done to the original configured
prefix.
Configure is now better at removing temporary files. Tom Callaway
(from RedHat) also contributed patches that complete the set of flags
passed to the compiler and the linker, in particular that -fPIC
is now
enabled on Linux. It will also croak when your /dev/null isn't a device.
A new configuration variable d_pseudofork
has been to Configure, and is
available as $Config{d_pseudofork}
in the Config
module. This
distinguishes real fork support from the pseudofork emulation used on
Windows platforms.
Config.pod and config.sh are now placed correctly for cross-compilation.
$Config{useshrplib}
is now 'true' rather than 'yes' when using a shared perl
library.
Parallel makes should work properly now, although there may still be problems
if make test
is instructed to run in parallel.
Many compilation warnings have been cleaned up. A very stubborn compiler
warning in S_emulate_eaccess()
was killed after six attempts.
g++ support has been tuned, especially for FreeBSD.
mkppport has been integrated, and all ppport.h files in the core will now be autogenerated at build time (and removed during cleanup).
installman now works with -Duserelocatableinc
and DESTDIR
.
installperl no longer installs:
static library files of statically linked extensions when a shared perl library is being used. (They are not needed. See Windows below).
SIGNATURE and PAUSE*.pub (CPAN files)
NOTES and PATCHING (ExtUtils files)
perlld and ld2 (Cygwin files)
There are improved hints for AIX, Cygwin, DEC/OSF, FreeBSD, HP/UX, Irix 6 Linux, MachTen, NetBSD, OS/390, QNX, SCO, Solaris, SunOS, System V Release 5.x (UnixWare 7, OpenUNIX 8), Ultrix, UMIPS, uts and VOS.
Drop -std=c89
and -ansi
if using long long
as the main integral type,
else in FreeBSD 6.2 (and perhaps other releases), system headers do not
declare some functions required by perl.
Starting with Solaris 10, we do not want versioned shared libraries, because
those often indicate a private use only library. These problems could often
be triggered when SUNWbdb (Berkeley DB) was installed. Hence if Solaris 10
is detected set ignore_versioned_solibs=y.
Allow IEEE math to be deselected on OpenVMS I64 (but it remains the default).
Record IEEE usage in config.h
Help older VMS compilers by using ccflags
when building munchconfig.exe
.
Don't try to build old Thread
extension on VMS when -Duseithreads
has
been chosen.
Passing a raw string of "NaN" to nawk causes a core dump - so the string has been changed to "*NaN*"
t/op/stat.t tests will now test hard links on VMS if they are supported.
When using a shared perl library installperl no longer installs static library files, import library files and export library files (of statically linked extensions) and empty bootstrap files (of dynamically linked extensions). This fixes a problem building PAR-Packer on Win32 with a debug build of perl.
Various improvements to the win32 build process, including support for Visual C++ 2005 Express Edition (aka Visual C++ 8.x).
perl.exe will now have an icon if built with MinGW or Borland.
Improvements to the perl-static.exe build process.
Add Win32 makefile option to link all extensions statically.
The WinCE directory has been merged into the Win32 directory.
setlocale
tests have been re-enabled for Windows XP onwards.
Many many bugs related to the internal Unicode implementation (UTF-8) have
been fixed. In particular, long standing bugs related to returning Unicode
via tie, overloading or $@
are now gone, some of which were never
reported.
unpack will internally convert the string back from UTF-8 on numeric types.
This is a compromise between the full consistency now in 5.10, and the current
behaviour, which is often used as a "feature" on string types.
Using :crlf
and UTF-16
IO layers together will now work.
Fixed problems with split, Unicode /\s+/
and / \0/
.
Fixed bug RT #40641 - encoding of Unicode characters in regular expressions.
Fixed a bug where using certain patterns in a regexp led to a panic. [RT #45337]
Perl no longer segfaults (due to infinite internal recursion) if the locale's character is not UTF-8 [RT #41442]:
Inconsistencies have been fixed in the reference counting PerlIO uses to keep
track of Unix file descriptors, and the API used by XS code to manage getting
and releasing FILE *
s
Several bugs have been fixed in Magic, the internal system used to implement
features such as tie, tainting and threads sharing.
undef @array
on a tied array now correctly calls the CLEAR
method.
Some of the bitwise ops were not checking whether their arguments were magical before using them. [RT #24816]
Magic is no longer invoked twice by the expression \&$x
A bug with assigning large numbers and tainting has been resolved. [RT #40708]
A new entry has been added to the MAGIC vtable - svt_local
. This is used
when copying magic to the new value during local, allowing certain problems
with localising shared variables to be resolved.
For the implementation details, see Magic Virtual Tables in perlguts.
Internally, perl object-ness is on the referent, not the reference, even though methods can only be called via a reference. However, the original implementation of overloading stored flags related to overloading on the reference, relying on the flags being copied when the reference was copied, or set at the creation of a new reference. This manifests in a bug - if you rebless an object from a class that has overloading, into one that does not, then any other existing references think that they (still) point to an overloaded object, choose these C code paths, and then throw errors. Analogously, blessing into an overloaded class when other references exist will result in them not using overloading.
The implementation has been fixed for 5.10, but this fix changes the semantics of flag bits, so is not binary compatible, so can't be applied to 5.8.9. However, 5.8.9 has a work-around that implements the same bug fix. If the referent has multiple references, then all the other references are located and corrected. A full search is avoided whenever possible by scanning lexicals outwards from the current subroutine, and the argument stack.
A certain well known Linux vendor applied incomplete versions of this bug fix to their /usr/bin/perl and then prematurely closed bug reports about performance issues without consulting back upstream. This not being enough, they then proceeded to ignore the necessary fixes to these unreleased changes for 11 months, until massive pressure was applied by their long-suffering paying customers, catalysed by the failings being featured on a prominent blog and Slashdot.
strict
now propagates correctly into string evalsUnder 5.8.8 and earlier:
- $ perl5.8.8 -e 'use strict; eval "use foo bar" or die $@'
- Can't locate foo.pm in @INC (@INC contains: ... .) at (eval 1) line 2.
- BEGIN failed--compilation aborted at (eval 1) line 2.
Under 5.8.9 and later:
- $ perl5.8.9 -e 'use strict; eval "use foo bar" or die $@'
- Bareword "bar" not allowed while "strict subs" in use at (eval 1) line 1.
This may cause problems with programs that parse the error message and rely on the buggy behaviour.
The tokenizer no longer treats =cute
(and other words beginning
with =cut
) as a synonym for =cut
.
Calling CORE::require
CORE::require
and CORE::do
were always parsed as require and do
when they were overridden. This is now fixed.
Stopped memory leak on long /etc/groups entries.
while (my $x ...) { ...; redo }
shouldn't undef $x
.
In the presence of my in the conditional of a while()
, until()
,
or for(;;)
loop, we now add an extra scope to the body so that redo
doesn't undef the lexical.
The encoding
pragma now correctly ignores anything following an @
character in the LC_ALL
and LANG
environment variables. [RT # 49646]
A segfault observed with some gcc 3.3 optimisations is resolved.
A possible segfault when unpack used in scalar context with ()
groups
is resolved. [RT #50256]
Resolved issue where $!
could be changed by a signal handler interrupting
a system call.
Fixed bug RT #37886, symbolic dereferencing was allowed in the argument of
defined even under the influence of use strict 'refs'
.
Fixed bug RT #43207, where lc/uc inside sort affected the return
value.
Fixed bug RT #45607, where *{"BONK"} = \&{"BONK"}
didn't work correctly.
Fixed bug RT #35878, croaking from a XSUB called via goto &xsub
corrupts perl
internals.
Fixed bug RT #32539, DynaLoader.o is moved into libperl.so to avoid the need to statically link DynaLoader into the stub perl executable. With this libperl.so provides everything needed to get a functional embedded perl interpreter to run.
Fix bug RT #36267 so that assigning to a tied hash doesn't change the underlying hash.
Fix bug RT #6006, regexp replaces using large replacement variables
fail some of the time, i.e. when substitution contains something
like ${10}
(note the bracket) instead of just $10
.
Fix bug RT #45053, Perl_newCONSTSUB()
is now thread safe.
Various improvements to 64 bit builds.
Mutex protection added in PerlIOStdio_close()
to avoid race conditions.
Hopefully this fixes failures in the threads tests free.t and blocks.t.
Added forked terminal support to the debugger, with the ability to update the window title.
A build problem with specifying USE_MULTI
and USE_ITHREADS
but without
USE_IMP_SYS
has been fixed.
OS2::REXX
upgraded to version 1.04
Aligned floating point build policies for cc and gcc.
Revisited a patch from 5.6.1 for RH7.2 for Intel's icc [RT #7916], added an
additional check for $Config{gccversion}
.
Use -DPTR_IS_LONG
when using 64 bit integers
Fixed PerlIO::Scalar
in-memory file record-style reads.
pipe shutdown at process exit should now be more robust.
Bugs in VMS exit handling tickled by Test::Harness
2.64 have been fixed.
Fix fcntl() locking capability test in configure.com.
Replaced shrplib='define'
with useshrplib='true'
on VMS.
File::Find
used to fail when the target directory is a bare drive letter and
no_chdir
is 1 (the default is 0). [RT #41555]
A build problem with specifying USE_MULTI
and USE_ITHREADS
but without
USE_IMP_SYS
has been fixed.
The process id is no longer truncated to 16 bits on some Windows platforms ( http://bugs.activestate.com/show_bug.cgi?id=72443 )
Fixed bug RT #54828 in perlio.c where calling binmode on Win32 and Cygwin
may cause a segmentation fault.
It is now possible to overload eq
when using nomethod
.
Various problems using overload
with 64 bit integers corrected.
The reference count of PerlIO
file descriptors is now correctly handled.
On VMS, escaped dots will be preserved when converted to Unix syntax.
keys %+
no longer throws an 'ambiguous'
warning.
Using #!perl -d
could trigger an assertion, which has been fixed.
Don't stringify tied code references in @INC
when calling require.
Code references in @INC
report the correct file name when __FILE__
is
used.
Width and precision in sprintf didn't handle characters above 255 correctly. [RT #40473]
List slices with indices out of range now work more consistently. [RT #39882]
A change introduced with perl 5.8.1 broke the parsing of arguments of the form
-foo=bar
with the -s
on the <#!> line. This has been fixed. See
http://bugs.activestate.com/show_bug.cgi?id=43483
tr/// is now threadsafe. Previously it was storing a swash inside its OP,
rather than in a pad.
pod2html labels anchors more consistently and handles nested definition lists better.
threads
cleanup veto has been extended to include perl_free()
and
perl_destruct()
On some systems, changes to $ENV{TZ}
would not always be
respected by the underlying calls to localtime_r()
. Perl now
forces the inspection of the environment on these systems.
The special variable $^R
is now more consistently set when executing
regexps using the (?{...}) construct. In particular, it will still
be set even if backreferences or optional sub-patterns (?:...)? are
used.
This new fatal error occurs when the C routine Perl_sv_chop()
was passed a
position that is not within the scalar's string buffer. This is caused by
buggy XS code, and at this point recovery is not possible.
This new fatal error occurs when the perl process has to abort due to too many pending signals, which is bound to prevent perl from being able to handle further incoming signals safely.
This new fatal error occurs when the ACL version file test operator is used where it is not available on the current platform. Earlier checks mean that it should never be possible to get this.
New error indicating that a tied array has claimed to have a negative number of elements.
Previously the internal error from the SV upgrade code was the less informative Can't upgrade that kind of scalar. It now reports the current internal type, and the new type requested.
This error, thrown if an invalid argument is provided to exists now
correctly includes "or a subroutine". [RT #38955]
This error in Fatal
previously did not show the name of the builtin in
question (now represented by %s above).
This error previously did not state the column.
This can now also be generated by a seek on a file handle using
PerlIO::scalar
.
New error, introduced as part of the fix to RT #40641 to handle encoding of Unicode characters in regular expression comments.
A more informative fatal error issued when calling dump on Win32 and
Cygwin. (Given that the purpose of dump is to abort with a core dump,
and core dumps can't be produced on these platforms, this is more useful than
silently exiting.)
The perl sources can now be compiled with a C++ compiler instead of a C
compiler. A necessary implementation details is that under C++, the macro
XS
used to define XSUBs now includes an extern "C"
definition. A side
effect of this is that C++ code that used the construction
- typedef XS(SwigPerlWrapper);
now needs to be written
- typedef XSPROTO(SwigPerlWrapper);
using the new XSPROTO
macro, in order to compile. C extensions are
unaffected, although C extensions are encouraged to use XSPROTO
too.
This change was present in the 5.10.0 release of perl, so any actively
maintained code that happened to use this construction should already have
been adapted. Code that needs changing will fail with a compilation error.
set
magic on localizing/assigning to a magic variable will now only
trigger for container magics, i.e. it will for %ENV
or %SIG
but not for $#array
.
The new API macro newSVpvs()
can be used in place of constructions such as
newSVpvn("ISA", 3)
. It takes a single string constant, and at C compile
time determines its length.
The new API function Perl_newSV_type()
can be used as a more efficient
replacement of the common idiom
- sv = newSV(0);
- sv_upgrade(sv, type);
Similarly Perl_newSVpvn_flags()
can be used to combine
Perl_newSVpv()
with Perl_sv_2mortal()
or the equivalent
Perl_sv_newmortal()
with Perl_sv_setpvn()
Two new macros mPUSHs()
and mXPUSHs()
are added, to make it easier to
push mortal SVs onto the stack. They were then used to fix several bugs where
values on the stack had not been mortalised.
A Perl_signbit()
function was added to test the sign of an NV
. It
maps to the system one when available.
Perl_av_reify()
, Perl_lex_end()
, Perl_mod()
, Perl_op_clear()
,
Perl_pop_return()
, Perl_qerror()
, Perl_setdefout()
,
Perl_vivify_defelem()
and Perl_yylex()
are now visible to extensions.
This was required to allow Data::Alias
to work on Windows.
Perl_find_runcv()
is now visible to perl core extensions. This was required
to allow Sub::Current
to work on Windows.
ptr_table*
functions are now available in unthreaded perl. Storable
takes advantage of this.
There have been many small cleanups made to the internals. In particular,
Perl_sv_upgrade()
has been simplified considerably, with a straight-through
code path that uses memset()
and memcpy()
to initialise the new body,
rather than assignment via multiple temporary variables. It has also
benefited from simplification and de-duplication of the arena management
code.
A lot of small improvements in the code base were made due to reports from the Coverity static code analyzer.
Corrected use and documentation of Perl_gv_stashpv()
, Perl_gv_stashpvn()
,
Perl_gv_stashsv()
functions (last parameter is a bitmask, not boolean).
PERL_SYS_INIT
, PERL_SYS_INIT3
and PERL_SYS_TERM
macros have been
changed into functions.
PERLSYS_TERM
no longer requires a context. PerlIO_teardown()
is now called without a context, and debugging output in this function has
been disabled because that required that an interpreter was present, an invalid
assumption at termination time.
All compile time options which affect binary compatibility have been grouped
together into a global variable (PL_bincompat_options
).
The values of PERL_REVISION
, PERL_VERSION
and PERL_SUBVERSION
are
now baked into global variables (and hence into any shared perl library).
Additionally under MULTIPLICITY
, the perl executable now records the size of
the interpreter structure (total, and for this version). Coupled with
PL_bincompat_options
this will allow 5.8.10 (and later), when compiled with a
shared perl library, to perform sanity checks in main()
to verify that the
shared library is indeed binary compatible.
Symbolic references can now have embedded NULs. The new public function
Perl_get_cvn_flags()
can be used in extensions if you have to handle them.
The core code, and XS code in ext that is not dual-lived on CPAN, no longer
uses the macros PL_na
, NEWSV()
, Null()
, Nullav
, Nullcv
,
Nullhv
, Nullhv
etc. Their use is discouraged in new code,
particularly PL_na
, which is a small performance hit.
Many modules updated from CPAN incorporate new tests. Some core specific tests have been added:
Tests for the DynaLoader
module.
Tests for compile-time constant folding.
Tests incorporated from 5.10.0 which check that there is no unexpected
interaction between the internal types PVBM
and PVGV
.
Tests for the new form of constant subroutines.
Tests for Attribute::Handlers
.
Tests for dbmopen.
Calls all tests in t/op/inccode.t after first tying @INC
.
Tests for source filters returned from code references in @INC
.
Tests for RT #30970.
Tests for RT #41484.
Tests for the qr// construct.
Tests for the qr// construct within another regexp.
Tests for the qr// construct.
Tests for RT #32840.
Tests for study on tied scalars.
Tests for subst
run under -T
mode.
Tests for undef and delete on stash entries that are bound to
subroutines or methods.
Tests for Perl_sv_upgrade()
.
MRO tests for isa
and package aliases.
Tests for calling Pod::Parser
twice.
Tests for inheriting file descriptors across exec (close-on-exec).
Tests for the UTF-8 caching code.
Test that strange encodings do not upset Perl_pp_chr()
.
Tests for RT #40641.
Tests for RT #40641.
Tests for returning Unicode from overloaded values.
Tests for returning Unicode from tied variables.
There are no known new bugs.
However, programs that rely on bugs that have been fixed will have problems. Also, many bug fixes present in 5.10.0 can't be back-ported to the 5.8.x branch, because they require changes that are binary incompatible, or because the code changes are too large and hence too risky to incorporate.
We have only limited volunteer labour, and the maintenance burden is getting increasingly complex. Hence this will be the last significant release of the 5.8.x series. Any future releases of 5.8.x will likely only be to deal with security issues, and platform build failures. Hence you should look to migrating to 5.10.x, if you have not started already. Alternatively, if business requirements constrain you to continue to use 5.8.x, you may wish to consider commercial support from firms such as ActiveState.
readdir(), cwd()
, $^X
and @INC
now use the alternate (short)
filename if the long name is outside the current codepage (Jan Dubois).
Win32
upgraded to version 0.38. Now has a documented 'WinVista' response
from GetOSName
and support for Vista's privilege elevation in IsAdminUser
.
Support for Unicode characters in path names. Improved cygwin and Win64
compatibility.
Win32API
updated to 0.1001_01
killpg()
support added to MSWin32
(Jan Dubois).
File::Spec::Win32
upgraded to version 3.2701
OS2::Process
upgraded to 1.03
Ilya Zakharevich has added and documented several Window*
and Clipbrd*
functions.
OS2::REXX::DLL
, OS2::REXX
updated to version 1.03
DCLsym
upgraded to version 1.03
Stdio
upgraded to version 2.4
VMS::XSSymSet
upgraded to 1.1.
Nick Ing-Simmons, long time Perl hacker, author of the Tk
and Encode
modules, perlio.c in the core, and 5.003_02 pumpking, died of a heart
attack on 25th September 2006. He will be missed.
Some of the work in this release was funded by a TPF grant.
Steve Hay worked behind the scenes working out the causes of the differences between core modules, their CPAN releases, and previous core releases, and the best way to rectify them. He doesn't want to do it again. I know this feeling, and I'm very glad he did it this time, instead of me.
Paul Fenwick assembled a team of 18 volunteers, who broke the back of writing this document. In particular, Bradley Dean, Eddy Tan, and Vincent Pit provided half the team's contribution.
Schwern verified the list of updated module versions, correcting quite a few errors that I (and everyone else) had missed, both wrongly stated module versions, and changed modules that had not been listed.
The crack Berlin-based QA team of Andreas König and Slaven Rezic tirelessly re-built snapshots, tested most everything CPAN against them, and then identified the changes responsible for any module regressions, ensuring that several show-stopper bugs were stomped before the first release candidate was cut.
The other core committers contributed most of the changes, and applied most of the patches sent in by the hundreds of contributors listed in AUTHORS.
And obviously, Larry Wall, without whom we wouldn't have Perl.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://bugs.perl.org. There may also be information at http://www.perl.org, the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team. You can browse and search
the Perl 5 bugs at http://bugs.perl.org/
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perl58delta - what is new for perl v5.8.0
This document describes differences between the 5.6.0 release and the 5.8.0 release.
Many of the bug fixes in 5.8.0 were already seen in the 5.6.1 maintenance release since the two releases were kept closely coordinated (while 5.8.0 was still called 5.7.something).
Changes that were integrated into the 5.6.1 release are marked [561]
.
Many of these changes have been further developed since 5.6.1 was released,
those are marked [561+]
.
You can see the list of changes in the 5.6.1 release (both from the 5.005_03 release and the 5.6.0 release) by reading perl561delta.
Better Unicode support
New IO Implementation
New Thread Implementation
Better Numeric Accuracy
Safe Signals
Many New Modules
More Extensive Regression Testing
Perl 5.8 is not binary compatible with earlier releases of Perl.
You have to recompile your XS modules.
(Pure Perl modules should continue to work.)
The major reason for the discontinuity is the new IO architecture called PerlIO. PerlIO is the default configuration because without it many new features of Perl 5.8 cannot be used. In other words: you just have to recompile your modules containing XS code, sorry about that.
In future releases of Perl, non-PerlIO aware XS modules may become completely unsupported. This shouldn't be too difficult for module authors, however: PerlIO has been designed as a drop-in replacement (at the source code level) for the stdio interface.
Depending on your platform, there are also other reasons why we decided to break binary compatibility, please read on.
If your pointers are 64 bits wide, the Perl malloc is no longer being used because it does not work well with 8-byte pointers. Also, usually the system mallocs on such platforms are much better optimized for such large memory models than the Perl malloc. Some memory-hungry Perl applications like the PDL don't work well with Perl's malloc. Finally, other applications than Perl (such as mod_perl) tend to prefer the system malloc. Such platforms include Alpha and 64-bit HPPA, MIPS, PPC, and Sparc.
The AIX dynaloading now uses in AIX releases 4.3 and newer the native dlopen interface of AIX instead of the old emulated interface. This change will probably break backward compatibility with compiled modules. The change was made to make Perl more compliant with other applications like mod_perl which are using the AIX native interface.
my variables now handled at run-timeThe my EXPR : ATTRS
syntax now applies variable attributes at
run-time. (Subroutine and our variables still get attributes applied
at compile-time.) See attributes for additional details. In particular,
however, this allows variable attributes to be useful for tie interfaces,
which was a deficiency of earlier releases. Note that the new semantics
doesn't work with the Attribute::Handlers module (as of version 0.76).
The Socket extension is now dynamically loaded instead of being statically built in. This may or may not be a problem with ancient TCP/IP stacks of VMS: we do not know since we weren't able to test Perl in such configurations.
Perl now uses IEEE format (T_FLOAT) as the default internal floating point format on OpenVMS Alpha, potentially breaking binary compatibility with external libraries or existing data. G_FLOAT is still available as a configuration option. The default on VAX (D_FLOAT) has not changed.
use utf8
, almost)Previously in Perl 5.6 to use Unicode one would say "use utf8" and then the operations (like string concatenation) were Unicode-aware in that lexical scope.
This was found to be an inconvenient interface, and in Perl 5.8 the Unicode model has completely changed: now the "Unicodeness" is bound to the data itself, and for most of the time "use utf8" is not needed at all. The only remaining use of "use utf8" is when the Perl script itself has been written in the UTF-8 encoding of Unicode. (UTF-8 has not been made the default since there are many Perl scripts out there that are using various national eight-bit character sets, which would be illegal in UTF-8.)
See perluniintro for the explanation of the current model, and utf8 for the current use of the utf8 pragma.
Unicode scripts are now supported. Scripts are similar to (and superior to) Unicode blocks. The difference between scripts and blocks is that scripts are the glyphs used by a language or a group of languages, while the blocks are more artificial groupings of (mostly) 256 characters based on the Unicode numbering.
In general, scripts are more inclusive, but not universally so. For
example, while the script Latin
includes all the Latin characters and
their various diacritic-adorned versions, it does not include the various
punctuation or digits (since they are not solely Latin
).
A number of other properties are now supported, including \p{L&},
\p{Any}
\p{Assigned}
, \p{Unassigned}
, \p{Blank}
[561] and
\p{SpacePerl}
[561] (along with their \P{...}
versions, of course).
See perlunicode for details, and more additions.
The In
or Is
prefix to names used with the \p{...}
and \P{...}
are now almost always optional. The only exception is that a In
prefix
is required to signify a Unicode block when a block name conflicts with a
script name. For example, \p{Tibetan}
refers to the script, while
\p{InTibetan}
refers to the block. When there is no name conflict, you
can omit the In
from the block name (e.g. \p{BraillePatterns}
), but
to be safe, it's probably best to always use the In
).
A reference to a reference now stringifies as "REF(0x81485ec)" instead of "SCALAR(0x81485ec)" in order to be more consistent with the return value of ref().
The undocumented pack/unpack template letters D/F have been recycled for better use: now they stand for long double (if supported by the platform) and NV (Perl internal floating point type). (They used to be aliases for d/f, but you never knew that.)
The list of filenames from glob() (or <...>) is now by default sorted alphabetically to be csh-compliant (which is what happened before in most Unix platforms). (bsd_glob() does still sort platform natively, ASCII or EBCDIC, unless GLOB_ALPHASORT is specified.) [561]
The semantics of bless(REF, REF) were unclear and until someone proves it to make some sense, it is forbidden.
The obsolete chat2 library that should never have been allowed to escape the laboratory has been decommissioned.
Using chdir("") or chdir(undef) instead of explicit chdir() is doubtful. A failure (think chdir(some_function()) can lead into unintended chdir() to the home directory, therefore this behaviour is deprecated.
The builtin dump() function has probably outlived most of its
usefulness. The core-dumping functionality will remain in future
available as an explicit call to CORE::dump()
, but in future
releases the behaviour of an unqualified dump() call may change.
The very dusty examples in the eg/ directory have been removed. Suggestions for new shiny examples welcome but the main issue is that the examples need to be documented, tested and (most importantly) maintained.
The (bogus) escape sequences \8 and \9 now give an optional warning
("Unrecognized escape passed through"). There is no need to \-escape
any \w
character.
The *glob{FILEHANDLE} is deprecated, use *glob{IO} instead.
The package; syntax (package without an argument) has been
deprecated. Its semantics were never that clear and its
implementation even less so. If you have used that feature to
disallow all but fully qualified variables, use strict;
instead.
The unimplemented POSIX regex features [[.cc.]] and [[=c=]] are still recognised but now cause fatal errors. The previous behaviour of ignoring them by default and warning if requested was unacceptable since it, in a way, falsely promised that the features could be used.
In future releases, non-PerlIO aware XS modules may become completely unsupported. Since PerlIO is a drop-in replacement for stdio at the source code level, this shouldn't be that drastic a change.
Previous versions of perl and some readings of some sections of Camel
III implied that the :raw
"discipline" was the inverse of :crlf
.
Turning off "clrfness" is no longer enough to make a stream truly
binary. So the PerlIO :raw
layer (or "discipline", to use the Camel
book's older terminology) is now formally defined as being equivalent
to binmode(FH) - which is in turn defined as doing whatever is
necessary to pass each byte as-is without any translation. In
particular binmode(FH) - and hence :raw
- will now turn off both
CRLF and UTF-8 translation and remove other layers (e.g. :encoding())
which would modify byte stream.
The current user-visible implementation of pseudo-hashes (the weird
use of the first array element) is deprecated starting from Perl 5.8.0
and will be removed in Perl 5.10.0, and the feature will be
implemented differently. Not only is the current interface rather
ugly, but the current implementation slows down normal array and hash
use quite noticeably. The fields
pragma interface will remain
available. The restricted hashes interface is expected to
be the replacement interface (see Hash::Util). If your existing
programs depends on the underlying implementation, consider using
Class::PseudoHash from CPAN.
The syntaxes @a->[...]
and %h->{...}
have now been deprecated.
After years of trying, suidperl is considered to be too complex to ever be considered truly secure. The suidperl functionality is likely to be removed in a future release.
The 5.005 threads model (module Thread
) is deprecated and expected
to be removed in Perl 5.10. Multithreaded code should be migrated to
the new ithreads model (see threads, threads::shared and
perlthrtut).
The long deprecated uppercase aliases for the string comparison operators (EQ, NE, LT, LE, GE, GT) have now been removed.
The tr///C and tr///U features have been removed and will not return; the interface was a mistake. Sorry about that. For similar functionality, see pack('U0', ...) and pack('C0', ...). [561]
Earlier Perls treated "sub foo (@bar)" as equivalent to "sub foo (@)". The prototypes are now checked better at compile-time for invalid syntax. An optional warning is generated ("Illegal character in prototype...") but this may be upgraded to a fatal error in a future release.
The exec LIST
and system LIST
operations now produce warnings on
tainted data and in some future release they will produce fatal errors.
The existing behaviour when localising tied arrays and hashes is wrong, and will be changed in a future release, so do not rely on the existing behaviour. See Localising Tied Arrays and Hashes Is Broken.
Unicode in general should be now much more usable than in Perl 5.6.0 (or even in 5.6.1). Unicode can be used in hash keys, Unicode in regular expressions should work now, Unicode in tr/// should work now, Unicode in I/O should work now. See perluniintro for introduction and perlunicode for details.
The Unicode Character Database coming with Perl has been upgraded to Unicode 3.2.0. For more information, see http://www.unicode.org/ . [561+] (5.6.1 has UCD 3.0.1.)
For developers interested in enhancing Perl's Unicode capabilities: almost all the UCD files are included with the Perl distribution in the lib/unicore subdirectory. The most notable omission, for space considerations, is the Unihan database.
The properties \p{Blank} and \p{SpacePerl} have been added. "Blank" is like
C isblank(), that is, it contains only "horizontal whitespace" (the space
character is, the newline isn't), and the "SpacePerl" is the Unicode
equivalent of \s (\p{Space} isn't, since that includes the vertical
tabulator character, whereas \s doesn't.)
See "New Unicode Properties" earlier in this document for additional information on changes with Unicode properties.
IO is now by default done via PerlIO rather than system's "stdio". PerlIO allows "layers" to be "pushed" onto a file handle to alter the handle's behaviour. Layers can be specified at open time via 3-arg form of open:
- open($fh,'>:crlf :utf8', $path) || ...
or on already opened handles via extended binmode:
- binmode($fh,':encoding(iso-8859-7)');
The built-in layers are: unix (low level read/write), stdio (as in previous Perls), perlio (re-implementation of stdio buffering in a portable manner), crlf (does CRLF <=> "\n" translation as on Win32, but available on any platform). A mmap layer may be available if platform supports it (mostly Unixes).
Layers to be applied by default may be specified via the 'open' pragma.
See Installation and Configuration Improvements for the effects of PerlIO on your architecture name.
If your platform supports fork(), you can use the list form of open
for pipes. For example:
forks the ps(1) command (without spawning a shell, as there are more
than three arguments to open()), and reads its standard output via the
KID_PS
filehandle. See perlipc.
File handles can be marked as accepting Perl's internal encoding of Unicode (UTF-8 or UTF-EBCDIC depending on platform) by a pseudo layer ":utf8" :
- open($fh,">:utf8","Uni.txt");
Note for EBCDIC users: the pseudo layer ":utf8" is erroneously named for you since it's not UTF-8 what you will be getting but instead UTF-EBCDIC. See perlunicode, utf8, and http://www.unicode.org/unicode/reports/tr16/ for more information. In future releases this naming may change. See perluniintro for more information about UTF-8.
If your environment variables (LC_ALL, LC_CTYPE, LANG) look like you
want to use UTF-8 (any of the variables match /utf-?8/i
), your
STDIN, STDOUT, STDERR handles and the default open layer (see open)
are marked as UTF-8. (This feature, like other new features that
combine Unicode and I/O, work only if you are using PerlIO, but that's
the default.)
Note that after this Perl really does assume that everything is UTF-8: for example if some input handle is not, Perl will probably very soon complain about the input data like this "Malformed UTF-8 ..." since any old eight-bit data is not legal UTF-8.
Note for code authors: if you want to enable your users to use UTF-8
as their default encoding but in your code still have eight-bit I/O streams
(such as images or zip files), you need to explicitly open() or binmode()
with :bytes
(see open and binmode), or you
can just use binmode(FH) (nice for pre-5.8.0 backward compatibility).
File handles can translate character encodings from/to Perl's internal Unicode form on read/write via the ":encoding()" layer.
File handles can be opened to "in memory" files held in Perl scalars via:
- open($fh,'>', \$variable) || ...
Anonymous temporary files are available without need to 'use FileHandle' or other module via
That is a literal undef, not an undefined value.
The new interpreter threads ("ithreads" for short) implementation of multithreading, by Arthur Bergman, replaces the old "5.005 threads" implementation. In the ithreads model any data sharing between threads must be explicit, as opposed to the model where data sharing was implicit. See threads and threads::shared, and perlthrtut.
As a part of the ithreads implementation Perl will also use any necessary and detectable reentrant libc interfaces.
A restricted hash is restricted to a certain set of keys, no keys outside the set can be added. Also individual keys can be restricted so that the key cannot be deleted and the value cannot be changed. No new syntax is involved: the Hash::Util module is the interface.
Perl used to be fragile in that signals arriving at inopportune moments could corrupt Perl's internal state. Now Perl postpones handling of signals until it's safe (between opcodes).
This change may have surprising side effects because signals no longer interrupt Perl instantly. Perl will now first finish whatever it was doing, like finishing an internal operation (like sort()) or an external operation (like an I/O operation), and only then look at any arrived signals (and before starting the next operation). No more corrupt internal state since the current operation is always finished first, but the signal may take more time to get heard. Note that breaking out from potentially blocking operations should still work, though.
In general a lot of fixing has happened in the area of Perl's
understanding of numbers, both integer and floating point. Since in
many systems the standard number parsing functions like strtoul()
and atof()
seem to have bugs, Perl tries to work around their
deficiencies. This results hopefully in more accurate numbers.
Perl now tries internally to use integer values in numeric conversions and basic arithmetics (+ - * /) if the arguments are integers, and tries also to keep the results stored internally as integers. This change leads to often slightly faster and always less lossy arithmetics. (Previously Perl always preferred floating point numbers in its math.)
In double-quoted strings, arrays now interpolate, no matter what. The behavior in earlier versions of perl 5 was that arrays would interpolate into strings if the array had been mentioned before the string was compiled, and otherwise Perl would raise a fatal compile-time error. In versions 5.000 through 5.003, the error was
- Literal @example now requires backslash
In versions 5.004_01 through 5.6.0, the error was
- In string, @example now must be written as \@example
The idea here was to get people into the habit of writing
"fred\@example.com"
when they wanted a literal @
sign, just as
they have always written "Give me back my \$5"
when they wanted a
literal $
sign.
Starting with 5.6.1, when Perl now sees an @
sign in a
double-quoted string, it always attempts to interpolate an array,
regardless of whether or not the array has been used or declared
already. The fatal error has been downgraded to an optional warning:
- Possible unintended interpolation of @example in string
This warns you that "fred@example.com"
is going to turn into
fred.com
if you don't backslash the @
.
See http://perl.plover.com/at-error.html for more details
about the history here.
AUTOLOAD is now lvaluable, meaning that you can add the :lvalue attribute to AUTOLOAD subroutines and you can assign to the AUTOLOAD return value.
The $Config{byteorder} (and corresponding BYTEORDER in config.h) was previously wrong in platforms if sizeof(long) was 4, but sizeof(IV) was 8. The byteorder was only sizeof(long) bytes long (1234 or 4321), but now it is correctly sizeof(IV) bytes long, (12345678 or 87654321). (This problem didn't affect Windows platforms.)
Also, $Config{byteorder} is now computed dynamically--this is more robust with "fat binaries" where an executable image contains binaries for more than one binary platform, and when cross-compiling.
perl -d:Module=arg,arg,arg now works (previously one couldn't pass
in multiple arguments.)
do followed by a bareword now ensures that this bareword isn't
a keyword (to avoid a bug where do q(foo.pl)
tried to call a
subroutine called q). This means that for example instead of
do format() you must write do &format()
.
The builtin dump() now gives an optional warning
dump() better written as CORE::dump(),
meaning that by default dump(...) is resolved as the builtin
dump() which dumps core and aborts, not as (possibly) user-defined
sub dump
. To call the latter, qualify the call as &dump(...)
.
(The whole dump() feature is to considered deprecated, and possibly
removed/changed in future releases.)
chomp() and chop() are now overridable. Note, however, that their
prototype (as given by prototype("CORE::chomp") is undefined,
because it cannot be expressed and therefore one cannot really write
replacements to override these builtins.
END blocks are now run even if you exit/die in a BEGIN block. Internally, the execution of END blocks is now controlled by PL_exit_flags & PERL_EXIT_DESTRUCT_END. This enables the new behaviour for Perl embedders. This will default in 5.10. See perlembed.
Formats now support zero-padded decimal fields.
Although "you shouldn't do that", it was possible to write code that depends on Perl's hashed key order (Data::Dumper does this). The new algorithm "One-at-a-Time" produces a different hashed key order. More details are in Performance Enhancements.
lstat(FILEHANDLE) now gives a warning because the operation makes no sense. In future releases this may become a fatal error.
Spurious syntax errors generated in certain situations, when glob() caused File::Glob to be loaded for the first time, have been fixed. [561]
Lvalue subroutines can now return undef in list context. However,
the lvalue subroutine feature still remains experimental. [561+]
A lost warning "Can't declare ... dereference in my" has been restored (Perl had it earlier but it became lost in later releases.)
A new special regular expression variable has been introduced:
$^N
, which contains the most-recently closed group (submatch).
no Module;
does not produce an error even if Module does not have an
unimport() method. This parallels the behavior of use vis-a-vis
import. [561]
The numerical comparison operators return undef if either operand
is a NaN. Previously the behaviour was unspecified.
our can now have an experimental optional attribute unique
that
affects how global variables are shared among multiple interpreters,
see our.
The following builtin functions are now overridable: each(), keys(), pop(), push(), shift(), splice(), unshift(). [561]
pack() / unpack()
can now group template letters with ()
and then
apply repetition/count modifiers on the groups.
pack() / unpack()
can now process the Perl internal numeric types:
IVs, UVs, NVs-- and also long doubles, if supported by the platform.
The template letters are j
, J
, F
, and D
.
pack('U0a*', ...)
can now be used to force a string to UTF-8.
my __PACKAGE__ $obj now works. [561]
POSIX::sleep() now returns the number of unslept seconds (as the POSIX standard says), as opposed to CORE::sleep() which returns the number of slept seconds.
printf() and sprintf() now support parameter reordering using the
%\d+\$ and *\d+\$ syntaxes. For example
- printf "%2\$s %1\$s\n", "foo", "bar";
will print "bar foo\n". This feature helps in writing internationalised software, and in general when the order of the parameters can vary.
The (\&) prototype now works properly. [561]
prototype(\[$@%&]) is now available to implicitly create references (useful for example if you want to emulate the tie() interface).
A new command-line option, -t
is available. It is the
little brother of -T
: instead of dying on taint violations,
lexical warnings are given. This is only meant as a temporary
debugging aid while securing the code of old legacy applications.
This is not a substitute for -T.
In other taint news, the exec LIST
and system LIST
have now been
considered too risky (think exec @ARGV
: it can start any program
with any arguments), and now the said forms cause a warning under
lexical warnings. You should carefully launder the arguments to
guarantee their validity. In future releases of Perl the forms will
become fatal errors so consider starting laundering now.
Tied hash interfaces are now required to have the EXISTS and DELETE methods (either own or inherited).
If tr/// is just counting characters, it doesn't attempt to modify its target.
untie() will now call an UNTIE() hook if it exists. See perltie for details. [561]
utime now supports utime undef, undef, @files
to change the
file timestamps to the current time.
The rules for allowing underscores (underbars) in numeric constants have been relaxed and simplified: now you can have an underscore simply between digits.
Rather than relying on C's argv[0] (which may not contain a full pathname) where possible $^X is now set by asking the operating system. (eg by reading /proc/self/exe on Linux, /proc/curproc/file on FreeBSD)
A new variable, ${^TAINT}
, indicates whether taint mode is enabled.
You can now override the readline() builtin, and this overrides also the <FILEHANDLE> angle bracket operator.
The command-line options -s and -F are now recognized on the shebang (#!) line.
Use of the /c match modifier without an accompanying /g modifier
elicits a new warning: Use of /c modifier is meaningless without /g
.
Use of /c in substitutions, even with /g, elicits
Use of /c modifier is meaningless in s///
.
Use of /g with split elicits Use of /g modifier is meaningless
in split
.
Support for the CLONE
special subroutine had been added.
With ithreads, when a new thread is created, all Perl data is cloned,
however non-Perl data cannot be cloned automatically. In CLONE
you
can do whatever you need to do, like for example handle the cloning of
non-Perl data, if necessary. CLONE
will be executed once for every
package that has it defined or inherited. It will be called in the
context of the new thread, so all modifications are made in the new area.
See perlmod
Attribute::Handlers
, originally by Damian Conway and now maintained
by Arthur Bergman, allows a class to define attribute handlers.
Both variables and routines can have attribute handlers. Handlers can be specific to type (SCALAR, ARRAY, HASH, or CODE), or specific to the exact compilation phase (BEGIN, CHECK, INIT, or END). See Attribute::Handlers.
B::Concise
, by Stephen McCamant, is a new compiler backend for
walking the Perl syntax tree, printing concise info about ops.
The output is highly customisable. See B::Concise. [561+]
The new bignum, bigint, and bigrat pragmas, by Tels, implement transparent bignum support (using the Math::BigInt, Math::BigFloat, and Math::BigRat backends).
Class::ISA
, by Sean Burke, is a module for reporting the search
path for a class's ISA tree. See Class::ISA.
Cwd
now has a split personality: if possible, an XS extension is
used, (this will hopefully be faster, more secure, and more robust)
but if not possible, the familiar Perl implementation is used.
Devel::PPPort
, originally by Kenneth Albanowski and now
maintained by Paul Marquess, has been added. It is primarily used
by h2xs
to enhance portability of XS modules between different
versions of Perl. See Devel::PPPort.
Digest
, frontend module for calculating digests (checksums), from
Gisle Aas, has been added. See Digest.
Digest::MD5
for calculating MD5 digests (checksums) as defined in
RFC 1321, from Gisle Aas, has been added. See Digest::MD5.
NOTE: the MD5
backward compatibility module is deliberately not
included since its further use is discouraged.
See also PerlIO::via::QuotedPrint.
Encode
, originally by Nick Ing-Simmons and now maintained by Dan
Kogai, provides a mechanism to translate between different character
encodings. Support for Unicode, ISO-8859-1, and ASCII are compiled in
to the module. Several other encodings (like the rest of the
ISO-8859, CP*/Win*, Mac, KOI8-R, three variants EBCDIC, Chinese,
Japanese, and Korean encodings) are included and can be loaded at
runtime. (For space considerations, the largest Chinese encodings
have been separated into their own CPAN module, Encode::HanExtra,
which Encode will use if available). See Encode.
Any encoding supported by Encode module is also available to the ":encoding()" layer if PerlIO is used.
Hash::Util
is the interface to the new restricted hashes
feature. (Implemented by Jeffrey Friedl, Nick Ing-Simmons, and
Michael Schwern.) See Hash::Util.
I18N::Langinfo
can be used to query locale information.
See I18N::Langinfo.
I18N::LangTags
, by Sean Burke, has functions for dealing with
RFC3066-style language tags. See I18N::LangTags.
ExtUtils::Constant
, by Nicholas Clark, is a new tool for extension
writers for generating XS code to import C header constants.
See ExtUtils::Constant.
Filter::Simple
, by Damian Conway, is an easy-to-use frontend to
Filter::Util::Call. See Filter::Simple.
- # in MyFilter.pm:
- package MyFilter;
- use Filter::Simple sub {
- while (my ($from, $to) = splice @_, 0, 2) {
- s/$from/$to/g;
- }
- };
- 1;
- # in user's code:
- use MyFilter qr/red/ => 'green';
- print "red\n"; # this code is filtered, will print "green\n"
- print "bored\n"; # this code is filtered, will print "bogreen\n"
- no MyFilter;
- print "red\n"; # this code is not filtered, will print "red\n"
File::Temp
, by Tim Jenness, allows one to create temporary files
and directories in an easy, portable, and secure way. See File::Temp.
[561+]
Filter::Util::Call
, by Paul Marquess, provides you with the
framework to write source filters in Perl. For most uses, the
frontend Filter::Simple is to be preferred. See Filter::Util::Call.
if
, by Ilya Zakharevich, is a new pragma for conditional inclusion
of modules.
libnet, by Graham Barr, is a collection of perl5 modules related to network programming. See Net::FTP, Net::NNTP, Net::Ping (not part of libnet, but related), Net::POP3, Net::SMTP, and Net::Time.
Perl installation leaves libnet unconfigured; use libnetcfg to configure it.
List::Util
, by Graham Barr, is a selection of general-utility
list subroutines, such as sum(), min(), first(), and shuffle().
See List::Util.
Locale::Constants
, Locale::Country
, Locale::Currency
Locale::Language
, and Locale::Script, by Neil Bowers, have
been added. They provide the codes for various locale standards, such
as "fr" for France, "usd" for US Dollar, and "ja" for Japanese.
- use Locale::Country;
- $country = code2country('jp'); # $country gets 'Japan'
- $code = country2code('Norway'); # $code gets 'no'
See Locale::Constants, Locale::Country, Locale::Currency, and Locale::Language.
Locale::Maketext
, by Sean Burke, is a localization framework. See
Locale::Maketext, and Locale::Maketext::TPJ13. The latter is an
article about software localization, originally published in The Perl
Journal #13, and republished here with kind permission.
Math::BigRat
for big rational numbers, to accompany Math::BigInt and
Math::BigFloat, from Tels. See Math::BigRat.
Memoize
can make your functions faster by trading space for time,
from Mark-Jason Dominus. See Memoize.
MIME::Base64
, by Gisle Aas, allows you to encode data in base64,
as defined in RFC 2045 - MIME (Multipurpose Internet Mail
Extensions).
See MIME::Base64.
MIME::QuotedPrint
, by Gisle Aas, allows you to encode data
in quoted-printable encoding, as defined in RFC 2045 - MIME
(Multipurpose Internet Mail Extensions).
See also PerlIO::via::QuotedPrint.
NEXT
, by Damian Conway, is a pseudo-class for method redispatch.
See NEXT.
open is a new pragma for setting the default I/O layers
for open().
PerlIO::scalar
, by Nick Ing-Simmons, provides the implementation
of IO to "in memory" Perl scalars as discussed above. It also serves
as an example of a loadable PerlIO layer. Other future possibilities
include PerlIO::Array and PerlIO::Code. See PerlIO::scalar.
PerlIO::via
, by Nick Ing-Simmons, acts as a PerlIO layer and wraps
PerlIO layer functionality provided by a class (typically implemented
in Perl code).
PerlIO::via::QuotedPrint
, by Elizabeth Mattijsen, is an example
of a PerlIO::via
class:
This will automatically convert everything output to $fh
to
Quoted-Printable. See PerlIO::via and PerlIO::via::QuotedPrint.
Pod::ParseLink
, by Russ Allbery, has been added,
to parse L<> links in pods as described in the new
perlpodspec.
Pod::Text::Overstrike
, by Joe Smith, has been added.
It converts POD data to formatted overstrike text.
See Pod::Text::Overstrike. [561+]
Scalar::Util
is a selection of general-utility scalar subroutines,
such as blessed(), reftype(), and tainted(). See Scalar::Util.
sort is a new pragma for controlling the behaviour of sort().
Storable
gives persistence to Perl data structures by allowing the
storage and retrieval of Perl data to and from files in a fast and
compact binary format. Because in effect Storable does serialisation
of Perl data structures, with it you can also clone deep, hierarchical
datastructures. Storable was originally created by Raphael Manfredi,
but it is now maintained by Abhijit Menon-Sen. Storable has been
enhanced to understand the two new hash features, Unicode keys and
restricted hashes. See Storable.
Switch
, by Damian Conway, has been added. Just by saying
- use Switch;
you have switch
and case
available in Perl.
- use Switch;
- switch ($val) {
- case 1 { print "number 1" }
- case "a" { print "string a" }
- case [1..10,42] { print "number in list" }
- case (@array) { print "number in list" }
- case /\w+/ { print "pattern" }
- case qr/\w+/ { print "pattern" }
- case (%hash) { print "entry in hash" }
- case (\%hash) { print "entry in hash" }
- case (\&sub) { print "arg to subroutine" }
- else { print "previous case not true" }
- }
See Switch.
Test::More
, by Michael Schwern, is yet another framework for writing
test scripts, more extensive than Test::Simple. See Test::More.
Test::Simple
, by Michael Schwern, has basic utilities for writing
tests. See Test::Simple.
Text::Balanced
, by Damian Conway, has been added, for extracting
delimited text sequences from strings.
- use Text::Balanced 'extract_delimited';
- ($a, $b) = extract_delimited("'never say never', he never said", "'", '');
$a will be "'never say never'", $b will be ', he never said'.
In addition to extract_delimited(), there are also extract_bracketed(), extract_quotelike(), extract_codeblock(), extract_variable(), extract_tagged(), extract_multiple(), gen_delimited_pat(), and gen_extract_tagged(). With these, you can implement rather advanced parsing algorithms. See Text::Balanced.
threads
, by Arthur Bergman, is an interface to interpreter threads.
Interpreter threads (ithreads) is the new thread model introduced in
Perl 5.6 but only available as an internal interface for extension
writers (and for Win32 Perl for fork() emulation). See threads,
threads::shared, and perlthrtut.
threads::shared
, by Arthur Bergman, allows data sharing for
interpreter threads. See threads::shared.
Tie::File
, by Mark-Jason Dominus, associates a Perl array with the
lines of a file. See Tie::File.
Tie::Memoize
, by Ilya Zakharevich, provides on-demand loaded hashes.
See Tie::Memoize.
Tie::RefHash::Nestable
, by Edward Avis, allows storing hash
references (unlike the standard Tie::RefHash) The module is contained
within Tie::RefHash. See Tie::RefHash.
Time::HiRes
, by Douglas E. Wegscheid, provides high resolution
timing (ualarm, usleep, and gettimeofday). See Time::HiRes.
Unicode::UCD
offers a querying interface to the Unicode Character
Database. See Unicode::UCD.
Unicode::Collate
, by SADAHIRO Tomoyuki, implements the UCA
(Unicode Collation Algorithm) for sorting Unicode strings.
See Unicode::Collate.
Unicode::Normalize
, by SADAHIRO Tomoyuki, implements the various
Unicode normalization forms. See Unicode::Normalize.
XS::APItest
, by Tim Jenness, is a test extension that exercises XS
APIs. Currently only printf() is tested: how to output various
basic data types from XS.
XS::Typemap
, by Tim Jenness, is a test extension that exercises
XS typemaps. Nothing gets installed, but the code is worth studying
for extension writers.
The following independently supported modules have been updated to the newest versions from CPAN: CGI, CPAN, DB_File, File::Spec, File::Temp, Getopt::Long, Math::BigFloat, Math::BigInt, the podlators bundle (Pod::Man, Pod::Text), Pod::LaTeX [561+], Pod::Parser, Storable, Term::ANSIColor, Test, Text-Tabs+Wrap.
attributes::reftype() now works on tied arguments.
AutoLoader can now be disabled with no AutoLoader;
.
B::Deparse has been significantly enhanced by Robin Houston. It can now deparse almost all of the standard test suite (so that the tests still succeed). There is a make target "test.deparse" for trying this out.
Carp now has better interface documentation, and the @CARP_NOT interface has been added to get optional control over where errors are reported independently of @ISA, by Ben Tilly.
Class::Struct can now define the classes in compile time.
Class::Struct now assigns the array/hash element if the accessor is called with an array/hash element as the sole argument.
The return value of Cwd::fastcwd() is now tainted.
Data::Dumper now has an option to sort hashes.
Data::Dumper now has an option to dump code references using B::Deparse.
DB_File now supports newer Berkeley DB versions, among other improvements.
Devel::Peek now has an interface for the Perl memory statistics (this works only if you are using perl's malloc, and if you have compiled with debugging).
The English module can now be used without the infamous performance hit by saying
- use English '-no_match_vars';
(Assuming, of course, that you don't need the troublesome variables
$`
, $&
, or $'
.) Also, introduced @LAST_MATCH_START
and
@LAST_MATCH_END
English aliases for @-
and @+
.
ExtUtils::MakeMaker has been significantly cleaned up and fixed. The enhanced version has also been backported to earlier releases of Perl and submitted to CPAN so that the earlier releases can enjoy the fixes.
The arguments of WriteMakefile() in Makefile.PL are now checked for sanity much more carefully than before. This may cause new warnings when modules are being installed. See ExtUtils::MakeMaker for more details.
ExtUtils::MakeMaker now uses File::Spec internally, which hopefully leads to better portability.
Fcntl, Socket, and Sys::Syslog have been rewritten by Nicholas Clark to use the new-style constant dispatch section (see ExtUtils::Constant). This means that they will be more robust and hopefully faster.
File::Find now chdir()s correctly when chasing symbolic links. [561]
File::Find now has pre- and post-processing callbacks. It also correctly changes directories when chasing symbolic links. Callbacks (naughtily) exiting with "next;" instead of "return;" now work.
File::Find is now (again) reentrant. It also has been made more portable.
The warnings issued by File::Find now belong to their own category.
You can enable/disable them with use/no warnings 'File::Find';.
File::Glob::glob() has been renamed to File::Glob::bsd_glob() because the name clashes with the builtin glob(). The older name is still available for compatibility, but is deprecated. [561]
File::Glob now supports GLOB_LIMIT
constant to limit the size of
the returned list of filenames.
IPC::Open3 now allows the use of numeric file descriptors.
IO::Socket now has an atmark() method, which returns true if the socket is positioned at the out-of-band mark. The method is also exportable as a sockatmark() function.
IO::Socket::INET failed to open the specified port if the service name was not known. It now correctly uses the supplied port number as is. [561]
IO::Socket::INET has support for the ReusePort option (if your platform supports it). The Reuse option now has an alias, ReuseAddr. For clarity, you may want to prefer ReuseAddr.
IO::Socket::INET now supports a value of zero for LocalPort
(usually meaning that the operating system will make one up.)
'use lib' now works identically to @INC. Removing directories with 'no lib' now works.
Math::BigFloat and Math::BigInt have undergone a full rewrite by Tels. They are now magnitudes faster, and they support various bignum libraries such as GMP and PARI as their backends.
Math::Complex handles inf, NaN etc., better.
Net::Ping has been considerably enhanced by Rob Brown: multihoming is now supported, Win32 functionality is better, there is now time measuring functionality (optionally high-resolution using Time::HiRes), and there is now "external" protocol which uses Net::Ping::External module which runs your external ping utility and parses the output. A version of Net::Ping::External is available in CPAN.
Note that some of the Net::Ping tests are disabled when running under the Perl distribution since one cannot assume one or more of the following: enabled echo port at localhost, full Internet connectivity, or sympathetic firewalls. You can set the environment variable PERL_TEST_Net_Ping to "1" (one) before running the Perl test suite to enable all the Net::Ping tests.
POSIX::sigaction() is now much more flexible and robust. You can now install coderef handlers, 'DEFAULT', and 'IGNORE' handlers, installing new handlers was not atomic.
In Safe, %INC
is now localised in a Safe compartment so that
use/require work.
In SDBM_File on dosish platforms, some keys went missing because of lack of support for files with "holes". A workaround for the problem has been added.
In Search::Dict one can now have a pre-processing hook for the lines being searched.
The Shell module now has an OO interface.
In Sys::Syslog there is now a failover mechanism that will go through alternative connection mechanisms until the message is successfully logged.
The Test module has been significantly enhanced.
Time::Local::timelocal() does not handle fractional seconds anymore. The rationale is that neither does localtime(), and timelocal() and localtime() are supposed to be inverses of each other.
The vars pragma now supports declaring fully qualified variables.
(Something that our() does not and will not support.)
The utf8::
name space (as in the pragma) provides various
Perl-callable functions to provide low level access to Perl's
internal Unicode representation. At the moment only length()
has been implemented.
Emacs perl mode (emacs/cperl-mode.el) has been updated to version 4.31.
emacs/e2ctags.pl is now much faster.
enc2xs
is a tool for people adding their own encodings to the
Encode module.
h2ph
now supports C trigraphs.
h2xs
now produces a template README.
h2xs
now uses Devel::PPPort
for better portability between
different versions of Perl.
h2xs
uses the new ExtUtils::Constant module
which will affect newly created extensions that define constants.
Since the new code is more correct (if you have two constants where the
first one is a prefix of the second one, the first constant never
got defined), less lossy (it uses integers for integer constant,
as opposed to the old code that used floating point numbers even for
integer constants), and slightly faster, you might want to consider
regenerating your extension code (the new scheme makes regenerating
easy). h2xs now also supports C trigraphs.
libnetcfg
has been added to configure libnet.
perlbug
is now much more robust. It also sends the bug report to
perl.org, not perl.com.
perlcc
has been rewritten and its user interface (that is,
command line) is much more like that of the Unix C compiler, cc.
(The perlbc tools has been removed. Use perlcc -B
instead.)
Note that perlcc is still considered very experimental and
unsupported. [561]
perlivp
is a new Installation Verification Procedure utility
for running any time after installing Perl.
piconv
is an implementation of the character conversion utility
iconv
, demonstrating the new Encode module.
pod2html
now allows specifying a cache directory.
pod2html
now produces XHTML 1.0.
pod2html
now understands POD written using different line endings
(PC-like CRLF versus Unix-like LF versus MacClassic-like CR).
s2p
has been completely rewritten in Perl. (It is in fact a full
implementation of sed in Perl: you can use the sed functionality by
using the psed
utility.)
xsubpp
now understands POD documentation embedded in the *.xs
files. [561]
xsubpp
now supports the OUT keyword.
perl56delta details the changes between the 5.005 release and the 5.6.0 release.
perlclib documents the internal replacements for standard C library functions. (Interesting only for extension writers and Perl core hackers.) [561+]
perldebtut is a Perl debugging tutorial. [561+]
perlebcdic contains considerations for running Perl on EBCDIC platforms. [561+]
perlintro is a gentle introduction to Perl.
perliol documents the internals of PerlIO with layers.
perlmodstyle is a style guide for writing modules.
perlnewmod tells about writing and submitting a new module. [561+]
perlpacktut is a pack() tutorial.
perlpod has been rewritten to be clearer and to record the best practices gathered over the years.
perlpodspec is a more formal specification of the pod format, mainly of interest for writers of pod applications, not to people writing in pod.
perlretut is a regular expression tutorial. [561+]
perlrequick is a regular expressions quick-start guide. Yes, much quicker than perlretut. [561]
perltodo has been updated.
perltootc has been renamed as perltooc (to not to conflict with perltoot in filesystems restricted to "8.3" names).
perluniintro is an introduction to using Unicode in Perl. (perlunicode is more of a detailed reference and background information)
perlutil explains the command line utilities packaged with the Perl distribution. [561+]
The following platform-specific documents are available before the installation as README.platform, and after the installation as perlplatform:
- perlaix perlamiga perlapollo perlbeos perlbs2000
- perlce perlcygwin perldgux perldos perlepoc perlfreebsd perlhpux
- perlhurd perlirix perlmachten perlmacos perlmint perlmpeix
- perlnetware perlos2 perlos390 perlplan9 perlqnx perlsolaris
- perltru64 perluts perlvmesa perlvms perlvos perlwin32
These documents usually detail one or more of the following subjects: configuring, building, testing, installing, and sometimes also using Perl on the said platform.
Eastern Asian Perl users are now welcomed in their own languages: README.jp (Japanese), README.ko (Korean), README.cn (simplified Chinese) and README.tw (traditional Chinese), which are written in normal pod but encoded in EUC-JP, EUC-KR, EUC-CN and Big5. These will get installed as
- perljp perlko perlcn perltw
The documentation for the POSIX-BC platform is called "BS2000", to avoid confusion with the Perl POSIX module.
The documentation for the WinCE platform is called perlce (README.ce in the source code kit), to avoid confusion with the perlwin32 documentation on 8.3-restricted filesystems.
map() could get pathologically slow when the result list it generates is larger than the source list. The performance has been improved for common scenarios. [561]
sort() is also fully reentrant, in the sense that the sort function can itself call sort(). This did not work reliably in previous releases. [561]
sort() has been changed to use primarily mergesort internally as
opposed to the earlier quicksort. For very small lists this may
result in slightly slower sorting times, but in general the speedup
should be at least 20%. Additional bonuses are that the worst case
behaviour of sort() is now better (in computer science terms it now
runs in time O(N log N), as opposed to quicksort's Theta(N**2)
worst-case run time behaviour), and that sort() is now stable
(meaning that elements with identical keys will stay ordered as they
were before the sort). See the sort pragma for information.
The story in more detail: suppose you want to serve yourself a little slice of Pi.
- @digits = ( 3,1,4,1,5,9 );
A numerical sort of the digits will yield (1,1,3,4,5,9), as expected.
Which 1
comes first is hard to know, since one 1
looks pretty
much like any other. You can regard this as totally trivial,
or somewhat profound. However, if you just want to sort the even
digits ahead of the odd ones, then what will
- sort { ($a % 2) <=> ($b % 2) } @digits;
yield? The only even digit, 4
, will come first. But how about
the odd numbers, which all compare equal? With the quicksort algorithm
used to implement Perl 5.6 and earlier, the order of ties is left up
to the sort. So, as you add more and more digits of Pi, the order
in which the sorted even and odd digits appear will change.
and, for sufficiently large slices of Pi, the quicksort algorithm
in Perl 5.8 won't return the same results even if reinvoked with the
same input. The justification for this rests with quicksort's
worst case behavior. If you run
- sort { $a <=> $b } ( 1 .. $N , 1 .. $N );
(something you might approximate if you wanted to merge two sorted arrays using sort), doubling $N doesn't just double the quicksort time, it quadruples it. Quicksort has a worst case run time that can grow like N**2, so-called quadratic behaviour, and it can happen on patterns that may well arise in normal use. You won't notice this for small arrays, but you will notice it with larger arrays, and you may not live long enough for the sort to complete on arrays of a million elements. So the 5.8 quicksort scrambles large arrays before sorting them, as a statistical defence against quadratic behaviour. But that means if you sort the same large array twice, ties may be broken in different ways.
Because of the unpredictability of tie-breaking order, and the quadratic worst-case behaviour, quicksort was almost replaced completely with a stable mergesort. Stable means that ties are broken to preserve the original order of appearance in the input array. So
- sort { ($a % 2) <=> ($b % 2) } (3,1,4,1,5,9);
will yield (4,3,1,1,5,9), guaranteed. The even and odd numbers appear in the output in the same order they appeared in the input. Mergesort has worst case O(N log N) behaviour, the best value attainable. And, ironically, this mergesort does particularly well where quicksort goes quadratic: mergesort sorts (1..$N, 1..$N) in O(N) time. But quicksort was rescued at the last moment because it is faster than mergesort on certain inputs and platforms. For example, if you really don't care about the order of even and odd digits, quicksort will run in O(N) time; it's very good at sorting many repetitions of a small number of distinct elements. The quicksort divide and conquer strategy works well on platforms with relatively small, very fast, caches. Eventually, the problem gets whittled down to one that fits in the cache, from which point it benefits from the increased memory speed.
Quicksort was rescued by implementing a sort pragma to control aspects
of the sort. The stable subpragma forces stable behaviour,
regardless of algorithm. The _quicksort and _mergesort
subpragmas are heavy-handed ways to select the underlying implementation.
The leading _
is a reminder that these subpragmas may not survive
beyond 5.8. More appropriate mechanisms for selecting the implementation
exist, but they wouldn't have arrived in time to save quicksort.
Hashes now use Bob Jenkins "One-at-a-Time" hashing key algorithm ( http://burtleburtle.net/bob/hash/doobs.html ). This algorithm is reasonably fast while producing a much better spread of values than the old hashing algorithm (originally by Chris Torek, later tweaked by Ilya Zakharevich). Hash values output from the algorithm on a hash of all 3-char printable ASCII keys comes much closer to passing the DIEHARD random number generation tests. According to perlbench, this change has not affected the overall speed of Perl.
unshift() should now be noticeably faster.
INSTALL now explains how you can configure Perl to use 64-bit integers even on non-64-bit platforms.
Policy.sh policy change: if you are reusing a Policy.sh file (see INSTALL) and you use Configure -Dprefix=/foo/bar and in the old Policy $prefix eq $siteprefix and $prefix eq $vendorprefix, all of them will now be changed to the new prefix, /foo/bar. (Previously only $prefix changed.) If you do not like this new behaviour, specify prefix, siteprefix, and vendorprefix explicitly.
A new optional location for Perl libraries, otherlibdirs, is available. It can be used for example for vendor add-ons without disturbing Perl's own library directories.
In many platforms, the vendor-supplied 'cc' is too stripped-down to build Perl (basically, 'cc' doesn't do ANSI C). If this seems to be the case and 'cc' does not seem to be the GNU C compiler 'gcc', an automatic attempt is made to find and use 'gcc' instead.
gcc needs to closely track the operating system release to avoid build problems. If Configure finds that gcc was built for a different operating system release than is running, it now gives a clearly visible warning that there may be trouble ahead.
Since Perl 5.8 is not binary-compatible with previous releases of Perl, Configure no longer suggests including the 5.005 modules in @INC.
Configure -S
can now run non-interactively. [561]
Configure support for pdp11-style memory models has been removed due to obsolescence. [561]
configure.gnu now works with options with whitespace in them.
installperl now outputs everything to STDERR.
Because PerlIO is now the default on most platforms, "-perlio" doesn't get appended to the $Config{archname} (also known as $^O) anymore. Instead, if you explicitly choose not to use perlio (Configure command line option -Uuseperlio), you will get "-stdio" appended.
Another change related to the architecture name is that "-64all" (-Duse64bitall, or "maximally 64-bit") is appended only if your pointers are 64 bits wide. (To be exact, the use64bitall is ignored.)
In AFS installations, one can configure the root of the AFS to be
somewhere else than the default /afs by using the Configure
parameter -Dafsroot=/some/where/else.
APPLLIB_EXP, a lesser-known configuration-time definition, has been documented. It can be used to prepend site-specific directories to Perl's default search path (@INC); see INSTALL for information.
The version of Berkeley DB used when the Perl (and, presumably, the
DB_File extension) was built is now available as
@Config{qw(db_version_major db_version_minor db_version_patch)}
from Perl and as DB_VERSION_MAJOR_CFG DB_VERSION_MINOR_CFG
DB_VERSION_PATCH_CFG
from C.
Building Berkeley DB3 for compatibility modes for DB, NDBM, and ODBM has been documented in INSTALL.
If you have CPAN access (either network or a local copy such as a CD-ROM) you can during specify extra modules to Configure to build and install with Perl using the -Dextras=... option. See INSTALL for more details.
In addition to config.over, a new override file, config.arch, is available. This file is supposed to be used by hints file writers for architecture-wide changes (as opposed to config.over which is for site-wide changes).
If your file system supports symbolic links, you can build Perl outside of the source directory by
- mkdir perl/build/directory
- cd perl/build/directory
- sh /path/to/perl/source/Configure -Dmksymlinks ...
This will create in perl/build/directory a tree of symbolic links pointing to files in /path/to/perl/source. The original files are left unaffected. After Configure has finished, you can just say
- make all test
and Perl will be built and tested, all in perl/build/directory. [561]
For Perl developers, several new make targets for profiling and debugging have been added; see perlhack.
Use of the gprof tool to profile Perl has been documented in perlhack. There is a make target called "perl.gprof" for generating a gprofiled Perl executable.
If you have GCC 3, there is a make target called "perl.gcov" for creating a gcoved Perl executable for coverage analysis. See perlhack.
If you are on IRIX or Tru64 platforms, new profiling/debugging options have been added; see perlhack for more information about pixie and Third Degree.
Guidelines of how to construct minimal Perl installations have been added to INSTALL.
The Thread extension is now not built at all under ithreads
(Configure -Duseithreads
) because it wouldn't work anyway (the
Thread extension requires being Configured with -Duse5005threads
).
Note that the 5.005 threads are unsupported and deprecated: if you have code written for the old threads you should migrate it to the new ithreads model.
The Gconvert macro ($Config{d_Gconvert}) used by perl for stringifying floating-point numbers is now more picky about using sprintf %.*g rules for the conversion. Some platforms that used to use gcvt may now resort to the slower sprintf.
The obsolete method of making a special (e.g., debugging) flavor of perl by saying
- make LIBPERL=libperld.a
has been removed. Use -DDEBUGGING instead.
For the list of platforms known to support Perl, see Supported Platforms in perlport.
AIX dynamic loading should be now better supported.
AIX should now work better with gcc, threads, and 64-bitness. Also the long doubles support in AIX should be better now. See perlaix.
AtheOS ( http://www.atheos.cx/ ) is a new platform.
BeOS has been reclaimed.
The DG/UX platform now supports 5.005-style threads. See perldgux.
The DYNIX/ptx platform (also known as dynixptx) is supported at or near osvers 4.5.2.
EBCDIC platforms (z/OS (also known as OS/390), POSIX-BC, and VM/ESA) have been regained. Many test suite tests still fail and the co-existence of Unicode and EBCDIC isn't quite settled, but the situation is much better than with Perl 5.6. See perlos390, perlbs2000 (for POSIX-BC), and perlvmesa for more information. (Note: support for VM/ESA was removed in Perl v5.18.0. The relevant information was in README.vmesa)
Building perl with -Duseithreads or -Duse5005threads now works under HP-UX 10.20 (previously it only worked under 10.30 or later). You will need a thread library package installed. See README.hpux. [561]
Mac OS Classic is now supported in the mainstream source package (MacPerl has of course been available since perl 5.004 but now the source code bases of standard Perl and MacPerl have been synchronised) [561]
Mac OS X (or Darwin) should now be able to build Perl even on HFS+ filesystems. (The case-insensitivity used to confuse the Perl build process.)
NCR MP-RAS is now supported. [561]
All the NetBSD specific patches (except for the installation specific ones) have been merged back to the main distribution.
NetWare from Novell is now supported. See perlnetware.
NonStop-UX is now supported. [561]
NEC SUPER-UX is now supported.
All the OpenBSD specific patches (except for the installation specific ones) have been merged back to the main distribution.
Perl has been tested with the GNU pth userlevel thread package ( http://www.gnu.org/software/pth/pth.html ). All thread tests of Perl now work, but not without adding some yield()s to the tests, so while pth (and other userlevel thread implementations) can be considered to be "working" with Perl ithreads, keep in mind the possible non-preemptability of the underlying thread implementation.
Stratus VOS is now supported using Perl's native build method (Configure). This is the recommended method to build Perl on VOS. The older methods, which build miniperl, are still available. See perlvos. [561+]
The Amdahl UTS Unix mainframe platform is now supported. [561]
WinCE is now supported. See perlce.
z/OS (formerly known as OS/390, formerly known as MVS OE) now has support for dynamic loading. This is not selected by default, however, you must specify -Dusedl in the arguments of Configure. [561]
Numerous memory leaks and uninitialized memory accesses have been hunted down. Most importantly, anonymous subs used to leak quite a bit. [561]
The autouse pragma didn't work for Multi::Part::Function::Names.
caller() could cause core dumps in certain situations. Carp was
sometimes affected by this problem. In particular, caller() now
returns a subroutine name of (unknown)
for subroutines that have
been removed from the symbol table.
chop(@list) in list context returned the characters chopped in reverse order. This has been reversed to be in the right order. [561]
Configure no longer includes the DBM libraries (dbm, gdbm, db, ndbm) when building the Perl binary. The only exception to this is SunOS 4.x, which needs them. [561]
The behaviour of non-decimal but numeric string constants such as "0x23" was platform-dependent: in some platforms that was seen as 35, in some as 0, in some as a floating point number (don't ask). This was caused by Perl's using the operating system libraries in a situation where the result of the string to number conversion is undefined: now Perl consistently handles such strings as zero in numeric contexts.
Several debugger fixes: exit code now reflects the script exit code,
condition "0"
now treated correctly, the d
command now checks
line number, $.
no longer gets corrupted, and all debugger output
now goes correctly to the socket if RemotePort is set. [561]
The debugger (perl5db.pl) has been modified to present a more consistent commands interface, via (CommandSet=580). perl5db.t was also added to test the changes, and as a placeholder for further tests.
See perldebug.
The debugger has a new dumpDepth
option to control the maximum
depth to which nested structures are dumped. The x
command has
been extended so that x N EXPR
dumps out the value of EXPR to a
depth of at most N levels.
The debugger can now show lexical variables if you have the CPAN module PadWalker installed.
The order of DESTROYs has been made more predictable.
Perl 5.6.0 could emit spurious warnings about redefinition of dl_error() when statically building extensions into perl. This has been corrected. [561]
dprofpp -R didn't work.
*foo{FORMAT}
now works.
Infinity is now recognized as a number.
UNIVERSAL::isa no longer caches methods incorrectly. (This broke the Tk extension with 5.6.0.) [561]
Lexicals I: lexicals outside an eval "" weren't resolved correctly inside a subroutine definition inside the eval "" if they were not already referenced in the top level of the eval""ed code.
Lexicals II: lexicals leaked at file scope into subroutines that were declared before the lexicals.
Lexical warnings now propagating correctly between scopes
and into eval "..."
.
use warnings qw(FATAL all)
did not work as intended. This has been
corrected. [561]
warnings::enabled() now reports the state of $^W correctly if the caller isn't using lexical warnings. [561]
Line renumbering with eval and #line
now works. [561]
Fixed numerous memory leaks, especially in eval "".
Localised tied variables no longer leak memory
Localised hash elements (and %ENV) are correctly unlocalised to not exist, if they didn't before they were localised.
As a side effect of this fix, tied hash interfaces must define the EXISTS and DELETE methods.
mkdir() now ignores trailing slashes in the directory name, as mandated by POSIX.
Some versions of glibc have a broken modfl(). This affects builds
with -Duselongdouble
. This version of Perl detects this brokenness
and has a workaround for it. The glibc release 2.2.2 is known to have
fixed the modfl() bug.
Modulus of unsigned numbers now works (4063328477 % 65535 used to return 27406, instead of 27047). [561]
Some "not a number" warnings introduced in 5.6.0 eliminated to be more compatible with 5.005. Infinity is now recognised as a number. [561]
Numeric conversions did not recognize changes in the string value properly in certain circumstances. [561]
Attributes (such as :shared) didn't work with our().
our() variables will not cause bogus "Variable will not stay shared" warnings. [561]
"our" variables of the same name declared in two sibling blocks resulted in bogus warnings about "redeclaration" of the variables. The problem has been corrected. [561]
pack "Z" now correctly terminates the string with "\0".
Fix password routines which in some shadow password platforms (e.g. HP-UX) caused getpwent() to return every other entry.
The PERL5OPT environment variable (for passing command line arguments to Perl) didn't work for more than a single group of options. [561]
PERL5OPT with embedded spaces didn't work.
printf() no longer resets the numeric locale to "C".
qw(a\\b) now parses correctly as 'a\\b'
: that is, as three
characters, not four. [561]
pos() did not return the correct value within s///ge in earlier versions. This is now handled correctly. [561]
Printing quads (64-bit integers) with printf/sprintf now works without the q L ll prefixes (assuming you are on a quad-capable platform).
Regular expressions on references and overloaded scalars now work. [561+]
Right-hand side magic (GMAGIC) could in many cases such as string concatenation be invoked too many times.
scalar() now forces scalar context even when used in void context.
SOCKS support is now much more robust.
sort() arguments are now compiled in the right wantarray context (they were accidentally using the context of the sort() itself). The comparison block is now run in scalar context, and the arguments to be sorted are always provided list context. [561]
Changed the POSIX character class [[:space:]] to include the (very
rarely used) vertical tab character. Added a new POSIX-ish character
class [[:blank:]] which stands for horizontal whitespace
(currently, the space and the tab).
The tainting behaviour of sprintf() has been rationalized. It does not taint the result of floating point formats anymore, making the behaviour consistent with that of string interpolation. [561]
Some cases of inconsistent taint propagation (such as within hash values) have been fixed.
The RE engine found in Perl 5.6.0 accidentally pessimised certain kinds of simple pattern matches. These are now handled better. [561]
Regular expression debug output (whether through use re 'debug'
or via -Dr
) now looks better. [561]
Multi-line matches like "a\nxb\n" =~ /(?!\A)x/m
were flawed. The
bug has been fixed. [561]
Use of $& could trigger a core dump under some situations. This is now avoided. [561]
The regular expression captured submatches ($1, $2, ...) are now more consistently unset if the match fails, instead of leaving false data lying around in them. [561]
readline() on files opened in "slurp" mode could return an extra "" (blank line) at the end in certain situations. This has been corrected. [561]
Autovivification of symbolic references of special variables described
in perlvar (as in ${$num}
) was accidentally disabled. This works
again now. [561]
Sys::Syslog ignored the LOG_AUTH
constant.
$AUTOLOAD, sort(), lock(), and spawning subprocesses in multiple threads simultaneously are now thread-safe.
Tie::Array's SPLICE method was broken.
Allow a read-only string on the left-hand side of a non-modifying tr///.
If STDERR
is tied, warnings caused by warn and die now
correctly pass to it.
Several Unicode fixes.
BOMs (byte order marks) at the beginning of Perl files (scripts, modules) should now be transparently skipped. UTF-16 and UCS-2 encoded Perl files should now be read correctly.
The character tables have been updated to Unicode 3.2.0.
Comparing with utf8 data does not magically upgrade non-utf8 data into utf8. (This was a problem for example if you were mixing data from I/O and Unicode data: your output might have got magically encoded as UTF-8.)
Generating illegal Unicode code points such as U+FFFE, or the UTF-16 surrogates, now also generates an optional warning.
IsAlnum
, IsAlpha
, and IsWord
now match titlecase.
Concatenation with the . operator or via variable interpolation,
eq
, substr, reverse, quotemeta, the x
operator,
substitution with s///, single-quoted UTF-8, should now work.
The tr/// operator now works. Note that the tr///CU
functionality has been removed (but see pack('U0', ...)).
eval "v200"
now works.
Perl 5.6.0 parsed m/\x{ab}/ incorrectly, leading to spurious warnings. This has been corrected. [561]
Zero entries were missing from the Unicode classes such as IsDigit
.
Large unsigned numbers (those above 2**31) could sometimes lose their unsignedness, causing bogus results in arithmetic operations. [561]
The Perl parser has been stress tested using both random input and Markov chain input and the few found crashes and lockups have been fixed.
BSDI 4.*
Perl now works on post-4.0 BSD/OSes.
All BSDs
Setting $0
now works (as much as possible; see perlvar for details).
Cygwin
Numerous updates; currently synchronised with Cygwin 1.3.10.
Previously DYNIX/ptx had problems in its Configure probe for non-blocking I/O.
EPOC
EPOC now better supported. See README.epoc. [561]
FreeBSD 3.*
Perl now works on post-3.0 FreeBSDs.
HP-UX
README.hpux updated; Configure -Duse64bitall
now works;
now uses HP-UX malloc instead of Perl malloc.
IRIX
Numerous compilation flag and hint enhancements; accidental mixing of 32-bit and 64-bit libraries (a doomed attempt) made much harder.
Linux
Long doubles should now work (see INSTALL). [561]
Linux previously had problems related to sockaddrlen when using accept(), recvfrom() (in Perl: recv()), getpeername(), and getsockname().
Mac OS Classic
Compilation of the standard Perl distribution in Mac OS Classic should now work if you have the Metrowerks development environment and the missing Mac-specific toolkit bits. Contact the macperl mailing list for details.
MPE/iX
MPE/iX update after Perl 5.6.0. See README.mpeix. [561]
NetBSD/threads: try installing the GNU pth (should be in the packages collection, or http://www.gnu.org/software/pth/), and Configure with -Duseithreads.
NetBSD/sparc
Perl now works on NetBSD/sparc.
OS/2
Now works with usethreads (see INSTALL). [561]
Solaris
64-bitness using the Sun Workshop compiler now works.
Stratus VOS
The native build method requires at least VOS Release 14.5.0 and GNU C++/GNU Tools 2.0.1 or later. The Perl pack function now maps overflowed values to +infinity and underflowed values to -infinity.
Tru64 (aka Digital UNIX, aka DEC OSF/1)
The operating system version letter now recorded in $Config{osvers}. Allow compiling with gcc (previously explicitly forbidden). Compiling with gcc still not recommended because buggy code results, even with gcc 2.95.2.
Unicos
Fixed various alignment problems that lead into core dumps either during build or later; no longer dies on math errors at runtime; now using full quad integers (64 bits), previously was using only 46 bit integers for speed.
VMS
See Socket Extension Dynamic in VMS and IEEE-format Floating Point Default on OpenVMS Alpha for important changes not otherwise listed here.
chdir() now works better despite a CRT bug; now works with MULTIPLICITY (see INSTALL); now works with Perl's malloc.
The tainting of %ENV
elements via keys or values was previously
unimplemented. It now works as documented.
The waitpid emulation has been improved. The worst bug (now fixed)
was that a pid of -1 would cause a wildcard search of all processes on
the system.
POSIX-style signals are now emulated much better on VMS versions prior to 7.0.
The system function and backticks operator have improved
functionality and better error handling. [561]
File access tests now use current process privileges rather than the user's default privileges, which could sometimes result in a mismatch between reported access and actual access. This improvement is only available on VMS v6.0 and later.
There is a new kill implementation based on sys$sigprc
that allows
older VMS systems (pre-7.0) to use kill to send signals rather than
simply force exit. This implementation also allows later systems to
call kill from within a signal handler.
Iterative logical name translations are now limited to 10 iterations in imitation of SHOW LOGICAL and other OpenVMS facilities.
Windows
Signal handling now works better than it used to. It is now implemented using a Windows message loop, and is therefore less prone to random crashes.
fork() emulation is now more robust, but still continues to have a few esoteric bugs and caveats. See perlfork for details. [561+]
A failed (pseudo)fork now returns undef and sets errno to EAGAIN. [561]
The following modules now work on Windows:
- ExtUtils::Embed [561]
- IO::Pipe
- IO::Poll
- Net::Ping
IO::File::new_tmpfile() is no longer limited to 32767 invocations per-process.
Better chdir() return value for a non-existent directory.
Compiling perl using the 64-bit Platform SDK tools is now supported.
The Win32::SetChildShowWindow() builtin can be used to control the visibility of windows created by child processes. See Win32 for details.
Non-blocking waits for child processes (or pseudo-processes) are
supported via waitpid($pid, &POSIX::WNOHANG)
.
The behavior of system() with multiple arguments has been rationalized.
Each unquoted argument will be automatically quoted to protect whitespace,
and any existing whitespace in the arguments will be preserved. This
improves the portability of system(@args) by avoiding the need for
Windows cmd
shell specific quoting in perl programs.
Note that this means that some scripts that may have relied on earlier
buggy behavior may no longer work correctly. For example,
system("nmake /nologo", @args)
will now attempt to run the file
nmake /nologo
and will fail when such a file isn't found.
On the other hand, perl will now execute code such as
system("c:/Program Files/MyApp/foo.exe", @args)
correctly.
The perl header files no longer suppress common warnings from the Microsoft Visual C++ compiler. This means that additional warnings may now show up when compiling XS code.
Borland C++ v5.5 is now a supported compiler that can build Perl. However, the generated binaries continue to be incompatible with those generated by the other supported compilers (GCC and Visual C++). [561]
Duping socket handles with open(F, ">&MYSOCK") now works under Windows 9x. [561]
Current directory entries in %ENV are now correctly propagated to child processes. [561]
New %ENV entries now propagate to subprocesses. [561]
Win32::GetCwd() correctly returns C:\ instead of C: when at the drive root. Other bugs in chdir() and Cwd::cwd() have also been fixed. [561]
The makefiles now default to the features enabled in ActiveState ActivePerl (a popular Win32 binary distribution). [561]
HTML files will now be installed in c:\perl\html instead of c:\perl\lib\pod\html
REG_EXPAND_SZ keys are now allowed in registry settings used by perl. [561]
Can now send() from all threads, not just the first one. [561]
ExtUtils::MakeMaker now uses $ENV{LIB} to search for libraries. [561]
Less stack reserved per thread so that more threads can run concurrently. (Still 16M per thread.) [561]
File::Spec->tmpdir()
now prefers C:/temp over /tmp
(works better when perl is running as service).
Better UNC path handling under ithreads. [561]
wait(), waitpid(), and backticks now return the correct exit status under Windows 9x. [561]
A socket handle leak in accept() has been fixed. [561]
Please see perldiag for more details.
Ambiguous range in the transliteration operator (like a-z-9) now gives a warning.
chdir("") and chdir(undef) now give a deprecation warning because they cause a possible unintentional chdir to the home directory. Say chdir() if you really mean that.
Two new debugging options have been added: if you have compiled your Perl with debugging, you can use the -DT [561] and -DR options to trace tokenising and to add reference counts to displaying variables, respectively.
The lexical warnings category "deprecated" is no longer a sub-category of the "syntax" category. It is now a top-level category in its own right.
Unadorned dump() will now give a warning suggesting to use explicit CORE::dump() if that's what really is meant.
The "Unrecognized escape" warning has been extended to include \8
,
\9
, and \_
. There is no need to escape any of the \w
characters.
All regular expression compilation error messages are now hopefully
easier to understand both because the error message now comes before
the failed regex and because the point of failure is now clearly
marked by a <-- HERE
marker.
Various I/O (and socket) functions like binmode(), close(), and so forth now more consistently warn if they are used illogically either on a yet unopened or on an already closed filehandle (or socket).
Using lstat() on a filehandle now gives a warning. (It's a non-sensical thing to do.)
The -M
and -m options now warn if you didn't supply the module name.
If you in use specify a required minimum version, modules matching
the name and but not defining a $VERSION will cause a fatal failure.
Using negative offset for vec() in lvalue context is now a warnable offense.
Odd number of arguments to overload::constant now elicits a warning.
Odd number of elements in anonymous hash now elicits a warning.
The various "opened only for", "on closed", "never opened" warnings
drop the main::
prefix for filehandles in the main
package,
for example STDIN
instead of main::STDIN
.
Subroutine prototypes are now checked more carefully, you may get warnings for example if you have used non-prototype characters.
If an attempt to use a (non-blessed) reference as an array index is made, a warning is given.
push @a;
and unshift @a;
(with no values to push or unshift)
now give a warning. This may be a problem for generated and eval'ed
code.
If you try to pack a number less than 0 or larger than 255
using the "C"
format you will get an optional warning. Similarly
for the "c"
format and a number less than -128 or more than 127.
pack P
format now demands an explicit size.
unpack w
now warns of unterminated compressed integers.
Warnings relating to the use of PerlIO have been added.
Certain regex modifiers such as (?o) make sense only if applied to
the entire regex. You will get an optional warning if you try to do
otherwise.
Variable length lookbehind has not yet been implemented, trying to use it will tell that.
Using arrays or hashes as references (e.g. %foo->{bar}
has been deprecated for a while. Now you will get an optional warning.
Warnings relating to the use of the new restricted hashes feature have been added.
Self-ties of arrays and hashes are not supported and fatal errors will happen even at an attempt to do so.
Using sort in scalar context now issues an optional warning.
This didn't do anything useful, as the sort was not performed.
Using the /g modifier in split() is meaningless and will cause a warning.
Using splice() past the end of an array now causes a warning.
Malformed Unicode encodings (UTF-8 and UTF-16) cause a lot of warnings, as does trying to use UTF-16 surrogates (which are unimplemented).
Trying to use Unicode characters on an I/O stream without marking the stream's encoding (using open() or binmode()) will cause "Wide character" warnings.
Use of v-strings in use/require causes a (backward) portability warning.
Warnings relating to the use interpreter threads and their shared data have been added.
PerlIO is now the default.
perlapi.pod (a companion to perlguts) now attempts to document the internal API.
You can now build a really minimal perl called microperl.
Building microperl does not require even running Configure;
make -f Makefile.micro
should be enough. Beware: microperl makes
many assumptions, some of which may be too bold; the resulting
executable may crash or otherwise misbehave in wondrous ways.
For careful hackers only.
Added rsignal(), whichsig(), do_join(), op_clear, op_null, ptr_table_clear(), ptr_table_free(), sv_setref_uv(), and several UTF-8 interfaces to the publicised API. For the full list of the available APIs see perlapi.
Made possible to propagate customised exceptions via croak()ing.
Now xsubs can have attributes just like subs. (Well, at least the built-in attributes.)
dTHR and djSP have been obsoleted; the former removed (because it's a no-op) and the latter replaced with dSP.
PERL_OBJECT has been completely removed.
The MAGIC constants (e.g. 'P'
) have been macrofied
(e.g. PERL_MAGIC_TIED
) for better source code readability
and maintainability.
The regex compiler now maintains a structure that identifies nodes in
the compiled bytecode with the corresponding syntactic features of the
original regex expression. The information is attached to the new
offsets
member of the struct regexp
. See perldebguts for more
complete information.
The C code has been made much more gcc -Wall
clean. Some warning
messages still remain in some platforms, so if you are compiling with
gcc you may see some warnings about dubious practices. The warnings
are being worked on.
perly.c, sv.c, and sv.h have now been extensively commented.
Documentation on how to use the Perl source repository has been added to Porting/repository.pod.
There are now several profiling make targets.
(This change was already made in 5.7.0 but bears repeating here.) (5.7.0 came out before 5.6.1: the development branch 5.7 released earlier than the maintenance branch 5.6)
A potential security vulnerability in the optional suidperl component of Perl was identified in August 2000. suidperl is neither built nor installed by default. As of November 2001 the only known vulnerable platform is Linux, most likely all Linux distributions. CERT and various vendors and distributors have been alerted about the vulnerability. See http://www.cpan.org/src/5.0/sperl-2000-08-05/sperl-2000-08-05.txt for more information.
The problem was caused by Perl trying to report a suspected security exploit attempt using an external program, /bin/mail. On Linux platforms the /bin/mail program had an undocumented feature which when combined with suidperl gave access to a root shell, resulting in a serious compromise instead of reporting the exploit attempt. If you don't have /bin/mail, or if you have 'safe setuid scripts', or if suidperl is not installed, you are safe.
The exploit attempt reporting feature has been completely removed from Perl 5.8.0 (and the maintenance release 5.6.1, and it was removed also from all the Perl 5.7 releases), so that particular vulnerability isn't there anymore. However, further security vulnerabilities are, unfortunately, always possible. The suidperl functionality is most probably going to be removed in Perl 5.10. In any case, suidperl should only be used by security experts who know exactly what they are doing and why they are using suidperl instead of some other solution such as sudo ( see http://www.courtesan.com/sudo/ ).
Several new tests have been added, especially for the lib and ext subsections. There are now about 69 000 individual tests (spread over about 700 test scripts), in the regression suite (5.6.1 has about 11 700 tests, in 258 test scripts) The exact numbers depend on the platform and Perl configuration used. Many of the new tests are of course introduced by the new modules, but still in general Perl is now more thoroughly tested.
Because of the large number of tests, running the regression suite will take considerably longer time than it used to: expect the suite to take up to 4-5 times longer to run than in perl 5.6. On a really fast machine you can hope to finish the suite in about 6-8 minutes (wallclock time).
The tests are now reported in a different order than in earlier Perls. (This happens because the test scripts from under t/lib have been moved to be closer to the library/extension they are testing.)
The compiler suite is slowly getting better but it continues to be highly experimental. Use in production environments is discouraged.
- local %tied_array;
doesn't work as one would expect: the old value is restored incorrectly. This will be changed in a future release, but we don't know yet what the new semantics will exactly be. In any case, the change will break existing code that relies on the current (ill-defined) semantics, so just avoid doing this in general.
Some extensions like mod_perl are known to have issues with `largefiles', a change brought by Perl 5.6.0 in which file offsets default to 64 bits wide, where supported. Modules may fail to compile at all, or they may compile and work incorrectly. Currently, there is no good solution for the problem, but Configure now provides appropriate non-largefile ccflags, ldflags, libswanted, and libs in the %Config hash (e.g., $Config{ccflags_nolargefiles}) so the extensions that are having problems can try configuring themselves without the largefileness. This is admittedly not a clean solution, and the solution may not even work at all. One potential failure is whether one can (or, if one can, whether it's a good idea to) link together at all binaries with different ideas about file offsets; all this is platform-dependent.
- for (1..5) { $_++ }
works without complaint. It shouldn't. (You should be able to modify only lvalue elements inside the loops.) You can see the correct behaviour by replacing the 1..5 with 1, 2, 3, 4, 5.
Use mod_perl 1.27 or higher.
Don't panic. Read the 'make test' section of INSTALL instead.
Use libwww-perl 5.65 or later.
Use PDL 2.3.4 or later.
You may get errors like 'Undefined symbol "Perl_get_sv"' or "can't resolve symbol 'Perl_get_sv'", or the symbol may be "Perl_sv_2pv". This probably means that you are trying to use an older shared Perl library (or extensions linked with such) with Perl 5.8.0 executable. Perl used to have such a subroutine, but that is no more the case. Check your shared library path, and any shared Perl libraries in those directories.
Sometimes this problem may also indicate a partial Perl 5.8.0 installation, see Mac OS X dyld undefined symbols for an example and how to deal with it.
Self-tying of arrays and hashes is broken in rather deep and hard-to-fix ways. As a stop-gap measure to avoid people from getting frustrated at the mysterious results (core dumps, most often), it is forbidden for now (you will get a fatal error even from an attempt).
A change to self-tying of globs has caused them to be recursively referenced (see: Two-Phased Garbage Collection in perlobj). You will now need an explicit untie to destroy a self-tied glob. This behaviour may be fixed at a later date.
Self-tying of scalars and IO thingies works.
If this test fails, it indicates that your libc (C library) is not threadsafe. This particular test stress tests the localtime() call to find out whether it is threadsafe. See perlthrtut for more information.
Note that support for 5.005-style threading is deprecated, experimental and practically unsupported. In 5.10, it is expected to be removed. You should migrate your code to ithreads.
The following tests are known to fail due to fundamental problems in the 5.005 threading implementation. These are not new failures--Perl 5.005_0x has the same bugs, but didn't have these tests.
- ../ext/B/t/xref.t 255 65280 14 12 85.71% 3-14
- ../ext/List/Util/t/first.t 255 65280 7 4 57.14% 2 5-7
- ../lib/English.t 2 512 54 2 3.70% 2-3
- ../lib/FileCache.t 5 1 20.00% 5
- ../lib/Filter/Simple/t/data.t 6 3 50.00% 1-3
- ../lib/Filter/Simple/t/filter_only. 9 3 33.33% 1-2 5
- ../lib/Math/BigInt/t/bare_mbf.t 1627 4 0.25% 8 11 1626-1627
- ../lib/Math/BigInt/t/bigfltpm.t 1629 4 0.25% 10 13 1628-
- 1629
- ../lib/Math/BigInt/t/sub_mbf.t 1633 4 0.24% 8 11 1632-1633
- ../lib/Math/BigInt/t/with_sub.t 1628 4 0.25% 9 12 1627-1628
- ../lib/Tie/File/t/31_autodefer.t 255 65280 65 32 49.23% 34-65
- ../lib/autouse.t 10 1 10.00% 4
- op/flip.t 15 1 6.67% 15
These failures are unlikely to get fixed as 5.005-style threads are considered fundamentally broken. (Basically what happens is that competing threads can corrupt shared global state, one good example being regular expression engine's state.)
The following tests may fail intermittently because of timing problems, for example if the system is heavily loaded.
- t/op/alarm.t
- ext/Time/HiRes/HiRes.t
- lib/Benchmark.t
- lib/Memoize/t/expmod_t.t
- lib/Memoize/t/speed.t
In case of failure please try running them manually, for example
- ./perl -Ilib ext/Time/HiRes/HiRes.t
For normal arrays $foo = \$bar[1]
will assign undef to
$bar[1]
(assuming that it didn't exist before), but for
tied/magical arrays and hashes such autovivification does not happen
because there is currently no way to catch the reference creation.
The same problem affects slicing over non-existent indices/keys of
a tied/magical array/hash.
One can have Unicode in identifier names, but not in package/class or subroutine names. While some limited functionality towards this does exist as of Perl 5.8.0, that is more accidental than designed; use of Unicode for the said purposes is unsupported.
One reason of this unfinishedness is its (currently) inherent unportability: since both package names and subroutine names may need to be mapped to file and directory names, the Unicode capability of the filesystem becomes important-- and there unfortunately aren't portable answers.
If using the AIX native make command, instead of just "make" issue "make all". In some setups the former has been known to spuriously also try to run "make install". Alternatively, you may want to use GNU make.
In AIX 4.2, Perl extensions that use C++ functions that use statics may have problems in that the statics are not getting initialized. In newer AIX releases, this has been solved by linking Perl with the libC_r library, but unfortunately in AIX 4.2 the said library has an obscure bug where the various functions related to time (such as time() and gettimeofday()) return broken values, and therefore in AIX 4.2 Perl is not linked against libC_r.
vac 5.0.0.0 May Produce Buggy Code For Perl
The AIX C compiler vac version 5.0.0.0 may produce buggy code, resulting in a few random tests failing when run as part of "make test", but when the failing tests are run by hand, they succeed. We suggest upgrading to at least vac version 5.0.1.0, that has been known to compile Perl correctly. "lslpp -L|grep vac.C" will tell you the vac version. See README.aix.
If building threaded Perl, you may get compilation warning from pp_sys.c:
- "pp_sys.c", line 4651.39: 1506-280 (W) Function argument assignment between types "unsigned char*" and "const void*" is not allowed.
This is harmless; it is caused by the getnetbyaddr() and getnetbyaddr_r() having slightly different types for their first argument.
If you see op/pack, op/pat, op/regexp, or ext/Storable tests failing in a Linux/alpha or *BSD/Alpha, it's probably time to upgrade your gcc. gccs prior to 2.95.3 are definitely not good enough, and gcc 3.1 may be even better. (RedHat Linux/alpha with gcc 3.1 reported no problems, as did Linux 2.4.18 with gcc 2.95.4.) (In Tru64, it is preferable to use the bundled C compiler.)
Perl 5.8.0 doesn't build in AmigaOS. It broke at some point during the ithreads work and we could not find Amiga experts to unbreak the problems. Perl 5.6.1 still works for AmigaOS (as does the 5.7.2 development release).
The following tests fail on 5.8.0 Perl in BeOS Personal 5.03:
- t/op/lfs............................FAILED at test 17
- t/op/magic..........................FAILED at test 24
- ext/Fcntl/t/syslfs..................FAILED at test 17
- ext/File/Glob/t/basic...............FAILED at test 3
- ext/POSIX/t/sigaction...............FAILED at test 13
- ext/POSIX/t/waitpid.................FAILED at test 1
(Note: more information was available in README.beos until support for BeOS was removed in Perl v5.18.0)
For example when building the Tk extension for Cygwin, you may get an error message saying "unable to remap". This is known problem with Cygwin, and a workaround is detailed in here: http://sources.redhat.com/ml/cygwin/2001-12/msg00894.html
One can build but not install (or test the build of) the NDBM_File on FAT filesystems. Installation (or build) on NTFS works fine. If one attempts the test on a FAT install (or build) the following failures are expected:
- ../ext/NDBM_File/ndbm.t 13 3328 71 59 83.10% 1-2 4 16-71
- ../ext/ODBM_File/odbm.t 255 65280 ?? ?? % ??
- ../lib/AnyDBM_File.t 2 512 12 2 16.67% 1 4
- ../lib/Memoize/t/errors.t 0 139 11 5 45.45% 7-11
- ../lib/Memoize/t/tie_ndbm.t 13 3328 4 4 100.00% 1-4
- run/fresh_perl.t 97 1 1.03% 91
NDBM_File fails and ODBM_File just coredumps.
If you intend to run only on FAT (or if using AnyDBM_File on FAT), run Configure with the -Ui_ndbm and -Ui_dbm options to prevent NDBM_File and ODBM_File being built.
- t/op/stat............................FAILED at test 29
- lib/File/Find/t/find.................FAILED at test 1
- lib/File/Find/t/taint................FAILED at test 1
- lib/h2xs.............................FAILED at test 15
- lib/Pod/t/eol........................FAILED at test 1
- lib/Test/Harness/t/strap-analyze.....FAILED at test 8
- lib/Test/Harness/t/test-harness......FAILED at test 23
- lib/Test/Simple/t/exit...............FAILED at test 1
The above failures are known as of 5.8.0 with native builds with long filenames, but there are a few more if running under dosemu because of limitations (and maybe bugs) of dosemu:
- t/comp/cpp...........................FAILED at test 3
- t/op/inccode.........................(crash)
and a few lib/ExtUtils tests, and several hundred Encode/t/Aliases.t failures that work fine with long filenames. So you really might prefer native builds and long filenames.
This is a known bug in FreeBSD 4.5's readdir_r(), it has been fixed in FreeBSD 4.6 (see perlfreebsd (README.freebsd)).
The ISO 8859-15 locales may fail the locale test 117 in FreeBSD. This is caused by the characters \xFF (y with diaeresis) and \xBE (Y with diaeresis) not behaving correctly when being matched case-insensitively. Apparently this problem has been fixed in the latest FreeBSD releases. ( http://www.freebsd.org/cgi/query-pr.cgi?pr=34308 )
IRIX with MIPSpro 7.3.1.2m or 7.3.1.3m compiler may fail the List::Util test ext/List/Util/t/shuffle.t by dumping core. This seems to be a compiler error since if compiled with gcc no core dump ensues, and no failures have been seen on the said test on any other platform.
Similarly, building the Digest::MD5 extension has been known to fail with "*** Termination code 139 (bu21)".
The cure is to drop optimization level (Configure -Doptimize=-O2).
If perl is configured with -Duse64bitall, the successful result of the subtest 10 of lib/posix may arrive before the successful result of the subtest 9, which confuses the test harness so much that it thinks the subtest 9 failed.
This is a known bug in the glibc 2.2.5 with long long integers. ( http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=65612 )
No known fix.
Please remember to set your environment variable LC_ALL to "C" (setenv LC_ALL C) before running "make test" to avoid a lot of warnings about the broken locales of Mac OS X.
The following tests are known to fail in Mac OS X 10.1.5 because of buggy (old) implementations of Berkeley DB included in Mac OS X:
- Failed Test Stat Wstat Total Fail Failed List of Failed
- -------------------------------------------------------------------------
- ../ext/DB_File/t/db-btree.t 0 11 ?? ?? % ??
- ../ext/DB_File/t/db-recno.t 149 3 2.01% 61 63 65
If you are building on a UFS partition, you will also probably see t/op/stat.t subtest #9 fail. This is caused by Darwin's UFS not supporting inode change time.
Also the ext/POSIX/t/posix.t subtest #10 fails but it is skipped for now because the failure is Apple's fault, not Perl's (blocked signals are lost).
If you Configure with ithreads, ext/threads/t/libc.t will fail. Again, this is not Perl's fault-- the libc of Mac OS X is not threadsafe (in this particular test, the localtime() call is found to be threadunsafe.)
If after installing Perl 5.8.0 you are getting warnings about missing symbols, for example
- dyld: perl Undefined symbols
- _perl_sv_2pv
- _perl_get_sv
you probably have an old pre-Perl-5.8.0 installation (or parts of one) in /Library/Perl (the undefined symbols used to exist in pre-5.8.0 Perls). It seems that for some reason "make install" doesn't always completely overwrite the files in /Library/Perl. You can move the old Perl shared library out of the way like this:
- cd /Library/Perl/darwin/CORE
- mv libperl.dylib libperlold.dylib
and then reissue "make install". Note that the above of course is extremely disruptive for anything using the /usr/local/bin/perl. If that doesn't help, you may have to try removing all the .bundle files from beneath /Library/Perl, and again "make install"-ing.
The following tests are known to fail on OS/2 (for clarity only the failures are shown, not the full error messages):
- ../lib/ExtUtils/t/Mkbootstrap.t 1 256 18 1 5.56% 8
- ../lib/ExtUtils/t/Packlist.t 1 256 34 1 2.94% 17
- ../lib/ExtUtils/t/basic.t 1 256 17 1 5.88% 14
- lib/os2_process.t 2 512 227 2 0.88% 174 209
- lib/os2_process_kid.t 227 2 0.88% 174 209
- lib/rx_cmprt.t 255 65280 18 3 16.67% 16-18
The op/sprintf tests 91, 129, and 130 are known to fail on some platforms. Examples include any platform using sfio, and Compaq/Tandem's NonStop-UX.
Test 91 is known to fail on QNX6 (nto), because sprintf '%e',0
incorrectly produces 0.000000e+0
instead of 0.000000e+00
.
For tests 129 and 130, the failing platforms do not comply with the ANSI C Standard: lines 19ff on page 134 of ANSI X3.159 1989, to be exact. (They produce something other than "1" and "-1" when formatting 0.6 and -0.6 using the printf format "%.0f"; most often, they produce "0" and "-0".)
The socketpair tests are known to be unhappy in SCO 3.2v5.0.4:
- ext/Socket/socketpair.t...............FAILED tests 15-45
In case you are still using Solaris 2.5 (aka SunOS 5.5), you may experience failures (the test core dumping) in lib/locale.t. The suggested cure is to upgrade your Solaris.
The following tests are known to fail in Solaris x86 with Perl configured to use 64 bit integers:
- ext/Data/Dumper/t/dumper.............FAILED at test 268
- ext/Devel/Peek/Peek..................FAILED at test 7
The following tests are known to fail on SUPER-UX:
- op/64bitint...........................FAILED tests 29-30, 32-33, 35-36
- op/arith..............................FAILED tests 128-130
- op/pack...............................FAILED tests 25-5625
- op/pow................................
- op/taint..............................# msgsnd failed
- ../ext/IO/lib/IO/t/io_poll............FAILED tests 3-4
- ../ext/IPC/SysV/ipcsysv...............FAILED tests 2, 5-6
- ../ext/IPC/SysV/t/msg.................FAILED tests 2, 4-6
- ../ext/Socket/socketpair..............FAILED tests 12
- ../lib/IPC/SysV.......................FAILED tests 2, 5-6
- ../lib/warnings.......................FAILED tests 115-116, 118-119
The op/pack failure ("Cannot compress negative numbers at op/pack.t line 126") is serious but as of yet unsolved. It points at some problems with the signedness handling of the C compiler, as do the 64bitint, arith, and pow failures. Most of the rest point at problems with SysV IPC.
Use Term::ReadKey 2.20 or later.
During Configure, the test
- Guessing which symbols your C compiler and preprocessor define...
will probably fail with error messages like
- CC-20 cc: ERROR File = try.c, Line = 3
- The identifier "bad" is undefined.
- bad switch yylook 79bad switch yylook 79bad switch yylook 79bad switch yylook 79#ifdef A29K
- ^
- CC-65 cc: ERROR File = try.c, Line = 3
- A semicolon is expected at this point.
This is caused by a bug in the awk utility of UNICOS/mk. You can ignore the error, but it does cause a slight problem: you cannot fully benefit from the h2ph utility (see h2ph) that can be used to convert C headers to Perl libraries, mainly used to be able to access from Perl the constants defined using C preprocessor, cpp. Because of the above error, parts of the converted headers will be invisible. Luckily, these days the need for h2ph is rare.
If building Perl with interpreter threads (ithreads), the getgrent(), getgrnam(), and getgrgid() functions cannot return the list of the group members due to a bug in the multithreaded support of UNICOS/mk. What this means is that in list context the functions will return only three values, not four.
There are a few known test failures. (Note: the relevant information was available in README.uts until support for UTS was removed in Perl v5.18.0)
When Perl is built using the native build process on VOS Release 14.5.0 and GNU C++/GNU Tools 2.0.1, all attempted tests either pass or result in TODO (ignored) failures.
There should be no reported test failures with a default configuration, though there are a number of tests marked TODO that point to areas needing further debugging and/or porting work.
In multi-CPU boxes, there are some problems with the I/O buffering: some output may appear twice.
Use XML::Parser 2.31 or later.
z/OS has rather many test failures but the situation is actually much better than it was in 5.6.0; it's just that so many new modules and tests have been added.
- Failed Test Stat Wstat Total Fail Failed List of Failed
- ---------------------------------------------------------------------------
- ../ext/Data/Dumper/t/dumper.t 357 8 2.24% 311 314 325 327
- 331 333 337 339
- ../ext/IO/lib/IO/t/io_unix.t 5 4 80.00% 2-5
- ../ext/Storable/t/downgrade.t 12 3072 169 12 7.10% 14-15 46-47 78-79
- 110-111 150 161
- ../lib/ExtUtils/t/Constant.t 121 30976 48 48 100.00% 1-48
- ../lib/ExtUtils/t/Embed.t 9 9 100.00% 1-9
- op/pat.t 922 7 0.76% 665 776 785 832-
- 834 845
- op/sprintf.t 224 3 1.34% 98 100 136
- op/tr.t 97 5 5.15% 63 71-74
- uni/fold.t 780 6 0.77% 61 169 196 661
- 710-711
The failures in dumper.t and downgrade.t are problems in the tests, those in io_unix and sprintf are problems in the USS (UDP sockets and printf formats). The pat, tr, and fold failures are genuine Perl problems caused by EBCDIC (and in the pat and fold cases, combining that with Unicode). The Constant and Embed are probably problems in the tests (since they test Perl's ability to build extensions, and that seems to be working reasonably well.)
Though mostly working, Unicode support still has problem spots on
EBCDIC platforms. One such known spot are the \p{}
and \P{}
regular expression constructs for code points less than 256: the
pP
are testing for Unicode code points, not knowing about EBCDIC.
Time::Piece
(previously known as Time::Object
) was removed
because it was felt that it didn't have enough value in it to be a
core module. It is still a useful module, though, and is available
from the CPAN.
Perl 5.8 unfortunately does not build anymore on AmigaOS; this broke accidentally at some point. Since there are not that many Amiga developers available, we could not get this fixed and tested in time for 5.8.0. Perl 5.6.1 still works for AmigaOS (as does the 5.7.2 development release).
The PerlIO::Scalar
and PerlIO::Via
(capitalised) were renamed as
PerlIO::scalar
and PerlIO::via
(all lowercase) just before 5.8.0.
The main rationale was to have all core PerlIO layers to have all
lowercase names. The "plugins" are named as usual, for example
PerlIO::via::QuotedPrint
.
The threads::shared::queue
and threads::shared::semaphore
were
renamed as Thread::Queue
and Thread::Semaphore
just before 5.8.0.
The main rationale was to have thread modules to obey normal naming,
Thread::
(the threads
and threads::shared
themselves are
more pragma-like, they affect compile-time, so they stay lowercase).
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://bugs.perl.org/ . There may also be information at http://www.perl.com/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug
program included with your release. Be sure to trim your bug down
to a tiny but sufficient test case. Your bug report, along with the
output of perl -V
, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.
The Changes file for exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
Written by Jarkko Hietaniemi <jhi@iki.fi>.
perlaix - Perl version 5 on IBM AIX (UNIX) systems
This document describes various features of IBM's UNIX operating system AIX that will affect how Perl version 5 (hereafter just Perl) is compiled and/or runs.
For information on compilers on older versions of AIX, see Compiling Perl 5 on older AIX versions up to 4.3.3.
When compiling Perl, you must use an ANSI C compiler. AIX does not ship an ANSI compliant C compiler with AIX by default, but binary builds of gcc for AIX are widely available. A version of gcc is also included in the AIX Toolbox which is shipped with AIX.
Currently all versions of IBM's "xlc", "xlc_r", "cc", "cc_r" or "vac" ANSI/C compiler will work for building Perl if that compiler works on your system.
If you plan to link Perl to any module that requires thread-support, like DBD::Oracle, it is better to use the _r version of the compiler. This will not build a threaded Perl, but a thread-enabled Perl. See also Threaded Perl later on.
As of writing (2010-09) only the IBM XL C for AIX or IBM XL C/C++ for AIX compiler is supported by IBM on AIX 5L/6.1/7.1.
The following compiler versions are currently supported by IBM:
- IBM XL C and IBM XL C/C++ V8, V9, V10, V11
The XL C for AIX is integrated in the XL C/C++ for AIX compiler and therefore also supported.
If you choose XL C/C++ V9 you need APAR IZ35785 installed otherwise the integrated SDBM_File do not compile correctly due to an optimization bug. You can circumvent this problem by adding -qipa to the optimization flags (-Doptimize='-O -qipa'). The PTF for APAR IZ35785 which solves this problem is available from IBM (April 2009 PTF for XL C/C++ Enterprise Edition for AIX, V9.0).
If you choose XL C/C++ V11 you need the April 2010 PTF (or newer) installed otherwise you will not get a working Perl version.
Perl can be compiled with either IBM's ANSI C compiler or with gcc. The former is recommended, as not only it can compile Perl with no difficulty, but also can take advantage of features listed later that require the use of IBM compiler-specific command-line flags.
If you decide to use gcc, make sure your installation is recent and complete, and be sure to read the Perl INSTALL file for more gcc-specific details. Please report any hoops you had to jump through to the development team.
If the AIX Toolbox version of lib gdbm < 1.8.3-5 is installed on your system then Perl will not work. This library contains the header files /opt/freeware/include/gdbm/dbm.h|ndbm.h which conflict with the AIX system versions. The lib gdbm will be automatically removed from the wanted libraries if the presence of one of these two header files is detected. If you want to build Perl with GDBM support then please install at least gdbm-devel-1.8.3-5 (or higher).
- Perl | AIX Level | Compiler Level | w th | w/o th
- -------+---------------------+-------------------------+------+-------
- 5.12.2 |5.1 TL9 32 bit | XL C/C++ V7 | OK | OK
- 5.12.2 |5.1 TL9 64 bit | XL C/C++ V7 | OK | OK
- 5.12.2 |5.2 TL10 SP8 32 bit | XL C/C++ V8 | OK | OK
- 5.12.2 |5.2 TL10 SP8 32 bit | gcc 3.2.2 | OK | OK
- 5.12.2 |5.2 TL10 SP8 64 bit | XL C/C++ V8 | OK | OK
- 5.12.2 |5.3 TL8 SP8 32 bit | XL C/C++ V9 + IZ35785 | OK | OK
- 5.12.2 |5.3 TL8 SP8 32 bit | gcc 4.2.4 | OK | OK
- 5.12.2 |5.3 TL8 SP8 64 bit | XL C/C++ V9 + IZ35785 | OK | OK
- 5.12.2 |5.3 TL10 SP3 32 bit | XL C/C++ V11 + Apr 2010 | OK | OK
- 5.12.2 |5.3 TL10 SP3 64 bit | XL C/C++ V11 + Apr 2010 | OK | OK
- 5.12.2 |6.1 TL1 SP7 32 bit | XL C/C++ V10 | OK | OK
- 5.12.2 |6.1 TL1 SP7 64 bit | XL C/C++ V10 | OK | OK
- 5.13 |7.1 TL0 SP1 32 bit | XL C/C++ V11 + Jul 2010 | OK | OK
- 5.13 |7.1 TL0 SP1 64 bit | XL C/C++ V11 + Jul 2010 | OK | OK
- w th = with thread support
- w/o th = without thread support
- OK = tested
Successfully tested means that all "make test" runs finish with a result of 100% OK. All tests were conducted with -Duseshrplib set.
All tests were conducted on the oldest supported AIX technology level with the latest support package applied. If the tested AIX version is out of support (AIX 4.3.3, 5.1, 5.2) then the last available support level was used.
Starting from Perl 5.7.2 (and consequently 5.8.x / 5.10.x / 5.12.x) and AIX 4.3 or newer Perl uses the AIX native dynamic loading interface in the so called runtime linking mode instead of the emulated interface that was used in Perl releases 5.6.1 and earlier or, for AIX releases 4.2 and earlier. This change does break backward compatibility with compiled modules from earlier Perl releases. The change was made to make Perl more compliant with other applications like Apache/mod_perl which are using the AIX native interface. This change also enables the use of C++ code with static constructors and destructors in Perl extensions, which was not possible using the emulated interface.
It is highly recommended to use the new interface.
Should yield no problems.
Should yield no problems with AIX 5.1 / 5.2 / 5.3 / 6.1 / 7.1.
IBM uses the AIX system Perl (V5.6.0 on AIX 5.1 and V5.8.2 on AIX 5.2 / 5.3 and 6.1; V5.8.8 on AIX 5.3 TL11 and AIX 6.1 TL4; V5.10.1 on AIX 7.1) for some AIX system scripts. If you switch the links in /usr/bin from the AIX system Perl (/usr/opt/perl5) to the newly build Perl then you get the same features as with the IBM AIX system Perl if the threaded options are used.
The threaded Perl build works also on AIX 5.1 but the IBM Perl build (Perl v5.6.0) is not threaded on AIX 5.1.
Perl 5.12 an newer is not compatible with the IBM fileset perl.libext.
If your AIX system is installed with 64-bit support, you can expect 64-bit configurations to work. If you want to use 64-bit Perl on AIX 6.1 you need an APAR for a libc.a bug which affects (n)dbm_XXX functions. The APAR number for this problem is IZ39077.
If you need more memory (larger data segment) for your Perl programs you can set:
- /etc/security/limits
- default: (or your user)
- data = -1 (default is 262144 * 512 byte)
With the default setting the size is limited to 128MB. The -1 removes this limit. If the "make test" fails please change your /etc/security/limits as stated above.
With the following options you get a threaded Perl version which passes all make tests in threaded 32-bit mode, which is the default configuration for the Perl builds that AIX ships with.
- rm config.sh
- ./Configure \
- -d \
- -Dcc=cc_r \
- -Duseshrplib \
- -Dusethreads \
- -Dprefix=/usr/opt/perl5_32
The -Dprefix option will install Perl in a directory parallel to the IBM AIX system Perl installation.
With the following options you get a Perl version which passes all make tests in 32-bit mode.
- rm config.sh
- ./Configure \
- -d \
- -Dcc=cc_r \
- -Duseshrplib \
- -Dprefix=/usr/opt/perl5_32
The -Dprefix option will install Perl in a directory parallel to the IBM AIX system Perl installation.
With the following options you get a threaded Perl version which passes all make tests in 64-bit mode.
- export OBJECT_MODE=64 / setenv OBJECT_MODE 64 (depending on your shell)
- rm config.sh
- ./Configure \
- -d \
- -Dcc=cc_r \
- -Duseshrplib \
- -Dusethreads \
- -Duse64bitall \
- -Dprefix=/usr/opt/perl5_64
With the following options you get a Perl version which passes all make tests in 64-bit mode.
- export OBJECT_MODE=64 / setenv OBJECT_MODE 64 (depending on your shell)
- rm config.sh
- ./Configure \
- -d \
- -Dcc=cc_r \
- -Duseshrplib \
- -Duse64bitall \
- -Dprefix=/usr/opt/perl5_64
The -Dprefix option will install Perl in a directory parallel to the IBM AIX system Perl installation.
If you choose gcc to compile 64-bit Perl then you need to add the following option:
- -Dcc='gcc -maix64'
A regression in AIX 7 causes a failure in make test in Time::Piece during
daylight savings time. APAR IV16514 provides the fix for this. A quick
test to see if it's required, assuming it is currently daylight savings
in Eastern Time, would be to run TZ=EST5 date +%Z
. This will come
back with EST
normally, but nothing if you have the problem.
Due to the fact that AIX 4.3.3 reached end-of-service in December 31, 2003 this information is provided as is. The Perl versions prior to Perl 5.8.9 could be compiled on AIX up to 4.3.3 with the following settings (your mileage may vary):
When compiling Perl, you must use an ANSI C compiler. AIX does not ship an ANSI compliant C-compiler with AIX by default, but binary builds of gcc for AIX are widely available.
At the moment of writing, AIX supports two different native C compilers, for which you have to pay: xlC and vac. If you decide to use either of these two (which is quite a lot easier than using gcc), be sure to upgrade to the latest available patch level. Currently:
- xlC.C 3.1.4.10 or 3.6.6.0 or 4.0.2.2 or 5.0.2.9 or 6.0.0.3
- vac.C 4.4.0.3 or 5.0.2.6 or 6.0.0.1
note that xlC has the OS version in the name as of version 4.0.2.0, so you will find xlC.C for AIX-5.0 as package
- xlC.aix50.rte 5.0.2.0 or 6.0.0.3
subversions are not the same "latest" on all OS versions. For example, the latest xlC-5 on aix41 is 5.0.2.9, while on aix43, it is 5.0.2.7.
Perl can be compiled with either IBM's ANSI C compiler or with gcc. The former is recommended, as not only can it compile Perl with no difficulty, but also can take advantage of features listed later that require the use of IBM compiler-specific command-line flags.
The IBM's compiler patch levels 5.0.0.0 and 5.0.1.0 have compiler optimization bugs that affect compiling perl.c and regcomp.c, respectively. If Perl's configuration detects those compiler patch levels, optimization is turned off for the said source code files. Upgrading to at least 5.0.2.0 is recommended.
If you decide to use gcc, make sure your installation is recent and complete, and be sure to read the Perl INSTALL file for more gcc-specific details. Please report any hoops you had to jump through to the development team.
Before installing the patches to the IBM C-compiler you need to know the level of patching for the Operating System. IBM's command 'oslevel' will show the base, but is not always complete (in this example oslevel shows 4.3.NULL, whereas the system might run most of 4.3.THREE):
- # oslevel
- 4.3.0.0
- # lslpp -l | grep 'bos.rte '
- bos.rte 4.3.3.75 COMMITTED Base Operating System Runtime
- bos.rte 4.3.2.0 COMMITTED Base Operating System Runtime
- #
The same might happen to AIX 5.1 or other OS levels. As a side note, Perl cannot be built without bos.adt.syscalls and bos.adt.libm installed
- # lslpp -l | egrep "syscalls|libm"
- bos.adt.libm 5.1.0.25 COMMITTED Base Application Development
- bos.adt.syscalls 5.1.0.36 COMMITTED System Calls Application
- #
AIX supports dynamically loadable objects as well as shared libraries. Shared libraries by convention end with the suffix .a, which is a bit misleading, as an archive can contain static as well as dynamic members. For Perl dynamically loaded objects we use the .so suffix also used on many other platforms.
Note that starting from Perl 5.7.2 (and consequently 5.8.0) and AIX 4.3 or newer Perl uses the AIX native dynamic loading interface in the so called runtime linking mode instead of the emulated interface that was used in Perl releases 5.6.1 and earlier or, for AIX releases 4.2 and earlier. This change does break backward compatibility with compiled modules from earlier Perl releases. The change was made to make Perl more compliant with other applications like Apache/mod_perl which are using the AIX native interface. This change also enables the use of C++ code with static constructors and destructors in Perl extensions, which was not possible using the emulated interface.
All defaults for Configure can be used.
If you've chosen to use vac 4, be sure to run 4.4.0.3. Older versions will turn up nasty later on. For vac 5 be sure to run at least 5.0.1.0, but vac 5.0.2.6 or up is highly recommended. Note that since IBM has removed vac 5.0.2.1 through 5.0.2.5 from the software depot, these versions should be considered obsolete.
Here's a brief lead of how to upgrade the compiler to the latest level. Of course this is subject to changes. You can only upgrade versions from ftp-available updates if the first three digit groups are the same (in where you can skip intermediate unlike the patches in the developer snapshots of Perl), or to one version up where the "base" is available. In other words, the AIX compiler patches are cumulative.
- vac.C.4.4.0.1 => vac.C.4.4.0.3 is OK (vac.C.4.4.0.2 not needed)
- xlC.C.3.1.3.3 => xlC.C.3.1.4.10 is NOT OK (xlC.C.3.1.4.0 is not available)
- # ftp ftp.software.ibm.com
- Connected to service.boulder.ibm.com.
- : welcome message ...
- Name (ftp.software.ibm.com:merijn): anonymous
- 331 Guest login ok, send your complete e-mail address as password.
- Password:
- ... accepted login stuff
- ftp> cd /aix/fixes/v4/
- ftp> dir other other.ll
- output to local-file: other.ll? y
- 200 PORT command successful.
- 150 Opening ASCII mode data connection for /bin/ls.
- 226 Transfer complete.
- ftp> dir xlc xlc.ll
- output to local-file: xlc.ll? y
- 200 PORT command successful.
- 150 Opening ASCII mode data connection for /bin/ls.
- 226 Transfer complete.
- ftp> bye
- ... goodbye messages
- # ls -l *.ll
- -rw-rw-rw- 1 merijn system 1169432 Nov 2 17:29 other.ll
- -rw-rw-rw- 1 merijn system 29170 Nov 2 17:29 xlc.ll
On AIX 4.2 using xlC, we continue:
- # lslpp -l | fgrep 'xlC.C '
- xlC.C 3.1.4.9 COMMITTED C for AIX Compiler
- xlC.C 3.1.4.0 COMMITTED C for AIX Compiler
- # grep 'xlC.C.3.1.4.*.bff' xlc.ll
- -rw-r--r-- 1 45776101 1 6286336 Jul 22 1996 xlC.C.3.1.4.1.bff
- -rw-rw-r-- 1 45776101 1 6173696 Aug 24 1998 xlC.C.3.1.4.10.bff
- -rw-r--r-- 1 45776101 1 6319104 Aug 14 1996 xlC.C.3.1.4.2.bff
- -rw-r--r-- 1 45776101 1 6316032 Oct 21 1996 xlC.C.3.1.4.3.bff
- -rw-r--r-- 1 45776101 1 6315008 Dec 20 1996 xlC.C.3.1.4.4.bff
- -rw-rw-r-- 1 45776101 1 6178816 Mar 28 1997 xlC.C.3.1.4.5.bff
- -rw-rw-r-- 1 45776101 1 6188032 May 22 1997 xlC.C.3.1.4.6.bff
- -rw-rw-r-- 1 45776101 1 6191104 Sep 5 1997 xlC.C.3.1.4.7.bff
- -rw-rw-r-- 1 45776101 1 6185984 Jan 13 1998 xlC.C.3.1.4.8.bff
- -rw-rw-r-- 1 45776101 1 6169600 May 27 1998 xlC.C.3.1.4.9.bff
- # wget ftp://ftp.software.ibm.com/aix/fixes/v4/xlc/xlC.C.3.1.4.10.bff
- #
On AIX 4.3 using vac, we continue:
- # lslpp -l | grep 'vac.C '
- vac.C 5.0.2.2 COMMITTED C for AIX Compiler
- vac.C 5.0.2.0 COMMITTED C for AIX Compiler
- # grep 'vac.C.5.0.2.*.bff' other.ll
- -rw-rw-r-- 1 45776101 1 13592576 Apr 16 2001 vac.C.5.0.2.0.bff
- -rw-rw-r-- 1 45776101 1 14133248 Apr 9 2002 vac.C.5.0.2.3.bff
- -rw-rw-r-- 1 45776101 1 14173184 May 20 2002 vac.C.5.0.2.4.bff
- -rw-rw-r-- 1 45776101 1 14192640 Nov 22 2002 vac.C.5.0.2.6.bff
- # wget ftp://ftp.software.ibm.com/aix/fixes/v4/other/vac.C.5.0.2.6.bff
- #
Likewise on all other OS levels. Then execute the following command, and fill in its choices
- # smit install_update
- -> Install and Update from LATEST Available Software
- * INPUT device / directory for software [ vac.C.5.0.2.6.bff ]
- [ OK ]
- [ OK ]
Follow the messages ... and you're done.
If you like a more web-like approach, a good start point can be http://www14.software.ibm.com/webapp/download/downloadaz.jsp and click "C for AIX", and follow the instructions.
If linking miniperl
- cc -o miniperl ... miniperlmain.o opmini.o perl.o ... -lm -lc ...
causes error like this
- ld: 0711-317 ERROR: Undefined symbol: .aintl
- ld: 0711-317 ERROR: Undefined symbol: .copysignl
- ld: 0711-317 ERROR: Undefined symbol: .syscall
- ld: 0711-317 ERROR: Undefined symbol: .eaccess
- ld: 0711-317 ERROR: Undefined symbol: .setresuid
- ld: 0711-317 ERROR: Undefined symbol: .setresgid
- ld: 0711-317 ERROR: Undefined symbol: .setproctitle
- ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
you could retry with
- make realclean
- rm config.sh
- ./Configure -Dusenm ...
which makes Configure to use the nm
tool when scanning for library
symbols, which usually is not done in AIX.
Related to this, you probably should not use the -r
option of
Configure in AIX, because that affects of how the nm
tool is used.
Using gcc-3.x (tested with 3.0.4, 3.1, and 3.2) now works out of the box, as do recent gcc-2.9 builds available directly from IBM as part of their Linux compatibility packages, available here:
- http://www.ibm.com/servers/aix/products/aixos/linux/
Should yield no problems.
Threads seem to work OK, though at the moment not all tests pass when threads are used in combination with 64-bit configurations.
You may get a warning when doing a threaded build:
- "pp_sys.c", line 4640.39: 1506-280 (W) Function argument assignment
- between types "unsigned char*" and "const void*" is not allowed.
The exact line number may vary, but if the warning (W) comes from a line line this
- hent = PerlSock_gethostbyaddr(addr, (Netdb_hlen_t) addrlen, addrtype);
in the "pp_ghostent" function, you may ignore it safely. The warning is caused by the reentrant variant of gethostbyaddr() having a slightly different prototype than its non-reentrant variant, but the difference is not really significant here.
If your AIX is installed with 64-bit support, you can expect 64-bit configurations to work. In combination with threads some tests might still fail.
In AIX 4.2 Perl extensions that use C++ functions that use statics may have problems in that the statics are not getting initialized. In newer AIX releases this has been solved by linking Perl with the libC_r library, but unfortunately in AIX 4.2 the said library has an obscure bug where the various functions related to time (such as time() and gettimeofday()) return broken values, and therefore in AIX 4.2 Perl is not linked against the libC_r.
Rainer Tammer <tammer@tammer.net>
perlamiga - Perl under Amiga OS
Perl 5.8.0 cannot be built in AmigaOS. You can use either the maintenance release Perl 5.6.1 or the development release Perl 5.7.2 in AmigaOS. See PERL 5.8.0 BROKEN IN AMIGAOS if you want to help fixing this problem.
One can read this document in the following formats:
- man perlamiga
- multiview perlamiga.guide
to list some (not all may be available simultaneously), or it may be read as is: either as README.amiga, or pod/perlamiga.pod.
A recent version of perl for the Amiga can be found at the Geek Gadgets section of the Aminet:
- http://www.aminet.net/~aminet/dev/gg
You need the Unix emulation for AmigaOS, whose most important part is ixemul.library. For a minimum setup, get the latest versions of the following packages from the Aminet archives ( http://www.aminet.net/~aminet/ ):
- ixemul-bin
- ixemul-env-bin
- pdksh-bin
Note also that this is a minimum setup; you might want to add other packages of ADE (the Amiga Developers Environment).
You need at the very least AmigaOS version 2.0. Recommended is version 3.1.
Start your Perl program foo with arguments arg1 arg2 arg3
the
same way as on any other platform, by
- perl foo arg1 arg2 arg3
If you want to specify perl options -my_opts
to the perl itself (as
opposed to your program), use
- perl -my_opts foo arg1 arg2 arg3
Alternately, you can try to get a replacement for the system's Execute command that honors the #!/usr/bin/perl syntax in scripts and set the s-Bit of your scripts. Then you can invoke your scripts like under UNIX with
- foo arg1 arg2 arg3
(Note that having *nixish full path to perl /usr/bin/perl is not necessary, perl would be enough, but having full path would make it easier to use your script under *nix.)
Perl under AmigaOS lacks some features of perl under UNIX because of deficiencies in the UNIX-emulation, most notably:
fork()
some features of the UNIX filesystem regarding link count and file dates
inplace operation (the -i switch) without backup file
umask() works, but the correct permissions are only set when the file is finally close()d
Change to the installation directory (most probably ADE:), and extract the binary distribution:
lha -mraxe x perl-$VERSION-bin.lha
or
tar xvzpf perl-$VERSION-bin.tgz
(Of course you need lha or tar and gunzip for this.)
For installation of the Unix emulation, read the appropriate docs.
If you have man
installed on your system, and you installed perl
manpages, use something like this:
- man perlfunc
- man less
- man ExtUtils.MakeMaker
to access documentation for different components of Perl. Start with
- man perl
Note: You have to modify your man.conf file to search for manpages in the /ade/lib/perl5/man/man3 directory, or the man pages for the perl library will not be found.
Note that dot (.) is used as a package separator for documentation
for packages, and as usual, sometimes you need to give the section - 3
above - to avoid shadowing by the less(1) manpage.
If you have some WWW browser available, you can build HTML docs. Cd to directory with .pod files, and do like this
- cd /ade/lib/perl5/pod
- pod2html
After this you can direct your browser the file perl.html in this directory, and go ahead with reading docs.
Alternatively you may be able to get these docs prebuilt from CPAN
.
Users of Emacs
would appreciate it very much, especially with
CPerl
mode loaded. You need to get latest pod2info
from CPAN
,
or, alternately, prebuilt info pages.
Can be constructed using pod2latex
.
Here we discuss how to build Perl under AmigaOS.
You need to have the latest ixemul (Unix emulation for Amiga) from Aminet.
You can either get the latest perl-for-amiga source from Ninemoons and extract it with:
- tar xvzpf perl-$VERSION-src.tgz
or get the official source from CPAN:
- http://www.cpan.org/src/5.0
Extract it like this
- tar xvzpf perl-$VERSION.tar.gz
You will see a message about errors while extracting Configure. This is normal and expected. (There is a conflict with a similarly-named file configure, but it causes no harm.)
Remember to use a hefty wad of stack (I use 2000000)
- sh configure.gnu --prefix=/gg
Now type
- make depend
Now!
- make
Now run
- make test
Some tests will be skipped because they need the fork() function:
io/pipe.t, op/fork.t, lib/filehand.t, lib/open2.t, lib/open3.t, lib/io_pipe.t, lib/io_sock.t
Run
- make install
As told above, Perl 5.6.1 was still good in AmigaOS, as was 5.7.2. After Perl 5.7.2 (change #11423, see the Changes file, and the file pod/perlhack.pod for how to get the individual changes) Perl dropped its internal support for vfork(), and that was very probably the step that broke AmigaOS (since the ixemul library has only vfork). The build finally fails when the ext/DynaLoader is being built, and PERL ends up as "0" in the produced Makefile, trying to run "0" does not quite work. Also, executing miniperl in backticks seems to generate nothing: very probably related to the (v)fork problems. Fixing the breakage requires someone quite familiar with the ixemul library, and how one is supposed to run external commands in AmigaOS without fork().
Norbert Pueschel, pueschel@imsdd.meb.uni-bonn.de Jan-Erik Karlsson, trg@privat.utfors.se
perl(1).
perlapi - autogenerated documentation for the perl public API
This file contains the documentation of the perl public API generated by embed.pl, specifically a listing of functions, macros, flags, and variables that may be used by extension writers. At the end is a list of functions which have yet to be documented. The interfaces of those are subject to change without notice. Any functions not listed here are not part of the public API, and should not be used by extension writers at all. For these reasons, blindly using functions listed in proto.h is to be avoided when writing extensions.
Note that all Perl API global variables must be referenced with the PL_
prefix. Some macros are provided for compatibility with the older,
unadorned names, but this support may be disabled in a future release.
Perl was originally written to handle US-ASCII only (that is characters whose ordinal numbers are in the range 0 - 127). And documentation and comments may still use the term ASCII, when sometimes in fact the entire range from 0 - 255 is meant.
Note that Perl can be compiled and run under EBCDIC (See perlebcdic)
or ASCII. Most of the documentation (and even comments in the code)
ignore the EBCDIC possibility.
For almost all purposes the differences are transparent.
As an example, under EBCDIC,
instead of UTF-8, UTF-EBCDIC is used to encode Unicode strings, and so
whenever this documentation refers to utf8
(and variants of that name, including in function names),
it also (essentially transparently) means UTF-EBCDIC
.
But the ordinals of characters differ between ASCII, EBCDIC, and
the UTF- encodings, and a string encoded in UTF-EBCDIC may occupy more bytes
than in UTF-8.
The listing below is alphabetical, case insensitive.
A backward-compatible version of GIMME_V
which can only return
G_SCALAR
or G_ARRAY
; in a void context, it returns G_SCALAR
.
Deprecated. Use GIMME_V
instead.
- U32 GIMME
The XSUB-writer's equivalent to Perl's wantarray. Returns G_VOID
,
G_SCALAR
or G_ARRAY
for void, scalar or list context,
respectively. See perlcall for a usage example.
- U32 GIMME_V
Used to indicate list context. See GIMME_V
, GIMME
and
perlcall.
Indicates that arguments returned from a callback should be discarded. See perlcall.
Used to force a Perl eval wrapper around a callback. See
perlcall.
Indicates that no arguments are being sent to a callback. See perlcall.
Used to indicate scalar context. See GIMME_V
, GIMME
, and
perlcall.
Used to indicate void context. See GIMME_V
and perlcall.
Same as av_top_index()
. Deprecated, use av_top_index()
instead.
- int AvFILL(AV* av)
Clears an array, making it empty. Does not free the memory the av uses to store its list of scalars. If any destructors are triggered as a result, the av itself may be freed when this function returns.
Perl equivalent: @myarray = ();
.
- void av_clear(AV *av)
Push an SV onto the end of the array, creating the array if necessary. A small internal helper function to remove a commonly duplicated idiom.
NOTE: this function is experimental and may change or be removed without notice.
- void av_create_and_push(AV **const avp,
- SV *const val)
Unshifts an SV onto the beginning of the array, creating the array if necessary. A small internal helper function to remove a commonly duplicated idiom.
NOTE: this function is experimental and may change or be removed without notice.
- SV** av_create_and_unshift_one(AV **const avp,
- SV *const val)
Deletes the element indexed by key
from the array, makes the element mortal,
and returns it. If flags
equals G_DISCARD
, the element is freed and null
is returned. Perl equivalent: my $elem = delete($myarray[$idx]);
for the
non-G_DISCARD
version and a void-context delete($myarray[$idx]); for the
G_DISCARD
version.
- SV* av_delete(AV *av, I32 key, I32 flags)
Returns true if the element indexed by key
has been initialized.
This relies on the fact that uninitialized array elements are set to
&PL_sv_undef
.
Perl equivalent: exists($myarray[$key]).
- bool av_exists(AV *av, I32 key)
Pre-extend an array. The key
is the index to which the array should be
extended.
- void av_extend(AV *av, I32 key)
Returns the SV at the specified index in the array. The key
is the
index. If lval is true, you are guaranteed to get a real SV back (in case
it wasn't real before), which you can then modify. Check that the return
value is non-null before dereferencing it to a SV*
.
See Understanding the Magic of Tied Hashes and Arrays in perlguts for more information on how to use this function on tied arrays.
The rough perl equivalent is $myarray[$idx]
.
- SV** av_fetch(AV *av, I32 key, I32 lval)
Set the highest index in the array to the given number, equivalent to
Perl's $#array = $fill;
.
The number of elements in the an array will be fill + 1
after
av_fill() returns. If the array was previously shorter, then the
additional elements appended are set to PL_sv_undef
. If the array
was longer, then the excess elements are freed. av_fill(av, -1)
is
the same as av_clear(av)
.
- void av_fill(AV *av, I32 fill)
Same as av_top_index. Returns the highest index in the array. Note that the return value is +1 what its name implies it returns; and hence differs in meaning from what the similarly named sv_len returns.
- I32 av_len(AV *av)
Creates a new AV and populates it with a list of SVs. The SVs are copied into the array, so they may be freed after the call to av_make. The new AV will have a reference count of 1.
Perl equivalent: my @new_array = ($scalar1, $scalar2, $scalar3...);
- AV* av_make(I32 size, SV **strp)
Removes one SV from the end of the array, reducing its size by one and
returning the SV (transferring control of one reference count) to the
caller. Returns &PL_sv_undef
if the array is empty.
Perl equivalent: pop(@myarray);
- SV* av_pop(AV *av)
Pushes an SV onto the end of the array. The array will grow automatically to accommodate the addition. This takes ownership of one reference count.
Perl equivalent: push @myarray, $elem;
.
- void av_push(AV *av, SV *val)
Shifts an SV off the beginning of the
array. Returns &PL_sv_undef
if the
array is empty.
Perl equivalent: shift(@myarray);
- SV* av_shift(AV *av)
Stores an SV in an array. The array index is specified as key
. The
return value will be NULL if the operation failed or if the value did not
need to be actually stored within the array (as in the case of tied
arrays). Otherwise, it can be dereferenced
to get the SV*
that was stored
there (= val
)).
Note that the caller is responsible for suitably incrementing the reference
count of val
before the call, and decrementing it if the function
returned NULL.
Approximate Perl equivalent: $myarray[$key] = $val;
.
See Understanding the Magic of Tied Hashes and Arrays in perlguts for more information on how to use this function on tied arrays.
- SV** av_store(AV *av, I32 key, SV *val)
Same as av_top_index()
.
- int av_tindex(AV* av)
Returns the highest index in the array. The number of elements in the
array is av_top_index(av) + 1
. Returns -1 if the array is empty.
The Perl equivalent for this is $#myarray
.
(A slightly shorter form is av_tindex
.)
- I32 av_top_index(AV *av)
Undefines the array. Frees the memory used by the av to store its list of scalars. If any destructors are triggered as a result, the av itself may be freed.
- void av_undef(AV *av)
Unshift the given number of undef values onto the beginning of the
array. The array will grow automatically to accommodate the addition. You
must then use av_store
to assign values to these new elements.
Perl equivalent: unshift @myarray, ( (undef) x $n );
- void av_unshift(AV *av, I32 num)
Returns the AV of the specified Perl global or package array with the given
name (so it won't work on lexical variables). flags
are passed
to gv_fetchpv
. If GV_ADD
is set and the
Perl variable does not exist then it will be created. If flags
is zero
and the variable does not exist then NULL is returned.
Perl equivalent: @{"$name"}
.
NOTE: the perl_ form of this function is deprecated.
- AV* get_av(const char *name, I32 flags)
Creates a new AV. The reference count is set to 1.
Perl equivalent: my @array;
.
- AV* newAV()
Sort an array. Here is an example:
- sortsv(AvARRAY(av), av_top_index(av)+1, Perl_sv_cmp_locale);
Currently this always uses mergesort. See sortsv_flags for a more flexible routine.
- void sortsv(SV** array, size_t num_elts,
- SVCOMPARE_t cmp)
Sort an array, with various options.
- void sortsv_flags(SV** array, size_t num_elts,
- SVCOMPARE_t cmp, U32 flags)
Performs a callback to the specified named and package-scoped Perl subroutine
with argv
(a NULL-terminated array of strings) as arguments. See perlcall.
Approximate Perl equivalent: &{"$sub_name"}(@$argv)
.
NOTE: the perl_ form of this function is deprecated.
- I32 call_argv(const char* sub_name, I32 flags,
- char** argv)
Performs a callback to the specified Perl method. The blessed object must be on the stack. See perlcall.
NOTE: the perl_ form of this function is deprecated.
- I32 call_method(const char* methname, I32 flags)
Performs a callback to the specified Perl sub. See perlcall.
NOTE: the perl_ form of this function is deprecated.
- I32 call_pv(const char* sub_name, I32 flags)
Performs a callback to the Perl sub whose name is in the SV. See perlcall.
NOTE: the perl_ form of this function is deprecated.
- I32 call_sv(SV* sv, VOL I32 flags)
Opening bracket on a callback. See LEAVE
and perlcall.
- ENTER;
Tells Perl to eval the given string and return an SV* result.
NOTE: the perl_ form of this function is deprecated.
- SV* eval_pv(const char* p, I32 croak_on_error)
Tells Perl to eval the string in the SV. It supports the same flags
as call_sv
, with the obvious exception of G_EVAL. See perlcall.
NOTE: the perl_ form of this function is deprecated.
- I32 eval_sv(SV* sv, I32 flags)
Closing bracket for temporaries on a callback. See SAVETMPS
and
perlcall.
- FREETMPS;
Closing bracket on a callback. See ENTER
and perlcall.
- LEAVE;
Opening bracket for temporaries on a callback. See FREETMPS
and
perlcall.
- SAVETMPS;
Converts the specified character to lowercase, if possible; otherwise returns the input character itself.
- char toLOWER(char ch)
Converts the specified character to uppercase, if possible; otherwise returns the input character itself.
- char toUPPER(char ch)
This section is about functions (really macros) that classify characters
into types, such as punctuation versus alphabetic, etc. Most of these are
analogous to regular expression character classes. (See
POSIX Character Classes in perlrecharclass.) There are several variants for
each class. (Not all macros have all variants; each item below lists the
ones valid for it.) None are affected by use bytes
, and only the ones
with LC
in the name are affected by the current locale.
The base function, e.g., isALPHA()
, takes an octet (either a char
or a
U8
) as input and returns a boolean as to whether or not the character
represented by that octet is (or on non-ASCII platforms, corresponds to) an
ASCII character in the named class based on platform, Unicode, and Perl rules.
If the input is a number that doesn't fit in an octet, FALSE is returned.
Variant isFOO_A
(e.g., isALPHA_A()
) is identical to the base function
with no suffix "_A"
.
Variant isFOO_L1
imposes the Latin-1 (or EBCDIC equivlalent) character set
onto the platform. That is, the code points that are ASCII are unaffected,
since ASCII is a subset of Latin-1. But the non-ASCII code points are treated
as if they are Latin-1 characters. For example, isWORDCHAR_L1()
will return
true when called with the code point 0xDF, which is a word character in both
ASCII and EBCDIC (though it represent different characters in each).
Variant isFOO_uni
is like the isFOO_L1
variant, but accepts any UV code
point as input. If the code point is larger than 255, Unicode rules are used
to determine if it is in the character class. For example,
isWORDCHAR_uni(0x100)
returns TRUE, since 0x100 is LATIN CAPITAL LETTER A
WITH MACRON in Unicode, and is a word character.
Variant isFOO_utf8
is like isFOO_uni
, but the input is a pointer to a
(known to be well-formed) UTF-8 encoded string (U8*
or char*
). The
classification of just the first (possibly multi-byte) character in the string
is tested.
Variant isFOO_LC
is like the isFOO_A
and isFOO_L1
variants, but uses
the C library function that gives the named classification instead of
hard-coded rules. For example, isDIGIT_LC()
returns the result of calling
isdigit()
. This means that the result is based on the current locale, which
is what LC
in the name stands for. FALSE is always returned if the input
won't fit into an octet.
Variant isFOO_LC_uvchr
is like isFOO_LC
, but is defined on any UV. It
returns the same as isFOO_LC
for input code points less than 256, and
returns the hard-coded, not-affected-by-locale, Unicode results for larger ones.
Variant isFOO_LC_utf8
is like isFOO_LC_uvchr
, but the input is a pointer to a
(known to be well-formed) UTF-8 encoded string (U8*
or char*
). The
classification of just the first (possibly multi-byte) character in the string
is tested.
Returns a boolean indicating whether the specified character is an
alphabetic character, analogous to m/[[:alpha:]]/.
See the top of this section for an explanation of variants
isALPHA_A
, isALPHA_L1
, isALPHA_uni
, isALPHA_utf8
, isALPHA_LC
,
isALPHA_LC_uvchr
, and isALPHA_LC_utf8
.
- bool isALPHA(char ch)
Returns a boolean indicating whether the specified character is a either an
alphabetic character or decimal digit, analogous to m/[[:alnum:]]/.
See the top of this section for an explanation of variants
isALPHANUMERIC_A
, isALPHANUMERIC_L1
, isALPHANUMERIC_uni
,
isALPHANUMERIC_utf8
, isALPHANUMERIC_LC
, isALPHANUMERIC_LC_uvchr
, and
isALPHANUMERIC_LC_utf8
.
- bool isALPHANUMERIC(char ch)
Returns a boolean indicating whether the specified character is one of the 128
characters in the ASCII character set, analogous to m/[[:ascii:]]/.
On non-ASCII platforms, it returns TRUE iff this
character corresponds to an ASCII character. Variants isASCII_A()
and
isASCII_L1()
are identical to isASCII()
.
See the top of this section for an explanation of variants
isASCII_uni
, isASCII_utf8
, isASCII_LC
, isASCII_LC_uvchr
, and
isASCII_LC_utf8
. Note, however, that some platforms do not have the C
library routine isascii()
. In these cases, the variants whose names contain
LC
are the same as the corresponding ones without.
- bool isASCII(char ch)
Returns a boolean indicating whether the specified character is a
character considered to be a blank, analogous to m/[[:blank:]]/.
See the top of this section for an explanation of variants
isBLANK_A
, isBLANK_L1
, isBLANK_uni
, isBLANK_utf8
, isBLANK_LC
,
isBLANK_LC_uvchr
, and isBLANK_LC_utf8
. Note, however, that some
platforms do not have the C library routine isblank()
. In these cases, the
variants whose names contain LC
are the same as the corresponding ones
without.
- bool isBLANK(char ch)
Returns a boolean indicating whether the specified character is a
control character, analogous to m/[[:cntrl:]]/.
See the top of this section for an explanation of variants
isCNTRL_A
, isCNTRL_L1
, isCNTRL_uni
, isCNTRL_utf8
, isCNTRL_LC
,
isCNTRL_LC_uvchr
, and isCNTRL_LC_utf8
On EBCDIC platforms, you almost always want to use the isCNTRL_L1
variant.
- bool isCNTRL(char ch)
Returns a boolean indicating whether the specified character is a
digit, analogous to m/[[:digit:]]/.
Variants isDIGIT_A
and isDIGIT_L1
are identical to isDIGIT
.
See the top of this section for an explanation of variants
isDIGIT_uni
, isDIGIT_utf8
, isDIGIT_LC
, isDIGIT_LC_uvchr
, and
isDIGIT_LC_utf8
.
- bool isDIGIT(char ch)
Returns a boolean indicating whether the specified character is a
graphic character, analogous to m/[[:graph:]]/.
See the top of this section for an explanation of variants
isGRAPH_A
, isGRAPH_L1
, isGRAPH_uni
, isGRAPH_utf8
, isGRAPH_LC
,
isGRAPH_LC_uvchr
, and isGRAPH_LC_utf8
.
- bool isGRAPH(char ch)
Returns a boolean indicating whether the specified character can be the
second or succeeding character of an identifier. This is very close to, but
not quite the same as the official Unicode property XID_Continue
. The
difference is that this returns true only if the input character also matches
isWORDCHAR. See the top of this section for an
explanation of variants isIDCONT_A
, isIDCONT_L1
, isIDCONT_uni
,
isIDCONT_utf8
, isIDCONT_LC
, isIDCONT_LC_uvchr
, and
isIDCONT_LC_utf8
.
- bool isIDCONT(char ch)
Returns a boolean indicating whether the specified character can be the first
character of an identifier. This is very close to, but not quite the same as
the official Unicode property XID_Start
. The difference is that this
returns true only if the input character also matches isWORDCHAR.
See the top of this section for an explanation of variants
isIDFIRST_A
, isIDFIRST_L1
, isIDFIRST_uni
, isIDFIRST_utf8
,
isIDFIRST_LC
, isIDFIRST_LC_uvchr
, and isIDFIRST_LC_utf8
.
- bool isIDFIRST(char ch)
Returns a boolean indicating whether the specified character is a
lowercase character, analogous to m/[[:lower:]]/.
See the top of this section for an explanation of variants
isLOWER_A
, isLOWER_L1
, isLOWER_uni
, isLOWER_utf8
, isLOWER_LC
,
isLOWER_LC_uvchr
, and isLOWER_LC_utf8
.
- bool isLOWER(char ch)
Returns a boolean indicating whether the specified character is an
octal digit, [0-7].
The only two variants are isOCTAL_A
and isOCTAL_L1
; each is identical to
isOCTAL
.
- bool isOCTAL(char ch)
Returns a boolean indicating whether the specified character is a
printable character, analogous to m/[[:print:]]/.
See the top of this section for an explanation of variants
isPRINT_A
, isPRINT_L1
, isPRINT_uni
, isPRINT_utf8
, isPRINT_LC
,
isPRINT_LC_uvchr
, and isPRINT_LC_utf8
.
- bool isPRINT(char ch)
(short for Posix Space)
Starting in 5.18, this is identical (experimentally) in all its forms to the
corresponding isSPACE()
macros. ("Experimentally" means that this change
may be backed out in 5.20 or 5.22 if field experience indicates that it
was unwise.)
The locale forms of this macro are identical to their corresponding
isSPACE()
forms in all Perl releases. In releases prior to 5.18, the
non-locale forms differ from their isSPACE()
forms only in that the
isSPACE()
forms don't match a Vertical Tab, and the isPSXSPC()
forms do.
Otherwise they are identical. Thus this macro is analogous to what
m/[[:space:]]/ matches in a regular expression.
See the top of this section for an explanation of variants
isPSXSPC_A
, isPSXSPC_L1
, isPSXSPC_uni
, isPSXSPC_utf8
, isPSXSPC_LC
,
isPSXSPC_LC_uvchr
, and isPSXSPC_LC_utf8
.
- bool isPSXSPC(char ch)
Returns a boolean indicating whether the specified character is a
punctuation character, analogous to m/[[:punct:]]/.
Note that the definition of what is punctuation isn't as
straightforward as one might desire. See POSIX Character Classes in perlrecharclass for details.
See the top of this section for an explanation of variants
isPUNCT_A
, isPUNCT_L1
, isPUNCT_uni
, isPUNCT_utf8
, isPUNCT_LC
,
isPUNCT_LC_uvchr
, and isPUNCT_LC_utf8
.
- bool isPUNCT(char ch)
Returns a boolean indicating whether the specified character is a
whitespace character. This is analogous
to what m/\s/ matches in a regular expression. Starting in Perl 5.18
(experimentally), this also matches what m/[[:space:]]/ does.
("Experimentally" means that this change may be backed out in 5.20 or 5.22 if
field experience indicates that it was unwise.) Prior to 5.18, only the
locale forms of this macro (the ones with LC
in their names) matched
precisely what m/[[:space:]]/ does. In those releases, the only difference,
in the non-locale variants, was that isSPACE()
did not match a vertical tab.
(See isPSXSPC for a macro that matches a vertical tab in all releases.)
See the top of this section for an explanation of variants
isSPACE_A
, isSPACE_L1
, isSPACE_uni
, isSPACE_utf8
, isSPACE_LC
,
isSPACE_LC_uvchr
, and isSPACE_LC_utf8
.
- bool isSPACE(char ch)
Returns a boolean indicating whether the specified character is an
uppercase character, analogous to m/[[:upper:]]/.
See the top of this section for an explanation of variants
isUPPER_A
, isUPPER_L1
, isUPPER_uni
, isUPPER_utf8
, isUPPER_LC
,
isUPPER_LC_uvchr
, and isUPPER_LC_utf8
.
- bool isUPPER(char ch)
Returns a boolean indicating whether the specified character is a character
that is a word character, analogous to what m/\w/ and m/[[:word:]]/ match
in a regular expression. A word character is an alphabetic character, a
decimal digit, a connecting punctuation character (such as an underscore), or
a "mark" character that attaches to one of those (like some sort of accent).
isALNUM()
is a synonym provided for backward compatibility, even though a
word character includes more than the standard C language meaning of
alphanumeric.
See the top of this section for an explanation of variants
isWORDCHAR_A
, isWORDCHAR_L1
, isWORDCHAR_uni
, isWORDCHAR_utf8
,
isWORDCHAR_LC
, isWORDCHAR_LC_uvchr
, and isWORDCHAR_LC_utf8
.
- bool isWORDCHAR(char ch)
Returns a boolean indicating whether the specified character is a hexadecimal
digit. In the ASCII range these are [0-9A-Fa-f]
. Variants isXDIGIT_A()
and isXDIGIT_L1()
are identical to isXDIGIT()
.
See the top of this section for an explanation of variants
isXDIGIT_uni
, isXDIGIT_utf8
, isXDIGIT_LC
, isXDIGIT_LC_uvchr
, and
isXDIGIT_LC_utf8
.
- bool isXDIGIT(char ch)
Create and return a new interpreter by cloning the current one.
perl_clone takes these flags as parameters:
CLONEf_COPY_STACKS - is used to, well, copy the stacks also, without it we only clone the data and zero the stacks, with it we copy the stacks and the new perl interpreter is ready to run at the exact same point as the previous one. The pseudo-fork code uses COPY_STACKS while the threads->create doesn't.
CLONEf_KEEP_PTR_TABLE -
perl_clone keeps a ptr_table with the pointer of the old
variable as a key and the new variable as a value,
this allows it to check if something has been cloned and not
clone it again but rather just use the value and increase the
refcount. If KEEP_PTR_TABLE is not set then perl_clone will kill
the ptr_table using the function
ptr_table_free(PL_ptr_table); PL_ptr_table = NULL;
,
reason to keep it around is if you want to dup some of your own
variable who are outside the graph perl scans, example of this
code is in threads.xs create.
CLONEf_CLONE_HOST - This is a win32 thing, it is ignored on unix, it tells perls win32host code (which is c++) to clone itself, this is needed on win32 if you want to run two threads at the same time, if you just want to do some stuff in a separate perl interpreter and then throw it away and return to the original one, you don't need to do anything.
- PerlInterpreter* perl_clone(
- PerlInterpreter *proto_perl,
- UV flags
- )
Temporarily disable an entry in this BHK structure, by clearing the appropriate flag. which is a preprocessor token indicating which entry to disable.
NOTE: this function is experimental and may change or be removed without notice.
- void BhkDISABLE(BHK *hk, which)
Re-enable an entry in this BHK structure, by setting the appropriate flag. which is a preprocessor token indicating which entry to enable. This will assert (under -DDEBUGGING) if the entry doesn't contain a valid pointer.
NOTE: this function is experimental and may change or be removed without notice.
- void BhkENABLE(BHK *hk, which)
Set an entry in the BHK structure, and set the flags to indicate it is valid. which is a preprocessing token indicating which entry to set. The type of ptr depends on the entry.
NOTE: this function is experimental and may change or be removed without notice.
- void BhkENTRY_set(BHK *hk, which, void *ptr)
Register a set of hooks to be called when the Perl lexical scope changes at compile time. See Compile-time scope hooks in perlguts.
NOTE: this function is experimental and may change or be removed without notice.
NOTE: this function must be explicitly called as Perl_blockhook_register with an aTHX_ parameter.
- void Perl_blockhook_register(pTHX_ BHK *hk)
Generates and returns a standard Perl hash representing the full set of key/value pairs in the cop hints hash cophh. flags is currently unused and must be zero.
NOTE: this function is experimental and may change or be removed without notice.
- HV * cophh_2hv(const COPHH *cophh, U32 flags)
Make and return a complete copy of the cop hints hash cophh.
NOTE: this function is experimental and may change or be removed without notice.
- COPHH * cophh_copy(COPHH *cophh)
Like cophh_delete_pvn, but takes a nul-terminated string instead of a string/length pair.
NOTE: this function is experimental and may change or be removed without notice.
- COPHH * cophh_delete_pv(const COPHH *cophh,
- const char *key, U32 hash,
- U32 flags)
Delete a key and its associated value from the cop hints hash cophh, and returns the modified hash. The returned hash pointer is in general not the same as the hash pointer that was passed in. The input hash is consumed by the function, and the pointer to it must not be subsequently used. Use cophh_copy if you need both hashes.
The key is specified by keypv and keylen. If flags has the
COPHH_KEY_UTF8
bit set, the key octets are interpreted as UTF-8,
otherwise they are interpreted as Latin-1. hash is a precomputed
hash of the key string, or zero if it has not been precomputed.
NOTE: this function is experimental and may change or be removed without notice.
- COPHH * cophh_delete_pvn(COPHH *cophh,
- const char *keypv,
- STRLEN keylen, U32 hash,
- U32 flags)
Like cophh_delete_pvn, but takes a literal string instead of a string/length pair, and no precomputed hash.
NOTE: this function is experimental and may change or be removed without notice.
- COPHH * cophh_delete_pvs(const COPHH *cophh,
- const char *key, U32 flags)
Like cophh_delete_pvn, but takes a Perl scalar instead of a string/length pair.
NOTE: this function is experimental and may change or be removed without notice.
- COPHH * cophh_delete_sv(const COPHH *cophh, SV *key,
- U32 hash, U32 flags)
Like cophh_fetch_pvn, but takes a nul-terminated string instead of a string/length pair.
NOTE: this function is experimental and may change or be removed without notice.
- SV * cophh_fetch_pv(const COPHH *cophh,
- const char *key, U32 hash,
- U32 flags)
Look up the entry in the cop hints hash cophh with the key specified by
keypv and keylen. If flags has the COPHH_KEY_UTF8
bit set,
the key octets are interpreted as UTF-8, otherwise they are interpreted
as Latin-1. hash is a precomputed hash of the key string, or zero if
it has not been precomputed. Returns a mortal scalar copy of the value
associated with the key, or &PL_sv_placeholder
if there is no value
associated with the key.
NOTE: this function is experimental and may change or be removed without notice.
- SV * cophh_fetch_pvn(const COPHH *cophh,
- const char *keypv,
- STRLEN keylen, U32 hash,
- U32 flags)
Like cophh_fetch_pvn, but takes a literal string instead of a string/length pair, and no precomputed hash.
NOTE: this function is experimental and may change or be removed without notice.
- SV * cophh_fetch_pvs(const COPHH *cophh,
- const char *key, U32 flags)
Like cophh_fetch_pvn, but takes a Perl scalar instead of a string/length pair.
NOTE: this function is experimental and may change or be removed without notice.
- SV * cophh_fetch_sv(const COPHH *cophh, SV *key,
- U32 hash, U32 flags)
Discard the cop hints hash cophh, freeing all resources associated with it.
NOTE: this function is experimental and may change or be removed without notice.
- void cophh_free(COPHH *cophh)
Generate and return a fresh cop hints hash containing no entries.
NOTE: this function is experimental and may change or be removed without notice.
- COPHH * cophh_new_empty()
Like cophh_store_pvn, but takes a nul-terminated string instead of a string/length pair.
NOTE: this function is experimental and may change or be removed without notice.
- COPHH * cophh_store_pv(const COPHH *cophh,
- const char *key, U32 hash,
- SV *value, U32 flags)
Stores a value, associated with a key, in the cop hints hash cophh, and returns the modified hash. The returned hash pointer is in general not the same as the hash pointer that was passed in. The input hash is consumed by the function, and the pointer to it must not be subsequently used. Use cophh_copy if you need both hashes.
The key is specified by keypv and keylen. If flags has the
COPHH_KEY_UTF8
bit set, the key octets are interpreted as UTF-8,
otherwise they are interpreted as Latin-1. hash is a precomputed
hash of the key string, or zero if it has not been precomputed.
value is the scalar value to store for this key. value is copied by this function, which thus does not take ownership of any reference to it, and later changes to the scalar will not be reflected in the value visible in the cop hints hash. Complex types of scalar will not be stored with referential integrity, but will be coerced to strings.
NOTE: this function is experimental and may change or be removed without notice.
- COPHH * cophh_store_pvn(COPHH *cophh, const char *keypv,
- STRLEN keylen, U32 hash,
- SV *value, U32 flags)
Like cophh_store_pvn, but takes a literal string instead of a string/length pair, and no precomputed hash.
NOTE: this function is experimental and may change or be removed without notice.
- COPHH * cophh_store_pvs(const COPHH *cophh,
- const char *key, SV *value,
- U32 flags)
Like cophh_store_pvn, but takes a Perl scalar instead of a string/length pair.
NOTE: this function is experimental and may change or be removed without notice.
- COPHH * cophh_store_sv(const COPHH *cophh, SV *key,
- U32 hash, SV *value, U32 flags)
Generates and returns a standard Perl hash representing the full set of hint entries in the cop cop. flags is currently unused and must be zero.
- HV * cop_hints_2hv(const COP *cop, U32 flags)
Like cop_hints_fetch_pvn, but takes a nul-terminated string instead of a string/length pair.
- SV * cop_hints_fetch_pv(const COP *cop,
- const char *key, U32 hash,
- U32 flags)
Look up the hint entry in the cop cop with the key specified by
keypv and keylen. If flags has the COPHH_KEY_UTF8
bit set,
the key octets are interpreted as UTF-8, otherwise they are interpreted
as Latin-1. hash is a precomputed hash of the key string, or zero if
it has not been precomputed. Returns a mortal scalar copy of the value
associated with the key, or &PL_sv_placeholder
if there is no value
associated with the key.
- SV * cop_hints_fetch_pvn(const COP *cop,
- const char *keypv,
- STRLEN keylen, U32 hash,
- U32 flags)
Like cop_hints_fetch_pvn, but takes a literal string instead of a string/length pair, and no precomputed hash.
- SV * cop_hints_fetch_pvs(const COP *cop,
- const char *key, U32 flags)
Like cop_hints_fetch_pvn, but takes a Perl scalar instead of a string/length pair.
- SV * cop_hints_fetch_sv(const COP *cop, SV *key,
- U32 hash, U32 flags)
Register a custom op. See Custom Operators in perlguts.
NOTE: this function must be explicitly called as Perl_custom_op_register with an aTHX_ parameter.
- void Perl_custom_op_register(pTHX_
- Perl_ppaddr_t ppaddr,
- const XOP *xop)
Return the XOP structure for a given custom op. This function should be considered internal to OP_NAME and the other access macros: use them instead.
NOTE: this function must be explicitly called as Perl_custom_op_xop with an aTHX_ parameter.
- const XOP * Perl_custom_op_xop(pTHX_ const OP *o)
Temporarily disable a member of the XOP, by clearing the appropriate flag.
- void XopDISABLE(XOP *xop, which)
Reenable a member of the XOP which has been disabled.
- void XopENABLE(XOP *xop, which)
Return a member of the XOP structure. which is a cpp token indicating which entry to return. If the member is not set this will return a default value. The return type depends on which.
- XopENTRY(XOP *xop, which)
Set a member of the XOP structure. which is a cpp token indicating which entry to set. See Custom Operators in perlguts for details about the available members and how they are used.
- void XopENTRY_set(XOP *xop, which, value)
Return the XOP's flags.
- U32 XopFLAGS(XOP *xop)
Returns the stash of the CV. A stash is the symbol table hash, containing the package-scoped variables in the package where the subroutine was defined. For more information, see perlguts.
This also has a special use with XS AUTOLOAD subs. See Autoloading with XSUBs in perlguts.
- HV* CvSTASH(CV* cv)
Uses strlen
to get the length of name
, then calls get_cvn_flags
.
NOTE: the perl_ form of this function is deprecated.
- CV* get_cv(const char* name, I32 flags)
Returns the CV of the specified Perl subroutine. flags
are passed to
gv_fetchpvn_flags
. If GV_ADD
is set and the Perl subroutine does not
exist then it will be declared (which has the same effect as saying
sub name;
). If GV_ADD
is not set and the subroutine does not exist
then NULL is returned.
NOTE: the perl_ form of this function is deprecated.
- CV* get_cvn_flags(const char* name, STRLEN len,
- I32 flags)
Clone a CV, making a lexical closure. proto supplies the prototype of the function: its code, pad structure, and other attributes. The prototype is combined with a capture of outer lexicals to which the code refers, which are taken from the currently-executing instance of the immediately surrounding code.
- CV * cv_clone(CV *proto)
Clear out all the active components of a CV. This can happen either
by an explicit undef &foo
, or by the reference count going to zero.
In the former case, we keep the CvOUTSIDE pointer, so that any anonymous
children can still follow the full lexical scope chain.
- void cv_undef(CV* cv)
Find and return the variable that is named $_
in the lexical scope
of the currently-executing function. This may be a lexical $_
,
or will otherwise be the global one.
- SV * find_rundefsv()
Find the position of the lexical $_
in the pad of the
currently-executing function. Returns the offset in the current pad,
or NOT_IN_PAD
if there is no lexical $_
in scope (in which case
the global one should be used instead).
find_rundefsv is likely to be more convenient.
NOTE: the perl_ form of this function is deprecated.
- PADOFFSET find_rundefsvoffset()
Loads the module whose name is pointed to by the string part of name.
Note that the actual module name, not its filename, should be given.
Eg, "Foo::Bar" instead of "Foo/Bar.pm". flags can be any of
PERL_LOADMOD_DENY, PERL_LOADMOD_NOIMPORT, or PERL_LOADMOD_IMPORT_OPS
(or 0 for no flags). ver, if specified and not NULL, provides version semantics
similar to use Foo::Bar VERSION
. The optional trailing SV*
arguments can be used to specify arguments to the module's import()
method, similar to use Foo::Bar VERSION LIST
. They must be
terminated with a final NULL pointer. Note that this list can only
be omitted when the PERL_LOADMOD_NOIMPORT flag has been used.
Otherwise at least a single NULL pointer to designate the default
import list is required.
The reference count for each specified SV*
parameter is decremented.
- void load_module(U32 flags, SV* name, SV* ver, ...)
Stub that provides thread hook for perl_destruct when there are no threads.
- int nothreadhook()
Allocates a place in the currently-compiling pad (via pad_alloc)
for an anonymous function that is lexically scoped inside the
currently-compiling function.
The function func is linked into the pad, and its CvOUTSIDE
link
to the outer scope is weakened to avoid a reference loop.
One reference count is stolen, so you may need to do SvREFCNT_inc(func)
.
optype should be an opcode indicating the type of operation that the pad entry is to support. This doesn't affect operational semantics, but is used for debugging.
- PADOFFSET pad_add_anon(CV *func, I32 optype)
Exactly like pad_add_name_pvn, but takes a nul-terminated string instead of a string/length pair.
- PADOFFSET pad_add_name_pv(const char *name, U32 flags,
- HV *typestash, HV *ourstash)
Allocates a place in the currently-compiling pad for a named lexical variable. Stores the name and other metadata in the name part of the pad, and makes preparations to manage the variable's lexical scoping. Returns the offset of the allocated pad slot.
namepv/namelen specify the variable's name, including leading sigil. If typestash is non-null, the name is for a typed lexical, and this identifies the type. If ourstash is non-null, it's a lexical reference to a package variable, and this identifies the package. The following flags can be OR'ed together:
Exactly like pad_add_name_pvn, but takes the name string in the form of an SV instead of a string/length pair.
- PADOFFSET pad_add_name_sv(SV *name, U32 flags,
- HV *typestash, HV *ourstash)
Allocates a place in the currently-compiling pad, returning the offset of the allocated pad slot. No name is initially attached to the pad slot. tmptype is a set of flags indicating the kind of pad entry required, which will be set in the value SV for the allocated pad entry:
- SVs_PADMY named lexical variable ("my", "our", "state")
- SVs_PADTMP unnamed temporary store
optype should be an opcode indicating the type of operation that the pad entry is to support. This doesn't affect operational semantics, but is used for debugging.
NOTE: this function is experimental and may change or be removed without notice.
- PADOFFSET pad_alloc(I32 optype, U32 tmptype)
Looks up the type of the lexical variable at position po in the
currently-compiling pad. If the variable is typed, the stash of the
class to which it is typed is returned. If not, NULL
is returned.
- HV * pad_compname_type(PADOFFSET po)
Exactly like pad_findmy_pvn, but takes a nul-terminated string instead of a string/length pair.
- PADOFFSET pad_findmy_pv(const char *name, U32 flags)
Given the name of a lexical variable, find its position in the
currently-compiling pad.
namepv/namelen specify the variable's name, including leading sigil.
flags is reserved and must be zero.
If it is not in the current pad but appears in the pad of any lexically
enclosing scope, then a pseudo-entry for it is added in the current pad.
Returns the offset in the current pad,
or NOT_IN_PAD
if no such lexical is in scope.
- PADOFFSET pad_findmy_pvn(const char *namepv,
- STRLEN namelen, U32 flags)
Exactly like pad_findmy_pvn, but takes the name string in the form of an SV instead of a string/length pair.
- PADOFFSET pad_findmy_sv(SV *name, U32 flags)
Set the value at offset po in the current (compiling or executing) pad. Use the macro PAD_SETSV() rather than calling this function directly.
- void pad_setsv(PADOFFSET po, SV *sv)
Get the value at offset po in the current (compiling or executing) pad. Use macro PAD_SV instead of calling this function directly.
- SV * pad_sv(PADOFFSET po)
Tidy up a pad at the end of compilation of the code to which it belongs. Jobs performed here are: remove most stuff from the pads of anonsub prototypes; give it a @_; mark temporaries as such. type indicates the kind of subroutine:
NOTE: this function is experimental and may change or be removed without notice.
- void pad_tidy(padtidy_type type)
Allocates a new Perl interpreter. See perlembed.
- PerlInterpreter* perl_alloc()
Initializes a new Perl interpreter. See perlembed.
- void perl_construct(PerlInterpreter *my_perl)
Shuts down a Perl interpreter. See perlembed.
- int perl_destruct(PerlInterpreter *my_perl)
Releases a Perl interpreter. See perlembed.
- void perl_free(PerlInterpreter *my_perl)
Tells a Perl interpreter to parse a Perl script. See perlembed.
Tells a Perl interpreter to run. See perlembed.
- int perl_run(PerlInterpreter *my_perl)
Tells Perl to require the file named by the string argument. It is
analogous to the Perl code eval "require '$file'"
. It's even
implemented that way; consider using load_module instead.
NOTE: the perl_ form of this function is deprecated.
- void require_pv(const char* pv)
Similar to
- pv_escape(dsv,pv,cur,pvlim,PERL_PV_ESCAPE_QUOTE);
except that an additional "\0" will be appended to the string when len > cur and pv[cur] is "\0".
Note that the final string may be up to 7 chars longer than pvlim.
- char* pv_display(SV *dsv, const char *pv, STRLEN cur,
- STRLEN len, STRLEN pvlim)
Escapes at most the first "count" chars of pv and puts the results into dsv such that the size of the escaped string will not exceed "max" chars and will not contain any incomplete escape sequences.
If flags contains PERL_PV_ESCAPE_QUOTE then any double quotes in the string will also be escaped.
Normally the SV will be cleared before the escaped string is prepared, but when PERL_PV_ESCAPE_NOCLEAR is set this will not occur.
If PERL_PV_ESCAPE_UNI is set then the input string is treated as Unicode,
if PERL_PV_ESCAPE_UNI_DETECT is set then the input string is scanned
using is_utf8_string()
to determine if it is Unicode.
If PERL_PV_ESCAPE_ALL is set then all input chars will be output
using \x01F1
style escapes, otherwise if PERL_PV_ESCAPE_NONASCII is set, only
chars above 127 will be escaped using this style; otherwise, only chars above
255 will be so escaped; other non printable chars will use octal or
common escaped patterns like \n
. Otherwise, if PERL_PV_ESCAPE_NOBACKSLASH
then all chars below 255 will be treated as printable and
will be output as literals.
If PERL_PV_ESCAPE_FIRSTCHAR is set then only the first char of the
string will be escaped, regardless of max. If the output is to be in hex,
then it will be returned as a plain hex
sequence. Thus the output will either be a single char,
an octal escape sequence, a special escape like \n
or a hex value.
If PERL_PV_ESCAPE_RE is set then the escape char used will be a '%' and not a '\\'. This is because regexes very often contain backslashed sequences, whereas '%' is not a particularly common character in patterns.
Returns a pointer to the escaped text as held by dsv.
- char* pv_escape(SV *dsv, char const * const str,
- const STRLEN count, const STRLEN max,
- STRLEN * const escaped,
- const U32 flags)
Converts a string into something presentable, handling escaping via pv_escape() and supporting quoting and ellipses.
If the PERL_PV_PRETTY_QUOTE flag is set then the result will be double quoted with any double quotes in the string escaped. Otherwise if the PERL_PV_PRETTY_LTGT flag is set then the result be wrapped in angle brackets.
If the PERL_PV_PRETTY_ELLIPSES flag is set and not all characters in
string were output then an ellipsis ...
will be appended to the
string. Note that this happens AFTER it has been quoted.
If start_color is non-null then it will be inserted after the opening quote (if there is one) but before the escaped text. If end_color is non-null then it will be inserted after the escaped text but before any quotes or ellipses.
Returns a pointer to the prettified text as held by dsv.
- char* pv_pretty(SV *dsv, char const * const str,
- const STRLEN count, const STRLEN max,
- char const * const start_color,
- char const * const end_color,
- const U32 flags)
Return the description of a given custom op. This was once used by the OP_DESC macro, but is no longer: it has only been kept for compatibility, and should not be used.
- const char * custom_op_desc(const OP *o)
Return the name for a given custom op. This was once used by the OP_NAME macro, but is no longer: it has only been kept for compatibility, and should not be used.
- const char * custom_op_name(const OP *o)
- GV* gv_fetchmethod(HV* stash, const char* name)
The engine implementing pack() Perl function. Note: parameters next_in_list and flags are not used. This call should not be used; use packlist instead.
- void pack_cat(SV *cat, const char *pat,
- const char *patend, SV **beglist,
- SV **endlist, SV ***next_in_list,
- U32 flags)
Return a pointer to the byte-encoded representation of the SV. May cause the SV to be downgraded from UTF-8 as a side-effect.
Usually accessed via the SvPVbyte_nolen
macro.
- char* sv_2pvbyte_nolen(SV* sv)
Return a pointer to the UTF-8-encoded representation of the SV. May cause the SV to be upgraded to UTF-8 as a side-effect.
Usually accessed via the SvPVutf8_nolen
macro.
- char* sv_2pvutf8_nolen(SV* sv)
Like sv_2pv()
, but doesn't return the length too. You should usually
use the macro wrapper SvPV_nolen(sv)
instead.
- char* sv_2pv_nolen(SV* sv)
Like sv_catpvn
, but also handles 'set' magic.
- void sv_catpvn_mg(SV *sv, const char *ptr,
- STRLEN len)
Like sv_catsv
, but also handles 'set' magic.
- void sv_catsv_mg(SV *dsv, SV *ssv)
Undo various types of fakery on an SV: if the PV is a shared string, make
a private copy; if we're a ref, stop refing; if we're a glob, downgrade to
an xpvmg. See also sv_force_normal_flags
.
- void sv_force_normal(SV *sv)
A private implementation of the SvIVx
macro for compilers which can't
cope with complex macro expressions. Always use the macro instead.
- IV sv_iv(SV* sv)
Dummy routine which "locks" an SV when there is no locking module present. Exists to avoid test for a NULL function pointer and because it could potentially warn under some level of strict-ness.
"Superseded" by sv_nosharing().
- void sv_nolocking(SV *sv)
Dummy routine which "unlocks" an SV when there is no locking module present. Exists to avoid test for a NULL function pointer and because it could potentially warn under some level of strict-ness.
"Superseded" by sv_nosharing().
- void sv_nounlocking(SV *sv)
A private implementation of the SvNVx
macro for compilers which can't
cope with complex macro expressions. Always use the macro instead.
- NV sv_nv(SV* sv)
Use the SvPV_nolen
macro instead
- char* sv_pv(SV *sv)
Use SvPVbyte_nolen
instead.
- char* sv_pvbyte(SV *sv)
A private implementation of the SvPVbyte
macro for compilers
which can't cope with complex macro expressions. Always use the macro
instead.
- char* sv_pvbyten(SV *sv, STRLEN *lp)
A private implementation of the SvPV
macro for compilers which can't
cope with complex macro expressions. Always use the macro instead.
- char* sv_pvn(SV *sv, STRLEN *lp)
Use the SvPVutf8_nolen
macro instead
- char* sv_pvutf8(SV *sv)
A private implementation of the SvPVutf8
macro for compilers
which can't cope with complex macro expressions. Always use the macro
instead.
- char* sv_pvutf8n(SV *sv, STRLEN *lp)
Taint an SV. Use SvTAINTED_on
instead.
- void sv_taint(SV* sv)
Unsets the RV status of the SV, and decrements the reference count of
whatever was being referenced by the RV. This can almost be thought of
as a reversal of newSVrv
. This is sv_unref_flags
with the flag
being zero. See SvROK_off
.
- void sv_unref(SV* sv)
Tells an SV to use ptr
to find its string value. Implemented by
calling sv_usepvn_flags
with flags
of 0, hence does not handle 'set'
magic. See sv_usepvn_flags
.
- void sv_usepvn(SV* sv, char* ptr, STRLEN len)
Like sv_usepvn
, but also handles 'set' magic.
- void sv_usepvn_mg(SV *sv, char *ptr, STRLEN len)
A private implementation of the SvUVx
macro for compilers which can't
cope with complex macro expressions. Always use the macro instead.
- UV sv_uv(SV* sv)
The engine implementing unpack() Perl function. Note: parameters strbeg, new_s and ocnt are not used. This call should not be used, use unpackstring instead.
- I32 unpack_str(const char *pat, const char *patend,
- const char *s, const char *strbeg,
- const char *strend, char **new_s,
- I32 ocnt, U32 flags)
Available only under threaded builds, this function allocates an entry in
PL_stashpad
for the stash passed to it.
NOTE: this function is experimental and may change or be removed without notice.
- PADOFFSET alloccopstash(HV *hv)
Applies a syntactic context to an op tree representing an expression.
o is the op tree, and context must be G_SCALAR
, G_ARRAY
,
or G_VOID
to specify the context to apply. The modified op tree
is returned.
- OP * op_contextualize(OP *o, I32 context)
Provides system-specific tune up of the C runtime environment necessary to run Perl interpreters. This should be called only once, before creating any Perl interpreters.
- void PERL_SYS_INIT(int *argc, char*** argv)
Provides system-specific tune up of the C runtime environment necessary to run Perl interpreters. This should be called only once, before creating any Perl interpreters.
- void PERL_SYS_INIT3(int *argc, char*** argv,
- char*** env)
Provides system-specific clean up of the C runtime environment after running Perl interpreters. This should be called only once, after freeing any remaining Perl interpreters.
- void PERL_SYS_TERM()
The XSUB-writer's equivalent of caller. The
returned PERL_CONTEXT
structure can be interrogated to find all the
information returned to Perl by caller. Note that XSUBs don't get a
stack frame, so caller_cx(0, NULL)
will return information for the
immediately-surrounding Perl code.
This function skips over the automatic calls to &DB::sub
made on the
behalf of the debugger. If the stack frame requested was a sub called by
DB::sub
, the return value will be the frame for the call to
DB::sub
, since that has the correct line number/etc. for the call
site. If dbcxp is non-NULL
, it will be set to a pointer to the
frame for the sub call itself.
- const PERL_CONTEXT * caller_cx(
- I32 level,
- const PERL_CONTEXT **dbcxp
- )
Locate the CV corresponding to the currently executing sub or eval. If db_seqp is non_null, skip CVs that are in the DB package and populate *db_seqp with the cop sequence number at the point that the DB:: code was entered. (allows debuggers to eval in the scope of the breakpoint rather than in the scope of the debugger itself).
- CV* find_runcv(U32 *db_seqp)
The engine implementing pack() Perl function.
- void packlist(SV *cat, const char *pat,
- const char *patend, SV **beglist,
- SV **endlist)
The engine implementing the unpack() Perl function.
Using the template pat..patend, this function unpacks the string
s..strend into a number of mortal SVs, which it pushes onto the perl
argument (@_) stack (so you will need to issue a PUTBACK
before and
SPAGAIN
after the call to this function). It returns the number of
pushed elements.
The strend and patend pointers should point to the byte following the last character of each string.
Although this function returns its values on the perl argument stack, it doesn't take any parameters from that stack (and thus in particular there's no need to do a PUSHMARK before calling it, unlike call_pv for example).
- I32 unpackstring(const char *pat,
- const char *patend, const char *s,
- const char *strend, U32 flags)
Sets PL_defoutgv, the default file handle for output, to the passed in typeglob. As PL_defoutgv "owns" a reference on its typeglob, the reference count of the passed in typeglob is increased by one, and the reference count of the typeglob that PL_defoutgv points to is decreased by one.
- void setdefout(GV* gv)
This is a synonym for (! foldEQ_utf8())
- I32 ibcmp_utf8(const char *s1, char **pe1, UV l1,
- bool u1, const char *s2, char **pe2,
- UV l2, bool u2)
This is a synonym for (! foldEQ())
- I32 ibcmp(const char* a, const char* b, I32 len)
This is a synonym for (! foldEQ_locale())
- I32 ibcmp_locale(const char* a, const char* b,
- I32 len)
Array, indexed by opcode, of functions that will be called for the "check" phase of optree building during compilation of Perl code. For most (but not all) types of op, once the op has been initially built and populated with child ops it will be filtered through the check function referenced by the appropriate element of this array. The new op is passed in as the sole argument to the check function, and the check function returns the completed op. The check function may (as the name suggests) check the op for validity and signal errors. It may also initialise or modify parts of the ops, or perform more radical surgery such as adding or removing child ops, or even throw the op away and return a different op in its place.
This array of function pointers is a convenient place to hook into the compilation process. An XS module can put its own custom check function in place of any of the standard ones, to influence the compilation of a particular type of op. However, a custom check function must never fully replace a standard check function (or even a custom check function from another module). A module modifying checking must instead wrap the preexisting check function. A custom check function must be selective about when to apply its custom behaviour. In the usual case where it decides not to do anything special with an op, it must chain the preexisting op function. Check functions are thus linked in a chain, with the core's base checker at the end.
For thread safety, modules should not write directly to this array. Instead, use the function wrap_op_checker.
Function pointer, pointing at a function used to handle extended keywords. The function should be declared as
- int keyword_plugin_function(pTHX_
- char *keyword_ptr, STRLEN keyword_len,
- OP **op_ptr)
The function is called from the tokeniser, whenever a possible keyword
is seen. keyword_ptr
points at the word in the parser's input
buffer, and keyword_len
gives its length; it is not null-terminated.
The function is expected to examine the word, and possibly other state
such as %^H, to decide whether it wants to handle it
as an extended keyword. If it does not, the function should return
KEYWORD_PLUGIN_DECLINE
, and the normal parser process will continue.
If the function wants to handle the keyword, it first must parse anything following the keyword that is part of the syntax introduced by the keyword. See Lexer interface for details.
When a keyword is being handled, the plugin function must build
a tree of OP
structures, representing the code that was parsed.
The root of the tree must be stored in *op_ptr
. The function then
returns a constant indicating the syntactic role of the construct that
it has parsed: KEYWORD_PLUGIN_STMT
if it is a complete statement, or
KEYWORD_PLUGIN_EXPR
if it is an expression. Note that a statement
construct cannot be used inside an expression (except via do BLOCK
and similar), and an expression is not a complete statement (it requires
at least a terminating semicolon).
When a keyword is handled, the plugin function may also have
(compile-time) side effects. It may modify %^H
, define functions, and
so on. Typically, if side effects are the main purpose of a handler,
it does not wish to generate any ops to be included in the normal
compilation. In this case it is still required to supply an op tree,
but it suffices to generate a single null op.
That's how the *PL_keyword_plugin
function needs to behave overall.
Conventionally, however, one does not completely replace the existing
handler function. Instead, take a copy of PL_keyword_plugin
before
assigning your own function pointer to it. Your handler function should
look for keywords that it is interested in and handle those. Where it
is not interested, it should call the saved plugin function, passing on
the arguments it received. Thus PL_keyword_plugin
actually points
at a chain of handler functions, all of which have an opportunity to
handle keywords, and only the last function in the chain (built into
the Perl core) will normally return KEYWORD_PLUGIN_DECLINE
.
NOTE: this function is experimental and may change or be removed without notice.
Return the AV from the GV.
- AV* GvAV(GV* gv)
Return the CV from the GV.
- CV* GvCV(GV* gv)
Return the HV from the GV.
- HV* GvHV(GV* gv)
Return the SV from the GV.
- SV* GvSV(GV* gv)
If gv
is a typeglob whose subroutine entry is a constant sub eligible for
inlining, or gv
is a placeholder reference that would be promoted to such
a typeglob, then returns the value returned by the sub. Otherwise, returns
NULL.
- SV* gv_const_sv(GV* gv)
Like gv_fetchmeth_pvn, but lacks a flags parameter.
- GV* gv_fetchmeth(HV* stash, const char* name,
- STRLEN len, I32 level)
Returns the glob which contains the subroutine to call to invoke the method
on the stash
. In fact in the presence of autoloading this may be the
glob for "AUTOLOAD". In this case the corresponding variable $AUTOLOAD is
already setup.
The third parameter of gv_fetchmethod_autoload
determines whether
AUTOLOAD lookup is performed if the given method is not present: non-zero
means yes, look for AUTOLOAD; zero means no, don't look for AUTOLOAD.
Calling gv_fetchmethod
is equivalent to calling gv_fetchmethod_autoload
with a non-zero autoload
parameter.
These functions grant "SUPER"
token as a prefix of the method name. Note
that if you want to keep the returned glob for a long time, you need to
check for it being "AUTOLOAD", since at the later time the call may load a
different subroutine due to $AUTOLOAD changing its value. Use the glob
created via a side effect to do this.
These functions have the same side-effects and as gv_fetchmeth
with
level==0
. name
should be writable if contains ':'
or '
''. The warning against passing the GV returned by gv_fetchmeth
to
call_sv
apply equally to these functions.
- GV* gv_fetchmethod_autoload(HV* stash,
- const char* name,
- I32 autoload)
This is the old form of gv_fetchmeth_pvn_autoload, which has no flags parameter.
- GV* gv_fetchmeth_autoload(HV* stash,
- const char* name,
- STRLEN len, I32 level)
Exactly like gv_fetchmeth_pvn, but takes a nul-terminated string instead of a string/length pair.
- GV* gv_fetchmeth_pv(HV* stash, const char* name,
- I32 level, U32 flags)
Returns the glob with the given name
and a defined subroutine or
NULL
. The glob lives in the given stash
, or in the stashes
accessible via @ISA and UNIVERSAL::.
The argument level
should be either 0 or -1. If level==0
, as a
side-effect creates a glob with the given name
in the given stash
which in the case of success contains an alias for the subroutine, and sets
up caching info for this glob.
The only significant values for flags
are GV_SUPER and SVf_UTF8.
GV_SUPER indicates that we want to look up the method in the superclasses
of the stash
.
The
GV returned from gv_fetchmeth
may be a method cache entry, which is not
visible to Perl code. So when calling call_sv
, you should not use
the GV directly; instead, you should use the method's CV, which can be
obtained from the GV with the GvCV
macro.
- GV* gv_fetchmeth_pvn(HV* stash, const char* name,
- STRLEN len, I32 level,
- U32 flags)
Same as gv_fetchmeth_pvn(), but looks for autoloaded subroutines too. Returns a glob for the subroutine.
For an autoloaded subroutine without a GV, will create a GV even
if level < 0
. For an autoloaded subroutine without a stub, GvCV()
of the result may be zero.
Currently, the only significant value for flags
is SVf_UTF8.
- GV* gv_fetchmeth_pvn_autoload(HV* stash,
- const char* name,
- STRLEN len, I32 level,
- U32 flags)
Exactly like gv_fetchmeth_pvn_autoload, but takes a nul-terminated string instead of a string/length pair.
- GV* gv_fetchmeth_pv_autoload(HV* stash,
- const char* name,
- I32 level, U32 flags)
Exactly like gv_fetchmeth_pvn, but takes the name string in the form of an SV instead of a string/length pair.
- GV* gv_fetchmeth_sv(HV* stash, SV* namesv,
- I32 level, U32 flags)
Exactly like gv_fetchmeth_pvn_autoload, but takes the name string in the form of an SV instead of a string/length pair.
- GV* gv_fetchmeth_sv_autoload(HV* stash, SV* namesv,
- I32 level, U32 flags)
The old form of gv_init_pvn(). It does not work with UTF8 strings, as it
has no flags parameter. If the multi
parameter is set, the
GV_ADDMULTI flag will be passed to gv_init_pvn().
- void gv_init(GV* gv, HV* stash, const char* name,
- STRLEN len, int multi)
Same as gv_init_pvn(), but takes a nul-terminated string for the name instead of separate char * and length parameters.
- void gv_init_pv(GV* gv, HV* stash, const char* name,
- U32 flags)
Converts a scalar into a typeglob. This is an incoercible typeglob; assigning a reference to it will assign to one of its slots, instead of overwriting it as happens with typeglobs created by SvSetSV. Converting any scalar that is SvOK() may produce unpredictable results and is reserved for perl's internal use.
gv
is the scalar to be converted.
stash
is the parent stash/package, if any.
name
and len
give the name. The name must be unqualified;
that is, it must not include the package name. If gv
is a
stash element, it is the caller's responsibility to ensure that the name
passed to this function matches the name of the element. If it does not
match, perl's internal bookkeeping will get out of sync.
flags
can be set to SVf_UTF8 if name
is a UTF8 string, or
the return value of SvUTF8(sv). It can also take the
GV_ADDMULTI flag, which means to pretend that the GV has been
seen before (i.e., suppress "Used once" warnings).
- void gv_init_pvn(GV* gv, HV* stash, const char* name,
- STRLEN len, U32 flags)
Same as gv_init_pvn(), but takes an SV * for the name instead of separate
char * and length parameters. flags
is currently unused.
- void gv_init_sv(GV* gv, HV* stash, SV* namesv,
- U32 flags)
Returns a pointer to the stash for a specified package. Uses strlen
to
determine the length of name
, then calls gv_stashpvn()
.
- HV* gv_stashpv(const char* name, I32 flags)
Returns a pointer to the stash for a specified package. The namelen
parameter indicates the length of the name
, in bytes. flags
is passed
to gv_fetchpvn_flags()
, so if set to GV_ADD
then the package will be
created if it does not already exist. If the package does not exist and
flags
is 0 (or any other setting that does not create packages) then NULL
is returned.
Flags may be one of:
- GV_ADD
- SVf_UTF8
- GV_NOADD_NOINIT
- GV_NOINIT
- GV_NOEXPAND
- GV_ADDMG
The most important of which are probably GV_ADD and SVf_UTF8.
- HV* gv_stashpvn(const char* name, U32 namelen,
- I32 flags)
Like gv_stashpvn
, but takes a literal string instead of a string/length pair.
- HV* gv_stashpvs(const char* name, I32 create)
Returns a pointer to the stash for a specified package. See gv_stashpvn
.
- HV* gv_stashsv(SV* sv, I32 flags)
Null AV pointer.
(deprecated - use (AV *)NULL instead)
Null character pointer. (No longer available when PERL_CORE
is defined.)
Null CV pointer.
(deprecated - use (CV *)NULL instead)
Null HV pointer.
(deprecated - use (HV *)NULL instead)
Null SV pointer. (No longer available when PERL_CORE
is defined.)
Returns the label attached to a cop.
The flags pointer may be set to SVf_UTF8
or 0.
NOTE: this function is experimental and may change or be removed without notice.
- const char * cop_fetch_label(COP *const cop,
- STRLEN *len, U32 *flags)
Save a label into a cop_hints_hash
. You need to set flags to SVf_UTF8
for a utf-8 label.
NOTE: this function is experimental and may change or be removed without notice.
- void cop_store_label(COP *const cop,
- const char *label, STRLEN len,
- U32 flags)
Returns the HV of the specified Perl hash. flags
are passed to
gv_fetchpv
. If GV_ADD
is set and the
Perl variable does not exist then it will be created. If flags
is zero
and the variable does not exist then NULL is returned.
NOTE: the perl_ form of this function is deprecated.
- HV* get_hv(const char *name, I32 flags)
This flag, used in the length slot of hash entries and magic structures,
specifies the structure contains an SV*
pointer where a char*
pointer
is to be expected. (For information only--not to be used).
Returns the computed hash stored in the hash entry.
- U32 HeHASH(HE* he)
Returns the actual pointer stored in the key slot of the hash entry. The
pointer may be either char*
or SV*
, depending on the value of
HeKLEN()
. Can be assigned to. The HePV()
or HeSVKEY()
macros are
usually preferable for finding the value of a key.
- void* HeKEY(HE* he)
If this is negative, and amounts to HEf_SVKEY
, it indicates the entry
holds an SV*
key. Otherwise, holds the actual length of the key. Can
be assigned to. The HePV()
macro is usually preferable for finding key
lengths.
- STRLEN HeKLEN(HE* he)
Returns the key slot of the hash entry as a char*
value, doing any
necessary dereferencing of possibly SV*
keys. The length of the string
is placed in len
(this is a macro, so do not use &len
). If you do
not care about what the length of the key is, you may use the global
variable PL_na
, though this is rather less efficient than using a local
variable. Remember though, that hash keys in perl are free to contain
embedded nulls, so using strlen()
or similar is not a good way to find
the length of hash keys. This is very similar to the SvPV()
macro
described elsewhere in this document. See also HeUTF8
.
If you are using HePV
to get values to pass to newSVpvn()
to create a
new SV, you should consider using newSVhek(HeKEY_hek(he))
as it is more
efficient.
- char* HePV(HE* he, STRLEN len)
Returns the key as an SV*
, or NULL
if the hash entry does not
contain an SV*
key.
- SV* HeSVKEY(HE* he)
Returns the key as an SV*
. Will create and return a temporary mortal
SV*
if the hash entry contains only a char*
key.
- SV* HeSVKEY_force(HE* he)
Sets the key to a given SV*
, taking care to set the appropriate flags to
indicate the presence of an SV*
key, and returns the same
SV*
.
- SV* HeSVKEY_set(HE* he, SV* sv)
Returns whether the char *
value returned by HePV
is encoded in UTF-8,
doing any necessary dereferencing of possibly SV*
keys. The value returned
will be 0 or non-0, not necessarily 1 (or even a value with any low bits set),
so do not blindly assign this to a bool
variable, as bool
may be a
typedef for char
.
- char* HeUTF8(HE* he)
Returns the value slot (type SV*
) stored in the hash entry. Can be assigned
to.
- SV *foo= HeVAL(hv);
- HeVAL(hv)= sv;
- SV* HeVAL(HE* he)
Returns the effective name of a stash, or NULL if there is none. The
effective name represents a location in the symbol table where this stash
resides. It is updated automatically when packages are aliased or deleted.
A stash that is no longer in the symbol table has no effective name. This
name is preferable to HvNAME
for use in MRO linearisations and isa
caches.
- char* HvENAME(HV* stash)
Returns the length of the stash's effective name.
- STRLEN HvENAMELEN(HV *stash)
Returns true if the effective name is in UTF8 encoding.
- unsigned char HvENAMEUTF8(HV *stash)
Returns the package name of a stash, or NULL if stash
isn't a stash.
See SvSTASH
, CvSTASH
.
- char* HvNAME(HV* stash)
Returns the length of the stash's name.
- STRLEN HvNAMELEN(HV *stash)
Returns true if the name is in UTF8 encoding.
- unsigned char HvNAMEUTF8(HV *stash)
Check that a hash is in an internally consistent state.
- void hv_assert(HV *hv)
Frees the all the elements of a hash, leaving it empty.
The XS equivalent of %hash = ()
. See also hv_undef.
If any destructors are triggered as a result, the hv itself may be freed.
- void hv_clear(HV *hv)
Clears any placeholders from a hash. If a restricted hash has any of its keys marked as readonly and the key is subsequently deleted, the key is not actually deleted but is marked by assigning it a value of &PL_sv_placeholder. This tags it so it will be ignored by future operations such as iterating over the hash, but will still allow the hash to have a value reassigned to the key at some future point. This function clears any such placeholder keys from the hash. See Hash::Util::lock_keys() for an example of its use.
- void hv_clear_placeholders(HV *hv)
A specialised version of newHVhv for copying %^H
. ohv must be
a pointer to a hash (which may have %^H
magic, but should be generally
non-magical), or NULL
(interpreted as an empty hash). The content
of ohv is copied to a new hash, which has the %^H
-specific magic
added to it. A pointer to the new hash is returned.
- HV * hv_copy_hints_hv(HV *ohv)
Deletes a key/value pair in the hash. The value's SV is removed from
the hash, made mortal, and returned to the caller. The absolute
value of klen
is the length of the key. If klen
is negative the
key is assumed to be in UTF-8-encoded Unicode. The flags
value
will normally be zero; if set to G_DISCARD then NULL will be returned.
NULL will also be returned if the key is not found.
- SV* hv_delete(HV *hv, const char *key, I32 klen,
- I32 flags)
Deletes a key/value pair in the hash. The value SV is removed from the hash,
made mortal, and returned to the caller. The flags
value will normally be
zero; if set to G_DISCARD then NULL will be returned. NULL will also be
returned if the key is not found. hash
can be a valid precomputed hash
value, or 0 to ask for it to be computed.
- SV* hv_delete_ent(HV *hv, SV *keysv, I32 flags,
- U32 hash)
Returns a boolean indicating whether the specified hash key exists. The
absolute value of klen
is the length of the key. If klen
is
negative the key is assumed to be in UTF-8-encoded Unicode.
- bool hv_exists(HV *hv, const char *key, I32 klen)
Returns a boolean indicating whether
the specified hash key exists. hash
can be a valid precomputed hash value, or 0 to ask for it to be
computed.
- bool hv_exists_ent(HV *hv, SV *keysv, U32 hash)
Returns the SV which corresponds to the specified key in the hash.
The absolute value of klen
is the length of the key. If klen
is
negative the key is assumed to be in UTF-8-encoded Unicode. If
lval
is set then the fetch will be part of a store. This means that if
there is no value in the hash associated with the given key, then one is
created and a pointer to it is returned. The SV*
it points to can be
assigned to. But always check that the
return value is non-null before dereferencing it to an SV*
.
See Understanding the Magic of Tied Hashes and Arrays in perlguts for more information on how to use this function on tied hashes.
- SV** hv_fetch(HV *hv, const char *key, I32 klen,
- I32 lval)
Like hv_fetch
, but takes a literal string instead of a string/length pair.
- SV** hv_fetchs(HV* tb, const char* key, I32 lval)
Returns the hash entry which corresponds to the specified key in the hash.
hash
must be a valid precomputed hash number for the given key
, or 0
if you want the function to compute it. IF lval
is set then the fetch
will be part of a store. Make sure the return value is non-null before
accessing it. The return value when hv
is a tied hash is a pointer to a
static location, so be sure to make a copy of the structure if you need to
store it somewhere.
See Understanding the Magic of Tied Hashes and Arrays in perlguts for more information on how to use this function on tied hashes.
- HE* hv_fetch_ent(HV *hv, SV *keysv, I32 lval,
- U32 hash)
Returns the number of hash buckets that happen to be in use. This function is
wrapped by the macro HvFILL
.
Previously this value was stored in the HV structure, rather than being calculated on demand.
- STRLEN hv_fill(HV const *const hv)
Prepares a starting point to traverse a hash table. Returns the number of
keys in the hash (i.e. the same as HvUSEDKEYS(hv)
). The return value is
currently only meaningful for hashes without tie magic.
NOTE: Before version 5.004_65, hv_iterinit
used to return the number of
hash buckets that happen to be in use. If you still need that esoteric
value, you can get it through the macro HvFILL(hv)
.
- I32 hv_iterinit(HV *hv)
Returns the key from the current position of the hash iterator. See
hv_iterinit
.
- char* hv_iterkey(HE* entry, I32* retlen)
Returns the key as an SV*
from the current position of the hash
iterator. The return value will always be a mortal copy of the key. Also
see hv_iterinit
.
- SV* hv_iterkeysv(HE* entry)
Returns entries from a hash iterator. See hv_iterinit
.
You may call hv_delete
or hv_delete_ent
on the hash entry that the
iterator currently points to, without losing your place or invalidating your
iterator. Note that in this case the current entry is deleted from the hash
with your iterator holding the last reference to it. Your iterator is flagged
to free the entry on the next call to hv_iternext
, so you must not discard
your iterator immediately else the entry will leak - call hv_iternext
to
trigger the resource deallocation.
- HE* hv_iternext(HV *hv)
Performs an hv_iternext
, hv_iterkey
, and hv_iterval
in one
operation.
- SV* hv_iternextsv(HV *hv, char **key, I32 *retlen)
Returns entries from a hash iterator. See hv_iterinit
and hv_iternext
.
The flags
value will normally be zero; if HV_ITERNEXT_WANTPLACEHOLDERS is
set the placeholders keys (for restricted hashes) will be returned in addition
to normal keys. By default placeholders are automatically skipped over.
Currently a placeholder is implemented with a value that is
&PL_sv_placeholder
. Note that the implementation of placeholders and
restricted hashes may change, and the implementation currently is
insufficiently abstracted for any change to be tidy.
NOTE: this function is experimental and may change or be removed without notice.
- HE* hv_iternext_flags(HV *hv, I32 flags)
Returns the value from the current position of the hash iterator. See
hv_iterkey
.
- SV* hv_iterval(HV *hv, HE *entry)
Adds magic to a hash. See sv_magic
.
- void hv_magic(HV *hv, GV *gv, int how)
Evaluates the hash in scalar context and returns the result. Handles magic when the hash is tied.
- SV* hv_scalar(HV *hv)
Stores an SV in a hash. The hash key is specified as key
and the
absolute value of klen
is the length of the key. If klen
is
negative the key is assumed to be in UTF-8-encoded Unicode. The
hash
parameter is the precomputed hash value; if it is zero then
Perl will compute it.
The return value will be
NULL if the operation failed or if the value did not need to be actually
stored within the hash (as in the case of tied hashes). Otherwise it can
be dereferenced to get the original SV*
. Note that the caller is
responsible for suitably incrementing the reference count of val
before
the call, and decrementing it if the function returned NULL. Effectively
a successful hv_store takes ownership of one reference to val
. This is
usually what you want; a newly created SV has a reference count of one, so
if all your code does is create SVs then store them in a hash, hv_store
will own the only reference to the new SV, and your code doesn't need to do
anything further to tidy up. hv_store is not implemented as a call to
hv_store_ent, and does not create a temporary SV for the key, so if your
key data is not already in SV form then use hv_store in preference to
hv_store_ent.
See Understanding the Magic of Tied Hashes and Arrays in perlguts for more information on how to use this function on tied hashes.
- SV** hv_store(HV *hv, const char *key, I32 klen,
- SV *val, U32 hash)
Like hv_store
, but takes a literal string instead of a string/length pair
and omits the hash parameter.
- SV** hv_stores(HV* tb, const char* key,
- NULLOK SV* val)
Stores val
in a hash. The hash key is specified as key
. The hash
parameter is the precomputed hash value; if it is zero then Perl will
compute it. The return value is the new hash entry so created. It will be
NULL if the operation failed or if the value did not need to be actually
stored within the hash (as in the case of tied hashes). Otherwise the
contents of the return value can be accessed using the He? macros
described here. Note that the caller is responsible for suitably
incrementing the reference count of val
before the call, and
decrementing it if the function returned NULL. Effectively a successful
hv_store_ent takes ownership of one reference to val
. This is
usually what you want; a newly created SV has a reference count of one, so
if all your code does is create SVs then store them in a hash, hv_store
will own the only reference to the new SV, and your code doesn't need to do
anything further to tidy up. Note that hv_store_ent only reads the key
;
unlike val
it does not take ownership of it, so maintaining the correct
reference count on key
is entirely the caller's responsibility. hv_store
is not implemented as a call to hv_store_ent, and does not create a temporary
SV for the key, so if your key data is not already in SV form then use
hv_store in preference to hv_store_ent.
See Understanding the Magic of Tied Hashes and Arrays in perlguts for more information on how to use this function on tied hashes.
- HE* hv_store_ent(HV *hv, SV *key, SV *val, U32 hash)
Undefines the hash. The XS equivalent of undef(%hash).
As well as freeing all the elements of the hash (like hv_clear()), this also frees any auxiliary data and storage associated with the hash.
If any destructors are triggered as a result, the hv itself may be freed.
See also hv_clear.
- void hv_undef(HV *hv)
Creates a new HV. The reference count is set to 1.
- HV* newHV()
Puts a C function into the chain of check functions for a specified op type. This is the preferred way to manipulate the PL_check array. opcode specifies which type of op is to be affected. new_checker is a pointer to the C function that is to be added to that opcode's check chain, and old_checker_p points to the storage location where a pointer to the next function in the chain will be stored. The value of new_pointer is written into the PL_check array, while the value previously stored there is written to *old_checker_p.
PL_check is global to an entire process, and a module wishing to
hook op checking may find itself invoked more than once per process,
typically in different threads. To handle that situation, this function
is idempotent. The location *old_checker_p must initially (once
per process) contain a null pointer. A C variable of static duration
(declared at file scope, typically also marked static
to give
it internal linkage) will be implicitly initialised appropriately,
if it does not have an explicit initialiser. This function will only
actually modify the check chain if it finds *old_checker_p to be null.
This function is also thread safe on the small scale. It uses appropriate
locking to avoid race conditions in accessing PL_check.
When this function is called, the function referenced by new_checker must be ready to be called, except for *old_checker_p being unfilled. In a threading situation, new_checker may be called immediately, even before this function has returned. *old_checker_p will always be appropriately set before new_checker is called. If new_checker decides not to do anything special with an op that it is given (which is the usual case for most uses of op check hooking), it must chain the check function referenced by *old_checker_p.
If you want to influence compilation of calls to a specific subroutine,
then use cv_set_call_checker rather than hooking checking of all
entersub
ops.
- void wrap_op_checker(Optype opcode,
- Perl_check_t new_checker,
- Perl_check_t *old_checker_p)
Indicates whether the octets in the lexer buffer
(PL_parser->linestr) should be interpreted as the UTF-8 encoding
of Unicode characters. If not, they should be interpreted as Latin-1
characters. This is analogous to the SvUTF8
flag for scalars.
In UTF-8 mode, it is not guaranteed that the lexer buffer actually contains valid UTF-8. Lexing code must be robust in the face of invalid encoding.
The actual SvUTF8
flag of the PL_parser->linestr scalar
is significant, but not the whole story regarding the input character
encoding. Normally, when a file is being read, the scalar contains octets
and its SvUTF8
flag is off, but the octets should be interpreted as
UTF-8 if the use utf8
pragma is in effect. During a string eval,
however, the scalar may have the SvUTF8
flag on, and in this case its
octets should be interpreted as UTF-8 unless the use bytes
pragma
is in effect. This logic may change in the future; use this function
instead of implementing the logic yourself.
NOTE: this function is experimental and may change or be removed without notice.
- bool lex_bufutf8()
Discards the first part of the PL_parser->linestr buffer, up to ptr. The remaining content of the buffer will be moved, and all pointers into the buffer updated appropriately. ptr must not be later in the buffer than the position of PL_parser->bufptr: it is not permitted to discard text that has yet to be lexed.
Normally it is not necessarily to do this directly, because it suffices to use the implicit discarding behaviour of lex_next_chunk and things based on it. However, if a token stretches across multiple lines, and the lexing code has kept multiple lines of text in the buffer for that purpose, then after completion of the token it would be wise to explicitly discard the now-unneeded earlier lines, to avoid future multi-line tokens growing the buffer without bound.
NOTE: this function is experimental and may change or be removed without notice.
- void lex_discard_to(char *ptr)
Reallocates the lexer buffer (PL_parser->linestr) to accommodate at least len octets (including terminating NUL). Returns a pointer to the reallocated buffer. This is necessary before making any direct modification of the buffer that would increase its length. lex_stuff_pvn provides a more convenient way to insert text into the buffer.
Do not use SvGROW
or sv_grow
directly on PL_parser->linestr
;
this function updates all of the lexer's variables that point directly
into the buffer.
NOTE: this function is experimental and may change or be removed without notice.
- char * lex_grow_linestr(STRLEN len)
Reads in the next chunk of text to be lexed, appending it to PL_parser->linestr. This should be called when lexing code has looked to the end of the current chunk and wants to know more. It is usual, but not necessary, for lexing to have consumed the entirety of the current chunk at this time.
If PL_parser->bufptr is pointing to the very end of the current
chunk (i.e., the current chunk has been entirely consumed), normally the
current chunk will be discarded at the same time that the new chunk is
read in. If flags includes LEX_KEEP_PREVIOUS
, the current chunk
will not be discarded. If the current chunk has not been entirely
consumed, then it will not be discarded regardless of the flag.
Returns true if some new text was added to the buffer, or false if the buffer has reached the end of the input text.
NOTE: this function is experimental and may change or be removed without notice.
- bool lex_next_chunk(U32 flags)
Looks ahead one (Unicode) character in the text currently being lexed. Returns the codepoint (unsigned integer value) of the next character, or -1 if lexing has reached the end of the input text. To consume the peeked character, use lex_read_unichar.
If the next character is in (or extends into) the next chunk of input
text, the next chunk will be read in. Normally the current chunk will be
discarded at the same time, but if flags includes LEX_KEEP_PREVIOUS
then the current chunk will not be discarded.
If the input is being interpreted as UTF-8 and a UTF-8 encoding error is encountered, an exception is generated.
NOTE: this function is experimental and may change or be removed without notice.
- I32 lex_peek_unichar(U32 flags)
Reads optional spaces, in Perl style, in the text currently being
lexed. The spaces may include ordinary whitespace characters and
Perl-style comments. #line
directives are processed if encountered.
PL_parser->bufptr is moved past the spaces, so that it points
at a non-space character (or the end of the input text).
If spaces extend into the next chunk of input text, the next chunk will
be read in. Normally the current chunk will be discarded at the same
time, but if flags includes LEX_KEEP_PREVIOUS
then the current
chunk will not be discarded.
NOTE: this function is experimental and may change or be removed without notice.
- void lex_read_space(U32 flags)
Consume text in the lexer buffer, from PL_parser->bufptr up to ptr. This advances PL_parser->bufptr to match ptr, performing the correct bookkeeping whenever a newline character is passed. This is the normal way to consume lexed text.
Interpretation of the buffer's octets can be abstracted out by using the slightly higher-level functions lex_peek_unichar and lex_read_unichar.
NOTE: this function is experimental and may change or be removed without notice.
- void lex_read_to(char *ptr)
Reads the next (Unicode) character in the text currently being lexed. Returns the codepoint (unsigned integer value) of the character read, and moves PL_parser->bufptr past the character, or returns -1 if lexing has reached the end of the input text. To non-destructively examine the next character, use lex_peek_unichar instead.
If the next character is in (or extends into) the next chunk of input
text, the next chunk will be read in. Normally the current chunk will be
discarded at the same time, but if flags includes LEX_KEEP_PREVIOUS
then the current chunk will not be discarded.
If the input is being interpreted as UTF-8 and a UTF-8 encoding error is encountered, an exception is generated.
NOTE: this function is experimental and may change or be removed without notice.
- I32 lex_read_unichar(U32 flags)
Creates and initialises a new lexer/parser state object, supplying a context in which to lex and parse from a new source of Perl code. A pointer to the new state object is placed in PL_parser. An entry is made on the save stack so that upon unwinding the new state object will be destroyed and the former value of PL_parser will be restored. Nothing else need be done to clean up the parsing context.
The code to be parsed comes from line and rsfp. line, if non-null, provides a string (in SV form) containing code to be parsed. A copy of the string is made, so subsequent modification of line does not affect parsing. rsfp, if non-null, provides an input stream from which code will be read to be parsed. If both are non-null, the code in line comes first and must consist of complete lines of input, and rsfp supplies the remainder of the source.
The flags parameter is reserved for future use. Currently it is only used by perl internally, so extensions should always pass zero.
NOTE: this function is experimental and may change or be removed without notice.
- void lex_start(SV *line, PerlIO *rsfp, U32 flags)
Insert characters into the lexer buffer (PL_parser->linestr), immediately after the current lexing point (PL_parser->bufptr), reallocating the buffer if necessary. This means that lexing code that runs later will see the characters as if they had appeared in the input. It is not recommended to do this as part of normal parsing, and most uses of this facility run the risk of the inserted characters being interpreted in an unintended manner.
The string to be inserted is represented by octets starting at pv
and continuing to the first nul. These octets are interpreted as either
UTF-8 or Latin-1, according to whether the LEX_STUFF_UTF8
flag is set
in flags. The characters are recoded for the lexer buffer, according
to how the buffer is currently being interpreted (lex_bufutf8).
If it is not convenient to nul-terminate a string to be inserted, the
lex_stuff_pvn function is more appropriate.
NOTE: this function is experimental and may change or be removed without notice.
- void lex_stuff_pv(const char *pv, U32 flags)
Insert characters into the lexer buffer (PL_parser->linestr), immediately after the current lexing point (PL_parser->bufptr), reallocating the buffer if necessary. This means that lexing code that runs later will see the characters as if they had appeared in the input. It is not recommended to do this as part of normal parsing, and most uses of this facility run the risk of the inserted characters being interpreted in an unintended manner.
The string to be inserted is represented by len octets starting
at pv. These octets are interpreted as either UTF-8 or Latin-1,
according to whether the LEX_STUFF_UTF8
flag is set in flags.
The characters are recoded for the lexer buffer, according to how the
buffer is currently being interpreted (lex_bufutf8). If a string
to be inserted is available as a Perl scalar, the lex_stuff_sv
function is more convenient.
NOTE: this function is experimental and may change or be removed without notice.
- void lex_stuff_pvn(const char *pv, STRLEN len,
- U32 flags)
Like lex_stuff_pvn, but takes a literal string instead of a string/length pair.
NOTE: this function is experimental and may change or be removed without notice.
- void lex_stuff_pvs(const char *pv, U32 flags)
Insert characters into the lexer buffer (PL_parser->linestr), immediately after the current lexing point (PL_parser->bufptr), reallocating the buffer if necessary. This means that lexing code that runs later will see the characters as if they had appeared in the input. It is not recommended to do this as part of normal parsing, and most uses of this facility run the risk of the inserted characters being interpreted in an unintended manner.
The string to be inserted is the string value of sv. The characters are recoded for the lexer buffer, according to how the buffer is currently being interpreted (lex_bufutf8). If a string to be inserted is not already a Perl scalar, the lex_stuff_pvn function avoids the need to construct a scalar.
NOTE: this function is experimental and may change or be removed without notice.
- void lex_stuff_sv(SV *sv, U32 flags)
Discards text about to be lexed, from PL_parser->bufptr up to ptr. Text following ptr will be moved, and the buffer shortened. This hides the discarded text from any lexing code that runs later, as if the text had never appeared.
This is not the normal way to consume lexed text. For that, use lex_read_to.
NOTE: this function is experimental and may change or be removed without notice.
- void lex_unstuff(char *ptr)
Parse a Perl arithmetic expression. This may contain operators of precedence
down to the bit shift operators. The expression must be followed (and thus
terminated) either by a comparison or lower-precedence operator or by
something that would normally terminate an expression such as semicolon.
If flags includes PARSE_OPTIONAL
then the expression is optional,
otherwise it is mandatory. It is up to the caller to ensure that the
dynamic parser state (PL_parser et al) is correctly set to reflect
the source of the code to be parsed and the lexical context for the
expression.
The op tree representing the expression is returned. If an optional expression is absent, a null pointer is returned, otherwise the pointer will be non-null.
If an error occurs in parsing or compilation, in most cases a valid op tree is returned anyway. The error is reflected in the parser state, normally resulting in a single exception at the top level of parsing which covers all the compilation errors that occurred. Some compilation errors, however, will throw an exception immediately.
NOTE: this function is experimental and may change or be removed without notice.
- OP * parse_arithexpr(U32 flags)
Parse a single unadorned Perl statement. This may be a normal imperative statement or a declaration that has compile-time effect. It does not include any label or other affixture. It is up to the caller to ensure that the dynamic parser state (PL_parser et al) is correctly set to reflect the source of the code to be parsed and the lexical context for the statement.
The op tree representing the statement is returned. This may be a
null pointer if the statement is null, for example if it was actually
a subroutine definition (which has compile-time side effects). If not
null, it will be ops directly implementing the statement, suitable to
pass to newSTATEOP. It will not normally include a nextstate
or
equivalent op (except for those embedded in a scope contained entirely
within the statement).
If an error occurs in parsing or compilation, in most cases a valid op tree (most likely null) is returned anyway. The error is reflected in the parser state, normally resulting in a single exception at the top level of parsing which covers all the compilation errors that occurred. Some compilation errors, however, will throw an exception immediately.
The flags parameter is reserved for future use, and must always be zero.
NOTE: this function is experimental and may change or be removed without notice.
- OP * parse_barestmt(U32 flags)
Parse a single complete Perl code block. This consists of an opening
brace, a sequence of statements, and a closing brace. The block
constitutes a lexical scope, so my variables and various compile-time
effects can be contained within it. It is up to the caller to ensure
that the dynamic parser state (PL_parser et al) is correctly set to
reflect the source of the code to be parsed and the lexical context for
the statement.
The op tree representing the code block is returned. This is always a
real op, never a null pointer. It will normally be a lineseq
list,
including nextstate
or equivalent ops. No ops to construct any kind
of runtime scope are included by virtue of it being a block.
If an error occurs in parsing or compilation, in most cases a valid op tree (most likely null) is returned anyway. The error is reflected in the parser state, normally resulting in a single exception at the top level of parsing which covers all the compilation errors that occurred. Some compilation errors, however, will throw an exception immediately.
The flags parameter is reserved for future use, and must always be zero.
NOTE: this function is experimental and may change or be removed without notice.
- OP * parse_block(U32 flags)
Parse a single complete Perl expression. This allows the full
expression grammar, including the lowest-precedence operators such
as or
. The expression must be followed (and thus terminated) by a
token that an expression would normally be terminated by: end-of-file,
closing bracketing punctuation, semicolon, or one of the keywords that
signals a postfix expression-statement modifier. If flags includes
PARSE_OPTIONAL
then the expression is optional, otherwise it is
mandatory. It is up to the caller to ensure that the dynamic parser
state (PL_parser et al) is correctly set to reflect the source of
the code to be parsed and the lexical context for the expression.
The op tree representing the expression is returned. If an optional expression is absent, a null pointer is returned, otherwise the pointer will be non-null.
If an error occurs in parsing or compilation, in most cases a valid op tree is returned anyway. The error is reflected in the parser state, normally resulting in a single exception at the top level of parsing which covers all the compilation errors that occurred. Some compilation errors, however, will throw an exception immediately.
NOTE: this function is experimental and may change or be removed without notice.
- OP * parse_fullexpr(U32 flags)
Parse a single complete Perl statement. This may be a normal imperative statement or a declaration that has compile-time effect, and may include optional labels. It is up to the caller to ensure that the dynamic parser state (PL_parser et al) is correctly set to reflect the source of the code to be parsed and the lexical context for the statement.
The op tree representing the statement is returned. This may be a
null pointer if the statement is null, for example if it was actually
a subroutine definition (which has compile-time side effects). If not
null, it will be the result of a newSTATEOP call, normally including
a nextstate
or equivalent op.
If an error occurs in parsing or compilation, in most cases a valid op tree (most likely null) is returned anyway. The error is reflected in the parser state, normally resulting in a single exception at the top level of parsing which covers all the compilation errors that occurred. Some compilation errors, however, will throw an exception immediately.
The flags parameter is reserved for future use, and must always be zero.
NOTE: this function is experimental and may change or be removed without notice.
- OP * parse_fullstmt(U32 flags)
Parse a single label, possibly optional, of the type that may prefix a
Perl statement. It is up to the caller to ensure that the dynamic parser
state (PL_parser et al) is correctly set to reflect the source of
the code to be parsed. If flags includes PARSE_OPTIONAL
then the
label is optional, otherwise it is mandatory.
The name of the label is returned in the form of a fresh scalar. If an optional label is absent, a null pointer is returned.
If an error occurs in parsing, which can only occur if the label is mandatory, a valid label is returned anyway. The error is reflected in the parser state, normally resulting in a single exception at the top level of parsing which covers all the compilation errors that occurred.
NOTE: this function is experimental and may change or be removed without notice.
- SV * parse_label(U32 flags)
Parse a Perl list expression. This may contain operators of precedence
down to the comma operator. The expression must be followed (and thus
terminated) either by a low-precedence logic operator such as or
or by
something that would normally terminate an expression such as semicolon.
If flags includes PARSE_OPTIONAL
then the expression is optional,
otherwise it is mandatory. It is up to the caller to ensure that the
dynamic parser state (PL_parser et al) is correctly set to reflect
the source of the code to be parsed and the lexical context for the
expression.
The op tree representing the expression is returned. If an optional expression is absent, a null pointer is returned, otherwise the pointer will be non-null.
If an error occurs in parsing or compilation, in most cases a valid op tree is returned anyway. The error is reflected in the parser state, normally resulting in a single exception at the top level of parsing which covers all the compilation errors that occurred. Some compilation errors, however, will throw an exception immediately.
NOTE: this function is experimental and may change or be removed without notice.
- OP * parse_listexpr(U32 flags)
Parse a sequence of zero or more Perl statements. These may be normal imperative statements, including optional labels, or declarations that have compile-time effect, or any mixture thereof. The statement sequence ends when a closing brace or end-of-file is encountered in a place where a new statement could have validly started. It is up to the caller to ensure that the dynamic parser state (PL_parser et al) is correctly set to reflect the source of the code to be parsed and the lexical context for the statements.
The op tree representing the statement sequence is returned. This may
be a null pointer if the statements were all null, for example if there
were no statements or if there were only subroutine definitions (which
have compile-time side effects). If not null, it will be a lineseq
list, normally including nextstate
or equivalent ops.
If an error occurs in parsing or compilation, in most cases a valid op tree is returned anyway. The error is reflected in the parser state, normally resulting in a single exception at the top level of parsing which covers all the compilation errors that occurred. Some compilation errors, however, will throw an exception immediately.
The flags parameter is reserved for future use, and must always be zero.
NOTE: this function is experimental and may change or be removed without notice.
- OP * parse_stmtseq(U32 flags)
Parse a Perl term expression. This may contain operators of precedence
down to the assignment operators. The expression must be followed (and thus
terminated) either by a comma or lower-precedence operator or by
something that would normally terminate an expression such as semicolon.
If flags includes PARSE_OPTIONAL
then the expression is optional,
otherwise it is mandatory. It is up to the caller to ensure that the
dynamic parser state (PL_parser et al) is correctly set to reflect
the source of the code to be parsed and the lexical context for the
expression.
The op tree representing the expression is returned. If an optional expression is absent, a null pointer is returned, otherwise the pointer will be non-null.
If an error occurs in parsing or compilation, in most cases a valid op tree is returned anyway. The error is reflected in the parser state, normally resulting in a single exception at the top level of parsing which covers all the compilation errors that occurred. Some compilation errors, however, will throw an exception immediately.
NOTE: this function is experimental and may change or be removed without notice.
- OP * parse_termexpr(U32 flags)
Pointer to a structure encapsulating the state of the parsing operation
currently in progress. The pointer can be locally changed to perform
a nested parse without interfering with the state of an outer parse.
Individual members of PL_parser
have their own documentation.
Direct pointer to the end of the chunk of text currently being lexed, the
end of the lexer buffer. This is equal to SvPVX(PL_parser->linestr)
+ SvCUR(PL_parser->linestr)
. A NUL character (zero octet) is
always located at the end of the buffer, and does not count as part of
the buffer's contents.
NOTE: this function is experimental and may change or be removed without notice.
Points to the current position of lexing inside the lexer buffer.
Characters around this point may be freely examined, within
the range delimited by SvPVX(PL_parser->linestr) and
PL_parser->bufend. The octets of the buffer may be intended to be
interpreted as either UTF-8 or Latin-1, as indicated by lex_bufutf8.
Lexing code (whether in the Perl core or not) moves this pointer past the characters that it consumes. It is also expected to perform some bookkeeping whenever a newline character is consumed. This movement can be more conveniently performed by the function lex_read_to, which handles newlines appropriately.
Interpretation of the buffer's octets can be abstracted out by using the slightly higher-level functions lex_peek_unichar and lex_read_unichar.
NOTE: this function is experimental and may change or be removed without notice.
Points to the start of the current line inside the lexer buffer. This is useful for indicating at which column an error occurred, and not much else. This must be updated by any lexing code that consumes a newline; the function lex_read_to handles this detail.
NOTE: this function is experimental and may change or be removed without notice.
Buffer scalar containing the chunk currently under consideration of the
text currently being lexed. This is always a plain string scalar (for
which SvPOK
is true). It is not intended to be used as a scalar by
normal scalar means; instead refer to the buffer directly by the pointer
variables described below.
The lexer maintains various char*
pointers to things in the
PL_parser->linestr
buffer. If PL_parser->linestr
is ever
reallocated, all of these pointers must be updated. Don't attempt to
do this manually, but rather use lex_grow_linestr if you need to
reallocate the buffer.
The content of the text chunk in the buffer is commonly exactly one
complete line of input, up to and including a newline terminator,
but there are situations where it is otherwise. The octets of the
buffer may be intended to be interpreted as either UTF-8 or Latin-1.
The function lex_bufutf8 tells you which. Do not use the SvUTF8
flag on this scalar, which may disagree with it.
For direct examination of the buffer, the variable PL_parser->bufend points to the end of the buffer. The current lexing position is pointed to by PL_parser->bufptr. Direct use of these pointers is usually preferable to examination of the scalar through normal scalar means.
NOTE: this function is experimental and may change or be removed without notice.
Clear something magical that the SV represents. See sv_magic
.
- int mg_clear(SV* sv)
Copies the magic from one SV to another. See sv_magic
.
- int mg_copy(SV *sv, SV *nsv, const char *key,
- I32 klen)
Finds the magic pointer for type matching the SV. See sv_magic
.
- MAGIC* mg_find(const SV* sv, int type)
Finds the magic pointer of type
with the given vtbl
for the SV
. See
sv_magicext
.
- MAGIC* mg_findext(const SV* sv, int type,
- const MGVTBL *vtbl)
Free any magic storage used by the SV. See sv_magic
.
- int mg_free(SV* sv)
Remove any magic of type how from the SV sv. See sv_magic.
- void mg_free_type(SV *sv, int how)
Do magic before a value is retrieved from the SV. The type of SV must
be >= SVt_PVMG. See sv_magic
.
- int mg_get(SV* sv)
This function is deprecated.
It reports on the SV's length in bytes, calling length magic if available, but does not set the UTF8 flag on the sv. It will fall back to 'get' magic if there is no 'length' magic, but with no indication as to whether it called 'get' magic. It assumes the sv is a PVMG or higher. Use sv_len() instead.
- U32 mg_length(SV* sv)
Turns on the magical status of an SV. See sv_magic
.
- void mg_magical(SV* sv)
Do magic after a value is assigned to the SV. See sv_magic
.
- int mg_set(SV* sv)
Invokes mg_get
on an SV if it has 'get' magic. For example, this
will call FETCH
on a tied variable. This macro evaluates its
argument more than once.
- void SvGETMAGIC(SV* sv)
Arranges for a mutual exclusion lock to be obtained on sv if a suitable module has been loaded.
- void SvLOCK(SV* sv)
Invokes mg_set
on an SV if it has 'set' magic. This is necessary
after modifying a scalar, in case it is a magical variable like $|
or a tied variable (it calls STORE
). This macro evaluates its
argument more than once.
- void SvSETMAGIC(SV* sv)
Like SvSetSV
, but does any set magic required afterwards.
- void SvSetMagicSV(SV* dsb, SV* ssv)
Like SvSetSV_nosteal
, but does any set magic required afterwards.
- void SvSetMagicSV_nosteal(SV* dsv, SV* ssv)
Calls sv_setsv
if dsv is not the same as ssv. May evaluate arguments
more than once.
- void SvSetSV(SV* dsb, SV* ssv)
Calls a non-destructive version of sv_setsv
if dsv is not the same as
ssv. May evaluate arguments more than once.
- void SvSetSV_nosteal(SV* dsv, SV* ssv)
Arranges for sv to be shared between threads if a suitable module has been loaded.
- void SvSHARE(SV* sv)
Releases a mutual exclusion lock on sv if a suitable module has been loaded.
- void SvUNLOCK(SV* sv)
The XSUB-writer's interface to the C memcpy
function. The src
is the
source, dest
is the destination, nitems
is the number of items, and
type
is the type. May fail on overlapping copies. See also Move
.
- void Copy(void* src, void* dest, int nitems, type)
Like Copy
but returns dest. Useful for encouraging compilers to tail-call
optimise.
- void * CopyD(void* src, void* dest, int nitems, type)
The XSUB-writer's interface to the C memmove
function. The src
is the
source, dest
is the destination, nitems
is the number of items, and
type
is the type. Can do overlapping moves. See also Copy
.
- void Move(void* src, void* dest, int nitems, type)
Like Move
but returns dest. Useful for encouraging compilers to tail-call
optimise.
- void * MoveD(void* src, void* dest, int nitems, type)
The XSUB-writer's interface to the C malloc
function.
In 5.9.3, Newx() and friends replace the older New() API, and drops the first parameter, x, a debug aid which allowed callers to identify themselves. This aid has been superseded by a new build option, PERL_MEM_LOG (see PERL_MEM_LOG in perlhacktips). The older API is still there for use in XS modules supporting older perls.
- void Newx(void* ptr, int nitems, type)
The XSUB-writer's interface to the C malloc
function, with
cast. See also Newx
.
- void Newxc(void* ptr, int nitems, type, cast)
The XSUB-writer's interface to the C malloc
function. The allocated
memory is zeroed with memzero
. See also Newx
.
- void Newxz(void* ptr, int nitems, type)
PoisonWith(0xEF) for catching access to freed memory.
- void Poison(void* dest, int nitems, type)
PoisonWith(0xEF) for catching access to freed memory.
- void PoisonFree(void* dest, int nitems, type)
PoisonWith(0xAB) for catching access to allocated but uninitialized memory.
- void PoisonNew(void* dest, int nitems, type)
Fill up memory with a byte pattern (a byte repeated over and over again) that hopefully catches attempts to access uninitialized memory.
- void PoisonWith(void* dest, int nitems, type,
- U8 byte)
The XSUB-writer's interface to the C realloc
function.
- void Renew(void* ptr, int nitems, type)
The XSUB-writer's interface to the C realloc
function, with
cast.
- void Renewc(void* ptr, int nitems, type, cast)
The XSUB-writer's interface to the C free
function.
- void Safefree(void* ptr)
Perl's version of strdup()
. Returns a pointer to a newly allocated
string which is a duplicate of pv
. The size of the string is
determined by strlen()
. The memory allocated for the new string can
be freed with the Safefree()
function.
- char* savepv(const char* pv)
Perl's version of what strndup()
would be if it existed. Returns a
pointer to a newly allocated string which is a duplicate of the first
len
bytes from pv
, plus a trailing NUL byte. The memory allocated for
the new string can be freed with the Safefree()
function.
- char* savepvn(const char* pv, I32 len)
Like savepvn
, but takes a literal string instead of a string/length pair.
- char* savepvs(const char* s)
A version of savepv()
which allocates the duplicate string in memory
which is shared between threads.
- char* savesharedpv(const char* pv)
A version of savepvn()
which allocates the duplicate string in memory
which is shared between threads. (With the specific difference that a NULL
pointer is not acceptable)
- char* savesharedpvn(const char *const pv,
- const STRLEN len)
A version of savepvs()
which allocates the duplicate string in memory
which is shared between threads.
- char* savesharedpvs(const char* s)
A version of savesharedpv()
which allocates the duplicate string in
memory which is shared between threads.
- char* savesharedsvpv(SV *sv)
A version of savepv()
/savepvn()
which gets the string to duplicate from
the passed in SV using SvPV()
- char* savesvpv(SV* sv)
This is an architecture-independent macro to copy one structure to another.
- void StructCopy(type *src, type *dest, type)
The XSUB-writer's interface to the C memzero
function. The dest
is the
destination, nitems
is the number of items, and type
is the type.
- void Zero(void* dest, int nitems, type)
Like Zero
but returns dest. Useful for encouraging compilers to tail-call
optimise.
- void * ZeroD(void* dest, int nitems, type)
Analyses the string in order to make fast searches on it using fbm_instr() -- the Boyer-Moore algorithm.
- void fbm_compile(SV* sv, U32 flags)
Returns the location of the SV in the string delimited by big
and
bigend
. It returns NULL
if the string can't be found. The sv
does not have to be fbm_compiled, but the search will not be as fast
then.
- char* fbm_instr(unsigned char* big,
- unsigned char* bigend, SV* littlestr,
- U32 flags)
Returns true if the leading len bytes of the strings s1 and s2 are the same case-insensitively; false otherwise. Uppercase and lowercase ASCII range bytes match themselves and their opposite case counterparts. Non-cased and non-ASCII range bytes match only themselves.
- I32 foldEQ(const char* a, const char* b, I32 len)
Returns true if the leading len bytes of the strings s1 and s2 are the same case-insensitively in the current locale; false otherwise.
- I32 foldEQ_locale(const char* a, const char* b,
- I32 len)
Takes a sprintf-style format pattern and conventional (non-SV) arguments and returns the formatted string.
- (char *) Perl_form(pTHX_ const char* pat, ...)
can be used any place a string (char *) is required:
- char * s = Perl_form("%d.%d",major,minor);
Uses a single private buffer so if you want to format several strings you must explicitly copy the earlier strings away (and free the copies when you are done).
- char* form(const char* pat, ...)
Fill the sv with current working directory
- int getcwd_sv(SV* sv)
Take a sprintf-style format pattern and argument list. These are used to generate a string message. If the message does not end with a newline, then it will be extended with some indication of the current location in the code, as described for mess_sv.
Normally, the resulting message is returned in a new mortal SV. During global destruction a single SV may be shared between uses of this function.
- SV * mess(const char *pat, ...)
Expands a message, intended for the user, to include an indication of the current location in the code, if the message does not already appear to be complete.
basemsg
is the initial message or object. If it is a reference, it
will be used as-is and will be the result of this function. Otherwise it
is used as a string, and if it already ends with a newline, it is taken
to be complete, and the result of this function will be the same string.
If the message does not end with a newline, then a segment such as at
foo.pl line 37
will be appended, and possibly other clauses indicating
the current state of execution. The resulting message will end with a
dot and a newline.
Normally, the resulting message is returned in a new mortal SV.
During global destruction a single SV may be shared between uses of this
function. If consume
is true, then the function is permitted (but not
required) to modify and return basemsg
instead of allocating a new SV.
- SV * mess_sv(SV *basemsg, bool consume)
The C library snprintf
functionality, if available and
standards-compliant (uses vsnprintf
, actually). However, if the
vsnprintf
is not available, will unfortunately use the unsafe
vsprintf
which can overrun the buffer (there is an overrun check,
but that may be too late). Consider using sv_vcatpvf
instead, or
getting vsnprintf
.
- int my_snprintf(char *buffer, const Size_t len,
- const char *format, ...)
The C library sprintf, wrapped if necessary, to ensure that it will return
the length of the string written to the buffer. Only rare pre-ANSI systems
need the wrapper function - usually this is a direct call to sprintf.
- int my_sprintf(char *buffer, const char *pat, ...)
The C library vsnprintf
if available and standards-compliant.
However, if if the vsnprintf
is not available, will unfortunately
use the unsafe vsprintf
which can overrun the buffer (there is an
overrun check, but that may be too late). Consider using
sv_vcatpvf
instead, or getting vsnprintf
.
- int my_vsnprintf(char *buffer, const Size_t len,
- const char *format, va_list ap)
Returns a new version object based on the passed in SV:
- SV *sv = new_version(SV *ver);
Does not alter the passed in ver SV. See "upg_version" if you want to upgrade the SV.
- SV* new_version(SV *ver)
Validate that a given string can be parsed as a version object, but doesn't actually perform the parsing. Can use either strict or lax validation rules. Can optionally set a number of hint variables to save the parsing code some time when tokenizing.
Returns the value of an ASCII-range hex digit and advances the string pointer. Behaviour is only well defined when isXDIGIT(*str) is true.
- U8 READ_XDIGIT(char str*)
Returns a pointer to the next character after the parsed version string, as well as upgrading the passed in SV to an RV.
Function must be called with an already existing SV like
- sv = newSV(0);
- s = scan_version(s, SV *sv, bool qv);
Performs some preprocessing to the string to ensure that it has the correct characteristics of a version. Flags the object if it contains an underscore (which denotes this is an alpha version). The boolean qv denotes that the version should be interpreted as if it had multiple decimals, even if it doesn't.
- const char* scan_version(const char *s, SV *rv, bool qv)
Test two strings to see if they are equal. Returns true or false.
- bool strEQ(char* s1, char* s2)
Test two strings to see if the first, s1
, is greater than or equal to
the second, s2
. Returns true or false.
- bool strGE(char* s1, char* s2)
Test two strings to see if the first, s1
, is greater than the second,
s2
. Returns true or false.
- bool strGT(char* s1, char* s2)
Test two strings to see if the first, s1
, is less than or equal to the
second, s2
. Returns true or false.
- bool strLE(char* s1, char* s2)
Test two strings to see if the first, s1
, is less than the second,
s2
. Returns true or false.
- bool strLT(char* s1, char* s2)
Test two strings to see if they are different. Returns true or false.
- bool strNE(char* s1, char* s2)
Test two strings to see if they are equal. The len
parameter indicates
the number of bytes to compare. Returns true or false. (A wrapper for
strncmp
).
- bool strnEQ(char* s1, char* s2, STRLEN len)
Test two strings to see if they are different. The len
parameter
indicates the number of bytes to compare. Returns true or false. (A
wrapper for strncmp
).
- bool strnNE(char* s1, char* s2, STRLEN len)
Dummy routine which reports that object can be destroyed when there is no sharing module present. It ignores its single SV argument, and returns 'true'. Exists to avoid test for a NULL function pointer and because it could potentially warn under some level of strict-ness.
- bool sv_destroyable(SV *sv)
Dummy routine which "shares" an SV when there is no sharing module present. Or "locks" it. Or "unlocks" it. In other words, ignores its single SV argument. Exists to avoid test for a NULL function pointer and because it could potentially warn under some level of strict-ness.
- void sv_nosharing(SV *sv)
In-place upgrade of the supplied SV to a version object.
- SV *sv = upg_version(SV *sv, bool qv);
Returns a pointer to the upgraded SV. Set the boolean qv if you want to force this SV to be interpreted as an "extended" version.
- SV* upg_version(SV *ver, bool qv)
Version object aware cmp. Both operands must already have been converted into version objects.
- int vcmp(SV *lhv, SV *rhv)
pat
and args
are a sprintf-style format pattern and encapsulated
argument list. These are used to generate a string message. If the
message does not end with a newline, then it will be extended with
some indication of the current location in the code, as described for
mess_sv.
Normally, the resulting message is returned in a new mortal SV. During global destruction a single SV may be shared between uses of this function.
- SV * vmess(const char *pat, va_list *args)
Accepts a version object and returns the normalized string representation. Call like:
- sv = vnormal(rv);
NOTE: you can pass either the object directly or the SV contained within the RV.
The SV returned has a refcount of 1.
- SV* vnormal(SV *vs)
Accepts a version object and returns the normalized floating point representation. Call like:
- sv = vnumify(rv);
NOTE: you can pass either the object directly or the SV contained within the RV.
The SV returned has a refcount of 1.
- SV* vnumify(SV *vs)
In order to maintain maximum compatibility with earlier versions of Perl, this function will return either the floating point notation or the multiple dotted notation, depending on whether the original version contained 1 or more dots, respectively.
The SV returned has a refcount of 1.
- SV* vstringify(SV *vs)
Validates that the SV contains valid internal structure for a version object. It may be passed either the version object (RV) or the hash itself (HV). If the structure is valid, it returns the HV. If the structure is invalid, it returns NULL.
- SV *hv = vverify(sv);
Note that it only confirms the bare minimum structure (so as not to get confused by derived classes which may contain additional hash entries):
- SV* vverify(SV *vs)
Returns the mro linearisation for the given stash. By default, this
will be whatever mro_get_linear_isa_dfs
returns unless some
other MRO is in effect for the stash. The return value is a
read-only AV*.
You are responsible for SvREFCNT_inc()
on the
return value if you plan to store it anywhere
semi-permanently (otherwise it might be deleted
out from under you the next time the cache is
invalidated).
- AV* mro_get_linear_isa(HV* stash)
Invalidates method caching on any child classes of the given stash, so that they might notice the changes in this one.
Ideally, all instances of PL_sub_generation++
in
perl source outside of mro.c should be
replaced by calls to this.
Perl automatically handles most of the common ways a method might be redefined. However, there are a few ways you could change a method in a stash without the cache code noticing, in which case you need to call this method afterwards:
1) Directly manipulating the stash HV entries from XS code.
2) Assigning a reference to a readonly scalar constant into a stash entry in order to create a constant subroutine (like constant.pm does).
This same method is available from pure perl
via, mro::method_changed_in(classname)
.
- void mro_method_changed_in(HV* stash)
Registers a custom mro plugin. See perlmroapi for details.
- void mro_register(const struct mro_alg *mro)
Declare local variables for a multicall. See LIGHTWEIGHT CALLBACKS in perlcall.
- dMULTICALL;
Make a lightweight callback. See LIGHTWEIGHT CALLBACKS in perlcall.
- MULTICALL;
Closing bracket for a lightweight callback. See LIGHTWEIGHT CALLBACKS in perlcall.
- POP_MULTICALL;
Opening bracket for a lightweight callback. See LIGHTWEIGHT CALLBACKS in perlcall.
- PUSH_MULTICALL;
converts a string representing a binary number to numeric form.
On entry start and *len give the string to scan, *flags gives
conversion flags, and result should be NULL or a pointer to an NV.
The scan stops at the end of the string, or the first invalid character.
Unless PERL_SCAN_SILENT_ILLDIGIT
is set in *flags, encountering an
invalid character will also trigger a warning.
On return *len is set to the length of the scanned string,
and *flags gives output flags.
If the value is <= UV_MAX
it is returned as a UV, the output flags are clear,
and nothing is written to *result. If the value is > UV_MAX grok_bin
returns UV_MAX, sets PERL_SCAN_GREATER_THAN_UV_MAX
in the output flags,
and writes the value to *result (or the value is discarded if result
is NULL).
The binary number may optionally be prefixed with "0b" or "b" unless
PERL_SCAN_DISALLOW_PREFIX
is set in *flags on entry. If
PERL_SCAN_ALLOW_UNDERSCORES
is set in *flags then the binary
number may use '_' characters to separate digits.
- UV grok_bin(const char* start, STRLEN* len_p,
- I32* flags, NV *result)
converts a string representing a hex number to numeric form.
On entry start and *len give the string to scan, *flags gives
conversion flags, and result should be NULL or a pointer to an NV.
The scan stops at the end of the string, or the first invalid character.
Unless PERL_SCAN_SILENT_ILLDIGIT
is set in *flags, encountering an
invalid character will also trigger a warning.
On return *len is set to the length of the scanned string,
and *flags gives output flags.
If the value is <= UV_MAX it is returned as a UV, the output flags are clear,
and nothing is written to *result. If the value is > UV_MAX grok_hex
returns UV_MAX, sets PERL_SCAN_GREATER_THAN_UV_MAX
in the output flags,
and writes the value to *result (or the value is discarded if result
is NULL).
The hex number may optionally be prefixed with "0x" or "x" unless
PERL_SCAN_DISALLOW_PREFIX
is set in *flags on entry. If
PERL_SCAN_ALLOW_UNDERSCORES
is set in *flags then the hex
number may use '_' characters to separate digits.
- UV grok_hex(const char* start, STRLEN* len_p,
- I32* flags, NV *result)
Recognise (or not) a number. The type of the number is returned (0 if unrecognised), otherwise it is a bit-ORed combination of IS_NUMBER_IN_UV, IS_NUMBER_GREATER_THAN_UV_MAX, IS_NUMBER_NOT_INT, IS_NUMBER_NEG, IS_NUMBER_INFINITY, IS_NUMBER_NAN (defined in perl.h).
If the value of the number can fit an in UV, it is returned in the *valuep IS_NUMBER_IN_UV will be set to indicate that *valuep is valid, IS_NUMBER_IN_UV will never be set unless *valuep is valid, but *valuep may have been assigned to during processing even though IS_NUMBER_IN_UV is not set on return. If valuep is NULL, IS_NUMBER_IN_UV will be set for the same cases as when valuep is non-NULL, but no actual assignment (or SEGV) will occur.
IS_NUMBER_NOT_INT will be set with IS_NUMBER_IN_UV if trailing decimals were seen (in which case *valuep gives the true value truncated to an integer), and IS_NUMBER_NEG if the number is negative (in which case *valuep holds the absolute value). IS_NUMBER_IN_UV is not set if e notation was used or the number is larger than a UV.
- int grok_number(const char *pv, STRLEN len,
- UV *valuep)
Scan and skip for a numeric decimal separator (radix).
- bool grok_numeric_radix(const char **sp,
- const char *send)
converts a string representing an octal number to numeric form.
On entry start and *len give the string to scan, *flags gives
conversion flags, and result should be NULL or a pointer to an NV.
The scan stops at the end of the string, or the first invalid character.
Unless PERL_SCAN_SILENT_ILLDIGIT
is set in *flags, encountering an
8 or 9 will also trigger a warning.
On return *len is set to the length of the scanned string,
and *flags gives output flags.
If the value is <= UV_MAX it is returned as a UV, the output flags are clear,
and nothing is written to *result. If the value is > UV_MAX grok_oct
returns UV_MAX, sets PERL_SCAN_GREATER_THAN_UV_MAX
in the output flags,
and writes the value to *result (or the value is discarded if result
is NULL).
If PERL_SCAN_ALLOW_UNDERSCORES
is set in *flags then the octal
number may use '_' characters to separate digits.
- UV grok_oct(const char* start, STRLEN* len_p,
- I32* flags, NV *result)
Return a non-zero integer if the sign bit on an NV is set, and 0 if it is not.
If Configure detects this system has a signbit() that will work with our NVs, then we just use it via the #define in perl.h. Otherwise, fall back on this implementation. As a first pass, this gets everything right except -0.0. Alas, catching -0.0 is the main use for this function, so this is not too helpful yet. Still, at least we have the scaffolding in place to support other systems, should that prove useful.
Configure notes: This function is called 'Perl_signbit' instead of a plain 'signbit' because it is easy to imagine a system having a signbit() function or macro that doesn't happen to work with our particular choice of NVs. We shouldn't just re-#define signbit as Perl_signbit and expect the standard system headers to be happy. Also, this is a no-context function (no pTHX_) because Perl_signbit() is usually re-#defined in perl.h as a simple macro call to the system's signbit(). Users should just always call Perl_signbit().
NOTE: this function is experimental and may change or be removed without notice.
- int Perl_signbit(NV f)
For backwards compatibility. Use grok_bin
instead.
- NV scan_bin(const char* start, STRLEN len,
- STRLEN* retlen)
For backwards compatibility. Use grok_hex
instead.
- NV scan_hex(const char* start, STRLEN len,
- STRLEN* retlen)
For backwards compatibility. Use grok_oct
instead.
- NV scan_oct(const char* start, STRLEN len,
- STRLEN* retlen)
Constructs, checks, and returns an assignment op. left and right supply the parameters of the assignment; they are consumed by this function and become part of the constructed op tree.
If optype is OP_ANDASSIGN
, OP_ORASSIGN
, or OP_DORASSIGN
, then
a suitable conditional optree is constructed. If optype is the opcode
of a binary operator, such as OP_BIT_OR
, then an op is constructed that
performs the binary operation and assigns the result to the left argument.
Either way, if optype is non-zero then flags has no effect.
If optype is zero, then a plain scalar or list assignment is
constructed. Which type of assignment it is is automatically determined.
flags gives the eight bits of op_flags
, except that OPf_KIDS
will be set automatically, and, shifted up eight bits, the eight bits
of op_private
, except that the bit with value 1 or 2 is automatically
set as required.
- OP * newASSIGNOP(I32 flags, OP *left, I32 optype,
- OP *right)
Constructs, checks, and returns an op of any binary type. type
is the opcode. flags gives the eight bits of op_flags
, except
that OPf_KIDS
will be set automatically, and, shifted up eight bits,
the eight bits of op_private
, except that the bit with value 1 or
2 is automatically set as required. first and last supply up to
two ops to be the direct children of the binary op; they are consumed
by this function and become part of the constructed op tree.
- OP * newBINOP(I32 type, I32 flags, OP *first,
- OP *last)
Constructs, checks, and returns a conditional-expression (cond_expr
)
op. flags gives the eight bits of op_flags
, except that OPf_KIDS
will be set automatically, and, shifted up eight bits, the eight bits of
op_private
, except that the bit with value 1 is automatically set.
first supplies the expression selecting between the two branches,
and trueop and falseop supply the branches; they are consumed by
this function and become part of the constructed op tree.
- OP * newCONDOP(I32 flags, OP *first, OP *trueop,
- OP *falseop)
Constructs, checks, and returns an op tree expressing a foreach
loop (iteration through a list of values). This is a heavyweight loop,
with structure that allows exiting the loop by last and suchlike.
sv optionally supplies the variable that will be aliased to each
item in turn; if null, it defaults to $_
(either lexical or global).
expr supplies the list of values to iterate over. block supplies
the main body of the loop, and cont optionally supplies a continue
block that operates as a second half of the body. All of these optree
inputs are consumed by this function and become part of the constructed
op tree.
flags gives the eight bits of op_flags
for the leaveloop
op and, shifted up eight bits, the eight bits of op_private
for
the leaveloop
op, except that (in both cases) some bits will be set
automatically.
- OP * newFOROP(I32 flags, OP *sv, OP *expr, OP *block,
- OP *cont)
Constructs, checks, and returns an op tree expressing a given
block.
cond supplies the expression that will be locally assigned to a lexical
variable, and block supplies the body of the given
construct; they
are consumed by this function and become part of the constructed op tree.
defsv_off is the pad offset of the scalar lexical variable that will
be affected. If it is 0, the global $_ will be used.
- OP * newGIVENOP(OP *cond, OP *block,
- PADOFFSET defsv_off)
Constructs, checks, and returns an op of any type that involves an
embedded reference to a GV. type is the opcode. flags gives the
eight bits of op_flags
. gv identifies the GV that the op should
reference; calling this function does not transfer ownership of any
reference to it.
- OP * newGVOP(I32 type, I32 flags, GV *gv)
Constructs, checks, and returns an op of any list type. type is
the opcode. flags gives the eight bits of op_flags
, except that
OPf_KIDS
will be set automatically if required. first and last
supply up to two ops to be direct children of the list op; they are
consumed by this function and become part of the constructed op tree.
- OP * newLISTOP(I32 type, I32 flags, OP *first,
- OP *last)
Constructs, checks, and returns a logical (flow control) op. type
is the opcode. flags gives the eight bits of op_flags
, except
that OPf_KIDS
will be set automatically, and, shifted up eight bits,
the eight bits of op_private
, except that the bit with value 1 is
automatically set. first supplies the expression controlling the
flow, and other supplies the side (alternate) chain of ops; they are
consumed by this function and become part of the constructed op tree.
- OP * newLOGOP(I32 type, I32 flags, OP *first,
- OP *other)
Constructs, checks, and returns a loop-exiting op (such as goto
or last). type is the opcode. label supplies the parameter
determining the target of the op; it is consumed by this function and
becomes part of the constructed op tree.
- OP * newLOOPEX(I32 type, OP *label)
Constructs, checks, and returns an op tree expressing a loop. This is
only a loop in the control flow through the op tree; it does not have
the heavyweight loop structure that allows exiting the loop by last
and suchlike. flags gives the eight bits of op_flags
for the
top-level op, except that some bits will be set automatically as required.
expr supplies the expression controlling loop iteration, and block
supplies the body of the loop; they are consumed by this function and
become part of the constructed op tree. debuggable is currently
unused and should always be 1.
- OP * newLOOPOP(I32 flags, I32 debuggable, OP *expr,
- OP *block)
Constructs, checks, and returns a new stub
op, which represents an
empty list expression.
- OP * newNULLLIST()
Constructs, checks, and returns an op of any base type (any type that
has no extra fields). type is the opcode. flags gives the
eight bits of op_flags
, and, shifted up eight bits, the eight bits
of op_private
.
- OP * newOP(I32 type, I32 flags)
Constructs, checks, and returns an op of any type that involves a
reference to a pad element. type is the opcode. flags gives the
eight bits of op_flags
. A pad slot is automatically allocated, and
is populated with sv; this function takes ownership of one reference
to it.
This function only exists if Perl has been compiled to use ithreads.
- OP * newPADOP(I32 type, I32 flags, SV *sv)
Constructs, checks, and returns an op of any pattern matching type.
type is the opcode. flags gives the eight bits of op_flags
and, shifted up eight bits, the eight bits of op_private
.
- OP * newPMOP(I32 type, I32 flags)
Constructs, checks, and returns an op of any type that involves an
embedded C-level pointer (PV). type is the opcode. flags gives
the eight bits of op_flags
. pv supplies the C-level pointer, which
must have been allocated using PerlMemShared_malloc; the memory will
be freed when the op is destroyed.
- OP * newPVOP(I32 type, I32 flags, char *pv)
Constructs and returns a range
op, with subordinate flip
and
flop
ops. flags gives the eight bits of op_flags
for the
flip
op and, shifted up eight bits, the eight bits of op_private
for both the flip
and range
ops, except that the bit with value
1 is automatically set. left and right supply the expressions
controlling the endpoints of the range; they are consumed by this function
and become part of the constructed op tree.
- OP * newRANGE(I32 flags, OP *left, OP *right)
Constructs, checks, and returns an lslice
(list slice) op. flags
gives the eight bits of op_flags
, except that OPf_KIDS
will
be set automatically, and, shifted up eight bits, the eight bits of
op_private
, except that the bit with value 1 or 2 is automatically
set as required. listval and subscript supply the parameters of
the slice; they are consumed by this function and become part of the
constructed op tree.
- OP * newSLICEOP(I32 flags, OP *subscript,
- OP *listval)
Constructs a state op (COP). The state op is normally a nextstate
op,
but will be a dbstate
op if debugging is enabled for currently-compiled
code. The state op is populated from PL_curcop (or PL_compiling).
If label is non-null, it supplies the name of a label to attach to
the state op; this function takes ownership of the memory pointed at by
label, and will free it. flags gives the eight bits of op_flags
for the state op.
If o is null, the state op is returned. Otherwise the state op is
combined with o into a lineseq
list op, which is returned. o
is consumed by this function and becomes part of the returned op tree.
- OP * newSTATEOP(I32 flags, char *label, OP *o)
Constructs, checks, and returns an op of any type that involves an
embedded SV. type is the opcode. flags gives the eight bits
of op_flags
. sv gives the SV to embed in the op; this function
takes ownership of one reference to it.
- OP * newSVOP(I32 type, I32 flags, SV *sv)
Constructs, checks, and returns an op of any unary type. type is
the opcode. flags gives the eight bits of op_flags
, except that
OPf_KIDS
will be set automatically if required, and, shifted up eight
bits, the eight bits of op_private
, except that the bit with value 1
is automatically set. first supplies an optional op to be the direct
child of the unary op; it is consumed by this function and become part
of the constructed op tree.
- OP * newUNOP(I32 type, I32 flags, OP *first)
Constructs, checks, and returns an op tree expressing a when
block.
cond supplies the test expression, and block supplies the block
that will be executed if the test evaluates to true; they are consumed
by this function and become part of the constructed op tree. cond
will be interpreted DWIMically, often as a comparison against $_
,
and may be null to generate a default
block.
- OP * newWHENOP(OP *cond, OP *block)
Constructs, checks, and returns an op tree expressing a while
loop.
This is a heavyweight loop, with structure that allows exiting the loop
by last and suchlike.
loop is an optional preconstructed enterloop
op to use in the
loop; if it is null then a suitable op will be constructed automatically.
expr supplies the loop's controlling expression. block supplies the
main body of the loop, and cont optionally supplies a continue block
that operates as a second half of the body. All of these optree inputs
are consumed by this function and become part of the constructed op tree.
flags gives the eight bits of op_flags
for the leaveloop
op and, shifted up eight bits, the eight bits of op_private
for
the leaveloop
op, except that (in both cases) some bits will be set
automatically. debuggable is currently unused and should always be 1.
has_my can be supplied as true to force the
loop body to be enclosed in its own scope.
- OP * newWHILEOP(I32 flags, I32 debuggable,
- LOOP *loop, OP *expr, OP *block,
- OP *cont, I32 has_my)
Performs the default fixup of the arguments part of an entersub
op tree. This consists of applying list context to each of the
argument ops. This is the standard treatment used on a call marked
with &
, or a method call, or a call through a subroutine reference,
or any other call where the callee can't be identified at compile time,
or a call where the callee has no prototype.
- OP * ck_entersub_args_list(OP *entersubop)
Performs the fixup of the arguments part of an entersub
op tree
based on a subroutine prototype. This makes various modifications to
the argument ops, from applying context up to inserting refgen
ops,
and checking the number and syntactic types of arguments, as directed by
the prototype. This is the standard treatment used on a subroutine call,
not marked with &
, where the callee can be identified at compile time
and has a prototype.
protosv supplies the subroutine prototype to be applied to the call.
It may be a normal defined scalar, of which the string value will be used.
Alternatively, for convenience, it may be a subroutine object (a CV*
that has been cast to SV*
) which has a prototype. The prototype
supplied, in whichever form, does not need to match the actual callee
referenced by the op tree.
If the argument ops disagree with the prototype, for example by having an unacceptable number of arguments, a valid op tree is returned anyway. The error is reflected in the parser state, normally resulting in a single exception at the top level of parsing which covers all the compilation errors that occurred. In the error message, the callee is referred to by the name defined by the namegv parameter.
- OP * ck_entersub_args_proto(OP *entersubop,
- GV *namegv, SV *protosv)
Performs the fixup of the arguments part of an entersub
op tree either
based on a subroutine prototype or using default list-context processing.
This is the standard treatment used on a subroutine call, not marked
with &
, where the callee can be identified at compile time.
protosv supplies the subroutine prototype to be applied to the call,
or indicates that there is no prototype. It may be a normal scalar,
in which case if it is defined then the string value will be used
as a prototype, and if it is undefined then there is no prototype.
Alternatively, for convenience, it may be a subroutine object (a CV*
that has been cast to SV*
), of which the prototype will be used if it
has one. The prototype (or lack thereof) supplied, in whichever form,
does not need to match the actual callee referenced by the op tree.
If the argument ops disagree with the prototype, for example by having an unacceptable number of arguments, a valid op tree is returned anyway. The error is reflected in the parser state, normally resulting in a single exception at the top level of parsing which covers all the compilation errors that occurred. In the error message, the callee is referred to by the name defined by the namegv parameter.
- OP * ck_entersub_args_proto_or_list(OP *entersubop,
- GV *namegv,
- SV *protosv)
If cv
is a constant sub eligible for inlining. returns the constant
value returned by the sub. Otherwise, returns NULL.
Constant subs can be created with newCONSTSUB
or as described in
Constant Functions in perlsub.
- SV* cv_const_sv(const CV *const cv)
Retrieves the function that will be used to fix up a call to cv.
Specifically, the function is applied to an entersub
op tree for a
subroutine call, not marked with &
, where the callee can be identified
at compile time as cv.
The C-level function pointer is returned in *ckfun_p, and an SV argument for it is returned in *ckobj_p. The function is intended to be called in this manner:
- entersubop = (*ckfun_p)(aTHX_ entersubop, namegv, (*ckobj_p));
In this call, entersubop is a pointer to the entersub
op,
which may be replaced by the check function, and namegv is a GV
supplying the name that should be used by the check function to refer
to the callee of the entersub
op if it needs to emit any diagnostics.
It is permitted to apply the check function in non-standard situations,
such as to a call to a different subroutine or to a method call.
By default, the function is Perl_ck_entersub_args_proto_or_list, and the SV parameter is cv itself. This implements standard prototype processing. It can be changed, for a particular subroutine, by cv_set_call_checker.
- void cv_get_call_checker(CV *cv,
- Perl_call_checker *ckfun_p,
- SV **ckobj_p)
Sets the function that will be used to fix up a call to cv.
Specifically, the function is applied to an entersub
op tree for a
subroutine call, not marked with &
, where the callee can be identified
at compile time as cv.
The C-level function pointer is supplied in ckfun, and an SV argument for it is supplied in ckobj. The function is intended to be called in this manner:
- entersubop = ckfun(aTHX_ entersubop, namegv, ckobj);
In this call, entersubop is a pointer to the entersub
op,
which may be replaced by the check function, and namegv is a GV
supplying the name that should be used by the check function to refer
to the callee of the entersub
op if it needs to emit any diagnostics.
It is permitted to apply the check function in non-standard situations,
such as to a call to a different subroutine or to a method call.
The current setting for a particular CV can be retrieved by cv_get_call_checker.
- void cv_set_call_checker(CV *cv,
- Perl_call_checker ckfun,
- SV *ckobj)
Given the root of an optree, link the tree in execution order using the
op_next
pointers and return the first op executed. If this has
already been done, it will not be redone, and o->op_next
will be
returned. If o->op_next
is not already set, o should be at
least an UNOP
.
- OP* LINKLIST(OP *o)
See newCONSTSUB_flags.
- CV* newCONSTSUB(HV* stash, const char* name, SV* sv)
Creates a constant sub equivalent to Perl sub FOO () { 123 }
which is
eligible for inlining at compile-time.
Currently, the only useful value for flags
is SVf_UTF8.
The newly created subroutine takes ownership of a reference to the passed in SV.
Passing NULL for SV creates a constant sub equivalent to sub BAR () {}
,
which won't be called if used as a destructor, but will suppress the overhead
of a call to AUTOLOAD
. (This form, however, isn't eligible for inlining at
compile time.)
- CV* newCONSTSUB_flags(HV* stash, const char* name,
- STRLEN len, U32 flags, SV* sv)
Used by xsubpp
to hook up XSUBs as Perl subs. filename needs to be
static storage, as it is used directly as CvFILE(), without a copy being made.
Append an item to the list of ops contained directly within a list-type op, returning the lengthened list. first is the list-type op, and last is the op to append to the list. optype specifies the intended opcode for the list. If first is not already a list of the right type, it will be upgraded into one. If either first or last is null, the other is returned unchanged.
- OP * op_append_elem(I32 optype, OP *first, OP *last)
Concatenate the lists of ops contained directly within two list-type ops, returning the combined list. first and last are the list-type ops to concatenate. optype specifies the intended opcode for the list. If either first or last is not already a list of the right type, it will be upgraded into one. If either first or last is null, the other is returned unchanged.
- OP * op_append_list(I32 optype, OP *first, OP *last)
Return the class of the provided OP: that is, which of the *OP structures it uses. For core ops this currently gets the information out of PL_opargs, which does not always accurately reflect the type used. For custom ops the type is returned from the registration, and it is up to the registree to ensure it is accurate. The value returned will be one of the OA_* constants from op.h.
- U32 OP_CLASS(OP *o)
Return a short description of the provided OP.
- const char * OP_DESC(OP *o)
This function is the implementation of the LINKLIST macro. It should not be called directly.
- OP* op_linklist(OP *o)
Propagate lvalue ("modifiable") context to an op and its children.
type represents the context type, roughly based on the type of op that
would do the modifying, although local() is represented by OP_NULL,
because it has no op type of its own (it is signalled by a flag on
the lvalue op).
This function detects things that can't be modified, such as $x+1
, and
generates errors for them. For example, $x+1 = 2
would cause it to be
called with an op of type OP_ADD and a type
argument of OP_SASSIGN.
It also flags things that need to behave specially in an lvalue context,
such as $$x = 5
which might have to vivify a reference in $x
.
NOTE: this function is experimental and may change or be removed without notice.
- OP * op_lvalue(OP *o, I32 type)
Return the name of the provided OP. For core ops this looks up the name from the op_type; for custom ops from the op_ppaddr.
- const char * OP_NAME(OP *o)
Prepend an item to the list of ops contained directly within a list-type op, returning the lengthened list. first is the op to prepend to the list, and last is the list-type op. optype specifies the intended opcode for the list. If last is not already a list of the right type, it will be upgraded into one. If either first or last is null, the other is returned unchanged.
- OP * op_prepend_elem(I32 optype, OP *first, OP *last)
Wraps up an op tree with some additional ops so that at runtime a dynamic
scope will be created. The original ops run in the new dynamic scope,
and then, provided that they exit normally, the scope will be unwound.
The additional ops used to create and unwind the dynamic scope will
normally be an enter
/leave
pair, but a scope
op may be used
instead if the ops are simple enough to not need the full dynamic scope
structure.
NOTE: this function is experimental and may change or be removed without notice.
- OP * op_scope(OP *o)
Examines an op, which is expected to identify a subroutine at runtime,
and attempts to determine at compile time which subroutine it identifies.
This is normally used during Perl compilation to determine whether
a prototype can be applied to a function call. cvop is the op
being considered, normally an rv2cv
op. A pointer to the identified
subroutine is returned, if it could be determined statically, and a null
pointer is returned if it was not possible to determine statically.
Currently, the subroutine can be identified statically if the RV that the
rv2cv
is to operate on is provided by a suitable gv
or const
op.
A gv
op is suitable if the GV's CV slot is populated. A const
op is
suitable if the constant value must be an RV pointing to a CV. Details of
this process may change in future versions of Perl. If the rv2cv
op
has the OPpENTERSUB_AMPER
flag set then no attempt is made to identify
the subroutine statically: this flag is used to suppress compile-time
magic on a subroutine call, forcing it to use default runtime behaviour.
If flags has the bit RV2CVOPCV_MARK_EARLY
set, then the handling
of a GV reference is modified. If a GV was examined and its CV slot was
found to be empty, then the gv
op has the OPpEARLY_CV
flag set.
If the op is not optimised away, and the CV slot is later populated with
a subroutine having a prototype, that flag eventually triggers the warning
"called too early to check prototype".
If flags has the bit RV2CVOPCV_RETURN_NAME_GV
set, then instead
of returning a pointer to the subroutine it returns a pointer to the
GV giving the most appropriate name for the subroutine in this context.
Normally this is just the CvGV
of the subroutine, but for an anonymous
(CvANON
) subroutine that is referenced through a GV it will be the
referencing GV. The resulting GV*
is cast to CV*
to be returned.
A null pointer is returned as usual if there is no statically-determinable
subroutine.
- CV * rv2cv_op_cv(OP *cvop, U32 flags)
CV's can have CvPADLIST(cv) set to point to a PADLIST. This is the CV's scratchpad, which stores lexical variables and opcode temporary and per-thread values.
For these purposes "formats" are a kind-of CV; eval""s are too (except they're not callable at will and are always thrown away after the eval"" is done executing). Require'd files are simply evals without any outer lexical scope.
XSUBs don't have CvPADLIST set - dXSTARG fetches values from PL_curpad, but that is really the callers pad (a slot of which is allocated by every entersub).
The PADLIST has a C array where pads are stored.
The 0th entry of the PADLIST is a PADNAMELIST (which is actually just an AV, but that may change) which represents the "names" or rather the "static type information" for lexicals. The individual elements of a PADNAMELIST are PADNAMEs (just SVs; but, again, that may change). Future refactorings might stop the PADNAMELIST from being stored in the PADLIST's array, so don't rely on it. See PadlistNAMES.
The CvDEPTH'th entry of a PADLIST is a PAD (an AV) which is the stack frame at that depth of recursion into the CV. The 0th slot of a frame AV is an AV which is @_. Other entries are storage for variables and op targets.
Iterating over the PADNAMELIST iterates over all possible pad items. Pad slots that are SVs_PADTMP (targets/GVs/constants) end up having &PL_sv_undef "names" (see pad_alloc()).
Only my/our variable (SvPADMY/PADNAME_isOUR) slots get valid names. The rest are op targets/GVs/constants which are statically allocated or resolved at compile time. These don't have names by which they can be looked up from Perl code at run time through eval"" the way my/our variables can be. Since they can't be looked up by "name" but only by their index allocated at compile time (which is usually in PL_op->op_targ), wasting a name SV for them doesn't make sense.
The SVs in the names AV have their PV being the name of the variable. xlow+1..xhigh inclusive in the NV union is a range of cop_seq numbers for which the name is valid (accessed through the macros COP_SEQ_RANGE_LOW and _HIGH). During compilation, these fields may hold the special value PERL_PADSEQ_INTRO to indicate various stages:
- COP_SEQ_RANGE_LOW _HIGH
- ----------------- -----
- PERL_PADSEQ_INTRO 0 variable not yet introduced: { my ($x
- valid-seq# PERL_PADSEQ_INTRO variable in scope: { my ($x)
- valid-seq# valid-seq# compilation of scope complete: { my ($x) }
For typed lexicals name SV is SVt_PVMG and SvSTASH
points at the type. For our lexicals, the type is also SVt_PVMG, with the
SvOURSTASH slot pointing at the stash of the associated global (so that
duplicate our declarations in the same package can be detected). SvUVX is
sometimes hijacked to store the generation number during compilation.
If PADNAME_OUTER (SvFAKE) is set on the name SV, then that slot in the frame AV is a REFCNT'ed reference to a lexical from "outside". In this case, the name SV does not use xlow and xhigh to store a cop_seq range, since it is in scope throughout. Instead xhigh stores some flags containing info about the real lexical (is it declared in an anon, and is it capable of being instantiated multiple times?), and for fake ANONs, xlow contains the index within the parent's pad where the lexical's value is stored, to make cloning quicker.
If the 'name' is '&' the corresponding entry in the PAD
is a CV representing a possible closure.
(PADNAME_OUTER and name of '&' is not a
meaningful combination currently but could
become so if my sub foo {}
is implemented.)
Note that formats are treated as anon subs, and are cloned each time write is called (if necessary).
The flag SVs_PADSTALE is cleared on lexicals each time the my() is executed, and set on scope exit. This allows the 'Variable $x is not available' warning to be generated in evals, such as
For state vars, SVs_PADSTALE is overloaded to mean 'not yet initialised'.
NOTE: this function is experimental and may change or be removed without notice.
- PADLIST * CvPADLIST(CV *cv)
The C array of pad entries.
NOTE: this function is experimental and may change or be removed without notice.
- SV ** PadARRAY(PAD pad)
The C array of a padlist, containing the pads. Only subscript it with numbers >= 1, as the 0th entry is not guaranteed to remain usable.
NOTE: this function is experimental and may change or be removed without notice.
- PAD ** PadlistARRAY(PADLIST padlist)
The index of the last pad in the padlist.
NOTE: this function is experimental and may change or be removed without notice.
- SSize_t PadlistMAX(PADLIST padlist)
The names associated with pad entries.
NOTE: this function is experimental and may change or be removed without notice.
- PADNAMELIST * PadlistNAMES(PADLIST padlist)
The C array of pad names.
NOTE: this function is experimental and may change or be removed without notice.
- PADNAME ** PadlistNAMESARRAY(PADLIST padlist)
The index of the last pad name.
NOTE: this function is experimental and may change or be removed without notice.
- SSize_t PadlistNAMESMAX(PADLIST padlist)
The reference count of the padlist. Currently this is always 1.
NOTE: this function is experimental and may change or be removed without notice.
- U32 PadlistREFCNT(PADLIST padlist)
The index of the last pad entry.
NOTE: this function is experimental and may change or be removed without notice.
- SSize_t PadMAX(PAD pad)
The length of the name.
NOTE: this function is experimental and may change or be removed without notice.
- STRLEN PadnameLEN(PADNAME pn)
The C array of pad names.
NOTE: this function is experimental and may change or be removed without notice.
- PADNAME ** PadnamelistARRAY(PADNAMELIST pnl)
The index of the last pad name.
NOTE: this function is experimental and may change or be removed without notice.
- SSize_t PadnamelistMAX(PADNAMELIST pnl)
The name stored in the pad name struct. This returns NULL for a target or GV slot.
NOTE: this function is experimental and may change or be removed without notice.
- char * PadnamePV(PADNAME pn)
Returns the pad name as an SV. This is currently just pn
. It will
begin returning a new mortal SV if pad names ever stop being SVs.
NOTE: this function is experimental and may change or be removed without notice.
- SV * PadnameSV(PADNAME pn)
Whether PadnamePV is in UTF8.
NOTE: this function is experimental and may change or be removed without notice.
- bool PadnameUTF8(PADNAME pn)
Exactly like pad_add_name_pvn, but takes a literal string instead of a string/length pair.
- PADOFFSET pad_add_name_pvs(const char *name, U32 flags,
- HV *typestash, HV *ourstash)
Exactly like pad_findmy_pvn, but takes a literal string instead of a string/length pair.
- PADOFFSET pad_findmy_pvs(const char *name, U32 flags)
Create a new padlist, updating the global variables for the currently-compiling padlist to point to the new padlist. The following flags can be OR'ed together:
During compilation, this points to the array containing the values part of the pad for the currently-compiling code. (At runtime a CV may have many such value arrays; at compile time just one is constructed.) At runtime, this points to the array containing the currently-relevant values for the pad for the currently-executing code.
NOTE: this function is experimental and may change or be removed without notice.
During compilation, this points to the array containing the names part of the pad for the currently-compiling code.
NOTE: this function is experimental and may change or be removed without notice.
Points directly to the body of the PL_comppad array.
(I.e., this is PAD_ARRAY(PL_comppad)
.)
NOTE: this function is experimental and may change or be removed without notice.
PL_modglobal
is a general purpose, interpreter global HV for use by
extensions that need to keep information on a per-interpreter basis.
In a pinch, it can also be used as a symbol table for extensions
to share data among each other. It is a good idea to use keys
prefixed by the package name of the extension that owns the data.
- HV* PL_modglobal
A convenience variable which is typically used with SvPV
when one
doesn't care about the length of the string. It is usually more efficient
to either declare a local variable and use that instead or to use the
SvPV_nolen
macro.
- STRLEN PL_na
When non-NULL
, the function pointed by this variable will be called each time an OP is freed with the corresponding OP as the argument.
This allows extensions to free any extra attribute they have locally attached to an OP.
It is also assured to first fire for the parent OP and then for its kids.
When you replace this variable, it is considered a good practice to store the possibly previously installed hook and that you recall it inside your own.
- Perl_ophook_t PL_opfreehook
Pointer to the per-subroutine peephole optimiser. This is a function that gets called at the end of compilation of a Perl subroutine (or equivalently independent piece of Perl code) to perform fixups of some ops and to perform small-scale optimisations. The function is called once for each subroutine that is compiled, and is passed, as sole parameter, a pointer to the op that is the entry point to the subroutine. It modifies the op tree in place.
The peephole optimiser should never be completely replaced. Rather, add code to it by wrapping the existing optimiser. The basic way to do this can be seen in Compile pass 3: peephole optimization in perlguts. If the new code wishes to operate on ops throughout the subroutine's structure, rather than just at the top level, it is likely to be more convenient to wrap the PL_rpeepp hook.
- peep_t PL_peepp
Pointer to the recursive peephole optimiser. This is a function
that gets called at the end of compilation of a Perl subroutine (or
equivalently independent piece of Perl code) to perform fixups of some
ops and to perform small-scale optimisations. The function is called
once for each chain of ops linked through their op_next
fields;
it is recursively called to handle each side chain. It is passed, as
sole parameter, a pointer to the op that is at the head of the chain.
It modifies the op tree in place.
The peephole optimiser should never be completely replaced. Rather, add code to it by wrapping the existing optimiser. The basic way to do this can be seen in Compile pass 3: peephole optimization in perlguts. If the new code wishes to operate only on ops at a subroutine's top level, rather than throughout the structure, it is likely to be more convenient to wrap the PL_peepp hook.
- peep_t PL_rpeepp
This is the false
SV. See PL_sv_yes
. Always refer to this as
&PL_sv_no
.
- SV PL_sv_no
This is the undef SV. Always refer to this as &PL_sv_undef
.
- SV PL_sv_undef
This is the true
SV. See PL_sv_no
. Always refer to this as
&PL_sv_yes
.
- SV PL_sv_yes
Convenience macro to get the REGEXP from a SV. This is approximately equivalent to the following snippet:
- if (SvMAGICAL(sv))
- mg_get(sv);
- if (SvROK(sv))
- sv = MUTABLE_SV(SvRV(sv));
- if (SvTYPE(sv) == SVt_REGEXP)
- return (REGEXP*) sv;
NULL will be returned if a REGEXP* is not found.
- REGEXP * SvRX(SV *sv)
Returns a boolean indicating whether the SV (or the one it references) is a REGEXP.
If you want to do something with the REGEXP* later use SvRX instead and check for NULL.
- bool SvRXOK(SV* sv)
Set up necessary local variables for exception handling. See Exception Handling in perlguts.
- dXCPT;
Introduces a catch block. See Exception Handling in perlguts.
Rethrows a previously caught exception. See Exception Handling in perlguts.
- XCPT_RETHROW;
Ends a try block. See Exception Handling in perlguts.
Starts a try block. See Exception Handling in perlguts.
Declare a stack marker variable, mark
, for the XSUB. See MARK
and
dORIGMARK
.
- dMARK;
Saves the original stack mark for the XSUB. See ORIGMARK
.
- dORIGMARK;
Declares a local copy of perl's stack pointer for the XSUB, available via
the SP
macro. See SP
.
- dSP;
Used to extend the argument stack for an XSUB's return values. Once
used, guarantees that there is room for at least nitems
to be pushed
onto the stack.
- void EXTEND(SP, int nitems)
Stack marker variable for the XSUB. See dMARK
.
Push an integer onto the stack. The stack must have room for this element.
Does not use TARG
. See also PUSHi
, mXPUSHi
and XPUSHi
.
- void mPUSHi(IV iv)
Push a double onto the stack. The stack must have room for this element.
Does not use TARG
. See also PUSHn
, mXPUSHn
and XPUSHn
.
- void mPUSHn(NV nv)
Push a string onto the stack. The stack must have room for this element.
The len
indicates the length of the string. Does not use TARG
.
See also PUSHp
, mXPUSHp
and XPUSHp
.
- void mPUSHp(char* str, STRLEN len)
Push an SV onto the stack and mortalizes the SV. The stack must have room
for this element. Does not use TARG
. See also PUSHs
and mXPUSHs
.
- void mPUSHs(SV* sv)
Push an unsigned integer onto the stack. The stack must have room for this
element. Does not use TARG
. See also PUSHu
, mXPUSHu
and XPUSHu
.
- void mPUSHu(UV uv)
Push an integer onto the stack, extending the stack if necessary.
Does not use TARG
. See also XPUSHi
, mPUSHi
and PUSHi
.
- void mXPUSHi(IV iv)
Push a double onto the stack, extending the stack if necessary.
Does not use TARG
. See also XPUSHn
, mPUSHn
and PUSHn
.
- void mXPUSHn(NV nv)
Push a string onto the stack, extending the stack if necessary. The len
indicates the length of the string. Does not use TARG
. See also XPUSHp
,
mPUSHp
and PUSHp
.
- void mXPUSHp(char* str, STRLEN len)
Push an SV onto the stack, extending the stack if necessary and mortalizes
the SV. Does not use TARG
. See also XPUSHs
and mPUSHs
.
- void mXPUSHs(SV* sv)
Push an unsigned integer onto the stack, extending the stack if necessary.
Does not use TARG
. See also XPUSHu
, mPUSHu
and PUSHu
.
- void mXPUSHu(UV uv)
The original stack mark for the XSUB. See dORIGMARK
.
Pops an integer off the stack.
- IV POPi
Pops a long off the stack.
- long POPl
Pops a double off the stack.
- NV POPn
Pops a string off the stack.
- char* POPp
Pops a string off the stack which must consist of bytes i.e. characters < 256.
- char* POPpbytex
Pops a string off the stack. Identical to POPp. There are two names for historical reasons.
- char* POPpx
Pops an SV off the stack.
- SV* POPs
Push an integer onto the stack. The stack must have room for this element.
Handles 'set' magic. Uses TARG
, so dTARGET
or dXSTARG
should be
called to declare it. Do not call multiple TARG
-oriented macros to
return lists from XSUB's - see mPUSHi
instead. See also XPUSHi
and
mXPUSHi
.
- void PUSHi(IV iv)
Opening bracket for arguments on a callback. See PUTBACK
and
perlcall.
- void PUSHMARK(SP)
Push a new mortal SV onto the stack. The stack must have room for this
element. Does not use TARG
. See also PUSHs
, XPUSHmortal
and XPUSHs
.
- void PUSHmortal()
Push a double onto the stack. The stack must have room for this element.
Handles 'set' magic. Uses TARG
, so dTARGET
or dXSTARG
should be
called to declare it. Do not call multiple TARG
-oriented macros to
return lists from XSUB's - see mPUSHn
instead. See also XPUSHn
and
mXPUSHn
.
- void PUSHn(NV nv)
Push a string onto the stack. The stack must have room for this element.
The len
indicates the length of the string. Handles 'set' magic. Uses
TARG
, so dTARGET
or dXSTARG
should be called to declare it. Do not
call multiple TARG
-oriented macros to return lists from XSUB's - see
mPUSHp
instead. See also XPUSHp
and mXPUSHp
.
- void PUSHp(char* str, STRLEN len)
Push an SV onto the stack. The stack must have room for this element.
Does not handle 'set' magic. Does not use TARG
. See also PUSHmortal
,
XPUSHs
and XPUSHmortal
.
- void PUSHs(SV* sv)
Push an unsigned integer onto the stack. The stack must have room for this
element. Handles 'set' magic. Uses TARG
, so dTARGET
or dXSTARG
should be called to declare it. Do not call multiple TARG
-oriented
macros to return lists from XSUB's - see mPUSHu
instead. See also
XPUSHu
and mXPUSHu
.
- void PUSHu(UV uv)
Closing bracket for XSUB arguments. This is usually handled by xsubpp
.
See PUSHMARK
and perlcall for other uses.
- PUTBACK;
Stack pointer. This is usually handled by xsubpp
. See dSP
and
SPAGAIN
.
Refetch the stack pointer. Used after a callback. See perlcall.
- SPAGAIN;
Push an integer onto the stack, extending the stack if necessary. Handles
'set' magic. Uses TARG
, so dTARGET
or dXSTARG
should be called to
declare it. Do not call multiple TARG
-oriented macros to return lists
from XSUB's - see mXPUSHi
instead. See also PUSHi
and mPUSHi
.
- void XPUSHi(IV iv)
Push a new mortal SV onto the stack, extending the stack if necessary.
Does not use TARG
. See also XPUSHs
, PUSHmortal
and PUSHs
.
- void XPUSHmortal()
Push a double onto the stack, extending the stack if necessary. Handles
'set' magic. Uses TARG
, so dTARGET
or dXSTARG
should be called to
declare it. Do not call multiple TARG
-oriented macros to return lists
from XSUB's - see mXPUSHn
instead. See also PUSHn
and mPUSHn
.
- void XPUSHn(NV nv)
Push a string onto the stack, extending the stack if necessary. The len
indicates the length of the string. Handles 'set' magic. Uses TARG
, so
dTARGET
or dXSTARG
should be called to declare it. Do not call
multiple TARG
-oriented macros to return lists from XSUB's - see
mXPUSHp
instead. See also PUSHp
and mPUSHp
.
- void XPUSHp(char* str, STRLEN len)
Push an SV onto the stack, extending the stack if necessary. Does not
handle 'set' magic. Does not use TARG
. See also XPUSHmortal
,
PUSHs
and PUSHmortal
.
- void XPUSHs(SV* sv)
Push an unsigned integer onto the stack, extending the stack if necessary.
Handles 'set' magic. Uses TARG
, so dTARGET
or dXSTARG
should be
called to declare it. Do not call multiple TARG
-oriented macros to
return lists from XSUB's - see mXPUSHu
instead. See also PUSHu
and
mPUSHu
.
- void XPUSHu(UV uv)
Return from XSUB, indicating number of items on the stack. This is usually
handled by xsubpp
.
- void XSRETURN(int nitems)
Return an empty list from an XSUB immediately.
- XSRETURN_EMPTY;
Return an integer from an XSUB immediately. Uses XST_mIV
.
- void XSRETURN_IV(IV iv)
Return &PL_sv_no
from an XSUB immediately. Uses XST_mNO
.
- XSRETURN_NO;
Return a double from an XSUB immediately. Uses XST_mNV
.
- void XSRETURN_NV(NV nv)
Return a copy of a string from an XSUB immediately. Uses XST_mPV
.
- void XSRETURN_PV(char* str)
Return &PL_sv_undef
from an XSUB immediately. Uses XST_mUNDEF
.
- XSRETURN_UNDEF;
Return an integer from an XSUB immediately. Uses XST_mUV
.
- void XSRETURN_UV(IV uv)
Return &PL_sv_yes
from an XSUB immediately. Uses XST_mYES
.
- XSRETURN_YES;
Place an integer into the specified position pos on the stack. The
value is stored in a new mortal SV.
Place &PL_sv_no
into the specified position pos on the
stack.
Place a double into the specified position pos on the stack. The value
is stored in a new mortal SV.
Place a copy of a string into the specified position pos on the stack.
The value is stored in a new mortal SV.
Place &PL_sv_undef
into the specified position pos on the
stack.
Place &PL_sv_yes
into the specified position pos on the
stack.
An enum of flags for Perl types. These are found in the file sv.h
in the svtype
enum. Test these flags with the SvTYPE
macro.
The types are:
- SVt_NULL
- SVt_BIND (unused)
- SVt_IV
- SVt_NV
- SVt_RV
- SVt_PV
- SVt_PVIV
- SVt_PVNV
- SVt_PVMG
- SVt_REGEXP
- SVt_PVGV
- SVt_PVLV
- SVt_PVAV
- SVt_PVHV
- SVt_PVCV
- SVt_PVFM
- SVt_PVIO
These are most easily explained from the bottom up.
SVt_PVIO is for I/O objects, SVt_PVFM for formats, SVt_PVCV for subroutines, SVt_PVHV for hashes and SVt_PVAV for arrays.
All the others are scalar types, that is, things that can be bound to a
$
variable. For these, the internal types are mostly orthogonal to
types in the Perl language.
Hence, checking SvTYPE(sv) < SVt_PVAV
is the best way to see whether
something is a scalar.
SVt_PVGV represents a typeglob. If !SvFAKE(sv), then it is a real,
incoercible typeglob. If SvFAKE(sv), then it is a scalar to which a
typeglob has been assigned. Assigning to it again will stop it from being
a typeglob. SVt_PVLV represents a scalar that delegates to another scalar
behind the scenes. It is used, e.g., for the return value of substr and
for tied hash and array elements. It can hold any scalar value, including
a typeglob. SVt_REGEXP is for regular expressions.
SVt_PVMG represents a "normal" scalar (not a typeglob, regular expression, or delegate). Since most scalars do not need all the internal fields of a PVMG, we save memory by allocating smaller structs when possible. All the other types are just simpler forms of SVt_PVMG, with fewer internal fields. SVt_NULL can only hold undef. SVt_IV can hold undef, an integer, or a reference. (SVt_RV is an alias for SVt_IV, which exists for backward compatibility.) SVt_NV can hold any of those or a double. SVt_PV can only hold undef or a string. SVt_PVIV is a superset of SVt_PV and SVt_IV. SVt_PVNV is similar. SVt_PVMG can hold anything SVt_PVNV can hold, but it can, but does not have to, be blessed or magical.
Type flag for scalars. See svtype.
Type flag for scalars. See svtype.
Type flag for scalars. See svtype.
Type flag for scalars. See svtype.
Type flag for arrays. See svtype.
Type flag for subroutines. See svtype.
Type flag for formats. See svtype.
Type flag for typeglobs. See svtype.
Type flag for hashes. See svtype.
Type flag for I/O objects. See svtype.
Type flag for scalars. See svtype.
Type flag for scalars. See svtype.
Type flag for scalars. See svtype.
Type flag for scalars. See svtype.
Type flag for regular expressions. See svtype.
Returns a true SV if b
is a true value, or a false SV if b
is 0.
See also PL_sv_yes
and PL_sv_no
.
- SV * boolSV(bool b)
A specialised variant of croak()
for emitting the usage message for xsubs
- croak_xs_usage(cv, "eee_yow");
works out the package name and subroutine name from cv
, and then calls
croak()
. Hence if cv
is &ouch::awk
, it would call croak
as:
- Perl_croak(aTHX_ "Usage: %"SVf"::%"SVf"(%s)", "ouch" "awk", "eee_yow");
- void croak_xs_usage(const CV *const cv,
- const char *const params)
Returns the SV of the specified Perl scalar. flags
are passed to
gv_fetchpv
. If GV_ADD
is set and the
Perl variable does not exist then it will be created. If flags
is zero
and the variable does not exist then NULL is returned.
NOTE: the perl_ form of this function is deprecated.
- SV* get_sv(const char *name, I32 flags)
Creates an RV wrapper for an SV. The reference count for the original SV is incremented.
- SV* newRV_inc(SV* sv)
Creates a new SV containing the pad name. This is currently identical
to newSVsv
, but pad names may cease being SVs at some point, so
newSVpadname
is preferable.
NOTE: this function is experimental and may change or be removed without notice.
- SV* newSVpadname(PADNAME *pn)
Creates a new SV and copies a string into it. If utf8 is true, calls
SvUTF8_on
on the new SV. Implemented as a wrapper around newSVpvn_flags
.
- SV* newSVpvn_utf8(NULLOK const char* s, STRLEN len,
- U32 utf8)
Returns the length of the string which is in the SV. See SvLEN
.
- STRLEN SvCUR(SV* sv)
Set the current length of the string which is in the SV. See SvCUR
and SvIV_set
.
- void SvCUR_set(SV* sv, STRLEN len)
Returns a pointer to the spot just after the last character in
the string which is in the SV, where there is usually a trailing
null (even though Perl scalars do not strictly require it).
See SvCUR
. Access the character as *(SvEND(sv)).
Warning: If SvCUR
is equal to SvLEN
, then SvEND
points to
unallocated memory.
- char* SvEND(SV* sv)
Returns true if the SV has get magic or overloading. If either is true then the scalar is active data, and has the potential to return a new value every time it is accessed. Hence you must be careful to only read it once per user logical operation and work with that returned value. If neither is true then the scalar's value cannot change unless written to.
- U32 SvGAMAGIC(SV* sv)
Expands the character buffer in the SV so that it has room for the
indicated number of bytes (remember to reserve space for an extra trailing
NUL character). Calls sv_grow
to perform the expansion if necessary.
Returns a pointer to the character buffer. SV must be of type >= SVt_PV. One
alternative is to call sv_grow
if you are not sure of the type of SV.
- char * SvGROW(SV* sv, STRLEN len)
Returns a U32 value indicating whether the SV contains an integer.
- U32 SvIOK(SV* sv)
Returns a U32 value indicating whether the SV contains an integer. Checks
the private setting. Use SvIOK
instead.
- U32 SvIOKp(SV* sv)
Returns a boolean indicating whether the SV contains a signed integer.
- bool SvIOK_notUV(SV* sv)
Unsets the IV status of an SV.
- void SvIOK_off(SV* sv)
Tells an SV that it is an integer.
- void SvIOK_on(SV* sv)
Tells an SV that it is an integer and disables all other OK bits.
- void SvIOK_only(SV* sv)
Tells an SV that it is an unsigned integer and disables all other OK bits.
- void SvIOK_only_UV(SV* sv)
Returns a boolean indicating whether the SV contains an integer that must be interpreted as unsigned. A non-negative integer whose value is within the range of both an IV and a UV may be be flagged as either SvUOK or SVIOK.
- bool SvIOK_UV(SV* sv)
Returns a boolean indicating whether the SV is Copy-On-Write (either shared hash key scalars, or full Copy On Write scalars if 5.9.0 is configured for COW).
- bool SvIsCOW(SV* sv)
Returns a boolean indicating whether the SV is Copy-On-Write shared hash key scalar.
- bool SvIsCOW_shared_hash(SV* sv)
Coerces the given SV to an integer and returns it. See SvIVx
for a
version which guarantees to evaluate sv only once.
- IV SvIV(SV* sv)
Returns the raw value in the SV's IV slot, without checks or conversions.
Only use when you are sure SvIOK is true. See also SvIV()
.
- IV SvIVX(SV* sv)
Coerces the given SV to an integer and returns it.
Guarantees to evaluate sv
only once. Only use
this if sv
is an expression with side effects,
otherwise use the more efficient SvIV
.
- IV SvIVx(SV* sv)
Like SvIV
but doesn't process magic.
- IV SvIV_nomg(SV* sv)
Set the value of the IV pointer in sv to val. It is possible to perform
the same function of this macro with an lvalue assignment to SvIVX
.
With future Perls, however, it will be more efficient to use
SvIV_set
instead of the lvalue assignment to SvIVX
.
- void SvIV_set(SV* sv, IV val)
Returns the size of the string buffer in the SV, not including any part
attributable to SvOOK
. See SvCUR
.
- STRLEN SvLEN(SV* sv)
Set the actual length of the string which is in the SV. See SvIV_set
.
- void SvLEN_set(SV* sv, STRLEN len)
Set the value of the MAGIC pointer in sv to val. See SvIV_set
.
- void SvMAGIC_set(SV* sv, MAGIC* val)
Returns a U32 value indicating whether the SV contains a number, integer or double.
- U32 SvNIOK(SV* sv)
Returns a U32 value indicating whether the SV contains a number, integer or
double. Checks the private setting. Use SvNIOK
instead.
- U32 SvNIOKp(SV* sv)
Unsets the NV/IV status of an SV.
- void SvNIOK_off(SV* sv)
Returns a U32 value indicating whether the SV contains a double.
- U32 SvNOK(SV* sv)
Returns a U32 value indicating whether the SV contains a double. Checks the
private setting. Use SvNOK
instead.
- U32 SvNOKp(SV* sv)
Unsets the NV status of an SV.
- void SvNOK_off(SV* sv)
Tells an SV that it is a double.
- void SvNOK_on(SV* sv)
Tells an SV that it is a double and disables all other OK bits.
- void SvNOK_only(SV* sv)
Coerce the given SV to a double and return it. See SvNVx
for a version
which guarantees to evaluate sv only once.
- NV SvNV(SV* sv)
Returns the raw value in the SV's NV slot, without checks or conversions.
Only use when you are sure SvNOK is true. See also SvNV()
.
- NV SvNVX(SV* sv)
Coerces the given SV to a double and returns it.
Guarantees to evaluate sv
only once. Only use
this if sv
is an expression with side effects,
otherwise use the more efficient SvNV
.
- NV SvNVx(SV* sv)
Like SvNV
but doesn't process magic.
- NV SvNV_nomg(SV* sv)
Set the value of the NV pointer in sv to val. See SvIV_set
.
- void SvNV_set(SV* sv, NV val)
Returns a U32 value indicating whether the value is defined. This is only meaningful for scalars.
- U32 SvOK(SV* sv)
Returns a U32 indicating whether the pointer to the string buffer is offset.
This hack is used internally to speed up removal of characters from the
beginning of a SvPV. When SvOOK is true, then the start of the
allocated string buffer is actually SvOOK_offset()
bytes before SvPVX.
This offset used to be stored in SvIVX, but is now stored within the spare
part of the buffer.
- U32 SvOOK(SV* sv)
Reads into len the offset from SvPVX back to the true start of the
allocated buffer, which will be non-zero if sv_chop
has been used to
efficiently remove characters from start of the buffer. Implemented as a
macro, which takes the address of len, which must be of type STRLEN
.
Evaluates sv more than once. Sets len to 0 if SvOOK(sv)
is false.
- void SvOOK_offset(NN SV*sv, STRLEN len)
Returns a U32 value indicating whether the SV contains a character string.
- U32 SvPOK(SV* sv)
Returns a U32 value indicating whether the SV contains a character string.
Checks the private setting. Use SvPOK
instead.
- U32 SvPOKp(SV* sv)
Unsets the PV status of an SV.
- void SvPOK_off(SV* sv)
Tells an SV that it is a string.
- void SvPOK_on(SV* sv)
Tells an SV that it is a string and disables all other OK bits. Will also turn off the UTF-8 status.
- void SvPOK_only(SV* sv)
Tells an SV that it is a string and disables all other OK bits, and leaves the UTF-8 status as it was.
- void SvPOK_only_UTF8(SV* sv)
Returns a pointer to the string in the SV, or a stringified form of
the SV if the SV does not contain a string. The SV may cache the
stringified version becoming SvPOK
. Handles 'get' magic. See also
SvPVx
for a version which guarantees to evaluate sv only once.
Note that there is no guarantee that the return value of SvPV()
is
equal to SvPVX(sv)
, or that SvPVX(sv)
contains valid data, or that
successive calls to SvPV(sv)) will return the same pointer value each
time. This is due to the way that things like overloading and
Copy-On-Write are handled. In these cases, the return value may point to
a temporary buffer or similar. If you absolutely need the SvPVX field to
be valid (for example, if you intend to write to it), then see
SvPV_force.
- char* SvPV(SV* sv, STRLEN len)
Like SvPV
, but converts sv to byte representation first if necessary.
- char* SvPVbyte(SV* sv, STRLEN len)
Like SvPV
, but converts sv to byte representation first if necessary.
Guarantees to evaluate sv only once; use the more efficient SvPVbyte
otherwise.
- char* SvPVbytex(SV* sv, STRLEN len)
Like SvPV_force
, but converts sv to byte representation first if necessary.
Guarantees to evaluate sv only once; use the more efficient SvPVbyte_force
otherwise.
- char* SvPVbytex_force(SV* sv, STRLEN len)
Like SvPV_force
, but converts sv to byte representation first if necessary.
- char* SvPVbyte_force(SV* sv, STRLEN len)
Like SvPV_nolen
, but converts sv to byte representation first if necessary.
- char* SvPVbyte_nolen(SV* sv)
Like SvPV
, but converts sv to utf8 first if necessary.
- char* SvPVutf8(SV* sv, STRLEN len)
Like SvPV
, but converts sv to utf8 first if necessary.
Guarantees to evaluate sv only once; use the more efficient SvPVutf8
otherwise.
- char* SvPVutf8x(SV* sv, STRLEN len)
Like SvPV_force
, but converts sv to utf8 first if necessary.
Guarantees to evaluate sv only once; use the more efficient SvPVutf8_force
otherwise.
- char* SvPVutf8x_force(SV* sv, STRLEN len)
Like SvPV_force
, but converts sv to utf8 first if necessary.
- char* SvPVutf8_force(SV* sv, STRLEN len)
Like SvPV_nolen
, but converts sv to utf8 first if necessary.
- char* SvPVutf8_nolen(SV* sv)
Returns a pointer to the physical string in the SV. The SV must contain a string. Prior to 5.9.3 it is not safe to execute this macro unless the SV's type >= SVt_PV.
This is also used to store the name of an autoloaded subroutine in an XS AUTOLOAD routine. See Autoloading with XSUBs in perlguts.
- char* SvPVX(SV* sv)
A version of SvPV
which guarantees to evaluate sv
only once.
Only use this if sv
is an expression with side effects, otherwise use the
more efficient SvPV
.
- char* SvPVx(SV* sv, STRLEN len)
Like SvPV
but will force the SV into containing a string (SvPOK
), and
only a string (SvPOK_only
), by hook or by crook. You need force if you are
going to update the SvPVX
directly. Processes get magic.
Note that coercing an arbitrary scalar into a plain PV will potentially
strip useful data from it. For example if the SV was SvROK
, then the
referent will have its reference count decremented, and the SV itself may
be converted to an SvPOK
scalar with a string buffer containing a value
such as "ARRAY(0x1234)"
.
- char* SvPV_force(SV* sv, STRLEN len)
Like SvPV_force
, but doesn't process get magic.
- char* SvPV_force_nomg(SV* sv, STRLEN len)
Like SvPV
but doesn't set a length variable.
- char* SvPV_nolen(SV* sv)
Like SvPV
but doesn't process magic.
- char* SvPV_nomg(SV* sv, STRLEN len)
Like SvPV_nolen
but doesn't process magic.
- char* SvPV_nomg_nolen(SV* sv)
Set the value of the PV pointer in sv to val. See also SvIV_set
.
Beware that the existing pointer may be involved in copy-on-write or other
mischief, so do SvOOK_off(sv)
and use sv_force_normal
or
SvPV_force
(or check the SvIsCOW flag) first to make sure this
modification is safe.
- void SvPV_set(SV* sv, char* val)
Returns the value of the object's reference count.
- U32 SvREFCNT(SV* sv)
Decrements the reference count of the given SV. sv may be NULL.
- void SvREFCNT_dec(SV* sv)
Same as SvREFCNT_dec, but can only be used if you know sv is not NULL. Since we don't have to check the NULLness, it's faster and smaller.
- void SvREFCNT_dec_NN(SV* sv)
Increments the reference count of the given SV, returning the SV.
All of the following SvREFCNT_inc* macros are optimized versions of SvREFCNT_inc, and can be replaced with SvREFCNT_inc.
- SV* SvREFCNT_inc(SV* sv)
Same as SvREFCNT_inc, but can only be used if you know sv is not NULL. Since we don't have to check the NULLness, it's faster and smaller.
- SV* SvREFCNT_inc_NN(SV* sv)
Same as SvREFCNT_inc, but can only be used with expressions without side effects. Since we don't have to store a temporary value, it's faster.
- SV* SvREFCNT_inc_simple(SV* sv)
Same as SvREFCNT_inc_simple, but can only be used if you know sv is not NULL. Since we don't have to check the NULLness, it's faster and smaller.
- SV* SvREFCNT_inc_simple_NN(SV* sv)
Same as SvREFCNT_inc_simple, but can only be used if you don't need the return value. The macro doesn't need to return a meaningful value.
- void SvREFCNT_inc_simple_void(SV* sv)
Same as SvREFCNT_inc, but can only be used if you don't need the return value, and you know that sv is not NULL. The macro doesn't need to return a meaningful value, or check for NULLness, so it's smaller and faster.
- void SvREFCNT_inc_simple_void_NN(SV* sv)
Same as SvREFCNT_inc, but can only be used if you don't need the return value. The macro doesn't need to return a meaningful value.
- void SvREFCNT_inc_void(SV* sv)
Same as SvREFCNT_inc, but can only be used if you don't need the return value, and you know that sv is not NULL. The macro doesn't need to return a meaningful value, or check for NULLness, so it's smaller and faster.
- void SvREFCNT_inc_void_NN(SV* sv)
Tests if the SV is an RV.
- U32 SvROK(SV* sv)
Unsets the RV status of an SV.
- void SvROK_off(SV* sv)
Tells an SV that it is an RV.
- void SvROK_on(SV* sv)
Dereferences an RV to return the SV.
- SV* SvRV(SV* sv)
Set the value of the RV pointer in sv to val. See SvIV_set
.
- void SvRV_set(SV* sv, SV* val)
Returns the stash of the SV.
- HV* SvSTASH(SV* sv)
Set the value of the STASH pointer in sv to val. See SvIV_set
.
- void SvSTASH_set(SV* sv, HV* val)
Taints an SV if tainting is enabled, and if some input to the current
expression is tainted--usually a variable, but possibly also implicit
inputs such as locale settings. SvTAINT
propagates that taintedness to
the outputs of an expression in a pessimistic fashion; i.e., without paying
attention to precisely which outputs are influenced by which inputs.
- void SvTAINT(SV* sv)
Checks to see if an SV is tainted. Returns TRUE if it is, FALSE if not.
- bool SvTAINTED(SV* sv)
Untaints an SV. Be very careful with this routine, as it short-circuits some of Perl's fundamental security features. XS module authors should not use this function unless they fully understand all the implications of unconditionally untainting the value. Untainting should be done in the standard perl fashion, via a carefully crafted regexp, rather than directly untainting variables.
- void SvTAINTED_off(SV* sv)
Marks an SV as tainted if tainting is enabled.
- void SvTAINTED_on(SV* sv)
Returns a boolean indicating whether Perl would evaluate the SV as true or false. See SvOK() for a defined/undefined test. Handles 'get' magic unless the scalar is already SvPOK, SvIOK or SvNOK (the public, not the private flags).
- bool SvTRUE(SV* sv)
Returns a boolean indicating whether Perl would evaluate the SV as true or false. See SvOK() for a defined/undefined test. Does not handle 'get' magic.
- bool SvTRUE_nomg(SV* sv)
Returns the type of the SV. See svtype
.
- svtype SvTYPE(SV* sv)
Returns a boolean indicating whether the SV contains an integer that must be interpreted as unsigned. A non-negative integer whose value is within the range of both an IV and a UV may be be flagged as either SvUOK or SVIOK.
- bool SvUOK(SV* sv)
Used to upgrade an SV to a more complex form. Uses sv_upgrade
to
perform the upgrade if necessary. See svtype
.
- void SvUPGRADE(SV* sv, svtype type)
Returns a U32 value indicating the UTF-8 status of an SV. If things are set-up properly, this indicates whether or not the SV contains UTF-8 encoded data. You should use this after a call to SvPV() or one of its variants, in case any call to string overloading updates the internal flag.
- U32 SvUTF8(SV* sv)
Unsets the UTF-8 status of an SV (the data is not changed, just the flag). Do not use frivolously.
- void SvUTF8_off(SV *sv)
Turn on the UTF-8 status of an SV (the data is not changed, just the flag). Do not use frivolously.
- void SvUTF8_on(SV *sv)
Coerces the given SV to an unsigned integer and returns it. See SvUVx
for a version which guarantees to evaluate sv only once.
- UV SvUV(SV* sv)
Returns the raw value in the SV's UV slot, without checks or conversions.
Only use when you are sure SvIOK is true. See also SvUV()
.
- UV SvUVX(SV* sv)
Coerces the given SV to an unsigned integer and
returns it. Guarantees to evaluate sv
only once. Only
use this if sv
is an expression with side effects,
otherwise use the more efficient SvUV
.
- UV SvUVx(SV* sv)
Like SvUV
but doesn't process magic.
- UV SvUV_nomg(SV* sv)
Set the value of the UV pointer in sv to val. See SvIV_set
.
- void SvUV_set(SV* sv, UV val)
Returns a boolean indicating whether the SV contains a v-string.
- bool SvVOK(SV* sv)
Like sv_catpvn
but doesn't process magic.
- void sv_catpvn_nomg(SV* sv, const char* ptr,
- STRLEN len)
Like sv_catpv
but doesn't process magic.
- void sv_catpv_nomg(SV* sv, const char* ptr)
Like sv_catsv
but doesn't process magic.
- void sv_catsv_nomg(SV* dsv, SV* ssv)
Exactly like sv_derived_from_pv, but doesn't take a flags
parameter.
- bool sv_derived_from(SV* sv, const char *const name)
Exactly like sv_derived_from_pvn, but takes a nul-terminated string instead of a string/length pair.
- bool sv_derived_from_pv(SV* sv,
- const char *const name,
- U32 flags)
Returns a boolean indicating whether the SV is derived from the specified class
at the C level. To check derivation at the Perl level, call isa()
as a
normal Perl method.
Currently, the only significant value for flags
is SVf_UTF8.
- bool sv_derived_from_pvn(SV* sv,
- const char *const name,
- const STRLEN len, U32 flags)
Exactly like sv_derived_from_pvn, but takes the name string in the form of an SV instead of a string/length pair.
- bool sv_derived_from_sv(SV* sv, SV *namesv,
- U32 flags)
Like sv_does_pv, but doesn't take a flags
parameter.
- bool sv_does(SV* sv, const char *const name)
Like sv_does_sv, but takes a nul-terminated string instead of an SV.
- bool sv_does_pv(SV* sv, const char *const name,
- U32 flags)
Like sv_does_sv, but takes a string/length pair instead of an SV.
- bool sv_does_pvn(SV* sv, const char *const name,
- const STRLEN len, U32 flags)
Returns a boolean indicating whether the SV performs a specific, named role. The SV can be a Perl object or the name of a Perl class.
- bool sv_does_sv(SV* sv, SV* namesv, U32 flags)
Dump the contents of all SVs not yet freed (debugging aid).
- void sv_report_used()
Like sv_setsv
but doesn't process magic.
- void sv_setsv_nomg(SV* dsv, SV* ssv)
Like sv_utf8_upgrade, but doesn't do magic on sv
.
- STRLEN sv_utf8_upgrade_nomg(NN SV *sv)
Test if the content of an SV looks like a number (or is a number).
Inf
and Infinity
are treated as numbers (so will not issue a
non-numeric warning), even if your atof() doesn't grok them. Get-magic is
ignored.
- I32 looks_like_number(SV *const sv)
Creates an RV wrapper for an SV. The reference count for the original SV is not incremented.
- SV* newRV_noinc(SV *const sv)
Creates a new SV. A non-zero len
parameter indicates the number of
bytes of preallocated string space the SV should have. An extra byte for a
trailing NUL is also reserved. (SvPOK is not set for the SV even if string
space is allocated.) The reference count for the new SV is set to 1.
In 5.9.3, newSV() replaces the older NEWSV() API, and drops the first parameter, x, a debug aid which allowed callers to identify themselves. This aid has been superseded by a new build option, PERL_MEM_LOG (see PERL_MEM_LOG in perlhacktips). The older API is still there for use in XS modules supporting older perls.
- SV* newSV(const STRLEN len)
Creates a new SV from the hash key structure. It will generate scalars that point to the shared string table where possible. Returns a new (undefined) SV if the hek is NULL.
- SV* newSVhek(const HEK *const hek)
Creates a new SV and copies an integer into it. The reference count for the SV is set to 1.
- SV* newSViv(const IV i)
Creates a new SV and copies a floating point value into it. The reference count for the SV is set to 1.
- SV* newSVnv(const NV n)
Creates a new SV and copies a string into it. The reference count for the
SV is set to 1. If len
is zero, Perl will compute the length using
strlen(). For efficiency, consider using newSVpvn
instead.
- SV* newSVpv(const char *const s, const STRLEN len)
Creates a new SV and initializes it with the string formatted like
sprintf.
- SV* newSVpvf(const char *const pat, ...)
Creates a new SV and copies a buffer into it, which may contain NUL characters
(\0
) and other binary data. The reference count for the SV is set to 1.
Note that if len
is zero, Perl will create a zero length (Perl) string. You
are responsible for ensuring that the source buffer is at least
len
bytes long. If the buffer
argument is NULL the new SV will be
undefined.
- SV* newSVpvn(const char *const s, const STRLEN len)
Creates a new SV and copies a string into it. The reference count for the
SV is set to 1. Note that if len
is zero, Perl will create a zero length
string. You are responsible for ensuring that the source string is at least
len
bytes long. If the s argument is NULL the new SV will be undefined.
Currently the only flag bits accepted are SVf_UTF8
and SVs_TEMP
.
If SVs_TEMP
is set, then sv_2mortal()
is called on the result before
returning. If SVf_UTF8
is set, s
is considered to be in UTF-8 and the
SVf_UTF8
flag will be set on the new SV.
newSVpvn_utf8()
is a convenience wrapper for this function, defined as
- #define newSVpvn_utf8(s, len, u) \
- newSVpvn_flags((s), (len), (u) ? SVf_UTF8 : 0)
- SV* newSVpvn_flags(const char *const s,
- const STRLEN len,
- const U32 flags)
Creates a new SV with its SvPVX_const pointing to a shared string in the string
table. If the string does not already exist in the table, it is
created first. Turns on the SvIsCOW flag (or READONLY
and FAKE in 5.16 and earlier). If the hash
parameter
is non-zero, that value is used; otherwise the hash is computed.
The string's hash can later be retrieved from the SV
with the SvSHARED_HASH()
macro. The idea here is
that as the string table is used for shared hash keys these strings will have
SvPVX_const == HeKEY and hash lookup will avoid string compare.
- SV* newSVpvn_share(const char* s, I32 len, U32 hash)
Like newSVpvn
, but takes a literal string instead of a string/length pair.
- SV* newSVpvs(const char* s)
Like newSVpvn_flags
, but takes a literal string instead of a string/length
pair.
- SV* newSVpvs_flags(const char* s, U32 flags)
Like newSVpvn_share
, but takes a literal string instead of a string/length
pair and omits the hash parameter.
- SV* newSVpvs_share(const char* s)
Like newSVpvn_share
, but takes a nul-terminated string instead of a
string/length pair.
- SV* newSVpv_share(const char* s, U32 hash)
Creates a new SV for the existing RV, rv
, to point to. If rv
is not an
RV then it will be upgraded to one. If classname
is non-null then the new
SV will be blessed in the specified package. The new SV is returned and its
reference count is 1. The reference count 1 is owned by rv
.
- SV* newSVrv(SV *const rv,
- const char *const classname)
Creates a new SV which is an exact duplicate of the original SV.
(Uses sv_setsv
.)
- SV* newSVsv(SV *const old)
Creates a new SV and copies an unsigned integer into it. The reference count for the SV is set to 1.
- SV* newSVuv(const UV u)
Creates a new SV, of the type specified. The reference count for the new SV is set to 1.
- SV* newSV_type(const svtype type)
This macro is only used by sv_true() or its macro equivalent, and only if the latter's argument is neither SvPOK, SvIOK nor SvNOK. It calls sv_2bool_flags with the SV_GMAGIC flag.
- bool sv_2bool(SV *const sv)
This function is only used by sv_true() and friends, and only if the latter's argument is neither SvPOK, SvIOK nor SvNOK. If the flags contain SV_GMAGIC, then it does an mg_get() first.
- bool sv_2bool_flags(SV *const sv, const I32 flags)
Using various gambits, try to get a CV from an SV; in addition, try if
possible to set *st
and *gvp
to the stash and GV associated with it.
The flags in lref
are passed to gv_fetchsv.
- CV* sv_2cv(SV* sv, HV **const st, GV **const gvp,
- const I32 lref)
Using various gambits, try to get an IO from an SV: the IO slot if its a GV; or the recursive result if we're an RV; or the IO slot of the symbol named after the PV if we're a string.
'Get' magic is ignored on the sv passed in, but will be called on
SvRV(sv)
if sv is an RV.
- IO* sv_2io(SV *const sv)
Return the integer value of an SV, doing any necessary string
conversion. If flags includes SV_GMAGIC, does an mg_get() first.
Normally used via the SvIV(sv)
and SvIVx(sv)
macros.
- IV sv_2iv_flags(SV *const sv, const I32 flags)
Marks an existing SV as mortal. The SV will be destroyed "soon", either
by an explicit call to FREETMPS, or by an implicit call at places such as
statement boundaries. SvTEMP() is turned on which means that the SV's
string buffer can be "stolen" if this SV is copied. See also sv_newmortal
and sv_mortalcopy
.
- SV* sv_2mortal(SV *const sv)
Return the num value of an SV, doing any necessary string or integer
conversion. If flags includes SV_GMAGIC, does an mg_get() first.
Normally used via the SvNV(sv)
and SvNVx(sv)
macros.
- NV sv_2nv_flags(SV *const sv, const I32 flags)
Return a pointer to the byte-encoded representation of the SV, and set *lp to its length. May cause the SV to be downgraded from UTF-8 as a side-effect.
Usually accessed via the SvPVbyte
macro.
- char* sv_2pvbyte(SV *sv, STRLEN *const lp)
Return a pointer to the UTF-8-encoded representation of the SV, and set *lp to its length. May cause the SV to be upgraded to UTF-8 as a side-effect.
Usually accessed via the SvPVutf8
macro.
- char* sv_2pvutf8(SV *sv, STRLEN *const lp)
Returns a pointer to the string value of an SV, and sets *lp to its length.
If flags includes SV_GMAGIC, does an mg_get() first. Coerces sv to a
string if necessary. Normally invoked via the SvPV_flags
macro.
sv_2pv()
and sv_2pv_nomg
usually end up here too.
- char* sv_2pv_flags(SV *const sv, STRLEN *const lp,
- const I32 flags)
Return the unsigned integer value of an SV, doing any necessary string
conversion. If flags includes SV_GMAGIC, does an mg_get() first.
Normally used via the SvUV(sv)
and SvUVx(sv)
macros.
- UV sv_2uv_flags(SV *const sv, const I32 flags)
Remove any string offset. You should normally use the SvOOK_off
macro
wrapper instead.
- int sv_backoff(SV *const sv)
Blesses an SV into a specified package. The SV must be an RV. The package
must be designated by its stash (see gv_stashpv()
). The reference count
of the SV is unaffected.
- SV* sv_bless(SV *const sv, HV *const stash)
Concatenates the string onto the end of the string which is in the SV.
If the SV has the UTF-8 status set, then the bytes appended should be
valid UTF-8. Handles 'get' magic, but not 'set' magic. See sv_catpv_mg
.
- void sv_catpv(SV *const sv, const char* ptr)
Processes its arguments like sprintf and appends the formatted
output to an SV. If the appended data contains "wide" characters
(including, but not limited to, SVs with a UTF-8 PV formatted with %s,
and characters >255 formatted with %c), the original SV might get
upgraded to UTF-8. Handles 'get' magic, but not 'set' magic. See
sv_catpvf_mg
. If the original SV was UTF-8, the pattern should be
valid UTF-8; if the original SV was bytes, the pattern should be too.
- void sv_catpvf(SV *const sv, const char *const pat,
- ...)
Like sv_catpvf
, but also handles 'set' magic.
- void sv_catpvf_mg(SV *const sv,
- const char *const pat, ...)
Concatenates the string onto the end of the string which is in the SV. The
len
indicates number of bytes to copy. If the SV has the UTF-8
status set, then the bytes appended should be valid UTF-8.
Handles 'get' magic, but not 'set' magic. See sv_catpvn_mg
.
- void sv_catpvn(SV *dsv, const char *sstr, STRLEN len)
Concatenates the string onto the end of the string which is in the SV. The
len
indicates number of bytes to copy. If the SV has the UTF-8
status set, then the bytes appended should be valid UTF-8.
If flags
has the SV_SMAGIC
bit set, will
mg_set
on dsv
afterwards if appropriate.
sv_catpvn
and sv_catpvn_nomg
are implemented
in terms of this function.
- void sv_catpvn_flags(SV *const dstr,
- const char *sstr,
- const STRLEN len,
- const I32 flags)
Like sv_catpvn
, but takes a literal string instead of a string/length pair.
- void sv_catpvs(SV* sv, const char* s)
Like sv_catpvn_flags
, but takes a literal string instead of a
string/length pair.
- void sv_catpvs_flags(SV* sv, const char* s,
- I32 flags)
Like sv_catpvn_mg
, but takes a literal string instead of a
string/length pair.
- void sv_catpvs_mg(SV* sv, const char* s)
Like sv_catpvn_nomg
, but takes a literal string instead of a
string/length pair.
- void sv_catpvs_nomg(SV* sv, const char* s)
Concatenates the string onto the end of the string which is in the SV.
If the SV has the UTF-8 status set, then the bytes appended should
be valid UTF-8. If flags
has the SV_SMAGIC
bit set, will mg_set
on the modified SV if appropriate.
- void sv_catpv_flags(SV *dstr, const char *sstr,
- const I32 flags)
Like sv_catpv
, but also handles 'set' magic.
- void sv_catpv_mg(SV *const sv, const char *const ptr)
Concatenates the string from SV ssv
onto the end of the string in SV
dsv
. If ssv
is null, does nothing; otherwise modifies only dsv
.
Handles 'get' magic on both SVs, but no 'set' magic. See sv_catsv_mg
and
sv_catsv_nomg
.
- void sv_catsv(SV *dstr, SV *sstr)
Concatenates the string from SV ssv
onto the end of the string in SV
dsv
. If ssv
is null, does nothing; otherwise modifies only dsv
.
If flags
include SV_GMAGIC
bit set, will call mg_get
on both SVs if
appropriate. If flags
include SV_SMAGIC
, mg_set
will be called on
the modified SV afterward, if appropriate. sv_catsv
, sv_catsv_nomg
,
and sv_catsv_mg
are implemented in terms of this function.
- void sv_catsv_flags(SV *const dsv, SV *const ssv,
- const I32 flags)
Efficient removal of characters from the beginning of the string buffer.
SvPOK(sv), or at least SvPOKp(sv), must be true and the ptr
must be a
pointer to somewhere inside the string buffer. The ptr
becomes the first
character of the adjusted string. Uses the "OOK hack". On return, only
SvPOK(sv) and SvPOKp(sv) among the OK flags will be true.
Beware: after this function returns, ptr
and SvPVX_const(sv) may no longer
refer to the same chunk of data.
The unfortunate similarity of this function's name to that of Perl's chop
operator is strictly coincidental. This function works from the left;
chop works from the right.
- void sv_chop(SV *const sv, const char *const ptr)
Clear an SV: call any destructors, free up any memory used by the body,
and free the body itself. The SV's head is not freed, although
its type is set to all 1's so that it won't inadvertently be assumed
to be live during global destruction etc.
This function should only be called when REFCNT is zero. Most of the time
you'll want to call sv_free()
(or its macro wrapper SvREFCNT_dec
)
instead.
- void sv_clear(SV *const orig_sv)
Compares the strings in two SVs. Returns -1, 0, or 1 indicating whether the
string in sv1
is less than, equal to, or greater than the string in
sv2
. Is UTF-8 and 'use bytes' aware, handles get magic, and will
coerce its args to strings if necessary. See also sv_cmp_locale
.
- I32 sv_cmp(SV *const sv1, SV *const sv2)
Compares the strings in two SVs. Returns -1, 0, or 1 indicating whether the
string in sv1
is less than, equal to, or greater than the string in
sv2
. Is UTF-8 and 'use bytes' aware and will coerce its args to strings
if necessary. If the flags include SV_GMAGIC, it handles get magic. See
also sv_cmp_locale_flags
.
- I32 sv_cmp_flags(SV *const sv1, SV *const sv2,
- const U32 flags)
Compares the strings in two SVs in a locale-aware manner. Is UTF-8 and
'use bytes' aware, handles get magic, and will coerce its args to strings
if necessary. See also sv_cmp
.
- I32 sv_cmp_locale(SV *const sv1, SV *const sv2)
Compares the strings in two SVs in a locale-aware manner. Is UTF-8 and
'use bytes' aware and will coerce its args to strings if necessary. If the
flags contain SV_GMAGIC, it handles get magic. See also sv_cmp_flags
.
- I32 sv_cmp_locale_flags(SV *const sv1,
- SV *const sv2,
- const U32 flags)
This calls sv_collxfrm_flags
with the SV_GMAGIC flag. See
sv_collxfrm_flags
.
- char* sv_collxfrm(SV *const sv, STRLEN *const nxp)
Add Collate Transform magic to an SV if it doesn't already have it. If the flags contain SV_GMAGIC, it handles get-magic.
Any scalar variable may carry PERL_MAGIC_collxfrm magic that contains the scalar data of the variable, but transformed to such a format that a normal memory comparison can be used to compare the data according to the locale settings.
- char* sv_collxfrm_flags(SV *const sv,
- STRLEN *const nxp,
- I32 const flags)
Implementation of sv_copypv and sv_copypv_nomg. Calls get magic iff flags include SV_GMAGIC.
- void sv_copypv_flags(SV *const dsv, SV *const ssv,
- const I32 flags)
Like sv_copypv, but doesn't invoke get magic first.
- void sv_copypv_nomg(SV *const dsv, SV *const ssv)
Auto-decrement of the value in the SV, doing string to numeric conversion if necessary. Handles 'get' magic and operator overloading.
- void sv_dec(SV *const sv)
Auto-decrement of the value in the SV, doing string to numeric conversion if necessary. Handles operator overloading. Skips handling 'get' magic.
- void sv_dec_nomg(SV *const sv)
Returns a boolean indicating whether the strings in the two SVs are identical. Is UTF-8 and 'use bytes' aware, handles get magic, and will coerce its args to strings if necessary.
- I32 sv_eq(SV* sv1, SV* sv2)
Returns a boolean indicating whether the strings in the two SVs are identical. Is UTF-8 and 'use bytes' aware and coerces its args to strings if necessary. If the flags include SV_GMAGIC, it handles get-magic, too.
- I32 sv_eq_flags(SV* sv1, SV* sv2, const U32 flags)
Undo various types of fakery on an SV, where fakery means
"more than" a string: if the PV is a shared string, make
a private copy; if we're a ref, stop refing; if we're a glob, downgrade to
an xpvmg; if we're a copy-on-write scalar, this is the on-write time when
we do the copy, and is also used locally; if this is a
vstring, drop the vstring magic. If SV_COW_DROP_PV
is set
then a copy-on-write scalar drops its PV buffer (if any) and becomes
SvPOK_off rather than making a copy. (Used where this
scalar is about to be set to some other value.) In addition,
the flags
parameter gets passed to sv_unref_flags()
when unreffing. sv_force_normal
calls this function
with flags set to 0.
- void sv_force_normal_flags(SV *const sv,
- const U32 flags)
Decrement an SV's reference count, and if it drops to zero, call
sv_clear
to invoke destructors and free up any memory used by
the body; finally, deallocate the SV's head itself.
Normally called via a wrapper macro SvREFCNT_dec
.
- void sv_free(SV *const sv)
Get a line from the filehandle and store it into the SV, optionally
appending to the currently-stored string. If append
is not 0, the
line is appended to the SV instead of overwriting it. append
should
be set to the byte offset that the appended string should start at
in the SV (typically, SvCUR(sv)
is a suitable choice).
- char* sv_gets(SV *const sv, PerlIO *const fp,
- I32 append)
Expands the character buffer in the SV. If necessary, uses sv_unref
and
upgrades the SV to SVt_PV
. Returns a pointer to the character buffer.
Use the SvGROW
wrapper instead.
- char* sv_grow(SV *const sv, STRLEN newlen)
Auto-increment of the value in the SV, doing string to numeric conversion if necessary. Handles 'get' magic and operator overloading.
- void sv_inc(SV *const sv)
Auto-increment of the value in the SV, doing string to numeric conversion if necessary. Handles operator overloading. Skips handling 'get' magic.
- void sv_inc_nomg(SV *const sv)
Inserts a string at the specified offset/length within the SV. Similar to the Perl substr() function. Handles get magic.
- void sv_insert(SV *const bigstr, const STRLEN offset,
- const STRLEN len,
- const char *const little,
- const STRLEN littlelen)
Same as sv_insert
, but the extra flags
are passed to the
SvPV_force_flags
that applies to bigstr
.
- void sv_insert_flags(SV *const bigstr,
- const STRLEN offset,
- const STRLEN len,
- const char *const little,
- const STRLEN littlelen,
- const U32 flags)
Returns a boolean indicating whether the SV is blessed into the specified
class. This does not check for subtypes; use sv_derived_from
to verify
an inheritance relationship.
- int sv_isa(SV* sv, const char *const name)
Returns a boolean indicating whether the SV is an RV pointing to a blessed object. If the SV is not an RV, or if the object is not blessed, then this will return false.
- int sv_isobject(SV* sv)
Returns the length of the string in the SV. Handles magic and type
coercion and sets the UTF8 flag appropriately. See also SvCUR
, which
gives raw access to the xpv_cur slot.
- STRLEN sv_len(SV *const sv)
Returns the number of characters in the string in an SV, counting wide UTF-8 bytes as a single character. Handles magic and type coercion.
- STRLEN sv_len_utf8(SV *const sv)
Adds magic to an SV. First upgrades sv
to type SVt_PVMG
if
necessary, then adds a new magic item of type how
to the head of the
magic list.
See sv_magicext
(which sv_magic
now calls) for a description of the
handling of the name
and namlen
arguments.
You need to use sv_magicext
to add magic to SvREADONLY SVs and also
to add more than one instance of the same 'how'.
- void sv_magic(SV *const sv, SV *const obj,
- const int how, const char *const name,
- const I32 namlen)
Adds magic to an SV, upgrading it if necessary. Applies the supplied vtable and returns a pointer to the magic added.
Note that sv_magicext
will allow things that sv_magic
will not.
In particular, you can add magic to SvREADONLY SVs, and add more than
one instance of the same 'how'.
If namlen
is greater than zero then a savepvn
copy of name
is
stored, if namlen
is zero then name
is stored as-is and - as another
special case - if (name && namlen == HEf_SVKEY)
then name
is assumed
to contain an SV*
and is stored as-is with its REFCNT incremented.
(This is now used as a subroutine by sv_magic
.)
- MAGIC * sv_magicext(SV *const sv, SV *const obj,
- const int how,
- const MGVTBL *const vtbl,
- const char *const name,
- const I32 namlen)
Creates a new SV which is a copy of the original SV (using sv_setsv
).
The new SV is marked as mortal. It will be destroyed "soon", either by an
explicit call to FREETMPS, or by an implicit call at places such as
statement boundaries. See also sv_newmortal
and sv_2mortal
.
- SV* sv_mortalcopy(SV *const oldsv)
Creates a new null SV which is mortal. The reference count of the SV is
set to 1. It will be destroyed "soon", either by an explicit call to
FREETMPS, or by an implicit call at places such as statement boundaries.
See also sv_mortalcopy
and sv_2mortal
.
- SV* sv_newmortal()
Increment an SV's reference count. Use the SvREFCNT_inc()
wrapper
instead.
- SV* sv_newref(SV *const sv)
Converts the value pointed to by offsetp from a count of bytes from the start of the string, to a count of the equivalent number of UTF-8 chars. Handles magic and type coercion.
- void sv_pos_b2u(SV *const sv, I32 *const offsetp)
Converts the value pointed to by offsetp from a count of UTF-8 chars from the start of the string, to a count of the equivalent number of bytes; if lenp is non-zero, it does the same to lenp, but this time starting from the offset, rather than from the start of the string. Handles magic and type coercion.
Use sv_pos_u2b_flags
in preference, which correctly handles strings longer
than 2Gb.
- void sv_pos_u2b(SV *const sv, I32 *const offsetp,
- I32 *const lenp)
Converts the value pointed to by offsetp from a count of UTF-8 chars from
the start of the string, to a count of the equivalent number of bytes; if
lenp is non-zero, it does the same to lenp, but this time starting from
the offset, rather than from the start
of the string. Handles type coercion.
flags is passed to SvPV_flags
, and usually should be
SV_GMAGIC|SV_CONST_RETURN
to handle magic.
- STRLEN sv_pos_u2b_flags(SV *const sv, STRLEN uoffset,
- STRLEN *const lenp, U32 flags)
The backend for the SvPVbytex_force
macro. Always use the macro
instead.
- char* sv_pvbyten_force(SV *const sv, STRLEN *const lp)
Get a sensible string out of the SV somehow.
A private implementation of the SvPV_force
macro for compilers which
can't cope with complex macro expressions. Always use the macro instead.
- char* sv_pvn_force(SV* sv, STRLEN* lp)
Get a sensible string out of the SV somehow.
If flags
has SV_GMAGIC
bit set, will mg_get
on sv
if
appropriate, else not. sv_pvn_force
and sv_pvn_force_nomg
are
implemented in terms of this function.
You normally want to use the various wrapper macros instead: see
SvPV_force
and SvPV_force_nomg
- char* sv_pvn_force_flags(SV *const sv,
- STRLEN *const lp,
- const I32 flags)
The backend for the SvPVutf8x_force
macro. Always use the macro
instead.
- char* sv_pvutf8n_force(SV *const sv, STRLEN *const lp)
Returns a string describing what the SV is a reference to.
- const char* sv_reftype(const SV *const sv, const int ob)
Make the first argument a copy of the second, then delete the original.
The target SV physically takes over ownership of the body of the source SV
and inherits its flags; however, the target keeps any magic it owns,
and any magic in the source is discarded.
Note that this is a rather specialist SV copying operation; most of the
time you'll want to use sv_setsv
or one of its many macro front-ends.
- void sv_replace(SV *const sv, SV *const nsv)
Underlying implementation for the reset Perl function.
Note that the perl-level function is vaguely deprecated.
- void sv_reset(const char* s, HV *const stash)
Weaken a reference: set the SvWEAKREF
flag on this RV; give the
referred-to SV PERL_MAGIC_backref
magic if it hasn't already; and
push a back-reference to this RV onto the array of backreferences
associated with that magic. If the RV is magical, set magic will be
called after the RV is cleared.
- SV* sv_rvweaken(SV *const sv)
Copies an integer into the given SV, upgrading first if necessary.
Does not handle 'set' magic. See also sv_setiv_mg
.
- void sv_setiv(SV *const sv, const IV num)
Like sv_setiv
, but also handles 'set' magic.
- void sv_setiv_mg(SV *const sv, const IV i)
Copies a double into the given SV, upgrading first if necessary.
Does not handle 'set' magic. See also sv_setnv_mg
.
- void sv_setnv(SV *const sv, const NV num)
Like sv_setnv
, but also handles 'set' magic.
- void sv_setnv_mg(SV *const sv, const NV num)
Copies a string into an SV. The string must be null-terminated. Does not
handle 'set' magic. See sv_setpv_mg
.
- void sv_setpv(SV *const sv, const char *const ptr)
Works like sv_catpvf
but copies the text into the SV instead of
appending it. Does not handle 'set' magic. See sv_setpvf_mg
.
- void sv_setpvf(SV *const sv, const char *const pat,
- ...)
Like sv_setpvf
, but also handles 'set' magic.
- void sv_setpvf_mg(SV *const sv,
- const char *const pat, ...)
Copies an integer into the given SV, also updating its string value.
Does not handle 'set' magic. See sv_setpviv_mg
.
- void sv_setpviv(SV *const sv, const IV num)
Like sv_setpviv
, but also handles 'set' magic.
- void sv_setpviv_mg(SV *const sv, const IV iv)
Copies a string into an SV. The len
parameter indicates the number of
bytes to be copied. If the ptr
argument is NULL the SV will become
undefined. Does not handle 'set' magic. See sv_setpvn_mg
.
- void sv_setpvn(SV *const sv, const char *const ptr,
- const STRLEN len)
Like sv_setpvn
, but also handles 'set' magic.
- void sv_setpvn_mg(SV *const sv,
- const char *const ptr,
- const STRLEN len)
Like sv_setpvn
, but takes a literal string instead of a string/length pair.
- void sv_setpvs(SV* sv, const char* s)
Like sv_setpvn_mg
, but takes a literal string instead of a
string/length pair.
- void sv_setpvs_mg(SV* sv, const char* s)
Like sv_setpv
, but also handles 'set' magic.
- void sv_setpv_mg(SV *const sv, const char *const ptr)
Copies an integer into a new SV, optionally blessing the SV. The rv
argument will be upgraded to an RV. That RV will be modified to point to
the new SV. The classname
argument indicates the package for the
blessing. Set classname
to NULL
to avoid the blessing. The new SV
will have a reference count of 1, and the RV will be returned.
- SV* sv_setref_iv(SV *const rv,
- const char *const classname,
- const IV iv)
Copies a double into a new SV, optionally blessing the SV. The rv
argument will be upgraded to an RV. That RV will be modified to point to
the new SV. The classname
argument indicates the package for the
blessing. Set classname
to NULL
to avoid the blessing. The new SV
will have a reference count of 1, and the RV will be returned.
- SV* sv_setref_nv(SV *const rv,
- const char *const classname,
- const NV nv)
Copies a pointer into a new SV, optionally blessing the SV. The rv
argument will be upgraded to an RV. That RV will be modified to point to
the new SV. If the pv
argument is NULL then PL_sv_undef
will be placed
into the SV. The classname
argument indicates the package for the
blessing. Set classname
to NULL
to avoid the blessing. The new SV
will have a reference count of 1, and the RV will be returned.
Do not use with other Perl types such as HV, AV, SV, CV, because those objects will become corrupted by the pointer copy process.
Note that sv_setref_pvn
copies the string while this copies the pointer.
- SV* sv_setref_pv(SV *const rv,
- const char *const classname,
- void *const pv)
Copies a string into a new SV, optionally blessing the SV. The length of the
string must be specified with n
. The rv
argument will be upgraded to
an RV. That RV will be modified to point to the new SV. The classname
argument indicates the package for the blessing. Set classname
to
NULL
to avoid the blessing. The new SV will have a reference count
of 1, and the RV will be returned.
Note that sv_setref_pv
copies the pointer while this copies the string.
- SV* sv_setref_pvn(SV *const rv,
- const char *const classname,
- const char *const pv,
- const STRLEN n)
Like sv_setref_pvn
, but takes a literal string instead of a
string/length pair.
- SV * sv_setref_pvs(const char* s)
Copies an unsigned integer into a new SV, optionally blessing the SV. The rv
argument will be upgraded to an RV. That RV will be modified to point to
the new SV. The classname
argument indicates the package for the
blessing. Set classname
to NULL
to avoid the blessing. The new SV
will have a reference count of 1, and the RV will be returned.
- SV* sv_setref_uv(SV *const rv,
- const char *const classname,
- const UV uv)
Copies the contents of the source SV ssv
into the destination SV
dsv
. The source SV may be destroyed if it is mortal, so don't use this
function if the source SV needs to be reused. Does not handle 'set' magic.
Loosely speaking, it performs a copy-by-value, obliterating any previous
content of the destination.
You probably want to use one of the assortment of wrappers, such as
SvSetSV
, SvSetSV_nosteal
, SvSetMagicSV
and
SvSetMagicSV_nosteal
.
- void sv_setsv(SV *dstr, SV *sstr)
Copies the contents of the source SV ssv
into the destination SV
dsv
. The source SV may be destroyed if it is mortal, so don't use this
function if the source SV needs to be reused. Does not handle 'set' magic.
Loosely speaking, it performs a copy-by-value, obliterating any previous
content of the destination.
If the flags
parameter has the SV_GMAGIC
bit set, will mg_get
on
ssv
if appropriate, else not. If the flags
parameter has the NOSTEAL
bit set then the
buffers of temps will not be stolen. <sv_setsv>
and sv_setsv_nomg
are implemented in terms of this function.
You probably want to use one of the assortment of wrappers, such as
SvSetSV
, SvSetSV_nosteal
, SvSetMagicSV
and
SvSetMagicSV_nosteal
.
This is the primary function for copying scalars, and most other copy-ish functions and macros use this underneath.
- void sv_setsv_flags(SV *dstr, SV *sstr,
- const I32 flags)
Like sv_setsv
, but also handles 'set' magic.
- void sv_setsv_mg(SV *const dstr, SV *const sstr)
Copies an unsigned integer into the given SV, upgrading first if necessary.
Does not handle 'set' magic. See also sv_setuv_mg
.
- void sv_setuv(SV *const sv, const UV num)
Like sv_setuv
, but also handles 'set' magic.
- void sv_setuv_mg(SV *const sv, const UV u)
Test an SV for taintedness. Use SvTAINTED
instead.
- bool sv_tainted(SV *const sv)
Returns true if the SV has a true value by Perl's rules.
Use the SvTRUE
macro instead, which may call sv_true()
or may
instead use an in-line version.
- I32 sv_true(SV *const sv)
Removes all magic of type type
from an SV.
Removes all magic of type type
with the specified vtbl
from an SV.
Unsets the RV status of the SV, and decrements the reference count of
whatever was being referenced by the RV. This can almost be thought of
as a reversal of newSVrv
. The cflags
argument can contain
SV_IMMEDIATE_UNREF
to force the reference count to be decremented
(otherwise the decrementing is conditional on the reference count being
different from one or the reference being a readonly SV).
See SvROK_off
.
- void sv_unref_flags(SV *const ref, const U32 flags)
Untaint an SV. Use SvTAINTED_off
instead.
- void sv_untaint(SV *const sv)
Upgrade an SV to a more complex form. Generally adds a new body type to the
SV, then copies across as much information as possible from the old body.
It croaks if the SV is already in a more complex form than requested. You
generally want to use the SvUPGRADE
macro wrapper, which checks the type
before calling sv_upgrade
, and hence does not croak. See also
svtype
.
- void sv_upgrade(SV *const sv, svtype new_type)
Tells an SV to use ptr
to find its string value. Normally the
string is stored inside the SV but sv_usepvn allows the SV to use an
outside string. The ptr
should point to memory that was allocated
by malloc
. It must be the start of a mallocked block
of memory, and not a pointer to the middle of it. The
string length, len
, must be supplied. By default
this function will realloc (i.e. move) the memory pointed to by ptr
,
so that pointer should not be freed or used by the programmer after
giving it to sv_usepvn, and neither should any pointers from "behind"
that pointer (e.g. ptr + 1) be used.
If flags
& SV_SMAGIC is true, will call SvSETMAGIC. If flags
&
SV_HAS_TRAILING_NUL is true, then ptr[len]
must be NUL, and the realloc
will be skipped (i.e. the buffer is actually at least 1 byte longer than
len
, and already meets the requirements for storing in SvPVX
).
- void sv_usepvn_flags(SV *const sv, char* ptr,
- const STRLEN len,
- const U32 flags)
If the PV of the SV is an octet sequence in UTF-8
and contains a multiple-byte character, the SvUTF8
flag is turned on
so that it looks like a character. If the PV contains only single-byte
characters, the SvUTF8
flag stays off.
Scans PV for validity and returns false if the PV is invalid UTF-8.
NOTE: this function is experimental and may change or be removed without notice.
- bool sv_utf8_decode(SV *const sv)
Attempts to convert the PV of an SV from characters to bytes.
If the PV contains a character that cannot fit
in a byte, this conversion will fail;
in this case, either returns false or, if fail_ok
is not
true, croaks.
This is not a general purpose Unicode to byte encoding interface: use the Encode extension for that.
NOTE: this function is experimental and may change or be removed without notice.
- bool sv_utf8_downgrade(SV *const sv,
- const bool fail_ok)
Converts the PV of an SV to UTF-8, but then turns the SvUTF8
flag off so that it looks like octets again.
- void sv_utf8_encode(SV *const sv)
Converts the PV of an SV to its UTF-8-encoded form.
Forces the SV to string form if it is not already.
Will mg_get
on sv
if appropriate.
Always sets the SvUTF8 flag to avoid future validity checks even
if the whole string is the same in UTF-8 as not.
Returns the number of bytes in the converted string
This is not a general purpose byte encoding to Unicode interface: use the Encode extension for that.
- STRLEN sv_utf8_upgrade(SV *sv)
Converts the PV of an SV to its UTF-8-encoded form.
Forces the SV to string form if it is not already.
Always sets the SvUTF8 flag to avoid future validity checks even
if all the bytes are invariant in UTF-8.
If flags
has SV_GMAGIC
bit set,
will mg_get
on sv
if appropriate, else not.
Returns the number of bytes in the converted string
sv_utf8_upgrade
and
sv_utf8_upgrade_nomg
are implemented in terms of this function.
This is not a general purpose byte encoding to Unicode interface: use the Encode extension for that.
- STRLEN sv_utf8_upgrade_flags(SV *const sv,
- const I32 flags)
Like sv_utf8_upgrade, but doesn't do magic on sv
.
- STRLEN sv_utf8_upgrade_nomg(SV *sv)
Processes its arguments like vsprintf
and appends the formatted output
to an SV. Does not handle 'set' magic. See sv_vcatpvf_mg
.
Usually used via its frontend sv_catpvf
.
- void sv_vcatpvf(SV *const sv, const char *const pat,
- va_list *const args)
- void sv_vcatpvfn(SV *const sv, const char *const pat,
- const STRLEN patlen,
- va_list *const args,
- SV **const svargs, const I32 svmax,
- bool *const maybe_tainted)
Processes its arguments like vsprintf
and appends the formatted output
to an SV. Uses an array of SVs if the C style variable argument list is
missing (NULL). When running with taint checks enabled, indicates via
maybe_tainted
if results are untrustworthy (often due to the use of
locales).
If called as sv_vcatpvfn
or flags include SV_GMAGIC
, calls get magic.
Usually used via one of its frontends sv_vcatpvf
and sv_vcatpvf_mg
.
- void sv_vcatpvfn_flags(SV *const sv,
- const char *const pat,
- const STRLEN patlen,
- va_list *const args,
- SV **const svargs,
- const I32 svmax,
- bool *const maybe_tainted,
- const U32 flags)
Like sv_vcatpvf
, but also handles 'set' magic.
Usually used via its frontend sv_catpvf_mg
.
- void sv_vcatpvf_mg(SV *const sv,
- const char *const pat,
- va_list *const args)
Works like sv_vcatpvf
but copies the text into the SV instead of
appending it. Does not handle 'set' magic. See sv_vsetpvf_mg
.
Usually used via its frontend sv_setpvf
.
- void sv_vsetpvf(SV *const sv, const char *const pat,
- va_list *const args)
Works like sv_vcatpvfn
but copies the text into the SV instead of
appending it.
Usually used via one of its frontends sv_vsetpvf
and sv_vsetpvf_mg
.
- void sv_vsetpvfn(SV *const sv, const char *const pat,
- const STRLEN patlen,
- va_list *const args,
- SV **const svargs, const I32 svmax,
- bool *const maybe_tainted)
Like sv_vsetpvf
, but also handles 'set' magic.
Usually used via its frontend sv_setpvf_mg
.
- void sv_vsetpvf_mg(SV *const sv,
- const char *const pat,
- va_list *const args)
Compares the sequence of characters (stored as octets) in b
, blen
with the
sequence of characters (stored as UTF-8) in u
, ulen
. Returns 0 if they are
equal, -1 or -2 if the first string is less than the second string, +1 or +2
if the first string is greater than the second string.
-1 or +1 is returned if the shorter string was identical to the start of the longer string. -2 or +2 is returned if the was a difference between characters within the strings.
- int bytes_cmp_utf8(const U8 *b, STRLEN blen,
- const U8 *u, STRLEN ulen)
Converts a string s of length len
from UTF-8 into native byte encoding.
Unlike utf8_to_bytes but like bytes_to_utf8, returns a pointer to
the newly-created string, and updates len
to contain the new
length. Returns the original string if no conversion occurs, len
is unchanged. Do nothing if is_utf8
points to 0. Sets is_utf8
to
0 if s is converted or consisted entirely of characters that are invariant
in utf8 (i.e., US-ASCII on non-EBCDIC machines).
NOTE: this function is experimental and may change or be removed without notice.
- U8* bytes_from_utf8(const U8 *s, STRLEN *len,
- bool *is_utf8)
Converts a string s of length len
bytes from the native encoding into
UTF-8.
Returns a pointer to the newly-created string, and sets len
to
reflect the new length in bytes.
A NUL character will be written after the end of the string.
If you want to convert to UTF-8 from encodings other than the native (Latin1 or EBCDIC), see sv_recode_to_utf8().
NOTE: this function is experimental and may change or be removed without notice.
- U8* bytes_to_utf8(const U8 *s, STRLEN *len)
Returns true if the leading portions of the strings s1
and s2
(either or both
of which may be in UTF-8) are the same case-insensitively; false otherwise.
How far into the strings to compare is determined by other input parameters.
If u1
is true, the string s1
is assumed to be in UTF-8-encoded Unicode;
otherwise it is assumed to be in native 8-bit encoding. Correspondingly for u2
with respect to s2
.
If the byte length l1
is non-zero, it says how far into s1
to check for fold
equality. In other words, s1
+l1
will be used as a goal to reach. The
scan will not be considered to be a match unless the goal is reached, and
scanning won't continue past that goal. Correspondingly for l2
with respect to
s2
.
If pe1
is non-NULL and the pointer it points to is not NULL, that pointer is
considered an end pointer to the position 1 byte past the maximum point
in s1
beyond which scanning will not continue under any circumstances.
(This routine assumes that UTF-8 encoded input strings are not malformed;
malformed input can cause it to read past pe1
).
This means that if both l1
and pe1
are specified, and pe1
is less than s1
+l1
, the match will never be successful because it can
never
get as far as its goal (and in fact is asserted against). Correspondingly for
pe2
with respect to s2
.
At least one of s1
and s2
must have a goal (at least one of l1
and
l2
must be non-zero), and if both do, both have to be
reached for a successful match. Also, if the fold of a character is multiple
characters, all of them must be matched (see tr21 reference below for
'folding').
Upon a successful match, if pe1
is non-NULL,
it will be set to point to the beginning of the next character of s1
beyond what was matched. Correspondingly for pe2
and s2
.
For case-insensitiveness, the "casefolding" of Unicode is used instead of upper/lowercasing both the characters, see http://www.unicode.org/unicode/reports/tr21/ (Case Mappings).
- I32 foldEQ_utf8(const char *s1, char **pe1, UV l1,
- bool u1, const char *s2, char **pe2,
- UV l2, bool u2)
Returns true if the first len
bytes of the string s are the same whether
or not the string is encoded in UTF-8 (or UTF-EBCDIC on EBCDIC machines). That
is, if they are invariant. On ASCII-ish machines, only ASCII characters
fit this definition, hence the function's name.
If len
is 0, it will be calculated using strlen(s)
.
See also is_utf8_string(), is_utf8_string_loclen(), and is_utf8_string_loc().
- bool is_ascii_string(const U8 *s, STRLEN len)
DEPRECATED!
Tests if some arbitrary number of bytes begins in a valid UTF-8 character. Note that an INVARIANT (i.e. ASCII on non-EBCDIC machines) character is a valid UTF-8 character. The actual number of bytes in the UTF-8 character will be returned if it is valid, otherwise 0.
This function is deprecated due to the possibility that malformed input could cause reading beyond the end of the input buffer. Use is_utf8_char_buf instead.
- STRLEN is_utf8_char(const U8 *s)
Returns the number of bytes that comprise the first UTF-8 encoded character in
buffer buf
. buf_end
should point to one position beyond the end of the
buffer. 0 is returned if buf
does not point to a complete, valid UTF-8
encoded character.
Note that an INVARIANT character (i.e. ASCII on non-EBCDIC machines) is a valid UTF-8 character.
- STRLEN is_utf8_char_buf(const U8 *buf,
- const U8 *buf_end)
Returns true if the first len
bytes of string s form a valid
UTF-8 string, false otherwise. If len
is 0, it will be calculated
using strlen(s)
(which means if you use this option, that s has to have a
terminating NUL byte). Note that all characters being ASCII constitute 'a
valid UTF-8 string'.
See also is_ascii_string(), is_utf8_string_loclen(), and is_utf8_string_loc().
- bool is_utf8_string(const U8 *s, STRLEN len)
Like is_utf8_string but stores the location of the failure (in the
case of "utf8ness failure") or the location s+len
(in the case of
"utf8ness success") in the ep
.
See also is_utf8_string_loclen() and is_utf8_string().
- bool is_utf8_string_loc(const U8 *s, STRLEN len,
- const U8 **ep)
Like is_utf8_string() but stores the location of the failure (in the
case of "utf8ness failure") or the location s+len
(in the case of
"utf8ness success") in the ep
, and the number of UTF-8
encoded characters in the el
.
See also is_utf8_string_loc() and is_utf8_string().
- bool is_utf8_string_loclen(const U8 *s, STRLEN len,
- const U8 **ep, STRLEN *el)
Build to the scalar dsv
a displayable version of the string spv
,
length len
, the displayable version being at most pvlim
bytes long
(if longer, the rest is truncated and "..." will be appended).
The flags
argument can have UNI_DISPLAY_ISPRINT set to display
isPRINT()able characters as themselves, UNI_DISPLAY_BACKSLASH
to display the \\[nrfta\\] as the backslashed versions (like '\n')
(UNI_DISPLAY_BACKSLASH is preferred over UNI_DISPLAY_ISPRINT for \\).
UNI_DISPLAY_QQ (and its alias UNI_DISPLAY_REGEX) have both
UNI_DISPLAY_BACKSLASH and UNI_DISPLAY_ISPRINT turned on.
The pointer to the PV of the dsv
is returned.
- char* pv_uni_display(SV *dsv, const U8 *spv,
- STRLEN len, STRLEN pvlim,
- UV flags)
The encoding is assumed to be an Encode object, the PV of the ssv is assumed to be octets in that encoding and decoding the input starts from the position which (PV + *offset) pointed to. The dsv will be concatenated the decoded UTF-8 string from ssv. Decoding will terminate when the string tstr appears in decoding output or the input ends on the PV of the ssv. The value which the offset points will be modified to the last input position on the ssv.
Returns TRUE if the terminator was found, else returns FALSE.
The encoding is assumed to be an Encode object, on entry the PV of the sv is assumed to be octets in that encoding, and the sv will be converted into Unicode (and UTF-8).
If the sv already is UTF-8 (or if it is not POK), or if the encoding
is not a reference, nothing is done to the sv. If the encoding is not
an Encode::XS
Encoding object, bad things will happen.
(See lib/encoding.pm and Encode.)
The PV of the sv is returned.
- char* sv_recode_to_utf8(SV* sv, SV *encoding)
Build to the scalar dsv
a displayable version of the scalar sv
,
the displayable version being at most pvlim
bytes long
(if longer, the rest is truncated and "..." will be appended).
The flags
argument is as in pv_uni_display().
The pointer to the PV of the dsv
is returned.
- char* sv_uni_display(SV *dsv, SV *ssv, STRLEN pvlim,
- UV flags)
The p
contains the pointer to the UTF-8 string encoding
the character that is being converted. This routine assumes that the character
at p
is well-formed.
The ustrp
is a pointer to the character buffer to put the
conversion result to. The lenp
is a pointer to the length
of the result.
The swashp
is a pointer to the swash to use.
Both the special and normal mappings are stored in lib/unicore/To/Foo.pl,
and loaded by SWASHNEW, using lib/utf8_heavy.pl. The special
(usually,
but not always, a multicharacter mapping), is tried first.
The special
is a string like "utf8::ToSpecLower", which means the
hash %utf8::ToSpecLower. The access to the hash is through
Perl_to_utf8_case().
The normal
is a string like "ToLower" which means the swash
%utf8::ToLower.
- UV to_utf8_case(const U8 *p, U8* ustrp,
- STRLEN *lenp, SV **swashp,
- const char *normal,
- const char *special)
Convert the UTF-8 encoded character at p
to its foldcase version and
store that in UTF-8 in ustrp
and its length in bytes in lenp
. Note
that the ustrp
needs to be at least UTF8_MAXBYTES_CASE+1 bytes since the
foldcase version may be longer than the original character (up to
three characters).
The first character of the foldcased version is returned (but note, as explained above, that there may be more.)
The character at p
is assumed by this routine to be well-formed.
- UV to_utf8_fold(const U8 *p, U8* ustrp,
- STRLEN *lenp)
Convert the UTF-8 encoded character at p
to its lowercase version and
store that in UTF-8 in ustrp and its length in bytes in lenp
. Note
that the ustrp
needs to be at least UTF8_MAXBYTES_CASE+1 bytes since the
lowercase version may be longer than the original character.
The first character of the lowercased version is returned (but note, as explained above, that there may be more.)
The character at p
is assumed by this routine to be well-formed.
- UV to_utf8_lower(const U8 *p, U8* ustrp,
- STRLEN *lenp)
Convert the UTF-8 encoded character at p
to its titlecase version and
store that in UTF-8 in ustrp
and its length in bytes in lenp
. Note
that the ustrp
needs to be at least UTF8_MAXBYTES_CASE+1 bytes since the
titlecase version may be longer than the original character.
The first character of the titlecased version is returned (but note, as explained above, that there may be more.)
The character at p
is assumed by this routine to be well-formed.
- UV to_utf8_title(const U8 *p, U8* ustrp,
- STRLEN *lenp)
Convert the UTF-8 encoded character at p
to its uppercase version and
store that in UTF-8 in ustrp
and its length in bytes in lenp
. Note
that the ustrp needs to be at least UTF8_MAXBYTES_CASE+1 bytes since
the uppercase version may be longer than the original character.
The first character of the uppercased version is returned (but note, as explained above, that there may be more.)
The character at p
is assumed by this routine to be well-formed.
- UV to_utf8_upper(const U8 *p, U8* ustrp,
- STRLEN *lenp)
Returns the native character value of the first character in the string
s
which is assumed to be in UTF-8 encoding; retlen
will be set to the
length, in bytes, of that character.
length and flags
are the same as utf8n_to_uvuni().
- UV utf8n_to_uvchr(const U8 *s, STRLEN curlen,
- STRLEN *retlen, U32 flags)
Bottom level UTF-8 decode routine.
Returns the code point value of the first character in the string s,
which is assumed to be in UTF-8 (or UTF-EBCDIC) encoding, and no longer than
curlen
bytes; *retlen
(if retlen
isn't NULL) will be set to
the length, in bytes, of that character.
The value of flags
determines the behavior when s does not point to a
well-formed UTF-8 character. If flags
is 0, when a malformation is found,
zero is returned and *retlen
is set so that (s + *retlen
) is the
next possible position in s that could begin a non-malformed character.
Also, if UTF-8 warnings haven't been lexically disabled, a warning is raised.
Various ALLOW flags can be set in flags
to allow (and not warn on)
individual types of malformations, such as the sequence being overlong (that
is, when there is a shorter sequence that can express the same code point;
overlong sequences are expressly forbidden in the UTF-8 standard due to
potential security issues). Another malformation example is the first byte of
a character not being a legal first byte. See utf8.h for the list of such
flags. For allowed 0 length strings, this function returns 0; for allowed
overlong sequences, the computed code point is returned; for all other allowed
malformations, the Unicode REPLACEMENT CHARACTER is returned, as these have no
determinable reasonable value.
The UTF8_CHECK_ONLY flag overrides the behavior when a non-allowed (by other
flags) malformation is found. If this flag is set, the routine assumes that
the caller will raise a warning, and this function will silently just set
retlen
to -1
(cast to STRLEN
) and return zero.
Note that this API requires disambiguation between successful decoding a NUL
character, and an error return (unless the UTF8_CHECK_ONLY flag is set), as
in both cases, 0 is returned. To disambiguate, upon a zero return, see if the
first byte of s is 0 as well. If so, the input was a NUL; if not, the input
had an error.
Certain code points are considered problematic. These are Unicode surrogates,
Unicode non-characters, and code points above the Unicode maximum of 0x10FFFF.
By default these are considered regular code points, but certain situations
warrant special handling for them. If flags
contains
UTF8_DISALLOW_ILLEGAL_INTERCHANGE, all three classes are treated as
malformations and handled as such. The flags UTF8_DISALLOW_SURROGATE,
UTF8_DISALLOW_NONCHAR, and UTF8_DISALLOW_SUPER (meaning above the legal Unicode
maximum) can be set to disallow these categories individually.
The flags UTF8_WARN_ILLEGAL_INTERCHANGE, UTF8_WARN_SURROGATE, UTF8_WARN_NONCHAR, and UTF8_WARN_SUPER will cause warning messages to be raised for their respective categories, but otherwise the code points are considered valid (not malformations). To get a category to both be treated as a malformation and raise a warning, specify both the WARN and DISALLOW flags. (But note that warnings are not raised if lexically disabled nor if UTF8_CHECK_ONLY is also specified.)
Very large code points (above 0x7FFF_FFFF) are considered more problematic than the others that are above the Unicode legal maximum. There are several reasons: they requre at least 32 bits to represent them on ASCII platforms, are not representable at all on EBCDIC platforms, and the original UTF-8 specification never went above this number (the current 0x10FFFF limit was imposed later). (The smaller ones, those that fit into 32 bits, are representable by a UV on ASCII platforms, but not by an IV, which means that the number of operations that can be performed on them is quite restricted.) The UTF-8 encoding on ASCII platforms for these large code points begins with a byte containing 0xFE or 0xFF. The UTF8_DISALLOW_FE_FF flag will cause them to be treated as malformations, while allowing smaller above-Unicode code points. (Of course UTF8_DISALLOW_SUPER will treat all above-Unicode code points, including these, as malformations.) Similarly, UTF8_WARN_FE_FF acts just like the other WARN flags, but applies just to these code points.
All other code points corresponding to Unicode characters, including private use and those yet to be assigned, are never considered malformed and never warn.
Most code should use utf8_to_uvchr_buf() rather than call this directly.
- UV utf8n_to_uvuni(const U8 *s, STRLEN curlen,
- STRLEN *retlen, U32 flags)
Returns the number of UTF-8 characters between the UTF-8 pointers a
and b
.
WARNING: use only if you *know* that the pointers point inside the same UTF-8 buffer.
- IV utf8_distance(const U8 *a, const U8 *b)
Return the UTF-8 pointer s displaced by off
characters, either
forward or backward.
WARNING: do not use the following unless you *know* off
is within
the UTF-8 data pointed to by s *and* that on entry s is aligned
on the first byte of character or just after the last byte of a character.
- U8* utf8_hop(const U8 *s, I32 off)
Return the length of the UTF-8 char encoded string s in characters.
Stops at e
(inclusive). If e < s or if the scan would end
up past e
, croaks.
- STRLEN utf8_length(const U8* s, const U8 *e)
Converts a string s of length len
from UTF-8 into native byte encoding.
Unlike bytes_to_utf8, this over-writes the original string, and
updates len
to contain the new length.
Returns zero on failure, setting len
to -1.
If you need a copy of the string, see bytes_from_utf8.
NOTE: this function is experimental and may change or be removed without notice.
- U8* utf8_to_bytes(U8 *s, STRLEN *len)
DEPRECATED!
Returns the native code point of the first character in the string s
which is assumed to be in UTF-8 encoding; retlen
will be set to the
length, in bytes, of that character.
Some, but not all, UTF-8 malformations are detected, and in fact, some malformed input could cause reading beyond the end of the input buffer, which is why this function is deprecated. Use utf8_to_uvchr_buf instead.
If s points to one of the detected malformations, and UTF8 warnings are
enabled, zero is returned and *retlen
is set (if retlen
isn't
NULL) to -1. If those warnings are off, the computed value if well-defined (or
the Unicode REPLACEMENT CHARACTER, if not) is silently returned, and *retlen
is set (if retlen
isn't NULL) so that (s + *retlen
) is the
next possible position in s that could begin a non-malformed character.
See utf8n_to_uvuni for details on when the REPLACEMENT CHARACTER is returned.
- UV utf8_to_uvchr(const U8 *s, STRLEN *retlen)
Returns the native code point of the first character in the string s which
is assumed to be in UTF-8 encoding; send points to 1 beyond the end of s.
*retlen
will be set to the length, in bytes, of that character.
If s does not point to a well-formed UTF-8 character and UTF8 warnings are
enabled, zero is returned and *retlen
is set (if retlen
isn't
NULL) to -1. If those warnings are off, the computed value, if well-defined
(or the Unicode REPLACEMENT CHARACTER if not), is silently returned, and
*retlen
is set (if retlen
isn't NULL) so that (s + *retlen
) is
the next possible position in s that could begin a non-malformed character.
See utf8n_to_uvuni for details on when the REPLACEMENT CHARACTER is
returned.
- UV utf8_to_uvchr_buf(const U8 *s, const U8 *send,
- STRLEN *retlen)
DEPRECATED!
Returns the Unicode code point of the first character in the string s
which is assumed to be in UTF-8 encoding; retlen
will be set to the
length, in bytes, of that character.
This function should only be used when the returned UV is considered an index into the Unicode semantic tables (e.g. swashes).
Some, but not all, UTF-8 malformations are detected, and in fact, some malformed input could cause reading beyond the end of the input buffer, which is why this function is deprecated. Use utf8_to_uvuni_buf instead.
If s points to one of the detected malformations, and UTF8 warnings are
enabled, zero is returned and *retlen
is set (if retlen
doesn't point to
NULL) to -1. If those warnings are off, the computed value if well-defined (or
the Unicode REPLACEMENT CHARACTER, if not) is silently returned, and *retlen
is set (if retlen
isn't NULL) so that (s + *retlen
) is the
next possible position in s that could begin a non-malformed character.
See utf8n_to_uvuni for details on when the REPLACEMENT CHARACTER is returned.
- UV utf8_to_uvuni(const U8 *s, STRLEN *retlen)
Returns the Unicode code point of the first character in the string s which
is assumed to be in UTF-8 encoding; send points to 1 beyond the end of s.
retlen
will be set to the length, in bytes, of that character.
This function should only be used when the returned UV is considered an index into the Unicode semantic tables (e.g. swashes).
If s does not point to a well-formed UTF-8 character and UTF8 warnings are
enabled, zero is returned and *retlen
is set (if retlen
isn't
NULL) to -1. If those warnings are off, the computed value if well-defined (or
the Unicode REPLACEMENT CHARACTER, if not) is silently returned, and *retlen
is set (if retlen
isn't NULL) so that (s + *retlen
) is the
next possible position in s that could begin a non-malformed character.
See utf8n_to_uvuni for details on when the REPLACEMENT CHARACTER is returned.
- UV utf8_to_uvuni_buf(const U8 *s, const U8 *send,
- STRLEN *retlen)
Adds the UTF-8 representation of the Native code point uv
to the end
of the string d
; d
should have at least UTF8_MAXBYTES+1
free
bytes available. The return value is the pointer to the byte after the
end of the new character. In other words,
- d = uvchr_to_utf8(d, uv);
is the recommended wide native character-aware way of saying
- *(d++) = uv;
- U8* uvchr_to_utf8(U8 *d, UV uv)
Adds the UTF-8 representation of the Unicode code point uv
to the end
of the string d
; d
should have at least UTF8_MAXBYTES+1
free
bytes available. The return value is the pointer to the byte after the
end of the new character. In other words,
- d = uvuni_to_utf8_flags(d, uv, flags);
or, in most cases,
- d = uvuni_to_utf8(d, uv);
(which is equivalent to)
- d = uvuni_to_utf8_flags(d, uv, 0);
This is the recommended Unicode-aware way of saying
- *(d++) = uv;
where uv is a code point expressed in Latin-1 or above, not the platform's native character set. Almost all code should instead use uvchr_to_utf8 or uvchr_to_utf8_flags.
This function will convert to UTF-8 (and not warn) even code points that aren't
legal Unicode or are problematic, unless flags
contains one or more of the
following flags:
If uv
is a Unicode surrogate code point and UNICODE_WARN_SURROGATE is set,
the function will raise a warning, provided UTF8 warnings are enabled. If instead
UNICODE_DISALLOW_SURROGATE is set, the function will fail and return NULL.
If both flags are set, the function will both warn and return NULL.
The UNICODE_WARN_NONCHAR and UNICODE_DISALLOW_NONCHAR flags correspondingly affect how the function handles a Unicode non-character. And likewise, the UNICODE_WARN_SUPER and UNICODE_DISALLOW_SUPER flags, affect the handling of code points that are above the Unicode maximum of 0x10FFFF. Code points above 0x7FFF_FFFF (which are even less portable) can be warned and/or disallowed even if other above-Unicode code points are accepted by the UNICODE_WARN_FE_FF and UNICODE_DISALLOW_FE_FF flags.
And finally, the flag UNICODE_WARN_ILLEGAL_INTERCHANGE selects all four of the above WARN flags; and UNICODE_DISALLOW_ILLEGAL_INTERCHANGE selects all four DISALLOW flags.
- U8* uvuni_to_utf8_flags(U8 *d, UV uv, UV flags)
xsubpp
and xsubpp
internal functionsVariable which is setup by xsubpp
to indicate the stack base offset,
used by the ST
, XSprePUSH
and XSRETURN
macros. The dMARK
macro
must be called prior to setup the MARK
variable.
- I32 ax
Variable which is setup by xsubpp
to indicate the
class name for a C++ XS constructor. This is always a char*
. See THIS
.
- char* CLASS
Sets up the ax
variable.
This is usually handled automatically by xsubpp
by calling dXSARGS
.
- dAX;
Sets up the ax
variable and stack marker variable mark
.
This is usually handled automatically by xsubpp
by calling dXSARGS
.
- dAXMARK;
Sets up the items
variable.
This is usually handled automatically by xsubpp
by calling dXSARGS
.
- dITEMS;
Sets up any variable needed by the UNDERBAR
macro. It used to define
padoff_du
, but it is currently a noop. However, it is strongly advised
to still use it for ensuring past and future compatibility.
- dUNDERBAR;
Sets up stack and mark pointers for an XSUB, calling dSP and dMARK.
Sets up the ax
and items
variables by calling dAX
and dITEMS
.
This is usually handled automatically by xsubpp
.
- dXSARGS;
Sets up the ix
variable for an XSUB which has aliases. This is usually
handled automatically by xsubpp
.
- dXSI32;
Variable which is setup by xsubpp
to indicate the number of
items on the stack. See Variable-length Parameter Lists in perlxs.
- I32 items
Variable which is setup by xsubpp
to indicate which of an
XSUB's aliases was used to invoke it. See The ALIAS: Keyword in perlxs.
- I32 ix
Used by xsubpp
to hook up XSUBs as Perl subs. Adds Perl prototypes to
the subs.
Variable which is setup by xsubpp
to hold the return value for an
XSUB. This is always the proper type for the XSUB. See
The RETVAL Variable in perlxs.
- (whatever) RETVAL
Used to access elements on the XSUB's stack.
- SV* ST(int ix)
Variable which is setup by xsubpp
to designate the object in a C++
XSUB. This is always the proper type for the C++ object. See CLASS
and
Using XS With C++ in perlxs.
- (whatever) THIS
The SV* corresponding to the $_ variable. Works even if there is a lexical $_ in scope.
Macro to declare an XSUB and its C parameter list. This is handled by
xsubpp
. It is the same as using the more explicit XS_EXTERNAL macro.
Macro to verify that the perl api version an XS module has been compiled against matches the api version of the perl interpreter it's being loaded into.
- XS_APIVERSION_BOOTCHECK;
Macro to declare an XSUB and its C parameter list explicitly exporting the symbols.
Macro to declare an XSUB and its C parameter list without exporting the symbols.
This is handled by xsubpp
and generally preferable over exporting the XSUB
symbols unnecessarily.
The version identifier for an XS module. This is usually
handled automatically by ExtUtils::MakeMaker
. See XS_VERSION_BOOTCHECK
.
Macro to verify that a PM module's $VERSION variable matches the XS
module's XS_VERSION
variable. This is usually handled automatically by
xsubpp
. See The VERSIONCHECK: Keyword in perlxs.
- XS_VERSION_BOOTCHECK;
This is an XS interface to Perl's die function.
Take a sprintf-style format pattern and argument list. These are used to generate a string message. If the message does not end with a newline, then it will be extended with some indication of the current location in the code, as described for mess_sv.
The error message will be used as an exception, by default
returning control to the nearest enclosing eval, but subject to
modification by a $SIG{__DIE__}
handler. In any case, the croak
function never returns normally.
For historical reasons, if pat
is null then the contents of ERRSV
($@
) will be used as an error message or object instead of building an
error message from arguments. If you want to throw a non-string object,
or build an error message in an SV yourself, it is preferable to use
the croak_sv function, which does not involve clobbering ERRSV
.
- void croak(const char *pat, ...)
Exactly equivalent to Perl_croak(aTHX_ "%s", PL_no_modify)
, but generates
terser object code than using Perl_croak
. Less code used on exception code
paths reduces CPU cache pressure.
- void croak_no_modify()
This is an XS interface to Perl's die function.
baseex
is the error message or object. If it is a reference, it
will be used as-is. Otherwise it is used as a string, and if it does
not end with a newline then it will be extended with some indication of
the current location in the code, as described for mess_sv.
The error message or object will be used as an exception, by default
returning control to the nearest enclosing eval, but subject to
modification by a $SIG{__DIE__}
handler. In any case, the croak_sv
function never returns normally.
To die with a simple string message, the croak function may be more convenient.
- void croak_sv(SV *baseex)
Behaves the same as croak, except for the return type.
It should be used only where the OP *
return type is required.
The function never actually returns.
- OP * die(const char *pat, ...)
Behaves the same as croak_sv, except for the return type.
It should be used only where the OP *
return type is required.
The function never actually returns.
- OP * die_sv(SV *baseex)
This is an XS interface to Perl's die function.
pat
and args
are a sprintf-style format pattern and encapsulated
argument list. These are used to generate a string message. If the
message does not end with a newline, then it will be extended with
some indication of the current location in the code, as described for
mess_sv.
The error message will be used as an exception, by default
returning control to the nearest enclosing eval, but subject to
modification by a $SIG{__DIE__}
handler. In any case, the croak
function never returns normally.
For historical reasons, if pat
is null then the contents of ERRSV
($@
) will be used as an error message or object instead of building an
error message from arguments. If you want to throw a non-string object,
or build an error message in an SV yourself, it is preferable to use
the croak_sv function, which does not involve clobbering ERRSV
.
- void vcroak(const char *pat, va_list *args)
This is an XS interface to Perl's warn function.
pat
and args
are a sprintf-style format pattern and encapsulated
argument list. These are used to generate a string message. If the
message does not end with a newline, then it will be extended with
some indication of the current location in the code, as described for
mess_sv.
The error message or object will by default be written to standard error,
but this is subject to modification by a $SIG{__WARN__}
handler.
Unlike with vcroak, pat
is not permitted to be null.
- void vwarn(const char *pat, va_list *args)
This is an XS interface to Perl's warn function.
Take a sprintf-style format pattern and argument list. These are used to generate a string message. If the message does not end with a newline, then it will be extended with some indication of the current location in the code, as described for mess_sv.
The error message or object will by default be written to standard error,
but this is subject to modification by a $SIG{__WARN__}
handler.
Unlike with croak, pat
is not permitted to be null.
- void warn(const char *pat, ...)
This is an XS interface to Perl's warn function.
baseex
is the error message or object. If it is a reference, it
will be used as-is. Otherwise it is used as a string, and if it does
not end with a newline then it will be extended with some indication of
the current location in the code, as described for mess_sv.
The error message or object will by default be written to standard error,
but this is subject to modification by a $SIG{__WARN__}
handler.
To warn with a simple string message, the warn function may be more convenient.
- void warn_sv(SV *baseex)
The following functions have been flagged as part of the public API, but are currently undocumented. Use them at your own risk, as the interfaces are subject to change. Functions that are not listed in this document are not intended for public use, and should NOT be used under any circumstances.
If you use one of the undocumented functions below, you may wish to consider creating and submitting documentation for it. If your patch is accepted, this will indicate that the interface is stable (unless it is explicitly marked otherwise).
Until May 1997, this document was maintained by Jeff Okamoto <okamoto@corp.hp.com>. It is now maintained as part of Perl itself.
With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, Stephen McCamant, and Gurusamy Sarathy.
API Listing originally by Dean Roehrich <roehrich@cray.com>.
Updated to be autogenerated from comments in the source by Benjamin Stuhl.
perlguts, perlxs, perlxstut, perlintern
perlapio - perl's IO abstraction interface.
- #define PERLIO_NOT_STDIO 0 /* For co-existence with stdio only */
- #include <perlio.h> /* Usually via #include <perl.h> */
- PerlIO *PerlIO_stdin(void);
- PerlIO *PerlIO_stdout(void);
- PerlIO *PerlIO_stderr(void);
- PerlIO *PerlIO_open(const char *path,const char *mode);
- PerlIO *PerlIO_fdopen(int fd, const char *mode);
- PerlIO *PerlIO_reopen(const char *path, const char *mode, PerlIO *old); /* deprecated */
- int PerlIO_close(PerlIO *f);
- int PerlIO_stdoutf(const char *fmt,...)
- int PerlIO_puts(PerlIO *f,const char *string);
- int PerlIO_putc(PerlIO *f,int ch);
- int PerlIO_write(PerlIO *f,const void *buf,size_t numbytes);
- int PerlIO_printf(PerlIO *f, const char *fmt,...);
- int PerlIO_vprintf(PerlIO *f, const char *fmt, va_list args);
- int PerlIO_flush(PerlIO *f);
- int PerlIO_eof(PerlIO *f);
- int PerlIO_error(PerlIO *f);
- void PerlIO_clearerr(PerlIO *f);
- int PerlIO_getc(PerlIO *d);
- int PerlIO_ungetc(PerlIO *f,int ch);
- int PerlIO_read(PerlIO *f, void *buf, size_t numbytes);
- int PerlIO_fileno(PerlIO *f);
- void PerlIO_setlinebuf(PerlIO *f);
- Off_t PerlIO_tell(PerlIO *f);
- int PerlIO_seek(PerlIO *f, Off_t offset, int whence);
- void PerlIO_rewind(PerlIO *f);
- int PerlIO_getpos(PerlIO *f, SV *save); /* prototype changed */
- int PerlIO_setpos(PerlIO *f, SV *saved); /* prototype changed */
- int PerlIO_fast_gets(PerlIO *f);
- int PerlIO_has_cntptr(PerlIO *f);
- int PerlIO_get_cnt(PerlIO *f);
- char *PerlIO_get_ptr(PerlIO *f);
- void PerlIO_set_ptrcnt(PerlIO *f, char *ptr, int count);
- int PerlIO_canset_cnt(PerlIO *f); /* deprecated */
- void PerlIO_set_cnt(PerlIO *f, int count); /* deprecated */
- int PerlIO_has_base(PerlIO *f);
- char *PerlIO_get_base(PerlIO *f);
- int PerlIO_get_bufsiz(PerlIO *f);
- PerlIO *PerlIO_importFILE(FILE *stdio, const char *mode);
- FILE *PerlIO_exportFILE(PerlIO *f, int flags);
- FILE *PerlIO_findFILE(PerlIO *f);
- void PerlIO_releaseFILE(PerlIO *f,FILE *stdio);
- int PerlIO_apply_layers(PerlIO *f, const char *mode, const char *layers);
- int PerlIO_binmode(PerlIO *f, int ptype, int imode, const char *layers);
- void PerlIO_debug(const char *fmt,...)
Perl's source code, and extensions that want maximum portability,
should use the above functions instead of those defined in ANSI C's
stdio.h. The perl headers (in particular "perlio.h") will
#define
them to the I/O mechanism selected at Configure time.
The functions are modeled on those in stdio.h, but parameter order has been "tidied up a little".
PerlIO *
takes the place of FILE *. Like FILE * it should be
treated as opaque (it is probably safe to assume it is a pointer to
something).
There are currently three implementations:
All above are #define'd to stdio functions or are trivial wrapper functions which call stdio. In this case only PerlIO * is a FILE *. This has been the default implementation since the abstraction was introduced in perl5.003_02.
A "legacy" implementation in terms of the "sfio" library. Used for some specialist applications on Unix machines ("sfio" is not widely ported away from Unix). Most of above are #define'd to the sfio functions. PerlIO * is in this case Sfio_t *.
Introduced just after perl5.7.0, this is a re-implementation of the above abstraction which allows perl more control over how IO is done as it decouples IO from the way the operating system and C library choose to do things. For USE_PERLIO PerlIO * has an extra layer of indirection - it is a pointer-to-a-pointer. This allows the PerlIO * to remain with a known value while swapping the implementation around underneath at run time. In this case all the above are true (but very simple) functions which call the underlying implementation.
This is the only implementation for which PerlIO_apply_layers()
does anything "interesting".
The USE_PERLIO implementation is described in perliol.
Because "perlio.h" is a thin layer (for efficiency) the semantics of these functions are somewhat dependent on the underlying implementation. Where these variations are understood they are noted below.
Unless otherwise noted, functions return 0 on success, or a negative
value (usually EOF
which is usually -1) and set errno
on error.
Use these rather than stdin
, stdout
, stderr
. They are written
to look like "function calls" rather than variables because this makes
it easier to make them function calls if platform cannot export data
to loaded modules, or if (say) different "threads" might have different
values.
These correspond to fopen()/fdopen() and the arguments are the same.
Return NULL
and set errno
if there is an error. There may be an
implementation limit on the number of open handles, which may be lower
than the limit on the number of open files - errno
may not be set
when NULL
is returned if this limit is exceeded.
While this currently exists in all three implementations perl itself does not use it. As perl does not use it, it is not well tested.
Perl prefers to dup
the new low-level descriptor to the descriptor
used by the existing PerlIO. This may become the behaviour of this
function in the future.
These are fprintf()/vfprintf() equivalents.
This is printf() equivalent. printf is #defined to this function,
so it is (currently) legal to use printf(fmt,...) in perl sources.
These correspond functionally to fread() and fwrite() but the
arguments and return values are different. The PerlIO_read() and
PerlIO_write() signatures have been modeled on the more sane low level
read() and write() functions instead: The "file" argument is passed
first, there is only one "count", and the return value can distinguish
between error and EOF
.
Returns a byte count if successful (which may be zero or
positive), returns negative value and sets errno
on error.
Depending on implementation errno
may be EINTR
if operation was
interrupted by a signal.
Depending on implementation errno
may be EINTR
if operation was
interrupted by a signal.
These correspond to fputs() and fputc(). Note that arguments have been revised to have "file" first.
This corresponds to ungetc(). Note that arguments have been revised
to have "file" first. Arranges that next read operation will return
the byte c. Despite the implied "character" in the name only
values in the range 0..0xFF are defined. Returns the byte c on
success or -1 (EOF
) on error. The number of bytes that can be
"pushed back" may vary, only 1 character is certain, and then only if
it is the last character that was read from the handle.
This corresponds to getc().
Despite the c in the name only byte range 0..0xFF is supported.
Returns the character read or -1 (EOF
) on error.
This corresponds to feof(). Returns a true/false indication of whether the handle is at end of file. For terminal devices this may or may not be "sticky" depending on the implementation. The flag is cleared by PerlIO_seek(), or PerlIO_rewind().
This corresponds to ferror(). Returns a true/false indication of whether there has been an IO error on the handle.
This corresponds to fileno(), note that on some platforms, the meaning of "fileno" may not match Unix. Returns -1 if the handle has no open descriptor associated with it.
This corresponds to clearerr(), i.e., clears 'error' and (usually) 'eof' flags for the "stream". Does not return a value.
This corresponds to fflush(). Sends any buffered write data to the
underlying file. If called with NULL
this may flush all open
streams (or core dump with some USE_STDIO implementations). Calling
on a handle open for read only, or on which last operation was a read
of some kind may lead to undefined behaviour on some USE_STDIO
implementations. The USE_PERLIO (layers) implementation tries to
behave better: it flushes all open streams when passed NULL
, and
attempts to retain data on read streams either in the buffer or by
seeking the handle to the current logical position.
This corresponds to fseek(). Sends buffered write data to the
underlying file, or discards any buffered read data, then positions
the file descriptor as specified by offset and whence (sic).
This is the correct thing to do when switching between read and write
on the same handle (see issues with PerlIO_flush() above). Offset is
of type Off_t
which is a perl Configure value which may not be same
as stdio's off_t
.
This corresponds to ftell(). Returns the current file position, or
(Off_t) -1 on error. May just return value system "knows" without
making a system call or checking the underlying file descriptor (so
use on shared file descriptors is not safe without a
PerlIO_seek()). Return value is of type Off_t
which is a perl
Configure value which may not be same as stdio's off_t
.
These correspond (loosely) to fgetpos() and fsetpos(). Rather than stdio's Fpos_t they expect a "Perl Scalar Value" to be passed. What is stored there should be considered opaque. The layout of the data may vary from handle to handle. When not using stdio or if platform does not have the stdio calls then they are implemented in terms of PerlIO_tell() and PerlIO_seek().
This corresponds to rewind(). It is usually defined as being
- PerlIO_seek(f,(Off_t)0L, SEEK_SET);
- PerlIO_clearerr(f);
This corresponds to tmpfile(), i.e., returns an anonymous PerlIO or
NULL on error. The system will attempt to automatically delete the
file when closed. On Unix the file is usually unlink-ed just after
it is created so it does not matter how it gets closed. On other
systems the file may only be deleted if closed via PerlIO_close()
and/or the program exits via exit. Depending on the implementation
there may be "race conditions" which allow other processes access to
the file, though in general it will be safer in this regard than
ad. hoc. schemes.
This corresponds to setlinebuf(). Does not return a value. What constitutes a "line" is implementation dependent but usually means that writing "\n" flushes the buffer. What happens with things like "this\nthat" is uncertain. (Perl core uses it only when "dumping"; it has nothing to do with $| auto-flush.)
There is outline support for co-existence of PerlIO with stdio. Obviously if PerlIO is implemented in terms of stdio there is no problem. However in other cases then mechanisms must exist to create a FILE * which can be passed to library code which is going to use stdio calls.
The first step is to add this line:
- #define PERLIO_NOT_STDIO 0
before including any perl header files. (This will probably become the default at some point). That prevents "perlio.h" from attempting to #define stdio functions onto PerlIO functions.
XS code is probably better using "typemap" if it expects FILE * arguments. The standard typemap will be adjusted to comprehend any changes in this area.
Used to get a PerlIO * from a FILE *.
The mode argument should be a string as would be passed to fopen/PerlIO_open. If it is NULL then - for legacy support - the code will (depending upon the platform and the implementation) either attempt to empirically determine the mode in which f is open, or use "r+" to indicate a read/write stream.
Once called the FILE * should ONLY be closed by calling
PerlIO_close()
on the returned PerlIO *.
The PerlIO is set to textmode. Use PerlIO_binmode if this is not the desired mode.
This is not the reverse of PerlIO_exportFILE().
Given a PerlIO * create a 'native' FILE * suitable for passing to code expecting to be compiled and linked with ANSI C stdio.h. The mode argument should be a string as would be passed to fopen/PerlIO_open. If it is NULL then - for legacy support - the FILE * is opened in same mode as the PerlIO *.
The fact that such a FILE * has been 'exported' is recorded, (normally
by pushing a new :stdio "layer" onto the PerlIO *), which may affect
future PerlIO operations on the original PerlIO *. You should not
call fclose()
on the file unless you call PerlIO_releaseFILE()
to disassociate it from the PerlIO *. (Do not use PerlIO_importFILE()
for doing the disassociation.)
Calling this function repeatedly will create a FILE * on each call (and will push an :stdio layer each time as well).
Calling PerlIO_releaseFILE informs PerlIO that all use of FILE * is complete. It is removed from the list of 'exported' FILE *s, and the associated PerlIO * should revert to its original behaviour.
Use this to disassociate a file from a PerlIO * that was associated using PerlIO_exportFILE().
Returns a native FILE * used by a stdio layer. If there is none, it
will create one with PerlIO_exportFILE. In either case the FILE *
should be considered as belonging to PerlIO subsystem and should
only be closed by calling PerlIO_close()
.
In addition to standard-like API defined so far above there is an "implementation" interface which allows perl to get at internals of PerlIO. The following calls correspond to the various FILE_xxx macros determined by Configure - or their equivalent in other implementations. This section is really of interest to only those concerned with detailed perl-core behaviour, implementing a PerlIO mapping or writing code which can make use of the "read ahead" that has been done by the IO system in the same way perl does. Note that any code that uses these interfaces must be prepared to do things the traditional way if a handle does not support them.
Returns true if implementation has all the interfaces required to
allow perl's sv_gets
to "bypass" normal IO mechanism. This can
vary from handle to handle.
- PerlIO_fast_gets(f) = PerlIO_has_cntptr(f) && \
- PerlIO_canset_cnt(f) && \
- 'Can set pointer into buffer'
Implementation can return pointer to current position in the "buffer" and a count of bytes available in the buffer. Do not use this - use PerlIO_fast_gets.
Return count of readable bytes in the buffer. Zero or negative return means no more bytes available.
Return pointer to next readable byte in buffer, accessing via the pointer (dereferencing) is only safe if PerlIO_get_cnt() has returned a positive value. Only positive offsets up to value returned by PerlIO_get_cnt() are allowed.
Set pointer into buffer, and a count of bytes still in the
buffer. Should be used only to set pointer to within range implied by
previous calls to PerlIO_get_ptr
and PerlIO_get_cnt
. The two
values must be consistent with each other (implementation may only
use one or the other or may require both).
Implementation can adjust its idea of number of bytes in the buffer. Do not use this - use PerlIO_fast_gets.
Obscure - set count of bytes in the buffer. Deprecated. Only usable if PerlIO_canset_cnt() returns true. Currently used in only doio.c to force count less than -1 to -1. Perhaps should be PerlIO_set_empty or similar. This call may actually do nothing if "count" is deduced from pointer and a "limit". Do not use this - use PerlIO_set_ptrcnt().
Returns true if implementation has a buffer, and can return pointer to whole buffer and its size. Used by perl for -T / -B tests. Other uses would be very obscure...
Return start of buffer. Access only positive offsets in the buffer up to the value returned by PerlIO_get_bufsiz().
Return the total number of bytes in the buffer, this is neither the
number that can be read, nor the amount of memory allocated to the
buffer. Rather it is what the operating system and/or implementation
happened to read() (or whatever) last time IO was requested.
The new interface to the USE_PERLIO implementation. The layers ":crlf" and ":raw" are only ones allowed for other implementations and those are silently ignored. (As of perl5.8 ":raw" is deprecated.) Use PerlIO_binmode() below for the portable case.
The hook used by perl's binmode operator.
ptype is perl's character for the kind of IO:
imode is O_BINARY
or O_TEXT
.
layers is a string of layers to apply, only ":crlf" makes sense in the non USE_PERLIO case. (As of perl5.8 ":raw" is deprecated in favour of passing NULL.)
Portable cases are:
- PerlIO_binmode(f,ptype,O_BINARY,NULL);
- and
- PerlIO_binmode(f,ptype,O_TEXT,":crlf");
On Unix these calls probably have no effect whatsoever. Elsewhere they alter "\n" to CR,LF translation and possibly cause a special text "end of file" indicator to be written or honoured on read. The effect of making the call after doing any IO to the handle depends on the implementation. (It may be ignored, affect any data which is already buffered as well, or only apply to subsequent data.)
PerlIO_debug is a printf()-like function which can be used for debugging. No return value. Its main use is inside PerlIO where using real printf, warn() etc. would recursively call PerlIO and be a problem.
PerlIO_debug writes to the file named by $ENV{'PERLIO_DEBUG'} typical use might be
- Bourne shells (sh, ksh, bash, zsh, ash, ...):
- PERLIO_DEBUG=/dev/tty ./perl somescript some args
- Csh/Tcsh:
- setenv PERLIO_DEBUG /dev/tty
- ./perl somescript some args
- If you have the "env" utility:
- env PERLIO_DEBUG=/dev/tty ./perl somescript some args
- Win32:
- set PERLIO_DEBUG=CON
- perl somescript some args
If $ENV{'PERLIO_DEBUG'} is not set PerlIO_debug() is a no-op.
- You can refer to this document in Pod via "L<perlartistic>"
- Or you can see this document by entering "perldoc perlartistic"
Perl is free software; you can redistribute it and/or modify it under the terms of either:
- a) the GNU General Public License as published by the Free
- Software Foundation; either version 1, or (at your option) any
- later version, or
- b) the "Artistic License" which comes with this Kit.
This is "The Artistic License". It's here so that modules, programs, etc., that want to declare this as their distribution license can link to it.
For the GNU General Public License, see perlgpl.
The intent of this document is to state the conditions under which a Package may be copied, such that the Copyright Holder maintains some semblance of artistic control over the development of the package, while giving the users of the package the right to use and distribute the Package in a more-or-less customary fashion, plus the right to make reasonable modifications.
refers to the collection of files distributed by the Copyright Holder, and derivatives of that collection of files created through textual modification.
refers to such a Package if it has not been modified, or has been modified in accordance with the wishes of the Copyright Holder as specified below.
is whoever is named in the copyright or copyrights for the package.
is you, if you're thinking about copying or distributing this Package.
is whatever you can justify on the basis of media cost, duplication charges, time of people involved, and so on. (You will not be required to justify it to the Copyright Holder, but only to the computing community at large as a market that must bear the fee.)
means that no fee is charged for the item itself, though there may be fees involved in handling the item. It also means that recipients of the item may redistribute it under the same conditions they received it.
You may make and give away verbatim copies of the source form of the Standard Version of this Package without restriction, provided that you duplicate all of the original copyright notices and associated disclaimers.
You may apply bug fixes, portability fixes and other modifications derived from the Public Domain or from the Copyright Holder. A Package modified in such a way shall still be considered the Standard Version.
You may otherwise modify your copy of this Package in any way, provided that you insert a prominent notice in each changed file stating how and when you changed that file, and provided that you do at least ONE of the following:
place your modifications in the Public Domain or otherwise make them Freely Available, such as by posting said modifications to Usenet or an equivalent medium, or placing the modifications on a major archive site such as uunet.uu.net, or by allowing the Copyright Holder to include your modifications in the Standard Version of the Package.
use the modified Package only within your corporation or organization.
rename any non-standard executables so the names do not conflict with standard executables, which must also be provided, and provide a separate manual page for each non-standard executable that clearly documents how it differs from the Standard Version.
make other distribution arrangements with the Copyright Holder.
You may distribute the programs of this Package in object code or executable form, provided that you do at least ONE of the following:
distribute a Standard Version of the executables and library files, together with instructions (in the manual page or equivalent) on where to get the Standard Version.
accompany the distribution with the machine-readable source of the Package with your modifications.
give non-standard executables non-standard names, and clearly document the differences in manual pages (or equivalent), together with instructions on where to get the Standard Version.
make other distribution arrangements with the Copyright Holder.
You may charge a reasonable copying fee for any distribution of this Package. You may charge any fee you choose for support of this Package. You may not charge a fee for this Package itself. However, you may distribute this Package in aggregate with other (possibly commercial) programs as part of a larger (possibly commercial) software distribution provided that you do not advertise this Package as a product of your own. You may embed this Package's interpreter within an executable of yours (by linking); this shall be construed as a mere form of aggregation, provided that the complete Standard Version of the interpreter is so embedded.
The scripts and library files supplied as input to or produced as output from the programs of this Package do not automatically fall under the copyright of this Package, but belong to whoever generated them, and may be sold commercially, and may be aggregated with this Package. If such scripts or library files are aggregated with this Package via the so-called "undump" or "unexec" methods of producing a binary executable image, then distribution of such an image shall neither be construed as a distribution of this Package nor shall it fall under the restrictions of Paragraphs 3 and 4, provided that you do not represent such an executable image as a Standard Version of this Package.
C subroutines (or comparably compiled subroutines in other languages) supplied by you and linked into this Package in order to emulate subroutines and variables of the language defined by this Package shall not be considered part of this Package, but are the equivalent of input as in Paragraph 6, provided these subroutines do not change the language in any way that would cause it to fail the regression tests for the language.
Aggregation of this Package with a commercial distribution is always permitted provided that the use of this Package is embedded; that is, when no overt attempt is made to make this Package's interfaces visible to the end user of the commercial distribution. Such use shall not be construed as a distribution of this Package.
The name of the Copyright Holder may not be used to endorse or promote products derived from this software without specific prior written permission.
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The End
perlbook - Books about and related to Perl
There are many books on Perl and Perl-related. A few of these are good, some are OK, but many aren't worth your money. There is a list of these books, some with extensive reviews, at http://books.perl.org/ . We list some of the books here, and while listing a book implies our endorsement, don't think that not including a book means anything.
Most of these books are available online through Safari Books Online ( http://safaribooksonline.com/ ).
The major reference book on Perl, written by the creator of Perl, is Programming Perl:
- by Tom Christiansen, brian d foy, Larry Wall with Jon Orwant
- ISBN 978-0-596-00492-7 [4th edition February 2012]
- ISBN 978-1-4493-9890-3 [ebook]
- http://oreilly.com/catalog/9780596004927
The Ram is a cookbook with hundreds of examples of using Perl to accomplish specific tasks:
- by Tom Christiansen and Nathan Torkington,
- with Foreword by Larry Wall
- ISBN 978-0-596-00313-5 [2nd Edition August 2003]
- http://oreilly.com/catalog/9780596003135/
If you want to learn the basics of Perl, you might start with the Llama book, which assumes that you already know a little about programming:
- by Randal L. Schwartz, Tom Phoenix, and brian d foy
- ISBN 978-1-4493-0358-7 [6th edition June 2011]
- http://oreilly.com/catalog/0636920018452
The tutorial started in the Llama continues in the Alpaca, which introduces the intermediate features of references, data structures, object-oriented programming, and modules:
- by Randal L. Schwartz and brian d foy, with Tom Phoenix
- foreword by Damian Conway
- ISBN 978-1-4493-9309-0 [2nd edition August 2012]
- http://oreilly.com/catalog/0636920012689/
You might want to keep these desktop references close by your keyboard:
- by Johan Vromans
- ISBN 978-1-4493-0370-9 [5th edition July 2011]
- ISBN 978-1-4493-0813-1 [ebook]
- http://oreilly.com/catalog/0636920018476/
- by Richard Foley
- ISBN 978-0-596-00503-0 [1st edition January 2004]
- http://oreilly.com/catalog/9780596005030/
- by Tony Stubblebine
- ISBN 978-0-596-51427-3 [July 2007]
- http://oreilly.com/catalog/9780596514273/
- by James Lee
- ISBN 1-59059-391-X [3rd edition April 2010]
- http://www.apress.com/9781430227939
- by Randal L. Schwartz, Tom Phoenix, and brian d foy
- ISBN 978-0-596-52010-6 [5th edition June 2008]
- http://oreilly.com/catalog/9780596520106
- by Randal L. Schwartz and brian d foy, with Tom Phoenix
- foreword by Damian Conway
- ISBN 0-596-10206-2 [1st edition March 2006]
- http://oreilly.com/catalog/9780596102067
- by brian d foy
- ISBN 978-0-596-10206-7 [1st edition July 2007]
- http://www.oreilly.com/catalog/9780596527242
- by Joseph N. Hall, Joshua A. McAdams, brian d foy
- ISBN 0-321-49694-9 [2nd edition 2010]
- http://www.effectiveperlprogramming.com/
- by Sam Tregar
- ISBN 1-59059-018-X [1st edition August 2002]
- http://www.apress.com/9781590590188
- by Tom Christiansen and Nathan Torkington
- with foreword by Larry Wall
- ISBN 1-56592-243-3 [2nd edition August 2003]
- http://oreilly.com/catalog/9780596003135
- by David N. Blank-Edelman
- ISBN 978-0-596-00639-6 [2nd edition May 2009]
- http://oreilly.com/catalog/9780596006396
- by Linchi Shea
- ISBN 1-59059-097-X [1st edition July 2003]
- http://www.apress.com/9781590590973
- by Jan Goyvaerts and Steven Levithan
- ISBN 978-0-596-52069-4 [May 2009]
- http://oreilly.com/catalog/9780596520694
- by Tim Bunce and Alligator Descartes
- ISBN 978-1-56592-699-8 [February 2000]
- http://oreilly.com/catalog/9781565926998
- by Damian Conway
- ISBN: 978-0-596-00173-5 [1st edition July 2005]
- http://oreilly.com/catalog/9780596001735
- by Mark-Jason Dominus
- ISBN: 1-55860-701-3 [1st edition March 2005]
- http://hop.perl.plover.com/
- by Jeffrey E. F. Friedl
- ISBN 978-0-596-52812-6 [3rd edition August 2006]
- http://oreilly.com/catalog/9780596528126
- by Lincoln Stein
- ISBN 0-201-61571-1 [1st edition 2001]
- http://www.pearsonhighered.com/educator/product/Network-Programming-with-Perl/9780201615715.page
- by Darren Chamberlain, Dave Cross, and Andy Wardley
- ISBN 978-0-596-00476-7 [December 2003]
- http://oreilly.com/catalog/9780596004767
- by Damian Conway
- with foreword by Randal L. Schwartz
- ISBN 1-884777-79-1 [1st edition August 1999]
- http://www.manning.com/conway/
- by Dave Cross
- ISBN 1-930110-00-6 [1st edition 2001]
- http://www.manning.com/cross
- by Steve Lidie and Nancy Walsh
- ISBN 978-1-56592-716-2 [1st edition January 2002]
- http://oreilly.com/catalog/9781565927162
- by Tim Jenness and Simon Cozens
- ISBN 1-930110-82-0 [1st edition August 2002]
- http://www.manning.com/jenness
- by Richard Foley with Andy Lester
- ISBN 1-59059-454-1 [1st edition July 2005]
- http://www.apress.com/9781590594544
Some of these books are available as free downloads.
Higher-Order Perl: http://hop.perl.plover.com/
You might notice several familiar Perl concepts in this collection of ACM columns from Jon Bentley. The similarity to the title of the major Perl book (which came later) is not completely accidental:
- by Jon Bentley
- ISBN 978-0-201-65788-3 [2 edition, October 1999]
- by Jon Bentley
- ISBN 0-201-11889-0 [January 1988]
Each version of Perl comes with the documentation that was current at the time of release. This poses a problem for content such as book lists. There are probably very nice books published after this list was included in your Perl release, and you can check the latest released version at http://perldoc.perl.org/perlbook.html .
Some of the books we've listed appear almost ancient in internet scale, but we've included those books because they still describe the current way of doing things. Not everything in Perl changes every day. Many of the beginner-level books, too, go over basic features and techniques that are still valid today. In general though, we try to limit this list to books published in the past five years.
If your Perl book isn't listed and you think it should be, let us know.
perlbs2000 - building and installing Perl for BS2000.
This document will help you Configure, build, test and install Perl on BS2000 in the POSIX subsystem.
This is a ported perl for the POSIX subsystem in BS2000 VERSION OSD V3.1A or later. It may work on other versions, but we started porting and testing it with 3.1A and are currently using Version V4.0A.
You may need the following GNU programs in order to install perl:
We used version 1.2.4, which could be installed out of the box with one failure during 'make check'.
The yacc coming with BS2000 POSIX didn't work for us. So we had to use bison. We had to make a few changes to perl in order to use the pure (reentrant) parser of bison. We used version 1.25, but we had to add a few changes due to EBCDIC. See below for more details concerning yacc.
To extract an ASCII tar archive on BS2000 POSIX you need an ASCII filesystem (we used the mountpoint /usr/local/ascii for this). Now you extract the archive in the ASCII filesystem without I/O-conversion:
cd /usr/local/ascii export IO_CONVERSION=NO gunzip < /usr/local/src/perl.tar.gz | pax -r
You may ignore the error message for the first element of the archive (this doesn't look like a tar archive / skipping to next file...), it's only the directory which will be created automatically anyway.
After extracting the archive you copy the whole directory tree to your EBCDIC filesystem. This time you use I/O-conversion:
cd /usr/local/src IO_CONVERSION=YES cp -r /usr/local/ascii/perl5.005_02 ./
There is a "hints" file for BS2000 called hints.posix-bc (because posix-bc is the OS name given by `uname`) that specifies the correct values for most things. The major problem is (of course) the EBCDIC character set. We have german EBCDIC version.
Because of our problems with the native yacc we used GNU bison to generate a pure (=reentrant) parser for perly.y. So our yacc is really the following script:
-----8<-----/usr/local/bin/yacc-----8<----- #! /usr/bin/sh
# Bison as a reentrant yacc:
# save parameters: params="" while [[ $# -gt 1 ]]; do params="$params $1" shift done
# add flag %pure_parser:
tmpfile=/tmp/bison.$$.y echo %pure_parser > $tmpfile cat $1>> $tmpfile
# call bison:
echo "/usr/local/bin/bison --yacc $params $1\t\t\t(Pure Parser)" /usr/local/bin/bison --yacc $params $tmpfile
# cleanup:
rm -f $tmpfile -----8<----------8<-----
We still use the normal yacc for a2p.y though!!! We made a softlink called byacc to distinguish between the two versions:
ln -s /usr/bin/yacc /usr/local/bin/byacc
We build perl using GNU make. We tried the native make once and it worked too.
We still got a few errors during make test
. Some of them are the
result of using bison. Bison prints parser error instead of syntax
error, so we may ignore them. The following list shows
our errors, your results may differ:
op/numconvert.......FAILED tests 1409-1440 op/regexp...........FAILED tests 483, 496 op/regexp_noamp.....FAILED tests 483, 496 pragma/overload.....FAILED tests 152-153, 170-171 pragma/warnings.....FAILED tests 14, 82, 129, 155, 192, 205, 207 lib/bigfloat........FAILED tests 351-352, 355 lib/bigfltpm........FAILED tests 354-355, 358 lib/complex.........FAILED tests 267, 487 lib/dumper..........FAILED tests 43, 45 Failed 11/231 test scripts, 95.24% okay. 57/10595 subtests failed, 99.46% okay.
We have no nroff on BS2000 POSIX (yet), so we ignored any errors while installing the documentation.
BS2000 POSIX doesn't support the shebang notation
(#!/usr/local/bin/perl
), so you have to use the following lines
instead:
: # use perl eval 'exec /usr/local/bin/perl -S $0 ${1+"$@"}' if $running_under_some_shell;
We don't have much experience with this yet, but try the following:
Copy your Perl executable to a BS2000 LLM using bs2cp:
bs2cp /usr/local/bin/perl 'bs2:perl(perl,l)'
Now you can start it with the following (SDF) command:
/START-PROG FROM-FILE=*MODULE(PERL,PERL),PROG-MODE=*ANY,RUN-MODE=*ADV
First you get the BS2000 commandline prompt ('*'). Here you may enter
your parameters, e.g. -e 'print "Hello World!\\n";'
(note the
double backslash!) or -w
and the name of your Perl script.
Filenames starting with / are searched in the Posix filesystem,
others are searched in the BS2000 filesystem. You may even use
wildcards if you put a %
in front of your filename (e.g. -w
checkfiles.pl %*.c
). Read your C/C++ manual for additional
possibilities of the commandline prompt (look for
PARAMETER-PROMPTING).
There appears to be a bug in the floating point implementation on BS2000 POSIX systems such that calling int() on the product of a number and a small magnitude number is not the same as calling int() on the quotient of that number and a large magnitude number. For example, in the following Perl code:
Although one would expect the quantities $y and $z to be the same and equal to 100000 they will differ and instead will be 0 and 100000 respectively.
Since version 5.8 Perl uses the new PerlIO on BS2000. This enables you using different encodings per IO channel. For example you may use
- use Encode;
- open($f, ">:encoding(ascii)", "test.ascii");
- print $f "Hello World!\n";
- open($f, ">:encoding(posix-bc)", "test.ebcdic");
- print $f "Hello World!\n";
- open($f, ">:encoding(latin1)", "test.latin1");
- print $f "Hello World!\n";
- open($f, ">:encoding(utf8)", "test.utf8");
- print $f "Hello World!\n";
to get two files containing "Hello World!\n" in ASCII, EBCDIC, ISO Latin-1 (in this example identical to ASCII) respective UTF-EBCDIC (in this example identical to normal EBCDIC). See the documentation of Encode::PerlIO for details.
As the PerlIO layer uses raw IO internally, all this totally ignores the type of your filesystem (ASCII or EBCDIC) and the IO_CONVERSION environment variable. If you want to get the old behavior, that the BS2000 IO functions determine conversion depending on the filesystem PerlIO still is your friend. You use IO_CONVERSION as usual and tell Perl, that it should use the native IO layer:
- export IO_CONVERSION=YES
- export PERLIO=stdio
Now your IO would be ASCII on ASCII partitions and EBCDIC on EBCDIC
partitions. See the documentation of PerlIO (without Encode::
!)
for further possibilities.
Thomas Dorner
If you are interested in the z/OS (formerly known as OS/390) and POSIX-BC (BS2000) ports of Perl then see the perl-mvs mailing list. To subscribe, send an empty message to perl-mvs-subscribe@perl.org.
See also:
- http://lists.perl.org/list/perl-mvs.html
There are web archives of the mailing list at:
- http://www.xray.mpe.mpg.de/mailing-lists/perl-mvs/
- http://archive.develooper.com/perl-mvs@perl.org/
This document was originally written by Thomas Dorner for the 5.005 release of Perl.
This document was podified for the 5.6 release of perl 11 July 2000.
perlbug - how to submit bug reports on Perl
perlbug
perlbug [ -v ] [ -a address ] [ -s subject ] [ -b body | -f inputfile ] [ -F outputfile ] [ -r returnaddress ] [ -e editor ] [ -c adminaddress | -C ] [ -S ] [ -t ] [ -d ] [ -A ] [ -h ] [ -T ]
perlbug [ -v ] [ -r returnaddress ] [ -A ] [ -ok | -okay | -nok | -nokay ]
perlthanks
This program is designed to help you generate and send bug reports (and thank-you notes) about perl5 and the modules which ship with it.
In most cases, you can just run it interactively from a command line without any special arguments and follow the prompts.
If you have found a bug with a non-standard port (one that was not part of the standard distribution), a binary distribution, or a non-core module (such as Tk, DBI, etc), then please see the documentation that came with that distribution to determine the correct place to report bugs.
If you are unable to send your report using perlbug (most likely because your system doesn't have a way to send mail that perlbug recognizes), you may be able to use this tool to compose your report and save it to a file which you can then send to perlbug@perl.org using your regular mail client.
In extreme cases, perlbug may not work well enough on your system to guide you through composing a bug report. In those cases, you may be able to use perlbug -d to get system configuration information to include in a manually composed bug report to perlbug@perl.org.
When reporting a bug, please run through this checklist:
Type perl -v
at the command line to find out.
Look at http://www.perl.org/ to find out. If you are not using the latest released version, please try to replicate your bug on the latest stable release.
Note that reports about bugs in old versions of Perl, especially those which indicate you haven't also tested the current stable release of Perl, are likely to receive less attention from the volunteers who build and maintain Perl than reports about bugs in the current release.
This tool isn't appropriate for reporting bugs in any version prior to Perl 5.0.
A significant number of the bug reports we get turn out to be documented features in Perl. Make sure the issue you've run into isn't intentional by glancing through the documentation that comes with the Perl distribution.
Given the sheer volume of Perl documentation, this isn't a trivial undertaking, but if you can point to documentation that suggests the behaviour you're seeing is wrong, your issue is likely to receive more attention. You may want to start with perldoc perltrap for pointers to common traps that new (and experienced) Perl programmers run into.
If you're unsure of the meaning of an error message you've run across, perldoc perldiag for an explanation. If the message isn't in perldiag, it probably isn't generated by Perl. You may have luck consulting your operating system documentation instead.
If you are on a non-UNIX platform perldoc perlport, as some features may be unimplemented or work differently.
You may be able to figure out what's going wrong using the Perl debugger. For information about how to use the debugger perldoc perldebug.
The easier it is to reproduce your bug, the more likely it will be fixed -- if nobody can duplicate your problem, it probably won't be addressed.
A good test case has most of these attributes: short, simple code; few dependencies on external commands, modules, or libraries; no platform-dependent code (unless it's a platform-specific bug); clear, simple documentation.
A good test case is almost always a good candidate to be included in Perl's test suite. If you have the time, consider writing your test case so that it can be easily included into the standard test suite.
Be sure to include the exact error messages, if any. "Perl gave an error" is not an exact error message.
If you get a core dump (or equivalent), you may use a debugger (dbx, gdb, etc) to produce a stack trace to include in the bug report.
NOTE: unless your Perl has been compiled with debug info (often -g), the stack trace is likely to be somewhat hard to use because it will most probably contain only the function names and not their arguments. If possible, recompile your Perl with debug info and reproduce the crash and the stack trace.
The easier it is to understand a reproducible bug, the more likely it will be fixed. Any insight you can provide into the problem will help a great deal. In other words, try to analyze the problem (to the extent you can) and report your discoveries.
A bug report which includes a patch to fix it will almost
definitely be fixed. When sending a patch, please use the diff
program with the -u
option to generate "unified" diff files.
Bug reports with patches are likely to receive significantly more
attention and interest than those without patches.
Your patch may be returned with requests for changes, or requests for more detailed explanations about your fix.
Here are a few hints for creating high-quality patches:
Make sure the patch is not reversed (the first argument to diff is
typically the original file, the second argument your changed file).
Make sure you test your patch by applying it with the patch
program before you send it on its way. Try to follow the same style
as the code you are trying to patch. Make sure your patch really
does work (make test
, if the thing you're patching is covered
by Perl's test suite).
perlbug
to submit the report?
perlbug will, amongst other things, ensure your report includes
crucial information about your version of perl. If perlbug
is
unable to mail your report after you have typed it in, you may have
to compose the message yourself, add the output produced by perlbug
-d
and email it to perlbug@perl.org. If, for some reason, you
cannot run perlbug
at all on your system, be sure to include the
entire output produced by running perl -V
(note the uppercase V).
Whether you use perlbug
or send the email manually, please make
your Subject line informative. "a bug" is not informative. Neither
is "perl crashes" nor is "HELP!!!". These don't help. A compact
description of what's wrong is fine.
perlbug
to submit a thank-you note?
Yes, you can do this by either using the -T
option, or by invoking
the program as perlthanks
. Thank-you notes are good. It makes people
smile.
Having done your bit, please be prepared to wait, to be told the bug is in your code, or possibly to get no reply at all. The volunteers who maintain Perl are busy folks, so if your problem is an obvious bug in your own code, is difficult to understand or is a duplicate of an existing report, you may not receive a personal reply.
If it is important to you that your bug be fixed, do monitor the perl5-porters@perl.org mailing list and the commit logs to development versions of Perl, and encourage the maintainers with kind words or offers of frosty beverages. (Please do be kind to the maintainers. Harassing or flaming them is likely to have the opposite effect of the one you want.)
Feel free to update the ticket about your bug on http://rt.perl.org if a new version of Perl is released and your bug is still present.
Address to send the report to. Defaults to perlbug@perl.org.
Don't send a bug received acknowledgement to the reply address. Generally it is only a sensible to use this option if you are a perl maintainer actively watching perl porters for your message to arrive.
Body of the report. If not included on the command line, or in a file with -f, you will get a chance to edit the message.
Don't send copy to administrator.
Address to send copy of report to. Defaults to the address of the local perl administrator (recorded when perl was built).
Data mode (the default if you redirect or pipe output). This prints out your configuration data, without mailing anything. You can use this with -v to get more complete data.
Editor to use.
File containing the body of the report. Use this to quickly send a prepared message.
File to output the results to instead of sending as an email. Useful particularly when running perlbug on a machine with no direct internet connection.
Prints a brief summary of the options.
Report successful build on this system to perl porters. Forces -S and -C. Forces and supplies values for -s and -b. Only prompts for a return address if it cannot guess it (for use with make). Honors return address specified with -r. You can use this with -v to get more complete data. Only makes a report if this system is less than 60 days old.
As -ok except it will report on older systems.
Report unsuccessful build on this system. Forces -C. Forces and supplies a value for -s, then requires you to edit the report and say what went wrong. Alternatively, a prepared report may be supplied using -f. Only prompts for a return address if it cannot guess it (for use with make). Honors return address specified with -r. You can use this with -v to get more complete data. Only makes a report if this system is less than 60 days old.
As -nok except it will report on older systems.
Your return address. The program will ask you to confirm its default if you don't use this option.
Send without asking for confirmation.
Subject to include with the message. You will be prompted if you don't supply one on the command line.
Test mode. The target address defaults to perlbug-test@perl.org.
Send a thank-you note instead of a bug report.
Include verbose configuration data in the report.
Kenneth Albanowski (<kjahds@kjahds.com>), subsequently doctored by Gurusamy Sarathy (<gsar@activestate.com>), Tom Christiansen (<tchrist@perl.com>), Nathan Torkington (<gnat@frii.com>), Charles F. Randall (<cfr@pobox.com>), Mike Guy (<mjtg@cam.ac.uk>), Dominic Dunlop (<domo@computer.org>), Hugo van der Sanden (<hv@crypt.org>), Jarkko Hietaniemi (<jhi@iki.fi>), Chris Nandor (<pudge@pobox.com>), Jon Orwant (<orwant@media.mit.edu>, Richard Foley (<richard.foley@rfi.net>), and Jesse Vincent (<jesse@bestpractical.com>).
perl(1), perldebug(1), perldiag(1), perlport(1), perltrap(1), diff(1), patch(1), dbx(1), gdb(1)
None known (guess what must have been used to report them?)
perlcall - Perl calling conventions from C
The purpose of this document is to show you how to call Perl subroutines directly from C, i.e., how to write callbacks.
Apart from discussing the C interface provided by Perl for writing callbacks the document uses a series of examples to show how the interface actually works in practice. In addition some techniques for coding callbacks are covered.
Examples where callbacks are necessary include
You have created an XSUB interface to an application's C API.
A fairly common feature in applications is to allow you to define a C function that will be called whenever something nasty occurs. What we would like is to be able to specify a Perl subroutine that will be called instead.
The classic example of where callbacks are used is when writing an event driven program, such as for an X11 application. In this case you register functions to be called whenever specific events occur, e.g., a mouse button is pressed, the cursor moves into a window or a menu item is selected.
Although the techniques described here are applicable when embedding Perl in a C program, this is not the primary goal of this document. There are other details that must be considered and are specific to embedding Perl. For details on embedding Perl in C refer to perlembed.
Before you launch yourself head first into the rest of this document, it would be a good idea to have read the following two documents--perlxs and perlguts.
Although this stuff is easier to explain using examples, you first need be aware of a few important definitions.
Perl has a number of C functions that allow you to call Perl subroutines. They are
- I32 call_sv(SV* sv, I32 flags);
- I32 call_pv(char *subname, I32 flags);
- I32 call_method(char *methname, I32 flags);
- I32 call_argv(char *subname, I32 flags, char **argv);
The key function is call_sv. All the other functions are fairly simple wrappers which make it easier to call Perl subroutines in special cases. At the end of the day they will all call call_sv to invoke the Perl subroutine.
All the call_* functions have a flags
parameter which is
used to pass a bit mask of options to Perl. This bit mask operates
identically for each of the functions. The settings available in the
bit mask are discussed in FLAG VALUES.
Each of the functions will now be discussed in turn.
call_sv takes two parameters. The first, sv
, is an SV*.
This allows you to specify the Perl subroutine to be called either as a
C string (which has first been converted to an SV) or a reference to a
subroutine. The section, Using call_sv, shows how you can make
use of call_sv.
The function, call_pv, is similar to call_sv except it
expects its first parameter to be a C char* which identifies the Perl
subroutine you want to call, e.g., call_pv("fred", 0)
. If the
subroutine you want to call is in another package, just include the
package name in the string, e.g., "pkg::fred"
.
The function call_method is used to call a method from a Perl
class. The parameter methname
corresponds to the name of the method
to be called. Note that the class that the method belongs to is passed
on the Perl stack rather than in the parameter list. This class can be
either the name of the class (for a static method) or a reference to an
object (for a virtual method). See perlobj for more information on
static and virtual methods and Using call_method for an example
of using call_method.
call_argv calls the Perl subroutine specified by the C string
stored in the subname
parameter. It also takes the usual flags
parameter. The final parameter, argv
, consists of a NULL-terminated
list of C strings to be passed as parameters to the Perl subroutine.
See Using call_argv.
All the functions return an integer. This is a count of the number of items returned by the Perl subroutine. The actual items returned by the subroutine are stored on the Perl stack.
As a general rule you should always check the return value from these functions. Even if you are expecting only a particular number of values to be returned from the Perl subroutine, there is nothing to stop someone from doing something unexpected--don't say you haven't been warned.
The flags
parameter in all the call_* functions is one of G_VOID,
G_SCALAR, or G_ARRAY, which indicate the call context, OR'ed together
with a bit mask of any combination of the other G_* symbols defined below.
Calls the Perl subroutine in a void context.
This flag has 2 effects:
It indicates to the subroutine being called that it is executing in a void context (if it executes wantarray the result will be the undefined value).
It ensures that nothing is actually returned from the subroutine.
The value returned by the call_* function indicates how many items have been returned by the Perl subroutine--in this case it will be 0.
Calls the Perl subroutine in a scalar context. This is the default context flag setting for all the call_* functions.
This flag has 2 effects:
It indicates to the subroutine being called that it is executing in a scalar context (if it executes wantarray the result will be false).
It ensures that only a scalar is actually returned from the subroutine. The subroutine can, of course, ignore the wantarray and return a list anyway. If so, then only the last element of the list will be returned.
The value returned by the call_* function indicates how many items have been returned by the Perl subroutine - in this case it will be either 0 or 1.
If 0, then you have specified the G_DISCARD flag.
If 1, then the item actually returned by the Perl subroutine will be stored on the Perl stack - the section Returning a Scalar shows how to access this value on the stack. Remember that regardless of how many items the Perl subroutine returns, only the last one will be accessible from the stack - think of the case where only one value is returned as being a list with only one element. Any other items that were returned will not exist by the time control returns from the call_* function. The section Returning a list in a scalar context shows an example of this behavior.
Calls the Perl subroutine in a list context.
As with G_SCALAR, this flag has 2 effects:
It indicates to the subroutine being called that it is executing in a list context (if it executes wantarray the result will be true).
It ensures that all items returned from the subroutine will be accessible when control returns from the call_* function.
The value returned by the call_* function indicates how many items have been returned by the Perl subroutine.
If 0, then you have specified the G_DISCARD flag.
If not 0, then it will be a count of the number of items returned by the subroutine. These items will be stored on the Perl stack. The section Returning a list of values gives an example of using the G_ARRAY flag and the mechanics of accessing the returned items from the Perl stack.
By default, the call_* functions place the items returned from by the Perl subroutine on the stack. If you are not interested in these items, then setting this flag will make Perl get rid of them automatically for you. Note that it is still possible to indicate a context to the Perl subroutine by using either G_SCALAR or G_ARRAY.
If you do not set this flag then it is very important that you make sure that any temporaries (i.e., parameters passed to the Perl subroutine and values returned from the subroutine) are disposed of yourself. The section Returning a Scalar gives details of how to dispose of these temporaries explicitly and the section Using Perl to dispose of temporaries discusses the specific circumstances where you can ignore the problem and let Perl deal with it for you.
Whenever a Perl subroutine is called using one of the call_*
functions, it is assumed by default that parameters are to be passed to
the subroutine. If you are not passing any parameters to the Perl
subroutine, you can save a bit of time by setting this flag. It has
the effect of not creating the @_
array for the Perl subroutine.
Although the functionality provided by this flag may seem straightforward, it should be used only if there is a good reason to do so. The reason for being cautious is that, even if you have specified the G_NOARGS flag, it is still possible for the Perl subroutine that has been called to think that you have passed it parameters.
In fact, what can happen is that the Perl subroutine you have called
can access the @_
array from a previous Perl subroutine. This will
occur when the code that is executing the call_* function has
itself been called from another Perl subroutine. The code below
illustrates this
- sub fred
- { print "@_\n" }
- sub joe
- { &fred }
- &joe(1,2,3);
This will print
- 1 2 3
What has happened is that fred
accesses the @_
array which
belongs to joe
.
It is possible for the Perl subroutine you are calling to terminate abnormally, e.g., by calling die explicitly or by not actually existing. By default, when either of these events occurs, the process will terminate immediately. If you want to trap this type of event, specify the G_EVAL flag. It will put an eval { } around the subroutine call.
Whenever control returns from the call_* function you need to
check the $@
variable as you would in a normal Perl script.
The value returned from the call_* function is dependent on what other flags have been specified and whether an error has occurred. Here are all the different cases that can occur:
If the call_* function returns normally, then the value returned is as specified in the previous sections.
If G_DISCARD is specified, the return value will always be 0.
If G_ARRAY is specified and an error has occurred, the return value will always be 0.
If G_SCALAR is specified and an error has occurred, the return value
will be 1 and the value on the top of the stack will be undef. This
means that if you have already detected the error by checking $@
and
you want the program to continue, you must remember to pop the undef
from the stack.
See Using G_EVAL for details on using G_EVAL.
Using the G_EVAL flag described above will always set $@
: clearing
it if there was no error, and setting it to describe the error if there
was an error in the called code. This is what you want if your intention
is to handle possible errors, but sometimes you just want to trap errors
and stop them interfering with the rest of the program.
This scenario will mostly be applicable to code that is meant to be called
from within destructors, asynchronous callbacks, and signal handlers.
In such situations, where the code being called has little relation to the
surrounding dynamic context, the main program needs to be insulated from
errors in the called code, even if they can't be handled intelligently.
It may also be useful to do this with code for __DIE__
or __WARN__
hooks, and tie functions.
The G_KEEPERR flag is meant to be used in conjunction with G_EVAL in
call_* functions that are used to implement such code, or with
eval_sv
. This flag has no effect on the call_*
functions when
G_EVAL is not used.
When G_KEEPERR is used, any error in the called code will terminate the
call as usual, and the error will not propagate beyond the call (as usual
for G_EVAL), but it will not go into $@
. Instead the error will be
converted into a warning, prefixed with the string "\t(in cleanup)".
This can be disabled using no warnings 'misc'
. If there is no error,
$@
will not be cleared.
Note that the G_KEEPERR flag does not propagate into inner evals; these
may still set $@
.
The G_KEEPERR flag was introduced in Perl version 5.002.
See Using G_KEEPERR for an example of a situation that warrants the use of this flag.
As mentioned above, you can determine the context of the currently
executing subroutine in Perl with wantarray. The equivalent test
can be made in C by using the GIMME_V
macro, which returns
G_ARRAY
if you have been called in a list context, G_SCALAR
if
in a scalar context, or G_VOID
if in a void context (i.e., the
return value will not be used). An older version of this macro is
called GIMME
; in a void context it returns G_SCALAR
instead of
G_VOID
. An example of using the GIMME_V
macro is shown in
section Using GIMME_V.
Enough of the definition talk! Let's have a few examples.
Perl provides many macros to assist in accessing the Perl stack. Wherever possible, these macros should always be used when interfacing to Perl internals. We hope this should make the code less vulnerable to any changes made to Perl in the future.
Another point worth noting is that in the first series of examples I have made use of only the call_pv function. This has been done to keep the code simpler and ease you into the topic. Wherever possible, if the choice is between using call_pv and call_sv, you should always try to use call_sv. See Using call_sv for details.
This first trivial example will call a Perl subroutine, PrintUID, to print out the UID of the process.
- sub PrintUID
- {
- print "UID is $<\n";
- }
and here is a C function to call it
- static void
- call_PrintUID()
- {
- dSP;
- PUSHMARK(SP);
- call_pv("PrintUID", G_DISCARD|G_NOARGS);
- }
Simple, eh?
A few points to note about this example:
Ignore dSP
and PUSHMARK(SP)
for now. They will be discussed in
the next example.
We aren't passing any parameters to PrintUID so G_NOARGS can be specified.
We aren't interested in anything returned from PrintUID, so G_DISCARD is specified. Even if PrintUID was changed to return some value(s), having specified G_DISCARD will mean that they will be wiped by the time control returns from call_pv.
As call_pv is being used, the Perl subroutine is specified as a C string. In this case the subroutine name has been 'hard-wired' into the code.
Because we specified G_DISCARD, it is not necessary to check the value returned from call_pv. It will always be 0.
Now let's make a slightly more complex example. This time we want to
call a Perl subroutine, LeftString
, which will take 2 parameters--a
string ($s) and an integer ($n). The subroutine will simply
print the first $n characters of the string.
So the Perl subroutine would look like this:
The C function required to call LeftString would look like this:
- static void
- call_LeftString(a, b)
- char * a;
- int b;
- {
- dSP;
- ENTER;
- SAVETMPS;
- PUSHMARK(SP);
- XPUSHs(sv_2mortal(newSVpv(a, 0)));
- XPUSHs(sv_2mortal(newSViv(b)));
- PUTBACK;
- call_pv("LeftString", G_DISCARD);
- FREETMPS;
- LEAVE;
- }
Here are a few notes on the C function call_LeftString.
Parameters are passed to the Perl subroutine using the Perl stack.
This is the purpose of the code beginning with the line dSP
and
ending with the line PUTBACK
. The dSP
declares a local copy
of the stack pointer. This local copy should always be accessed
as SP
.
If you are going to put something onto the Perl stack, you need to know
where to put it. This is the purpose of the macro dSP
--it declares
and initializes a local copy of the Perl stack pointer.
All the other macros which will be used in this example require you to have used this macro.
The exception to this rule is if you are calling a Perl subroutine
directly from an XSUB function. In this case it is not necessary to
use the dSP
macro explicitly--it will be declared for you
automatically.
Any parameters to be pushed onto the stack should be bracketed by the
PUSHMARK
and PUTBACK
macros. The purpose of these two macros, in
this context, is to count the number of parameters you are
pushing automatically. Then whenever Perl is creating the @_
array for the
subroutine, it knows how big to make it.
The PUSHMARK
macro tells Perl to make a mental note of the current
stack pointer. Even if you aren't passing any parameters (like the
example shown in the section No Parameters, Nothing Returned) you
must still call the PUSHMARK
macro before you can call any of the
call_* functions--Perl still needs to know that there are no
parameters.
The PUTBACK
macro sets the global copy of the stack pointer to be
the same as our local copy. If we didn't do this, call_pv
wouldn't know where the two parameters we pushed were--remember that
up to now all the stack pointer manipulation we have done is with our
local copy, not the global copy.
Next, we come to XPUSHs. This is where the parameters actually get pushed onto the stack. In this case we are pushing a string and an integer.
See XSUBs and the Argument Stack in perlguts for details on how the XPUSH macros work.
Because we created temporary values (by means of sv_2mortal() calls) we will have to tidy up the Perl stack and dispose of mortal SVs.
This is the purpose of
- ENTER;
- SAVETMPS;
at the start of the function, and
- FREETMPS;
- LEAVE;
at the end. The ENTER
/SAVETMPS
pair creates a boundary for any
temporaries we create. This means that the temporaries we get rid of
will be limited to those which were created after these calls.
The FREETMPS
/LEAVE
pair will get rid of any values returned by
the Perl subroutine (see next example), plus it will also dump the
mortal SVs we have created. Having ENTER
/SAVETMPS
at the
beginning of the code makes sure that no other mortals are destroyed.
Think of these macros as working a bit like { and } in Perl
to limit the scope of local variables.
See the section Using Perl to Dispose of Temporaries for details of an alternative to using these macros.
Finally, LeftString can now be called via the call_pv function. The only flag specified this time is G_DISCARD. Because we are passing 2 parameters to the Perl subroutine this time, we have not specified G_NOARGS.
Now for an example of dealing with the items returned from a Perl subroutine.
Here is a Perl subroutine, Adder, that takes 2 integer parameters and simply returns their sum.
- sub Adder
- {
- my($a, $b) = @_;
- $a + $b;
- }
Because we are now concerned with the return value from Adder, the C function required to call it is now a bit more complex.
- static void
- call_Adder(a, b)
- int a;
- int b;
- {
- dSP;
- int count;
- ENTER;
- SAVETMPS;
- PUSHMARK(SP);
- XPUSHs(sv_2mortal(newSViv(a)));
- XPUSHs(sv_2mortal(newSViv(b)));
- PUTBACK;
- count = call_pv("Adder", G_SCALAR);
- SPAGAIN;
- if (count != 1)
- croak("Big trouble\n");
- printf ("The sum of %d and %d is %d\n", a, b, POPi);
- PUTBACK;
- FREETMPS;
- LEAVE;
- }
Points to note this time are
The only flag specified this time was G_SCALAR. That means that the @_
array will be created and that the value returned by Adder will
still exist after the call to call_pv.
The purpose of the macro SPAGAIN
is to refresh the local copy of the
stack pointer. This is necessary because it is possible that the memory
allocated to the Perl stack has been reallocated during the
call_pv call.
If you are making use of the Perl stack pointer in your code you must always refresh the local copy using SPAGAIN whenever you make use of the call_* functions or any other Perl internal function.
Although only a single value was expected to be returned from Adder, it is still good practice to check the return code from call_pv anyway.
Expecting a single value is not quite the same as knowing that there will be one. If someone modified Adder to return a list and we didn't check for that possibility and take appropriate action the Perl stack would end up in an inconsistent state. That is something you really don't want to happen ever.
The POPi
macro is used here to pop the return value from the stack.
In this case we wanted an integer, so POPi
was used.
Here is the complete list of POP macros available, along with the types they return.
- POPs SV
- POPp pointer
- POPn double
- POPi integer
- POPl long
The final PUTBACK
is used to leave the Perl stack in a consistent
state before exiting the function. This is necessary because when we
popped the return value from the stack with POPi
it updated only our
local copy of the stack pointer. Remember, PUTBACK
sets the global
stack pointer to be the same as our local copy.
Now, let's extend the previous example to return both the sum of the parameters and the difference.
Here is the Perl subroutine
- sub AddSubtract
- {
- my($a, $b) = @_;
- ($a+$b, $a-$b);
- }
and this is the C function
- static void
- call_AddSubtract(a, b)
- int a;
- int b;
- {
- dSP;
- int count;
- ENTER;
- SAVETMPS;
- PUSHMARK(SP);
- XPUSHs(sv_2mortal(newSViv(a)));
- XPUSHs(sv_2mortal(newSViv(b)));
- PUTBACK;
- count = call_pv("AddSubtract", G_ARRAY);
- SPAGAIN;
- if (count != 2)
- croak("Big trouble\n");
- printf ("%d - %d = %d\n", a, b, POPi);
- printf ("%d + %d = %d\n", a, b, POPi);
- PUTBACK;
- FREETMPS;
- LEAVE;
- }
If call_AddSubtract is called like this
- call_AddSubtract(7, 4);
then here is the output
- 7 - 4 = 3
- 7 + 4 = 11
Notes
We wanted list context, so G_ARRAY was used.
Not surprisingly POPi
is used twice this time because we were
retrieving 2 values from the stack. The important thing to note is that
when using the POP*
macros they come off the stack in reverse
order.
Say the Perl subroutine in the previous section was called in a scalar context, like this
- static void
- call_AddSubScalar(a, b)
- int a;
- int b;
- {
- dSP;
- int count;
- int i;
- ENTER;
- SAVETMPS;
- PUSHMARK(SP);
- XPUSHs(sv_2mortal(newSViv(a)));
- XPUSHs(sv_2mortal(newSViv(b)));
- PUTBACK;
- count = call_pv("AddSubtract", G_SCALAR);
- SPAGAIN;
- printf ("Items Returned = %d\n", count);
- for (i = 1; i <= count; ++i)
- printf ("Value %d = %d\n", i, POPi);
- PUTBACK;
- FREETMPS;
- LEAVE;
- }
The other modification made is that call_AddSubScalar will print the number of items returned from the Perl subroutine and their value (for simplicity it assumes that they are integer). So if call_AddSubScalar is called
- call_AddSubScalar(7, 4);
then the output will be
- Items Returned = 1
- Value 1 = 3
In this case the main point to note is that only the last item in the list is returned from the subroutine. AddSubtract actually made it back to call_AddSubScalar.
It is also possible to return values directly via the parameter list--whether it is actually desirable to do it is another matter entirely.
The Perl subroutine, Inc, below takes 2 parameters and increments each directly.
and here is a C function to call it.
- static void
- call_Inc(a, b)
- int a;
- int b;
- {
- dSP;
- int count;
- SV * sva;
- SV * svb;
- ENTER;
- SAVETMPS;
- sva = sv_2mortal(newSViv(a));
- svb = sv_2mortal(newSViv(b));
- PUSHMARK(SP);
- XPUSHs(sva);
- XPUSHs(svb);
- PUTBACK;
- count = call_pv("Inc", G_DISCARD);
- if (count != 0)
- croak ("call_Inc: expected 0 values from 'Inc', got %d\n",
- count);
- printf ("%d + 1 = %d\n", a, SvIV(sva));
- printf ("%d + 1 = %d\n", b, SvIV(svb));
- FREETMPS;
- LEAVE;
- }
To be able to access the two parameters that were pushed onto the stack
after they return from call_pv it is necessary to make a note
of their addresses--thus the two variables sva
and svb
.
The reason this is necessary is that the area of the Perl stack which held them will very likely have been overwritten by something else by the time control returns from call_pv.
Now an example using G_EVAL. Below is a Perl subroutine which computes the difference of its 2 parameters. If this would result in a negative result, the subroutine calls die.
and some C to call it
- static void
- call_Subtract(a, b)
- int a;
- int b;
- {
- dSP;
- int count;
- ENTER;
- SAVETMPS;
- PUSHMARK(SP);
- XPUSHs(sv_2mortal(newSViv(a)));
- XPUSHs(sv_2mortal(newSViv(b)));
- PUTBACK;
- count = call_pv("Subtract", G_EVAL|G_SCALAR);
- SPAGAIN;
- /* Check the eval first */
- if (SvTRUE(ERRSV))
- {
- printf ("Uh oh - %s\n", SvPV_nolen(ERRSV));
- POPs;
- }
- else
- {
- if (count != 1)
- croak("call_Subtract: wanted 1 value from 'Subtract', got %d\n",
- count);
- printf ("%d - %d = %d\n", a, b, POPi);
- }
- PUTBACK;
- FREETMPS;
- LEAVE;
- }
If call_Subtract is called thus
- call_Subtract(4, 5)
the following will be printed
- Uh oh - death can be fatal
Notes
We want to be able to catch the die so we have used the G_EVAL flag. Not specifying this flag would mean that the program would terminate immediately at the die statement in the subroutine Subtract.
The code
- if (SvTRUE(ERRSV))
- {
- printf ("Uh oh - %s\n", SvPV_nolen(ERRSV));
- POPs;
- }
is the direct equivalent of this bit of Perl
- print "Uh oh - $@\n" if $@;
PL_errgv
is a perl global of type GV *
that points to the
symbol table entry containing the error. ERRSV
therefore
refers to the C equivalent of $@
.
Note that the stack is popped using POPs
in the block where
SvTRUE(ERRSV)
is true. This is necessary because whenever a
call_* function invoked with G_EVAL|G_SCALAR returns an error,
the top of the stack holds the value undef. Because we want the
program to continue after detecting this error, it is essential that
the stack be tidied up by removing the undef.
Consider this rather facetious example, where we have used an XS version of the call_Subtract example above inside a destructor:
This example will fail to recognize that an error occurred inside the
eval {}
. Here's why: the call_Subtract code got executed while perl
was cleaning up temporaries when exiting the outer braced block, and because
call_Subtract is implemented with call_pv using the G_EVAL
flag, it promptly reset $@
. This results in the failure of the
outermost test for $@
, and thereby the failure of the error trap.
Appending the G_KEEPERR flag, so that the call_pv call in call_Subtract reads:
- count = call_pv("Subtract", G_EVAL|G_SCALAR|G_KEEPERR);
will preserve the error and restore reliable error handling.
In all the previous examples I have 'hard-wired' the name of the Perl subroutine to be called from C. Most of the time though, it is more convenient to be able to specify the name of the Perl subroutine from within the Perl script.
Consider the Perl code below
- sub fred
- {
- print "Hello there\n";
- }
- CallSubPV("fred");
Here is a snippet of XSUB which defines CallSubPV.
- void
- CallSubPV(name)
- char * name
- CODE:
- PUSHMARK(SP);
- call_pv(name, G_DISCARD|G_NOARGS);
That is fine as far as it goes. The thing is, the Perl subroutine can be specified as only a string, however, Perl allows references to subroutines and anonymous subroutines. This is where call_sv is useful.
The code below for CallSubSV is identical to CallSubPV except
that the name
parameter is now defined as an SV* and we use
call_sv instead of call_pv.
- void
- CallSubSV(name)
- SV * name
- CODE:
- PUSHMARK(SP);
- call_sv(name, G_DISCARD|G_NOARGS);
Because we are using an SV to call fred the following can all be used:
As you can see, call_sv gives you much greater flexibility in how you can specify the Perl subroutine.
You should note that, if it is necessary to store the SV (name
in the
example above) which corresponds to the Perl subroutine so that it can
be used later in the program, it not enough just to store a copy of the
pointer to the SV. Say the code above had been like this:
- static SV * rememberSub;
- void
- SaveSub1(name)
- SV * name
- CODE:
- rememberSub = name;
- void
- CallSavedSub1()
- CODE:
- PUSHMARK(SP);
- call_sv(rememberSub, G_DISCARD|G_NOARGS);
The reason this is wrong is that, by the time you come to use the
pointer rememberSub
in CallSavedSub1
, it may or may not still refer
to the Perl subroutine that was recorded in SaveSub1
. This is
particularly true for these cases:
By the time each of the SaveSub1
statements above has been executed,
the SV*s which corresponded to the parameters will no longer exist.
Expect an error message from Perl of the form
- Can't use an undefined value as a subroutine reference at ...
for each of the CallSavedSub1
lines.
Similarly, with this code
- $ref = \&fred;
- SaveSub1($ref);
- $ref = 47;
- CallSavedSub1();
you can expect one of these messages (which you actually get is dependent on the version of Perl you are using)
- Not a CODE reference at ...
- Undefined subroutine &main::47 called ...
The variable $ref may have referred to the subroutine fred
whenever the call to SaveSub1
was made but by the time
CallSavedSub1
gets called it now holds the number 47
. Because we
saved only a pointer to the original SV in SaveSub1
, any changes to
$ref will be tracked by the pointer rememberSub
. This means that
whenever CallSavedSub1
gets called, it will attempt to execute the
code which is referenced by the SV* rememberSub
. In this case
though, it now refers to the integer 47
, so expect Perl to complain
loudly.
A similar but more subtle problem is illustrated with this code:
- $ref = \&fred;
- SaveSub1($ref);
- $ref = \&joe;
- CallSavedSub1();
This time whenever CallSavedSub1
gets called it will execute the Perl
subroutine joe
(assuming it exists) rather than fred
as was
originally requested in the call to SaveSub1
.
To get around these problems it is necessary to take a full copy of the
SV. The code below shows SaveSub2
modified to do that.
- static SV * keepSub = (SV*)NULL;
- void
- SaveSub2(name)
- SV * name
- CODE:
- /* Take a copy of the callback */
- if (keepSub == (SV*)NULL)
- /* First time, so create a new SV */
- keepSub = newSVsv(name);
- else
- /* Been here before, so overwrite */
- SvSetSV(keepSub, name);
- void
- CallSavedSub2()
- CODE:
- PUSHMARK(SP);
- call_sv(keepSub, G_DISCARD|G_NOARGS);
To avoid creating a new SV every time SaveSub2
is called,
the function first checks to see if it has been called before. If not,
then space for a new SV is allocated and the reference to the Perl
subroutine name
is copied to the variable keepSub
in one
operation using newSVsv
. Thereafter, whenever SaveSub2
is called,
the existing SV, keepSub
, is overwritten with the new value using
SvSetSV
.
Here is a Perl subroutine which prints whatever parameters are passed to it.
And here is an example of call_argv which will call PrintList.
- static char * words[] = {"alpha", "beta", "gamma", "delta", NULL};
- static void
- call_PrintList()
- {
- dSP;
- call_argv("PrintList", G_DISCARD, words);
- }
Note that it is not necessary to call PUSHMARK
in this instance.
This is because call_argv will do it for you.
Consider the following Perl code:
It implements just a very simple class to manage an array. Apart from
the constructor, new
, it declares methods, one static and one
virtual. The static method, PrintID
, prints out simply the class
name and a version number. The virtual method, Display
, prints out a
single element of the array. Here is an all-Perl example of using it.
- $a = Mine->new('red', 'green', 'blue');
- $a->Display(1);
- Mine->PrintID;
will print
- 1: green
- This is Class Mine version 1.0
Calling a Perl method from C is fairly straightforward. The following things are required:
A reference to the object for a virtual method or the name of the class for a static method
The name of the method
Any other parameters specific to the method
Here is a simple XSUB which illustrates the mechanics of calling both
the PrintID
and Display
methods from C.
- void
- call_Method(ref, method, index)
- SV * ref
- char * method
- int index
- CODE:
- PUSHMARK(SP);
- XPUSHs(ref);
- XPUSHs(sv_2mortal(newSViv(index)));
- PUTBACK;
- call_method(method, G_DISCARD);
- void
- call_PrintID(class, method)
- char * class
- char * method
- CODE:
- PUSHMARK(SP);
- XPUSHs(sv_2mortal(newSVpv(class, 0)));
- PUTBACK;
- call_method(method, G_DISCARD);
So the methods PrintID
and Display
can be invoked like this:
- $a = Mine->new('red', 'green', 'blue');
- call_Method($a, 'Display', 1);
- call_PrintID('Mine', 'PrintID');
The only thing to note is that, in both the static and virtual methods, the method name is not passed via the stack--it is used as the first parameter to call_method.
Here is a trivial XSUB which prints the context in which it is currently executing.
- void
- PrintContext()
- CODE:
- I32 gimme = GIMME_V;
- if (gimme == G_VOID)
- printf ("Context is Void\n");
- else if (gimme == G_SCALAR)
- printf ("Context is Scalar\n");
- else
- printf ("Context is Array\n");
And here is some Perl to test it.
- PrintContext;
- $a = PrintContext;
- @a = PrintContext;
The output from that will be
- Context is Void
- Context is Scalar
- Context is Array
In the examples given to date, any temporaries created in the callback (i.e., parameters passed on the stack to the call_* function or values returned via the stack) have been freed by one of these methods:
Specifying the G_DISCARD flag with call_*
Explicitly using the ENTER
/SAVETMPS
--FREETMPS
/LEAVE
pairing
There is another method which can be used, namely letting Perl do it for you automatically whenever it regains control after the callback has terminated. This is done by simply not using the
- ENTER;
- SAVETMPS;
- ...
- FREETMPS;
- LEAVE;
sequence in the callback (and not, of course, specifying the G_DISCARD flag).
If you are going to use this method you have to be aware of a possible memory leak which can arise under very specific circumstances. To explain these circumstances you need to know a bit about the flow of control between Perl and the callback routine.
The examples given at the start of the document (an error handler and an event driven program) are typical of the two main sorts of flow control that you are likely to encounter with callbacks. There is a very important distinction between them, so pay attention.
In the first example, an error handler, the flow of control could be as follows. You have created an interface to an external library. Control can reach the external library like this
- perl --> XSUB --> external library
Whilst control is in the library, an error condition occurs. You have previously set up a Perl callback to handle this situation, so it will get executed. Once the callback has finished, control will drop back to Perl again. Here is what the flow of control will be like in that situation
- perl --> XSUB --> external library
- ...
- error occurs
- ...
- external library --> call_* --> perl
- |
- perl <-- XSUB <-- external library <-- call_* <----+
After processing of the error using call_* is completed, control reverts back to Perl more or less immediately.
In the diagram, the further right you go the more deeply nested the scope is. It is only when control is back with perl on the extreme left of the diagram that you will have dropped back to the enclosing scope and any temporaries you have left hanging around will be freed.
In the second example, an event driven program, the flow of control will be more like this
- perl --> XSUB --> event handler
- ...
- event handler --> call_* --> perl
- |
- event handler <-- call_* <----+
- ...
- event handler --> call_* --> perl
- |
- event handler <-- call_* <----+
- ...
- event handler --> call_* --> perl
- |
- event handler <-- call_* <----+
In this case the flow of control can consist of only the repeated sequence
- event handler --> call_* --> perl
for practically the complete duration of the program. This means that control may never drop back to the surrounding scope in Perl at the extreme left.
So what is the big problem? Well, if you are expecting Perl to tidy up those temporaries for you, you might be in for a long wait. For Perl to dispose of your temporaries, control must drop back to the enclosing scope at some stage. In the event driven scenario that may never happen. This means that, as time goes on, your program will create more and more temporaries, none of which will ever be freed. As each of these temporaries consumes some memory your program will eventually consume all the available memory in your system--kapow!
So here is the bottom line--if you are sure that control will revert back to the enclosing Perl scope fairly quickly after the end of your callback, then it isn't absolutely necessary to dispose explicitly of any temporaries you may have created. Mind you, if you are at all uncertain about what to do, it doesn't do any harm to tidy up anyway.
Potentially one of the trickiest problems to overcome when designing a callback interface can be figuring out how to store the mapping between the C callback function and the Perl equivalent.
To help understand why this can be a real problem first consider how a
callback is set up in an all C environment. Typically a C API will
provide a function to register a callback. This will expect a pointer
to a function as one of its parameters. Below is a call to a
hypothetical function register_fatal
which registers the C function
to get called when a fatal error occurs.
- register_fatal(cb1);
The single parameter cb1
is a pointer to a function, so you must
have defined cb1
in your code, say something like this
- static void
- cb1()
- {
- printf ("Fatal Error\n");
- exit(1);
- }
Now change that to call a Perl subroutine instead
- static SV * callback = (SV*)NULL;
- static void
- cb1()
- {
- dSP;
- PUSHMARK(SP);
- /* Call the Perl sub to process the callback */
- call_sv(callback, G_DISCARD);
- }
- void
- register_fatal(fn)
- SV * fn
- CODE:
- /* Remember the Perl sub */
- if (callback == (SV*)NULL)
- callback = newSVsv(fn);
- else
- SvSetSV(callback, fn);
- /* register the callback with the external library */
- register_fatal(cb1);
where the Perl equivalent of register_fatal
and the callback it
registers, pcb1
, might look like this
- # Register the sub pcb1
- register_fatal(\&pcb1);
- sub pcb1
- {
- die "I'm dying...\n";
- }
The mapping between the C callback and the Perl equivalent is stored in
the global variable callback
.
This will be adequate if you ever need to have only one callback
registered at any time. An example could be an error handler like the
code sketched out above. Remember though, repeated calls to
register_fatal
will replace the previously registered callback
function with the new one.
Say for example you want to interface to a library which allows asynchronous file i/o. In this case you may be able to register a callback whenever a read operation has completed. To be of any use we want to be able to call separate Perl subroutines for each file that is opened. As it stands, the error handler example above would not be adequate as it allows only a single callback to be defined at any time. What we require is a means of storing the mapping between the opened file and the Perl subroutine we want to be called for that file.
Say the i/o library has a function asynch_read
which associates a C
function ProcessRead
with a file handle fh
--this assumes that it
has also provided some routine to open the file and so obtain the file
handle.
- asynch_read(fh, ProcessRead)
This may expect the C ProcessRead function of this form
- void
- ProcessRead(fh, buffer)
- int fh;
- char * buffer;
- {
- ...
- }
To provide a Perl interface to this library we need to be able to map
between the fh
parameter and the Perl subroutine we want called. A
hash is a convenient mechanism for storing this mapping. The code
below shows a possible implementation
- static HV * Mapping = (HV*)NULL;
- void
- asynch_read(fh, callback)
- int fh
- SV * callback
- CODE:
- /* If the hash doesn't already exist, create it */
- if (Mapping == (HV*)NULL)
- Mapping = newHV();
- /* Save the fh -> callback mapping */
- hv_store(Mapping, (char*)&fh, sizeof(fh), newSVsv(callback), 0);
- /* Register with the C Library */
- asynch_read(fh, asynch_read_if);
and asynch_read_if
could look like this
- static void
- asynch_read_if(fh, buffer)
- int fh;
- char * buffer;
- {
- dSP;
- SV ** sv;
- /* Get the callback associated with fh */
- sv = hv_fetch(Mapping, (char*)&fh , sizeof(fh), FALSE);
- if (sv == (SV**)NULL)
- croak("Internal error...\n");
- PUSHMARK(SP);
- XPUSHs(sv_2mortal(newSViv(fh)));
- XPUSHs(sv_2mortal(newSVpv(buffer, 0)));
- PUTBACK;
- /* Call the Perl sub */
- call_sv(*sv, G_DISCARD);
- }
For completeness, here is asynch_close
. This shows how to remove
the entry from the hash Mapping
.
- void
- asynch_close(fh)
- int fh
- CODE:
- /* Remove the entry from the hash */
- (void) hv_delete(Mapping, (char*)&fh, sizeof(fh), G_DISCARD);
- /* Now call the real asynch_close */
- asynch_close(fh);
So the Perl interface would look like this
- sub callback1
- {
- my($handle, $buffer) = @_;
- }
- # Register the Perl callback
- asynch_read($fh, \&callback1);
- asynch_close($fh);
The mapping between the C callback and Perl is stored in the global
hash Mapping
this time. Using a hash has the distinct advantage that
it allows an unlimited number of callbacks to be registered.
What if the interface provided by the C callback doesn't contain a
parameter which allows the file handle to Perl subroutine mapping? Say
in the asynchronous i/o package, the callback function gets passed only
the buffer
parameter like this
- void
- ProcessRead(buffer)
- char * buffer;
- {
- ...
- }
Without the file handle there is no straightforward way to map from the C callback to the Perl subroutine.
In this case a possible way around this problem is to predefine a series of C functions to act as the interface to Perl, thus
- #define MAX_CB 3
- #define NULL_HANDLE -1
- typedef void (*FnMap)();
- struct MapStruct {
- FnMap Function;
- SV * PerlSub;
- int Handle;
- };
- static void fn1();
- static void fn2();
- static void fn3();
- static struct MapStruct Map [MAX_CB] =
- {
- { fn1, NULL, NULL_HANDLE },
- { fn2, NULL, NULL_HANDLE },
- { fn3, NULL, NULL_HANDLE }
- };
- static void
- Pcb(index, buffer)
- int index;
- char * buffer;
- {
- dSP;
- PUSHMARK(SP);
- XPUSHs(sv_2mortal(newSVpv(buffer, 0)));
- PUTBACK;
- /* Call the Perl sub */
- call_sv(Map[index].PerlSub, G_DISCARD);
- }
- static void
- fn1(buffer)
- char * buffer;
- {
- Pcb(0, buffer);
- }
- static void
- fn2(buffer)
- char * buffer;
- {
- Pcb(1, buffer);
- }
- static void
- fn3(buffer)
- char * buffer;
- {
- Pcb(2, buffer);
- }
- void
- array_asynch_read(fh, callback)
- int fh
- SV * callback
- CODE:
- int index;
- int null_index = MAX_CB;
- /* Find the same handle or an empty entry */
- for (index = 0; index < MAX_CB; ++index)
- {
- if (Map[index].Handle == fh)
- break;
- if (Map[index].Handle == NULL_HANDLE)
- null_index = index;
- }
- if (index == MAX_CB && null_index == MAX_CB)
- croak ("Too many callback functions registered\n");
- if (index == MAX_CB)
- index = null_index;
- /* Save the file handle */
- Map[index].Handle = fh;
- /* Remember the Perl sub */
- if (Map[index].PerlSub == (SV*)NULL)
- Map[index].PerlSub = newSVsv(callback);
- else
- SvSetSV(Map[index].PerlSub, callback);
- asynch_read(fh, Map[index].Function);
- void
- array_asynch_close(fh)
- int fh
- CODE:
- int index;
- /* Find the file handle */
- for (index = 0; index < MAX_CB; ++ index)
- if (Map[index].Handle == fh)
- break;
- if (index == MAX_CB)
- croak ("could not close fh %d\n", fh);
- Map[index].Handle = NULL_HANDLE;
- SvREFCNT_dec(Map[index].PerlSub);
- Map[index].PerlSub = (SV*)NULL;
- asynch_close(fh);
In this case the functions fn1
, fn2
, and fn3
are used to
remember the Perl subroutine to be called. Each of the functions holds
a separate hard-wired index which is used in the function Pcb
to
access the Map
array and actually call the Perl subroutine.
There are some obvious disadvantages with this technique.
Firstly, the code is considerably more complex than with the previous example.
Secondly, there is a hard-wired limit (in this case 3) to the number of callbacks that can exist simultaneously. The only way to increase the limit is by modifying the code to add more functions and then recompiling. None the less, as long as the number of functions is chosen with some care, it is still a workable solution and in some cases is the only one available.
To summarize, here are a number of possible methods for you to consider for storing the mapping between C and the Perl callback
For a lot of situations, like interfacing to an error handler, this may be a perfectly adequate solution.
If it is impossible to tell from the parameters passed back from the C callback what the context is, then you may need to create a sequence of C callback interface functions, and store pointers to each in an array.
A hash is an ideal mechanism to store the mapping between C and Perl.
Although I have made use of only the POP*
macros to access values
returned from Perl subroutines, it is also possible to bypass these
macros and read the stack using the ST
macro (See perlxs for a
full description of the ST
macro).
Most of the time the POP*
macros should be adequate; the main
problem with them is that they force you to process the returned values
in sequence. This may not be the most suitable way to process the
values in some cases. What we want is to be able to access the stack in
a random order. The ST
macro as used when coding an XSUB is ideal
for this purpose.
The code below is the example given in the section Returning a List
of Values recoded to use ST
instead of POP*
.
- static void
- call_AddSubtract2(a, b)
- int a;
- int b;
- {
- dSP;
- I32 ax;
- int count;
- ENTER;
- SAVETMPS;
- PUSHMARK(SP);
- XPUSHs(sv_2mortal(newSViv(a)));
- XPUSHs(sv_2mortal(newSViv(b)));
- PUTBACK;
- count = call_pv("AddSubtract", G_ARRAY);
- SPAGAIN;
- SP -= count;
- ax = (SP - PL_stack_base) + 1;
- if (count != 2)
- croak("Big trouble\n");
- printf ("%d + %d = %d\n", a, b, SvIV(ST(0)));
- printf ("%d - %d = %d\n", a, b, SvIV(ST(1)));
- PUTBACK;
- FREETMPS;
- LEAVE;
- }
Notes
Notice that it was necessary to define the variable ax
. This is
because the ST
macro expects it to exist. If we were in an XSUB it
would not be necessary to define ax
as it is already defined for
us.
The code
- SPAGAIN;
- SP -= count;
- ax = (SP - PL_stack_base) + 1;
sets the stack up so that we can use the ST
macro.
Unlike the original coding of this example, the returned
values are not accessed in reverse order. So ST(0)
refers to the
first value returned by the Perl subroutine and ST(count-1)
refers to the last.
As we've already shown, call_sv
can be used to invoke an
anonymous subroutine. However, our example showed a Perl script
invoking an XSUB to perform this operation. Let's see how it can be
done inside our C code:
- ...
- SV *cvrv = eval_pv("sub { print 'You will not find me cluttering any namespace!' }", TRUE);
- ...
- call_sv(cvrv, G_VOID|G_NOARGS);
eval_pv
is used to compile the anonymous subroutine, which
will be the return value as well (read more about eval_pv
in
eval_pv in perlapi). Once this code reference is in hand, it
can be mixed in with all the previous examples we've shown.
Sometimes you need to invoke the same subroutine repeatedly. This usually happens with a function that acts on a list of values, such as Perl's built-in sort(). You can pass a comparison function to sort(), which will then be invoked for every pair of values that needs to be compared. The first() and reduce() functions from List::Util follow a similar pattern.
In this case it is possible to speed up the routine (often quite substantially) by using the lightweight callback API. The idea is that the calling context only needs to be created and destroyed once, and the sub can be called arbitrarily many times in between.
It is usual to pass parameters using global variables (typically $_ for one parameter, or $a and $b for two parameters) rather than via @_. (It is possible to use the @_ mechanism if you know what you're doing, though there is as yet no supported API for it. It's also inherently slower.)
The pattern of macro calls is like this:
- dMULTICALL; /* Declare local variables */
- I32 gimme = G_SCALAR; /* context of the call: G_SCALAR,
- * G_ARRAY, or G_VOID */
- PUSH_MULTICALL(cv); /* Set up the context for calling cv,
- and set local vars appropriately */
- /* loop */ {
- /* set the value(s) af your parameter variables */
- MULTICALL; /* Make the actual call */
- } /* end of loop */
- POP_MULTICALL; /* Tear down the calling context */
For some concrete examples, see the implementation of the first() and reduce() functions of List::Util 1.18. There you will also find a header file that emulates the multicall API on older versions of perl.
Paul Marquess
Special thanks to the following people who assisted in the creation of the document.
Jeff Okamoto, Tim Bunce, Nick Gianniotis, Steve Kelem, Gurusamy Sarathy and Larry Wall.
Version 1.3, 14th Apr 1997
perlce - Perl for WinCE
This file gives the instructions for building Perl5.8 and above for WinCE. Please read and understand the terms under which this software is distributed.
miniperl is built. This is a single executable (without DLL), intended to run on Win32, and it will facilitate remaining build process; all binaries built after it are foreign and should not run locally.
miniperl is built using ./win32/Makefile; this is part of normal build process invoked as dependency from wince/Makefile.ce
After miniperl is built, configpm is invoked to create right Config.pm in right place and its corresponding Cross.pm.
Unlike Win32 build, miniperl will not have Config.pm of host within reach; it rather will use Config.pm from within cross-compilation directories.
File Cross.pm is dead simple: for given cross-architecture places in @INC a path where perl modules are, and right Config.pm in that place.
That said, miniperl -Ilib -MConfig -we 1
should report an error, because
it can not find Config.pm. If it does not give an error -- wrong Config.pm
is substituted, and resulting binaries will be a mess.
miniperl -MCross -MConfig -we 1
should run okay, and it will provide right
Config.pm for further compilations.
During extensions build phase, a script ./win32/buldext.pl is invoked, which in turn steps in ./ext subdirectories and performs a build of each extension in turn.
All invokes of Makefile.PL are provided with -MCross
so to enable cross-
compile.
This section describes the steps to be performed to build PerlCE. You may find additional information about building perl for WinCE at http://perlce.sourceforge.net and some pre-built binaries.
For compiling, you need following:
Needed source files can be downloaded at http://perlce.sourceforge.net
Normally you only need to edit ./win32/ce-helpers/compile.bat to reflect your system and run it.
File ./win32/ce-helpers/compile.bat is actually a wrapper to call
nmake -f makefile.ce
with appropriate parameters and it accepts extra
parameters and forwards them to nmake
command as additional
arguments. You should pass target this way.
To prepare distribution you need to do following:
Makefile.ce has CROSS_NAME
macro, and it is used further to refer to
your cross-compilation scheme. You could assign a name to it, but this
is not necessary, because by default it is assigned after your machine
configuration name, such as "wince-sh3-hpc-wce211", and this is enough
to distinguish different builds at the same time. This option could be
handy for several different builds on same platform to perform, say,
threaded build. In a following example we assume that all required
environment variables are set properly for C cross-compiler (a special
*.bat file could fit perfectly to this purpose) and your compile.bat
has proper "MACHINE" parameter set, to, say, wince-mips-pocket-wce300
.
- compile.bat
- compile.bat dist
- compile.bat CROSS_NAME=mips-wce300-thr "USE_ITHREADS=define" "USE_IMP_SYS=define" "USE_MULTI=define"
- compile.bat CROSS_NAME=mips-wce300-thr "USE_ITHREADS=define" "USE_IMP_SYS=define" "USE_MULTI=define" dist
If all goes okay and no errors during a build, you'll get two independent
distributions: wince-mips-pocket-wce300
and mips-wce300-thr
.
Target dist
prepares distribution file set. Target zipdist
performs
same as dist
but additionally compresses distribution files into zip
archive.
NOTE: during a build there could be created a number (or one) of Config.pm for cross-compilation ("foreign" Config.pm) and those are hidden inside ../xlib/$(CROSS_NAME) with other auxiliary files, but, and this is important to note, there should be no Config.pm for host miniperl. If you'll get an error that perl could not find Config.pm somewhere in building process this means something went wrong. Most probably you forgot to specify a cross-compilation when invoking miniperl.exe to Makefile.PL When building an extension for cross-compilation your command line should look like
- ..\miniperl.exe -I..\lib -MCross=mips-wce300-thr Makefile.PL
or just
- ..\miniperl.exe -I..\lib -MCross Makefile.PL
to refer a cross-compilation that was created last time.
All questions related to building for WinCE devices could be asked in perlce-user@lists.sourceforge.net mailing list.
PerlCE is currently linked with a simple console window, so it also works on non-hpc devices.
The simple stdio implementation creates the files stdin.txt, stdout.txt and stderr.txt, so you might examine them if your console has only a limited number of cols.
When exitcode is non-zero, a message box appears, otherwise the console closes, so you might have to catch an exit with status 0 in your program to see any output.
stdout/stderr now go into the files /perl-stdout.txt and /perl-stderr.txt.
PerlIDE is handy to deal with perlce.
No fork(), pipe(), popen() etc.
All environment vars must be stored in HKLM\Environment as strings. They are read at process startup.
Usual perl lib path (semi-list).
Semi-list for executables.
- Tempdir.
- Root for accessing some special files, i.e. /dev/null, /etc/services.
- Rows/cols for console.
- Home directory.
- Size for console font.
You can set these with cereg.exe, a (remote) registry editor or via the PerlIDE.
To start perl by clicking on a perl source file, you have to make the according entries in HKCR (see ce-helpers/wince-reg.bat). cereg.exe (which must be executed on a desktop pc with ActiveSync) is reported not to work on some devices. You have to create the registry entries by hand using a registry editor.
The following Win32-Methods are built-in:
- newXS("Win32::GetCwd", w32_GetCwd, file);
- newXS("Win32::SetCwd", w32_SetCwd, file);
- newXS("Win32::GetTickCount", w32_GetTickCount, file);
- newXS("Win32::GetOSVersion", w32_GetOSVersion, file);
- newXS("Win32::IsWinNT", w32_IsWinNT, file);
- newXS("Win32::IsWin95", w32_IsWin95, file);
- newXS("Win32::IsWinCE", w32_IsWinCE, file);
- newXS("Win32::CopyFile", w32_CopyFile, file);
- newXS("Win32::Sleep", w32_Sleep, file);
- newXS("Win32::MessageBox", w32_MessageBox, file);
- newXS("Win32::GetPowerStatus", w32_GetPowerStatus, file);
- newXS("Win32::GetOemInfo", w32_GetOemInfo, file);
- newXS("Win32::ShellEx", w32_ShellEx, file);
Opening files for read-write is currently not supported if they use stdio (normal perl file handles).
If you find bugs or if it does not work at all on your device, send mail to the address below. Please report the details of your device (processor, ceversion, devicetype (hpc/palm/pocket)) and the date of the downloaded files.
Currently installation instructions are at http://perlce.sourceforge.net/.
After installation & testing processes will stabilize, information will be more precise.
The port for Win32 was used as a reference.
Initial port of perl to WinCE. It was performed in separate directory named wince. This port was based on contents of ./win32 directory. miniperl was not built, user must have HOST perl and properly edit makefile.ce to reflect this.
wince port was kept in the same ./wince directory, and wince/Makefile.ce was used to invoke native compiler to create HOST miniperl, which then facilitates cross-compiling process. Extension building support was added.
Two directories ./win32 and ./wince were merged, so perlce build process comes in ./win32 directory.
perlcheat - Perl 5 Cheat Sheet
This 'cheat sheet' is a handy reference, meant for beginning Perl programmers. Not everything is mentioned, but 195 features may already be overwhelming.
- CONTEXTS SIGILS ref ARRAYS HASHES
- void $scalar SCALAR @array %hash
- scalar @array ARRAY @array[0, 2] @hash{'a', 'b'}
- list %hash HASH $array[0] $hash{'a'}
- &sub CODE
- *glob GLOB SCALAR VALUES
- FORMAT number, string, ref, glob, undef
- REFERENCES
- \ reference $$foo[1] aka $foo->[1]
- $@%&* dereference $$foo{bar} aka $foo->{bar}
- [] anon. arrayref ${$$foo[1]}[2] aka $foo->[1]->[2]
- {} anon. hashref ${$$foo[1]}[2] aka $foo->[1][2]
- \() list of refs
- SYNTAX
- OPERATOR PRECEDENCE foreach (LIST) { } for (a;b;c) { }
- -> while (e) { } until (e) { }
- ++ -- if (e) { } elsif (e) { } else { }
- ** unless (e) { } elsif (e) { } else { }
- ! ~ \ u+ u- given (e) { when (e) {} default {} }
- =~ !~
- * / % x NUMBERS vs STRINGS FALSE vs TRUE
- + - . = = undef, "", 0, "0"
- << >> + . anything else
- named uops == != eq ne
- < > <= >= lt gt le ge < > <= >= lt gt le ge
- == != <=> eq ne cmp ~~ <=> cmp
- &
- | ^ REGEX MODIFIERS REGEX METACHARS
- && /i case insensitive ^ string begin
- || // /m line based ^$ $ str end (bfr \n)
- .. ... /s . includes \n + one or more
- ?: /x ignore wh.space * zero or more
- = += last goto /p preserve ? zero or one
- , => /a ASCII /aa safe {3,7} repeat in range
- list ops /l locale /d dual | alternation
- not /u Unicode [] character class
- and /e evaluate /ee rpts \b word boundary
- or xor /g global \z string end
- /o compile pat once () capture
- DEBUG (?:p) no capture
- -MO=Deparse REGEX CHARCLASSES (?#t) comment
- -MO=Terse . [^\n] (?=p) ZW pos ahead
- -D## \s whitespace (?!p) ZW neg ahead
- -d:Trace \w word chars (?<=p) ZW pos behind \K
- \d digits (?<!p) ZW neg behind
- CONFIGURATION \pP named property (?>p) no backtrack
- perl -V:ivsize \h horiz.wh.space (?|p|p)branch reset
- \R linebreak (?<n>p)named capture
- \S \W \D \H negate \g{n} ref to named cap
- \K keep left part
- FUNCTION RETURN LISTS
- stat localtime caller SPECIAL VARIABLES
- 0 dev 0 second 0 package $_ default variable
- 1 ino 1 minute 1 filename $0 program name
- 2 mode 2 hour 2 line $/ input separator
- 3 nlink 3 day 3 subroutine $\ output separator
- 4 uid 4 month-1 4 hasargs $| autoflush
- 5 gid 5 year-1900 5 wantarray $! sys/libcall error
- 6 rdev 6 weekday 6 evaltext $@ eval error
- 7 size 7 yearday 7 is_require $$ process ID
- 8 atime 8 is_dst 8 hints $. line number
- 9 mtime 9 bitmask @ARGV command line args
- 10 ctime 10 hinthash @INC include paths
- 11 blksz 3..10 only @_ subroutine args
- 12 blcks with EXPR %ENV environment
The first version of this document appeared on Perl Monks, where several people had useful suggestions. Thank you, Perl Monks.
A special thanks to Damian Conway, who didn't only suggest important changes, but also took the time to count the number of listed features and make a Perl 6 version to show that Perl will stay Perl.
Juerd Waalboer <#####@juerd.nl>, with the help of many Perl Monks.
http://perlmonks.org/?node_id=216602 - the original PM post
http://perlmonks.org/?node_id=238031 - Damian Conway's Perl 6 version
http://juerd.nl/site.plp/perlcheat - home of the Perl Cheat Sheet
perlclib - Internal replacements for standard C library functions
One thing Perl porters should note is that perl doesn't tend to use that much of the C standard library internally; you'll see very little use of, for example, the ctype.h functions in there. This is because Perl tends to reimplement or abstract standard library functions, so that we know exactly how they're going to operate.
This is a reference card for people who are familiar with the C library and who want to do things the Perl way; to tell them which functions they ought to use instead of the more normal C functions.
In the following tables:
t
is a type.
p
is a pointer.
n
is a number.
s
is a string.
sv
, av
, hv
, etc. represent variables of their respective types.
Instead of the stdio.h functions, you should use the Perl abstraction
layer. Instead of FILE*
types, you need to be handling PerlIO*
types. Don't forget that with the new PerlIO layered I/O abstraction
FILE*
types may not even be available. See also the perlapio
documentation for more information about the following functions:
- Instead Of: Use:
- stdin PerlIO_stdin()
- stdout PerlIO_stdout()
- stderr PerlIO_stderr()
- fopen(fn, mode) PerlIO_open(fn, mode)
- freopen(fn, mode, stream) PerlIO_reopen(fn, mode, perlio) (Deprecated)
- fflush(stream) PerlIO_flush(perlio)
- fclose(stream) PerlIO_close(perlio)
- Instead Of: Use:
- fprintf(stream, fmt, ...) PerlIO_printf(perlio, fmt, ...)
- [f]getc(stream) PerlIO_getc(perlio)
- [f]putc(stream, n) PerlIO_putc(perlio, n)
- ungetc(n, stream) PerlIO_ungetc(perlio, n)
Note that the PerlIO equivalents of fread
and fwrite
are slightly
different from their C library counterparts:
- fread(p, size, n, stream) PerlIO_read(perlio, buf, numbytes)
- fwrite(p, size, n, stream) PerlIO_write(perlio, buf, numbytes)
- fputs(s, stream) PerlIO_puts(perlio, s)
There is no equivalent to fgets
; one should use sv_gets
instead:
- fgets(s, n, stream) sv_gets(sv, perlio, append)
- Instead Of: Use:
- feof(stream) PerlIO_eof(perlio)
- fseek(stream, n, whence) PerlIO_seek(perlio, n, whence)
- rewind(stream) PerlIO_rewind(perlio)
- fgetpos(stream, p) PerlIO_getpos(perlio, sv)
- fsetpos(stream, p) PerlIO_setpos(perlio, sv)
- ferror(stream) PerlIO_error(perlio)
- clearerr(stream) PerlIO_clearerr(perlio)
- Instead Of: Use:
- t* p = malloc(n) Newx(p, n, t)
- t* p = calloc(n, s) Newxz(p, n, t)
- p = realloc(p, n) Renew(p, n, t)
- memcpy(dst, src, n) Copy(src, dst, n, t)
- memmove(dst, src, n) Move(src, dst, n, t)
- memcpy(dst, src, sizeof(t)) StructCopy(src, dst, t)
- memset(dst, 0, n * sizeof(t)) Zero(dst, n, t)
- memzero(dst, 0) Zero(dst, n, char)
- free(p) Safefree(p)
- strdup(p) savepv(p)
- strndup(p, n) savepvn(p, n) (Hey, strndup doesn't exist!)
- strstr(big, little) instr(big, little)
- strcmp(s1, s2) strLE(s1, s2) / strEQ(s1, s2) / strGT(s1,s2)
- strncmp(s1, s2, n) strnNE(s1, s2, n) / strnEQ(s1, s2, n)
Notice the different order of arguments to Copy
and Move
than used
in memcpy
and memmove
.
Most of the time, though, you'll want to be dealing with SVs internally
instead of raw char *
strings:
- strlen(s) sv_len(sv)
- strcpy(dt, src) sv_setpv(sv, s)
- strncpy(dt, src, n) sv_setpvn(sv, s, n)
- strcat(dt, src) sv_catpv(sv, s)
- strncat(dt, src) sv_catpvn(sv, s)
- sprintf(s, fmt, ...) sv_setpvf(sv, fmt, ...)
Note also the existence of sv_catpvf
and sv_vcatpvfn
, combining
concatenation with formatting.
Sometimes instead of zeroing the allocated heap by using Newxz() you should consider "poisoning" the data. This means writing a bit pattern into it that should be illegal as pointers (and floating point numbers), and also hopefully surprising enough as integers, so that any code attempting to use the data without forethought will break sooner rather than later. Poisoning can be done using the Poison() macros, which have similar arguments to Zero():
- PoisonWith(dst, n, t, b) scribble memory with byte b
- PoisonNew(dst, n, t) equal to PoisonWith(dst, n, t, 0xAB)
- PoisonFree(dst, n, t) equal to PoisonWith(dst, n, t, 0xEF)
- Poison(dst, n, t) equal to PoisonFree(dst, n, t)
There are two types of character class tests that Perl implements: one
type deals in char
s and are thus not Unicode aware (and hence
deprecated unless you know you should use them) and the other type
deal in UV
s and know about Unicode properties. In the following
table, c
is a char
, and u
is a Unicode codepoint.
- Instead Of: Use: But better use:
- isalnum(c) isALNUM(c) isALNUM_uni(u)
- isalpha(c) isALPHA(c) isALPHA_uni(u)
- iscntrl(c) isCNTRL(c) isCNTRL_uni(u)
- isdigit(c) isDIGIT(c) isDIGIT_uni(u)
- isgraph(c) isGRAPH(c) isGRAPH_uni(u)
- islower(c) isLOWER(c) isLOWER_uni(u)
- isprint(c) isPRINT(c) isPRINT_uni(u)
- ispunct(c) isPUNCT(c) isPUNCT_uni(u)
- isspace(c) isSPACE(c) isSPACE_uni(u)
- isupper(c) isUPPER(c) isUPPER_uni(u)
- isxdigit(c) isXDIGIT(c) isXDIGIT_uni(u)
- tolower(c) toLOWER(c) toLOWER_uni(u)
- toupper(c) toUPPER(c) toUPPER_uni(u)
- Instead Of: Use:
- atof(s) Atof(s)
- atol(s) Atol(s)
- strtod(s, &p) Nothing. Just don't use it.
- strtol(s, &p, n) Strtol(s, &p, n)
- strtoul(s, &p, n) Strtoul(s, &p, n)
Notice also the grok_bin
, grok_hex
, and grok_oct
functions in
numeric.c for converting strings representing numbers in the respective
bases into NV
s.
In theory Strtol
and Strtoul
may not be defined if the machine perl is
built on doesn't actually have strtol and strtoul. But as those 2
functions are part of the 1989 ANSI C spec we suspect you'll find them
everywhere by now.
- int rand() double Drand01()
- srand(n) { seedDrand01((Rand_seed_t)n);
- PL_srand_called = TRUE; }
- exit(n) my_exit(n)
- system(s) Don't. Look at pp_system or use my_popen
- getenv(s) PerlEnv_getenv(s)
- setenv(s, val) my_putenv(s, val)
You should not even want to use setjmp.h functions, but if you
think you do, use the JMPENV
stack in scope.h instead.
For signal
/sigaction
, use rsignal(signo, handler)
.
perlcommunity - a brief overview of the Perl community
This document aims to provide an overview of the vast perl community, which is far too large and diverse to provide a detailed listing. If any specific niche has been forgotten, it is not meant as an insult but an omission for the sake of brevity.
The Perl community is as diverse as Perl, and there is a large amount of evidence that the Perl users apply TMTOWTDI to all endeavors, not just programming. From websites, to IRC, to mailing lists, there is more than one way to get involved in the community.
There is a central directory for the Perl community: http://perl.org maintained by the Perl Foundation (http://www.perlfoundation.org/), which tracks and provides services for a variety of other community sites.
Perl runs on e-mail; there is no doubt about it. The Camel book was originally written mostly over e-mail and today Perl's development is co-ordinated through mailing lists. The largest repository of Perl mailing lists is located at http://lists.perl.org.
Most Perl-related projects set up mailing lists for both users and contributors. If you don't see a certain project listed at http://lists.perl.org, check the particular website for that project. Most mailing lists are archived at http://nntp.perl.org/.
There are also plenty of Perl related newsgroups located under
comp.lang.perl.*
.
The Perl community has a rather large IRC presence. For starters, it has its own IRC network, irc://irc.perl.org. General (not help-oriented) chat can be found at irc://irc.perl.org/#perl. Many other more specific chats are also hosted on the network. Information about irc.perl.org is located on the network's website: http://www.irc.perl.org. For a more help-oriented #perl, check out irc://irc.freenode.net/#perl. Perl 6 development also has a presence in irc://irc.freenode.net/#perl6. Most Perl-related channels will be kind enough to point you in the right direction if you ask nicely.
Any large IRC network (Dalnet, EFnet) is also likely to have a #perl channel, with varying activity levels.
Perl websites come in a variety of forms, but they fit into two large categories: forums and news websites. There are many Perl-related websites, so only a few of the community's largest are mentioned here.
Run by O'Reilly Media (the publisher of the Camel Book, among other Perl-related literature), perl.com provides current Perl news, articles, and resources for Perl developers as well as a directory of other useful websites.
Many members of the community have a Perl-related blog on this site. If you'd like to join them, you can sign up for free.
use Perl; used to provide a slashdot-style news/blog website covering all things Perl, from minutes of the meetings of the Perl 6 Design team to conference announcements with (ir)relevant discussion. It no longer accepts updates, but you can still use the site to read old entries and comments.
PerlMonks is one of the largest Perl forums, and describes itself as "A place for individuals to polish, improve, and showcase their Perl skills." and "A community which allows everyone to grow and learn from each other."
Stack Overflow is a free question-and-answer site for programmers. It's not focussed solely on Perl, but it does have an active group of users who do their best to help people with their Perl programming questions.
Many cities around the world have local Perl Mongers chapters. A Perl Mongers chapter is a local user group which typically holds regular in-person meetings, both social and technical; helps organize local conferences, workshops, and hackathons; and provides a mailing list or other continual contact method for its members to keep in touch.
To find your local Perl Mongers (or PM as they're commonly abbreviated) group check the international Perl Mongers directory at http://www.pm.org/.
Perl workshops are, as the name might suggest, workshops where Perl is taught in a variety of ways. At the workshops, subjects range from a beginner's introduction (such as the Pittsburgh Perl Workshop's "Zero To Perl") to much more advanced subjects.
There are several great resources for locating workshops: the websites mentioned above, the calendar mentioned below, and the YAPC Europe website, http://www.yapceurope.org/, which is probably the best resource for European Perl events.
Hackathons are a very different kind of gathering where Perl hackers gather to do just that, hack nonstop for an extended (several day) period on a specific project or projects. Information about hackathons can be located in the same place as information about workshops as well as in irc://irc.perl.org/#perl.
If you have never been to a hackathon, here are a few basic things you need to know before attending: have a working laptop and know how to use it; check out the involved projects beforehand; have the necessary version control client; and bring backup equipment (an extra LAN cable, additional power strips, etc.) because someone will forget.
Perl has two major annual conventions: The Perl Conference (now part of OSCON), put on by O'Reilly, and Yet Another Perl Conference or YAPC (pronounced yap-see), which is localized into several regional YAPCs (North America, Europe, Asia) in a stunning grassroots display by the Perl community. For more information about either conference, check out their respective web pages: OSCON http://conferences.oreillynet.com/; YAPC http://www.yapc.org.
A relatively new conference franchise with a large Perl portion is the Open Source Developers Conference or OSDC. First held in Australia it has recently also spread to Israel and France. More information can be found at: http://www.osdc.com.au/ for Australia, http://www.osdc.org.il for Israel, and http://www.osdc.fr/ for France.
The Perl Review, http://www.theperlreview.com maintains a website and Google calendar (http://www.theperlreview.com/community_calendar) for tracking workshops, hackathons, Perl Mongers meetings, and other events. Views of this calendar are at http://www.perl.org/events.html and http://www.yapc.org.
Not every event or Perl Mongers group is on that calendar, so don't lose heart if you don't see yours posted. To have your event or group listed, contact brian d foy (brian@theperlreview.com).
Edgar "Trizor" Bering <trizor@gmail.com>
perlcygwin - Perl for Cygwin
This document will help you configure, make, test and install Perl on Cygwin. This document also describes features of Cygwin that will affect how Perl behaves at runtime.
NOTE: There are pre-built Perl packages available for Cygwin and a version of Perl is provided in the normal Cygwin install. If you do not need to customize the configuration, consider using one of those packages.
The Cygwin tools are ports of the popular GNU development tools for Win32 platforms. They run thanks to the Cygwin library which provides the UNIX system calls and environment these programs expect. More information about this project can be found at:
A recent net or commercial release of Cygwin is required.
At the time this document was last updated, Cygwin 1.7.16 was current.
While building Perl some changes may be necessary to your Cygwin setup so that Perl builds cleanly. These changes are not required for normal Perl usage.
NOTE: The binaries that are built will run on all Win32 versions. They do not depend on your host system (WinXP/Win2K/Win7) or your Cygwin configuration (binary/text mounts, cvgserver). The only dependencies come from hard-coded pathnames like /usr/local. However, your host system and Cygwin configuration will affect Perl's runtime behavior (see TEST).
PATH
Set the PATH
environment variable so that Configure finds the Cygwin
versions of programs. Any not-needed Windows directories should be removed or
moved to the end of your PATH
.
If you do not have nroff (which is part of the groff package), Configure will not prompt you to install man pages.
The default options gathered by Configure with the assistance of hints/cygwin.sh will build a Perl that supports dynamic loading (which requires a shared cygperl5_16.dll).
This will run Configure and keep a record:
- ./Configure 2>&1 | tee log.configure
If you are willing to accept all the defaults run Configure with -de. However, several useful customizations are available.
It is possible to strip the EXEs and DLLs created by the build process. The resulting binaries will be significantly smaller. If you want the binaries to be stripped, you can either add a -s option when Configure prompts you,
- Any additional ld flags (NOT including libraries)? [none] -s
- Any special flags to pass to g++ to create a dynamically loaded library?
- [none] -s
- Any special flags to pass to gcc to use dynamic linking? [none] -s
or you can edit hints/cygwin.sh and uncomment the relevant variables near the end of the file.
Several Perl functions and modules depend on the existence of some optional libraries. Configure will find them if they are installed in one of the directories listed as being used for library searches. Pre-built packages for most of these are available from the Cygwin installer.
-lcrypt
The crypt package distributed with Cygwin is a Linux compatible 56-bit DES crypt port by Corinna Vinschen.
Alternatively, the crypt libraries in GNU libc have been ported to Cygwin.
-lgdbm_compat
(use GDBM_File
)
GDBM is available for Cygwin.
NOTE: The GDBM library only works on NTFS partitions.
-ldb
(use DB_File
)
BerkeleyDB is available for Cygwin.
NOTE: The BerkeleyDB library only completely works on NTFS partitions.
cygserver
(use IPC::SysV
)
A port of SysV IPC is available for Cygwin.
NOTE: This has not been extensively tested. In particular,
d_semctl_semun
is undefined because it fails a Configure test
and on Win9x the shm*() functions seem to hang. It also creates
a compile time dependency because perl.h includes <sys/ipc.h>
and <sys/sem.h> (which will be required in the future when compiling
CPAN modules). CURRENTLY NOT SUPPORTED!
-lutil
Included with the standard Cygwin netrelease is the inetutils package which includes libutil.a.
The INSTALL document describes several Configure-time options. Some of these will work with Cygwin, others are not yet possible. Also, some of these are experimental. You can either select an option when Configure prompts you or you can define (undefine) symbols on the command line.
-Uusedl
Undefining this symbol forces Perl to be compiled statically.
-Dusemymalloc
By default Perl does not use the malloc()
included with the Perl source,
because it was slower and not entirely thread-safe. If you want to force
Perl to build with the old -Dusemymalloc define this.
-Uuseperlio
Undefining this symbol disables the PerlIO abstraction. PerlIO is now the default; it is not recommended to disable PerlIO.
-Dusemultiplicity
Multiplicity is required when embedding Perl in a C program and using
more than one interpreter instance. This is only required when you build
a not-threaded perl with -Uuseithreads
.
-Uuse64bitint
By default Perl uses 64 bit integers. If you want to use smaller 32 bit integers, define this symbol.
-Duselongdouble
gcc supports long doubles (12 bytes). However, several additional long double math functions are necessary to use them within Perl ({atan2, cos, exp, floor, fmod, frexp, isnan, log, modf, pow, sin, sqrt}l, strtold). These are not yet available with newlib, the Cygwin libc.
-Uuseithreads
Define this symbol if you want not-threaded faster perl.
-Duselargefiles
Cygwin uses 64-bit integers for internal size and position calculations, this will be correctly detected and defined by Configure.
-Dmksymlinks
Use this to build perl outside of the source tree. Details can be found in the INSTALL document. This is the recommended way to build perl from sources.
You may see some messages during Configure that seem suspicious.
d_eofnblk
Win9x does not correctly report EOF
with a non-blocking read on a
closed pipe. You will see the following messages:
- But it also returns -1 to signal EOF, so be careful!
- WARNING: you can't distinguish between EOF and no data!
- *** WHOA THERE!!! ***
- The recommended value for $d_eofnblk on this machine was "define"!
- Keep the recommended value? [y]
At least for consistency with WinNT, you should keep the recommended value.
The following error occurs because of the Cygwin #define
of
_LONG_DOUBLE
:
- Guessing which symbols your C compiler and preprocessor define...
- try.c:<line#>: missing binary operator
This failure does not seem to cause any problems. With older gcc versions, "parse error" is reported instead of "missing binary operator".
Simply run make and wait:
- make 2>&1 | tee log.make
There are two steps to running the test suite:
- make test 2>&1 | tee log.make-test
- cd t; ./perl harness 2>&1 | tee ../log.harness
The same tests are run both times, but more information is provided when
running as ./perl harness.
Test results vary depending on your host system and your Cygwin configuration. If a test can pass in some Cygwin setup, it is always attempted and explainable test failures are documented. It is possible for Perl to pass all the tests, but it is more likely that some tests will fail for one of the reasons listed below.
UNIX file permissions are based on sets of mode bits for
{read,write,execute} for each {user,group,other}. By default Cygwin
only tracks the Win32 read-only attribute represented as the UNIX file
user write bit (files are always readable, files are executable if they
have a .{com,bat,exe} extension or begin with #!
, directories are
always readable and executable). On WinNT with the ntea CYGWIN
setting, the additional mode bits are stored as extended file attributes.
On WinNT with the default ntsec CYGWIN
setting, permissions use the
standard WinNT security descriptors and access control lists. Without one of
these options, these tests will fail (listing not updated yet):
- Failed Test List of failed
- ------------------------------------
- io/fs.t 5, 7, 9-10
- lib/anydbm.t 2
- lib/db-btree.t 20
- lib/db-hash.t 16
- lib/db-recno.t 18
- lib/gdbm.t 2
- lib/ndbm.t 2
- lib/odbm.t 2
- lib/sdbm.t 2
- op/stat.t 9, 20 (.tmp not an executable extension)
Do not use NDBM_File or ODBM_File on FAT filesystem. They can be built on a FAT filesystem, but many tests will fail:
- ../ext/NDBM_File/ndbm.t 13 3328 71 59 83.10% 1-2 4 16-71
- ../ext/ODBM_File/odbm.t 255 65280 ?? ?? % ??
- ../lib/AnyDBM_File.t 2 512 12 2 16.67% 1 4
- ../lib/Memoize/t/errors.t 0 139 11 5 45.45% 7-11
- ../lib/Memoize/t/tie_ndbm.t 13 3328 4 4 100.00% 1-4
- run/fresh_perl.t 97 1 1.03% 91
If you intend to run only on FAT (or if using AnyDBM_File on FAT), run Configure with the -Ui_ndbm and -Ui_dbm options to prevent NDBM_File and ODBM_File being built.
With NTFS (and no CYGWIN=nontsec), there should be no problems even if perl was built on FAT.
fork() failures in io_* testsA fork() failure may result in the following tests failing:
- ext/IO/lib/IO/t/io_multihomed.t
- ext/IO/lib/IO/t/io_sock.t
- ext/IO/lib/IO/t/io_unix.t
See comment on fork in Miscellaneous below.
Cygwin does an outstanding job of providing UNIX-like semantics on top of Win32 systems. However, in addition to the items noted above, there are some differences that you should know about. This is a very brief guide to portability, more information can be found in the Cygwin documentation.
Cygwin pathnames are separated by forward (/) slashes, Universal Naming Codes (//UNC) are also supported Since cygwin-1.7 non-POSIX pathnames are discouraged. Names may contain all printable characters.
File names are case insensitive, but case preserving. A pathname that contains a backslash or drive letter is a Win32 pathname, and not subject to the translations applied to POSIX style pathnames, but cygwin will warn you, so better convert them to POSIX.
For conversion we have Cygwin::win_to_posix_path()
and
Cygwin::posix_to_win_path()
.
Since cygwin-1.7 pathnames are UTF-8 encoded.
Since cygwin-1.7 textmounts are deprecated and strongly discouraged.
When a file is opened it is in either text or binary mode. In text mode
a file is subject to CR/LF/Ctrl-Z translations. With Cygwin, the default
mode for an open() is determined by the mode of the mount that underlies
the file. See Cygwin::is_binmount(). Perl provides a binmode() function
to set binary mode on files that otherwise would be treated as text.
sysopen() with the O_TEXT
flag sets text mode on files that otherwise
would be treated as binary:
- sysopen(FOO, "bar", O_WRONLY|O_CREAT|O_TEXT)
lseek()
, tell() and sysseek() only work with files opened in binary
mode.
The text/binary issue is covered at length in the Cygwin documentation.
PerlIO overrides the default Cygwin Text/Binary behaviour. A file will
always be treated as binary, regardless of the mode of the mount it lives
on, just like it is in UNIX. So CR/LF translation needs to be requested in
either the open() call like this:
- open(FH, ">:crlf", "out.txt");
which will do conversion from LF to CR/LF on the output, or in the environment settings (add this to your .bashrc):
- export PERLIO=crlf
which will pull in the crlf PerlIO layer which does LF -> CRLF conversion on every output generated by perl.
The Cygwin stat(), lstat() and readlink() functions make the .exe
extension transparent by looking for foo.exe when you ask for foo
(unless a foo also exists). Cygwin does not require a .exe
extension, but gcc adds it automatically when building a program.
However, when accessing an executable as a normal file (e.g., cp
in a makefile) the .exe is not transparent. The install program
included with Cygwin automatically appends a .exe when necessary.
Cygwin processes have their own pid, which is different from the
underlying windows pid. Most posix compliant Proc functions expect
the cygwin pid, but several Win32::Process functions expect the
winpid. E.g. $$
is the cygwin pid of /usr/bin/perl, which is not
the winpid. Use Cygwin::winpid_to_pid()
and Cygwin::winpid_to_pid()
to translate between them.
Under Cygwin, $^E is the same as $!. When using Win32 API Functions,
use Win32::GetLastError()
to get the last Windows error.
Using fork() or system() out to another perl after loading multiple dlls
may result on a DLL baseaddress conflict. The internal cygwin error
looks like like the following:
- 0 [main] perl 8916 child_info_fork::abort: data segment start: parent
- (0xC1A000) != child(0xA6A000)
or:
See http://cygwin.com/faq/faq-nochunks.html#faq.using.fixing-fork-failures It helps if not too many DLLs are loaded in memory so the available address space is larger, e.g. stopping the MS Internet Explorer might help.
Use the perlrebase or rebase utilities to resolve the conflicting dll addresses. The rebase package is included in the Cygwin setup. Use setup.exe from http://www.cygwin.com/setup.exe to install it.
1. kill all perl processes and run perlrebase
or
2. kill all cygwin processes and services, start dash from cmd.exe and run rebaseall
.
chown()
On WinNT chown() can change a file's user and group IDs. On Win9x chown()
is a no-op, although this is appropriate since there is no security model.
File locking using the F_GETLK
command to fcntl() is a stub that
returns ENOSYS
.
Win9x can not rename() an open file (although WinNT can).
The Cygwin chroot() implementation has holes (it can not restrict file
access by native Win32 programs).
Inplace editing perl -i
of files doesn't work without doing a backup
of the file being edited perl -i.bak
because of windowish restrictions,
therefore Perl adds the suffix .bak automatically if you use perl -i
without specifying a backup extension.
Cwd::cwd
Returns the current working directory.
Cygwin::pid_to_winpid
Translates a cygwin pid to the corresponding Windows pid (which may or may not be the same).
Cygwin::winpid_to_pid
Translates a Windows pid to the corresponding cygwin pid (if any).
Cygwin::win_to_posix_path
Translates a Windows path to the corresponding cygwin path respecting the current mount points. With a second non-null argument returns an absolute path. Double-byte characters will not be translated.
Cygwin::posix_to_win_path
Translates a cygwin path to the corresponding cygwin path respecting the current mount points. With a second non-null argument returns an absolute path. Double-byte characters will not be translated.
Cygwin::mount_table()
Returns an array of [mnt_dir, mnt_fsname, mnt_type, mnt_opts].
- perl -e 'for $i (Cygwin::mount_table) {print join(" ",@$i),"\n";}'
- /bin c:\cygwin\bin system binmode,cygexec
- /usr/bin c:\cygwin\bin system binmode
- /usr/lib c:\cygwin\lib system binmode
- / c:\cygwin system binmode
- /cygdrive/c c: system binmode,noumount
- /cygdrive/d d: system binmode,noumount
- /cygdrive/e e: system binmode,noumount
Cygwin::mount_flags
Returns the mount type and flags for a specified mount point. A comma-separated string of mntent->mnt_type (always "system" or "user"), then the mntent->mnt_opts, where the first is always "binmode" or "textmode".
If the argument is "/cygdrive", then just the volume mount settings, and the cygdrive mount prefix are returned.
User mounts override system mounts.
- $ perl -e 'print Cygwin::mount_flags "/usr/bin"'
- system,binmode,cygexec
- $ perl -e 'print Cygwin::mount_flags "/cygdrive"'
- binmode,cygdrive,/cygdrive
Cygwin::is_binmount
Returns true if the given cygwin path is binary mounted, false if the path is mounted in textmode.
Cygwin::sync_winenv
Cygwin does not initialize all original Win32 environment variables. See the bottom of this page http://cygwin.com/cygwin-ug-net/setup-env.html for "Restricted Win32 environment".
Certain Win32 programs called from cygwin programs might need some environment variable, such as e.g. ADODB needs %COMMONPROGRAMFILES%. Call Cygwin::sync_winenv() to copy all Win32 environment variables to your process and note that cygwin will warn on every encounter of non-POSIX paths.
This will install Perl, including man pages.
- make install 2>&1 | tee log.make-install
NOTE: If STDERR
is redirected make install
will not prompt
you to install perl into /usr/bin.
You may need to be Administrator to run make install
. If you
are not, you must have write access to the directories in question.
Information on installing the Perl documentation in HTML format can be found in the INSTALL document.
These are the files in the Perl release that contain references to Cygwin. These very brief notes attempt to explain the reason for all conditional code. Hopefully, keeping this up to date will allow the Cygwin port to be kept as clean as possible.
- INSTALL README.cygwin README.win32 MANIFEST
- pod/perl.pod pod/perlport.pod pod/perlfaq3.pod
- pod/perldelta.pod pod/perl5004delta.pod pod/perl56delta.pod
- pod/perl561delta.pod pod/perl570delta.pod pod/perl572delta.pod
- pod/perl573delta.pod pod/perl58delta.pod pod/perl581delta.pod
- pod/perl590delta.pod pod/perlhist.pod pod/perlmodlib.pod
- pod/perltoc.pod Porting/Glossary pod/perlgit.pod
- Porting/checkAUTHORS.pl
- dist/Cwd/Changes ext/Compress-Raw-Zlib/Changes
- ext/Compress-Raw-Zlib/README ext/Compress-Zlib/Changes
- ext/DB_File/Changes ext/Encode/Changes ext/Sys-Syslog/Changes
- ext/Time-HiRes/Changes ext/Win32API-File/Changes lib/CGI/Changes
- lib/ExtUtils/CBuilder/Changes lib/ExtUtils/Changes lib/ExtUtils/NOTES
- lib/ExtUtils/PATCHING lib/ExtUtils/README lib/Module/Build/Changes
- lib/Net/Ping/Changes lib/Test/Harness/Changes
- lib/Term/ANSIColor/ChangeLog lib/Term/ANSIColor/README README.symbian
- symbian/TODO
- cygwin/Makefile.SHs
- ext/IPC/SysV/hints/cygwin.pl
- ext/NDBM_File/hints/cygwin.pl
- ext/ODBM_File/hints/cygwin.pl
- hints/cygwin.sh
- Configure - help finding hints from uname,
- shared libperl required for dynamic loading
- Makefile.SH Cross/Makefile-cross-SH
- - linklibperl
- Porting/patchls - cygwin in port list
- installman - man pages with :: translated to .
- installperl - install dll, install to 'pods'
- makedepend.SH - uwinfix
- regen_lib.pl - file permissions
- NetWare/Makefile
- plan9/mkfile
- symbian/sanity.pl symbian/sisify.pl
- hints/uwin.sh
- vms/descrip_mms.template
- win32/Makefile win32/makefile.mk
- t/io/fs.t - no file mode checks if not ntsec
- skip rename() check when not check_case:relaxed
- t/io/tell.t - binmode
- t/lib/cygwin.t - builtin cygwin function tests
- t/op/groups.t - basegroup has ID = 0
- t/op/magic.t - $^X/symlink WORKAROUND, s/.exe//
- t/op/stat.t - no /dev, skip Win32 ftCreationTime quirk
- (cache manager sometimes preserves ctime of file
- previously created and deleted), no -u (setuid)
- t/op/taint.t - can't use empty path under Cygwin Perl
- t/op/time.t - no tzset()
- EXTERN.h - __declspec(dllimport)
- XSUB.h - __declspec(dllexport)
- cygwin/cygwin.c - os_extras (getcwd, spawn, and several Cygwin:: functions)
- perl.c - os_extras, -i.bak
- perl.h - binmode
- doio.c - win9x can not rename a file when it is open
- pp_sys.c - do not define h_errno, init _pwent_struct.pw_comment
- util.c - use setenv
- util.h - PERL_FILE_IS_ABSOLUTE macro
- pp.c - Comment about Posix vs IEEE math under Cygwin
- perlio.c - CR/LF mode
- perliol.c - Comment about EXTCONST under Cygwin
- ext/Compress-Raw-Zlib/Makefile.PL
- - Can't install via CPAN shell under Cygwin
- ext/Compress-Raw-Zlib/zlib-src/zutil.h
- - Cygwin is Unix-like and has vsnprintf
- ext/Errno/Errno_pm.PL - Special handling for Win32 Perl under Cygwin
- ext/POSIX/POSIX.xs - tzname defined externally
- ext/SDBM_File/sdbm/pair.c
- - EXTCONST needs to be redefined from EXTERN.h
- ext/SDBM_File/sdbm/sdbm.c
- - binary open
- ext/Sys/Syslog/Syslog.xs
- - Cygwin has syslog.h
- ext/Sys/Syslog/win32/compile.pl
- - Convert paths to Windows paths
- ext/Time-HiRes/HiRes.xs
- - Various timers not available
- ext/Time-HiRes/Makefile.PL
- - Find w32api/windows.h
- ext/Win32/Makefile.PL - Use various libraries under Cygwin
- ext/Win32/Win32.xs - Child dir and child env under Cygwin
- ext/Win32API-File/File.xs
- - _open_osfhandle not implemented under Cygwin
- ext/Win32CORE/Win32CORE.c
- - __declspec(dllexport)
- ext/B/t/OptreeCheck.pm - Comment about stderr/stdout order under Cygwin
- ext/Digest-SHA/bin/shasum
- - Use binary mode under Cygwin
- ext/Sys/Syslog/win32/Win32.pm
- - Convert paths to Windows paths
- ext/Time-HiRes/HiRes.pm
- - Comment about various timers not available
- ext/Win32API-File/File.pm
- - _open_osfhandle not implemented under Cygwin
- ext/Win32CORE/Win32CORE.pm
- - History of Win32CORE under Cygwin
- lib/CGI.pm - binmode and path separator
- lib/CPANPLUS/Dist/MM.pm - Commented out code that fails under Win32/Cygwin
- lib/CPANPLUS/Internals/Constants/Report.pm
- - OS classifications
- lib/CPANPLUS/Internals/Constants.pm
- - Constants for Cygwin
- lib/CPANPLUS/Internals/Report.pm
- - Example of Cygwin report
- lib/CPANPLUS/Module.pm
- - Abort if running on old Cygwin version
- lib/Cwd.pm - hook to internal Cwd::cwd
- lib/ExtUtils/CBuilder/Platform/cygwin.pm
- - use gcc for ld, and link to libperl.dll.a
- lib/ExtUtils/CBuilder.pm
- - Cygwin is Unix-like
- lib/ExtUtils/Install.pm - Install and rename issues under Cygwin
- lib/ExtUtils/MM.pm - OS classifications
- lib/ExtUtils/MM_Any.pm - Example for Cygwin
- lib/ExtUtils/MakeMaker.pm
- - require MM_Cygwin.pm
- lib/ExtUtils/MM_Cygwin.pm
- - canonpath, cflags, manifypods, perl_archive
- lib/File/Fetch.pm - Comment about quotes using a Cygwin example
- lib/File/Find.pm - on remote drives stat() always sets st_nlink to 1
- lib/File/Spec/Cygwin.pm - case_tolerant
- lib/File/Spec/Unix.pm - preserve //unc
- lib/File/Spec/Win32.pm - References a message on cygwin.com
- lib/File/Spec.pm - Pulls in lib/File/Spec/Cygwin.pm
- lib/File/Temp.pm - no directory sticky bit
- lib/Module/Build/Compat.pm - Comment references 'make' under Cygwin
- lib/Module/Build/Platform/cygwin.pm
- - Use '.' for man page separator
- lib/Module/Build.pm - Cygwin is Unix-like
- lib/Module/CoreList.pm - List of all module files and versions
- lib/Net/Domain.pm - No domainname command under Cygwin
- lib/Net/Netrc.pm - Bypass using stat() under Cygwin
- lib/Net/Ping.pm - ECONREFUSED is EAGAIN under Cygwin
- lib/Pod/Find.pm - Set 'pods' dir
- lib/Pod/Perldoc/ToMan.pm - '-c' switch for pod2man
- lib/Pod/Perldoc.pm - Use 'less' pager, and use .exe extension
- lib/Term/ANSIColor.pm - Cygwin terminal info
- lib/perl5db.pl - use stdin not /dev/tty
- utils/perlbug.PL - Add CYGWIN environment variable to report
- dist/Cwd/t/cwd.t
- ext/Compress-Zlib/t/14gzopen.t
- ext/DB_File/t/db-btree.t
- ext/DB_File/t/db-hash.t
- ext/DB_File/t/db-recno.t
- ext/DynaLoader/t/DynaLoader.t
- ext/File-Glob/t/basic.t
- ext/GDBM_File/t/gdbm.t
- ext/POSIX/t/sysconf.t
- ext/POSIX/t/time.t
- ext/SDBM_File/t/sdbm.t
- ext/Sys/Syslog/t/syslog.t
- ext/Time-HiRes/t/HiRes.t
- ext/Win32/t/Unicode.t
- ext/Win32API-File/t/file.t
- ext/Win32CORE/t/win32core.t
- lib/AnyDBM_File.t
- lib/Archive/Extract/t/01_Archive-Extract.t
- lib/Archive/Tar/t/02_methods.t
- lib/CPANPLUS/t/05_CPANPLUS-Internals-Fetch.t
- lib/CPANPLUS/t/20_CPANPLUS-Dist-MM.t
- lib/ExtUtils/t/Embed.t
- lib/ExtUtils/t/eu_command.t
- lib/ExtUtils/t/MM_Cygwin.t
- lib/ExtUtils/t/MM_Unix.t
- lib/File/Compare.t
- lib/File/Copy.t
- lib/File/Find/t/find.t
- lib/File/Path.t
- lib/File/Spec/t/crossplatform.t
- lib/File/Spec/t/Spec.t
- lib/Module/Build/t/destinations.t
- lib/Net/hostent.t
- lib/Net/Ping/t/110_icmp_inst.t
- lib/Net/Ping/t/500_ping_icmp.t
- lib/Net/t/netrc.t
- lib/Pod/Simple/t/perlcyg.pod
- lib/Pod/Simple/t/perlcygo.txt
- lib/Pod/Simple/t/perlfaq.pod
- lib/Pod/Simple/t/perlfaqo.txt
- lib/User/grent.t
- lib/User/pwent.t
Support for swapping real and effective user and group IDs is incomplete.
On WinNT Cygwin provides setuid()
, seteuid()
, setgid()
and setegid()
.
However, additional Cygwin calls for manipulating WinNT access tokens
and security contexts are required.
Charles Wilson <cwilson@ece.gatech.edu>, Eric Fifer <egf7@columbia.edu>, alexander smishlajev <als@turnhere.com>, Steven Morlock <newspost@morlock.net>, Sebastien Barre <Sebastien.Barre@utc.fr>, Teun Burgers <burgers@ecn.nl>, Gerrit P. Haase <gp@familiehaase.de>, Reini Urban <rurban@cpan.org>, Jan Dubois <jand@activestate.com>, Jerry D. Hedden <jdhedden@cpan.org>.
Last updated: 2012-02-08
perldata - Perl data types
Perl has three built-in data types: scalars, arrays of scalars, and associative arrays of scalars, known as "hashes". A scalar is a single string (of any size, limited only by the available memory), number, or a reference to something (which will be discussed in perlref). Normal arrays are ordered lists of scalars indexed by number, starting with 0. Hashes are unordered collections of scalar values indexed by their associated string key.
Values are usually referred to by name, or through a named reference.
The first character of the name tells you to what sort of data
structure it refers. The rest of the name tells you the particular
value to which it refers. Usually this name is a single identifier,
that is, a string beginning with a letter or underscore, and
containing letters, underscores, and digits. In some cases, it may
be a chain of identifiers, separated by ::
(or by the slightly
archaic '); all but the last are interpreted as names of packages,
to locate the namespace in which to look up the final identifier
(see Packages in perlmod for details). For a more in-depth discussion
on identifiers, see Identifier parsing. It's possible to
substitute for a simple identifier, an expression that produces a reference
to the value at runtime. This is described in more detail below
and in perlref.
Perl also has its own built-in variables whose names don't follow
these rules. They have strange names so they don't accidentally
collide with one of your normal variables. Strings that match
parenthesized parts of a regular expression are saved under names
containing only digits after the $
(see perlop and perlre).
In addition, several special variables that provide windows into
the inner working of Perl have names containing punctuation characters
and control characters. These are documented in perlvar.
Scalar values are always named with '$', even when referring to a scalar that is part of an array or a hash. The '$' symbol works semantically like the English word "the" in that it indicates a single value is expected.
- $days # the simple scalar value "days"
- $days[28] # the 29th element of array @days
- $days{'Feb'} # the 'Feb' value from hash %days
- $#days # the last index of array @days
Entire arrays (and slices of arrays and hashes) are denoted by '@', which works much as the word "these" or "those" does in English, in that it indicates multiple values are expected.
- @days # ($days[0], $days[1],... $days[n])
- @days[3,4,5] # same as ($days[3],$days[4],$days[5])
- @days{'a','c'} # same as ($days{'a'},$days{'c'})
Entire hashes are denoted by '%':
- %days # (key1, val1, key2, val2 ...)
In addition, subroutines are named with an initial '&', though this is optional when unambiguous, just as the word "do" is often redundant in English. Symbol table entries can be named with an initial '*', but you don't really care about that yet (if ever :-).
Every variable type has its own namespace, as do several
non-variable identifiers. This means that you can, without fear
of conflict, use the same name for a scalar variable, an array, or
a hash--or, for that matter, for a filehandle, a directory handle, a
subroutine name, a format name, or a label. This means that $foo
and @foo are two different variables. It also means that $foo[1]
is a part of @foo, not a part of $foo. This may seem a bit weird,
but that's okay, because it is weird.
Because variable references always start with '$', '@', or '%', the
"reserved" words aren't in fact reserved with respect to variable
names. They are reserved with respect to labels and filehandles,
however, which don't have an initial special character. You can't
have a filehandle named "log", for instance. Hint: you could say
open(LOG,'logfile') rather than open(log,'logfile'). Using
uppercase filehandles also improves readability and protects you
from conflict with future reserved words. Case is significant--"FOO",
"Foo", and "foo" are all different names. Names that start with a
letter or underscore may also contain digits and underscores.
It is possible to replace such an alphanumeric name with an expression that returns a reference to the appropriate type. For a description of this, see perlref.
Names that start with a digit may contain only more digits. Names
that do not start with a letter, underscore, digit or a caret (i.e.
a control character) are limited to one character, e.g., $%
or
$$
. (Most of these one character names have a predefined
significance to Perl. For instance, $$
is the current process
id.)
Up until Perl 5.18, the actual rules of what a valid identifier
was were a bit fuzzy. However, in general, anything defined here should
work on previous versions of Perl, while the opposite -- edge cases
that work in previous versions, but aren't defined here -- probably
won't work on newer versions.
As an important side note, please note that the following only applies
to bareword identifiers as found in Perl source code, not identifiers
introduced through symbolic references, which have much fewer
restrictions.
If working under the effect of the use utf8;
pragma, the following
rules apply:
- / (?[ ( \p{Word} & \p{XID_Start} ) + [_] ]) \p{XID_Continue}* /x
If not under use utf8
, the source is treated as ASCII + 128 extra
controls, and identifiers should match
- / (?aa) (?!\d) \w+ /x
That is, any word character in the ASCII range, as long as the first character is not a digit.
There are two package separators in Perl: A double colon (::
) and a single
quote ('). Normal identifiers can start or end with a double colon, and
can contain several parts delimited by double colons.
Single quotes have similar rules, but with the exception that they are not
legal at the end of an identifier: That is, $'foo
and $foo'bar
are
legal, but $foo'bar'
are not.
Finally, if the identifier is preceded by a sigil -- More so, normal identifiers can start or end with any number of double colons (::), and can contain several parts delimited by double colons. And additionally, if the identifier is preceded by a sigil -- that is, if the identifier is part of a variable name -- it may optionally be enclosed in braces.
While you can mix double colons with singles quotes, the quotes must come
after the colons: $::::'foo
and $foo::'bar
are legal, but $::'::foo
and $foo'::bar
are not.
Put together, a grammar to match a basic identifier becomes
- /
- (?(DEFINE)
- (?<variable>
- (?&sigil)
- (?:
- (?&normal_identifier)
- | \{ \s* (?&normal_identifier) \s* \}
- )
- )
- (?<normal_identifier>
- (?: :: )* '?
- (?&basic_identifier)
- (?: (?= (?: :: )+ '? | (?: :: )* ' ) (?&normal_identifier) )?
- (?: :: )*
- )
- (?<basic_identifier>
- # is use utf8 on?
- (?(?{ (caller(0))[8] & $utf8::hint_bits })
- (?&Perl_XIDS) \p{XID_Continue}*
- | (?aa) (?!\d) \w+
- )
- )
- (?<sigil> [&*\$\@\%])
- (?<Perl_XIDS> (?[ ( \p{Word} & \p{XID_Start} ) + [_] ]) )
- )
- /x
Meanwhile, special identifiers don't follow the above rules; For the most part, all of the identifiers in this category have a special meaning given by Perl. Because they have special parsing rules, these generally can't be fully-qualified. They come in four forms:
$0
,
$1
, or $10000
.
$^V
or $^W
, or a sigil followed by a literal control character
matching the \p{POSIX_Cntrl}
property. Due to a historical oddity, if not
running under use utf8
, the 128 extra controls in the [0x80-0xff]
range
may also be used in length one variables.
${^GLOBAL_PHASE}
or ${\7LOBAL_PHASE}
.
\p{POSIX_Punct}
property, like $!
or %+
.
The interpretation of operations and values in Perl sometimes depends on the requirements of the context around the operation or value. There are two major contexts: list and scalar. Certain operations return list values in contexts wanting a list, and scalar values otherwise. If this is true of an operation it will be mentioned in the documentation for that operation. In other words, Perl overloads certain operations based on whether the expected return value is singular or plural. Some words in English work this way, like "fish" and "sheep".
In a reciprocal fashion, an operation provides either a scalar or a list context to each of its arguments. For example, if you say
- int( <STDIN> )
the integer operation provides scalar context for the <> operator, which responds by reading one line from STDIN and passing it back to the integer operation, which will then find the integer value of that line and return that. If, on the other hand, you say
- sort( <STDIN> )
then the sort operation provides list context for <>, which will proceed to read every line available up to the end of file, and pass that list of lines back to the sort routine, which will then sort those lines and return them as a list to whatever the context of the sort was.
Assignment is a little bit special in that it uses its left argument to determine the context for the right argument. Assignment to a scalar evaluates the right-hand side in scalar context, while assignment to an array or hash evaluates the righthand side in list context. Assignment to a list (or slice, which is just a list anyway) also evaluates the right-hand side in list context.
When you use the use warnings
pragma or Perl's -w command-line
option, you may see warnings
about useless uses of constants or functions in "void context".
Void context just means the value has been discarded, such as a
statement containing only "fred";
or getpwuid(0);. It still
counts as scalar context for functions that care whether or not
they're being called in list context.
User-defined subroutines may choose to care whether they are being called in a void, scalar, or list context. Most subroutines do not need to bother, though. That's because both scalars and lists are automatically interpolated into lists. See wantarray for how you would dynamically discern your function's calling context.
All data in Perl is a scalar, an array of scalars, or a hash of scalars. A scalar may contain one single value in any of three different flavors: a number, a string, or a reference. In general, conversion from one form to another is transparent. Although a scalar may not directly hold multiple values, it may contain a reference to an array or hash which in turn contains multiple values.
Scalars aren't necessarily one thing or another. There's no place to declare a scalar variable to be of type "string", type "number", type "reference", or anything else. Because of the automatic conversion of scalars, operations that return scalars don't need to care (and in fact, cannot care) whether their caller is looking for a string, a number, or a reference. Perl is a contextually polymorphic language whose scalars can be strings, numbers, or references (which includes objects). Although strings and numbers are considered pretty much the same thing for nearly all purposes, references are strongly-typed, uncastable pointers with builtin reference-counting and destructor invocation.
A scalar value is interpreted as FALSE in the Boolean sense if it is undefined, the null string or the number 0 (or its string equivalent, "0"), and TRUE if it is anything else. The Boolean context is just a special kind of scalar context where no conversion to a string or a number is ever performed.
There are actually two varieties of null strings (sometimes referred
to as "empty" strings), a defined one and an undefined one. The
defined version is just a string of length zero, such as ""
.
The undefined version is the value that indicates that there is
no real value for something, such as when there was an error, or
at end of file, or when you refer to an uninitialized variable or
element of an array or hash. Although in early versions of Perl,
an undefined scalar could become defined when first used in a
place expecting a defined value, this no longer happens except for
rare cases of autovivification as explained in perlref. You can
use the defined() operator to determine whether a scalar value is
defined (this has no meaning on arrays or hashes), and the undef()
operator to produce an undefined value.
To find out whether a given string is a valid non-zero number, it's sometimes enough to test it against both numeric 0 and also lexical "0" (although this will cause noises if warnings are on). That's because strings that aren't numbers count as 0, just as they do in awk:
- if ($str == 0 && $str ne "0") {
- warn "That doesn't look like a number";
- }
That method may be best because otherwise you won't treat IEEE
notations like NaN
or Infinity
properly. At other times, you
might prefer to determine whether string data can be used numerically
by calling the POSIX::strtod() function or by inspecting your string
with a regular expression (as documented in perlre).
- warn "has nondigits" if /\D/;
- warn "not a natural number" unless /^\d+$/; # rejects -3
- warn "not an integer" unless /^-?\d+$/; # rejects +3
- warn "not an integer" unless /^[+-]?\d+$/;
- warn "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2
- warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/;
- warn "not a C float"
- unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
The length of an array is a scalar value. You may find the length
of array @days by evaluating $#days
, as in csh. However, this
isn't the length of the array; it's the subscript of the last element,
which is a different value since there is ordinarily a 0th element.
Assigning to $#days
actually changes the length of the array.
Shortening an array this way destroys intervening values. Lengthening
an array that was previously shortened does not recover values
that were in those elements.
You can also gain some minuscule measure of efficiency by pre-extending an array that is going to get big. You can also extend an array by assigning to an element that is off the end of the array. You can truncate an array down to nothing by assigning the null list () to it. The following are equivalent:
- @whatever = ();
- $#whatever = -1;
If you evaluate an array in scalar context, it returns the length of the array. (Note that this is not true of lists, which return the last value, like the C comma operator, nor of built-in functions, which return whatever they feel like returning.) The following is always true:
- scalar(@whatever) == $#whatever + 1;
Some programmers choose to use an explicit conversion so as to leave nothing to doubt:
- $element_count = scalar(@whatever);
If you evaluate a hash in scalar context, it returns false if the
hash is empty. If there are any key/value pairs, it returns true;
more precisely, the value returned is a string consisting of the
number of used buckets and the number of allocated buckets, separated
by a slash. This is pretty much useful only to find out whether
Perl's internal hashing algorithm is performing poorly on your data
set. For example, you stick 10,000 things in a hash, but evaluating
%HASH in scalar context reveals "1/16"
, which means only one out
of sixteen buckets has been touched, and presumably contains all
10,000 of your items. This isn't supposed to happen. If a tied hash
is evaluated in scalar context, the SCALAR
method is called (with a
fallback to FIRSTKEY
).
You can preallocate space for a hash by assigning to the keys() function. This rounds up the allocated buckets to the next power of two:
- keys(%users) = 1000; # allocate 1024 buckets
Numeric literals are specified in any of the following floating point or integer formats:
- 12345
- 12345.67
- .23E-10 # a very small number
- 3.14_15_92 # a very important number
- 4_294_967_296 # underscore for legibility
- 0xff # hex
- 0xdead_beef # more hex
- 0377 # octal (only numbers, begins with 0)
- 0b011011 # binary
You are allowed to use underscores (underbars) in numeric literals
between digits for legibility (but not multiple underscores in a row:
23__500
is not legal; 23_500
is).
You could, for example, group binary
digits by threes (as for a Unix-style mode argument such as 0b110_100_100)
or by fours (to represent nibbles, as in 0b1010_0110) or in other groups.
String literals are usually delimited by either single or double
quotes. They work much like quotes in the standard Unix shells:
double-quoted string literals are subject to backslash and variable
substitution; single-quoted strings are not (except for \' and
\\
). The usual C-style backslash rules apply for making
characters such as newline, tab, etc., as well as some more exotic
forms. See Quote and Quote-like Operators in perlop for a list.
Hexadecimal, octal, or binary, representations in string literals (e.g. '0xff') are not automatically converted to their integer representation. The hex() and oct() functions make these conversions for you. See hex and oct for more details.
You can also embed newlines directly in your strings, i.e., they can end on a different line than they begin. This is nice, but if you forget your trailing quote, the error will not be reported until Perl finds another line containing the quote character, which may be much further on in the script. Variable substitution inside strings is limited to scalar variables, arrays, and array or hash slices. (In other words, names beginning with $ or @, followed by an optional bracketed expression as a subscript.) The following code segment prints out "The price is $100."
- $Price = '$100'; # not interpolated
- print "The price is $Price.\n"; # interpolated
There is no double interpolation in Perl, so the $100
is left as is.
By default floating point numbers substituted inside strings use the
dot (".") as the decimal separator. If use locale
is in effect,
and POSIX::setlocale() has been called, the character used for the
decimal separator is affected by the LC_NUMERIC locale.
See perllocale and POSIX.
As in some shells, you can enclose the variable name in braces to disambiguate it from following alphanumerics (and underscores). You must also do this when interpolating a variable into a string to separate the variable name from a following double-colon or an apostrophe, since these would be otherwise treated as a package separator:
Without the braces, Perl would have looked for a $whospeak, a
$who::0
, and a $who's
variable. The last two would be the
$0 and the $s variables in the (presumably) non-existent package
who
.
In fact, a simple identifier within such curlies is forced to be
a string, and likewise within a hash subscript. Neither need
quoting. Our earlier example, $days{'Feb'}
can be written as
$days{Feb}
and the quotes will be assumed automatically. But
anything more complicated in the subscript will be interpreted as an
expression. This means for example that $version{2.0}++
is
equivalent to $version{2}++
, not to $version{'2.0'}++
.
A literal of the form v1.20.300.4000
is parsed as a string composed
of characters with the specified ordinals. This form, known as
v-strings, provides an alternative, more readable way to construct
strings, rather than use the somewhat less readable interpolation form
"\x{1}\x{14}\x{12c}\x{fa0}"
. This is useful for representing
Unicode strings, and for comparing version "numbers" using the string
comparison operators, cmp
, gt
, lt
etc. If there are two or
more dots in the literal, the leading v
may be omitted.
Such literals are accepted by both require and use for
doing a version check. Note that using the v-strings for IPv4
addresses is not portable unless you also use the
inet_aton()/inet_ntoa() routines of the Socket package.
Note that since Perl 5.8.1 the single-number v-strings (like v65
)
are not v-strings before the =>
operator (which is usually used
to separate a hash key from a hash value); instead they are interpreted
as literal strings ('v65'). They were v-strings from Perl 5.6.0 to
Perl 5.8.0, but that caused more confusion and breakage than good.
Multi-number v-strings like v65.66
and 65.66.67
continue to
be v-strings always.
The special literals __FILE__, __LINE__, and __PACKAGE__
represent the current filename, line number, and package name at that
point in your program. __SUB__ gives a reference to the current
subroutine. They may be used only as separate tokens; they
will not be interpolated into strings. If there is no current package
(due to an empty package; directive), __PACKAGE__ is the undefined
value. (But the empty package; is no longer supported, as of version
5.10.) Outside of a subroutine, __SUB__ is the undefined value. __SUB__
is only available in 5.16 or higher, and only with a use v5.16
or
use feature "current_sub"
declaration.
The two control characters ^D and ^Z, and the tokens __END__ and __DATA__ may be used to indicate the logical end of the script before the actual end of file. Any following text is ignored.
Text after __DATA__ may be read via the filehandle PACKNAME::DATA
,
where PACKNAME
is the package that was current when the __DATA__
token was encountered. The filehandle is left open pointing to the
line after __DATA__. The program should close DATA
when it is done
reading from it. (Leaving it open leaks filehandles if the module is
reloaded for any reason, so it's a safer practice to close it.) For
compatibility with older scripts written before __DATA__ was
introduced, __END__ behaves like __DATA__ in the top level script (but
not in files loaded with require or do) and leaves the remaining
contents of the file accessible via main::DATA
.
See SelfLoader for more description of __DATA__, and an example of its use. Note that you cannot read from the DATA filehandle in a BEGIN block: the BEGIN block is executed as soon as it is seen (during compilation), at which point the corresponding __DATA__ (or __END__) token has not yet been seen.
A word that has no other interpretation in the grammar will
be treated as if it were a quoted string. These are known as
"barewords". As with filehandles and labels, a bareword that consists
entirely of lowercase letters risks conflict with future reserved
words, and if you use the use warnings
pragma or the -w switch,
Perl will warn you about any such words. Perl limits barewords (like
identifiers) to about 250 characters. Future versions of Perl are likely
to eliminate these arbitrary limitations.
Some people may wish to outlaw barewords entirely. If you say
- use strict 'subs';
then any bareword that would NOT be interpreted as a subroutine call
produces a compile-time error instead. The restriction lasts to the
end of the enclosing block. An inner block may countermand this
by saying no strict 'subs'
.
Arrays and slices are interpolated into double-quoted strings
by joining the elements with the delimiter specified in the $"
variable ($LIST_SEPARATOR
if "use English;" is specified),
space by default. The following are equivalent:
Within search patterns (which also undergo double-quotish substitution)
there is an unfortunate ambiguity: Is /$foo[bar]/
to be interpreted as
/${foo}[bar]/
(where [bar]
is a character class for the regular
expression) or as /${foo[bar]}/
(where [bar]
is the subscript to array
@foo)? If @foo doesn't otherwise exist, then it's obviously a
character class. If @foo exists, Perl takes a good guess about [bar]
,
and is almost always right. If it does guess wrong, or if you're just
plain paranoid, you can force the correct interpretation with curly
braces as above.
If you're looking for the information on how to use here-documents, which used to be here, that's been moved to Quote and Quote-like Operators in perlop.
List values are denoted by separating individual values by commas (and enclosing the list in parentheses where precedence requires it):
- (LIST)
In a context not requiring a list value, the value of what appears to be a list literal is simply the value of the final element, as with the C comma operator. For example,
- @foo = ('cc', '-E', $bar);
assigns the entire list value to array @foo, but
- $foo = ('cc', '-E', $bar);
assigns the value of variable $bar to the scalar variable $foo. Note that the value of an actual array in scalar context is the length of the array; the following assigns the value 3 to $foo:
- @foo = ('cc', '-E', $bar);
- $foo = @foo; # $foo gets 3
You may have an optional comma before the closing parenthesis of a list literal, so that you can say:
- @foo = (
- 1,
- 2,
- 3,
- );
To use a here-document to assign an array, one line per element, you might use an approach like this:
- @sauces = <<End_Lines =~ m/(\S.*\S)/g;
- normal tomato
- spicy tomato
- green chile
- pesto
- white wine
- End_Lines
LISTs do automatic interpolation of sublists. That is, when a LIST is evaluated, each element of the list is evaluated in list context, and the resulting list value is interpolated into LIST just as if each individual element were a member of LIST. Thus arrays and hashes lose their identity in a LIST--the list
- (@foo,@bar,&SomeSub,%glarch)
contains all the elements of @foo followed by all the elements of @bar, followed by all the elements returned by the subroutine named SomeSub called in list context, followed by the key/value pairs of %glarch. To make a list reference that does NOT interpolate, see perlref.
The null list is represented by (). Interpolating it in a list has no effect. Thus ((),(),()) is equivalent to (). Similarly, interpolating an array with no elements is the same as if no array had been interpolated at that point.
This interpolation combines with the facts that the opening
and closing parentheses are optional (except when necessary for
precedence) and lists may end with an optional comma to mean that
multiple commas within lists are legal syntax. The list 1,,3
is a
concatenation of two lists, 1,
and 3
, the first of which ends
with that optional comma. 1,,3
is (1,),(3)
is 1,3
(And
similarly for 1,,,3
is (1,),(,),3
is 1,3
and so on.) Not that
we'd advise you to use this obfuscation.
A list value may also be subscripted like a normal array. You must put the list in parentheses to avoid ambiguity. For example:
Lists may be assigned to only when each element of the list is itself legal to assign to:
- ($a, $b, $c) = (1, 2, 3);
- ($map{'red'}, $map{'blue'}, $map{'green'}) = (0x00f, 0x0f0, 0xf00);
An exception to this is that you may assign to undef in a list.
This is useful for throwing away some of the return values of a
function:
List assignment in scalar context returns the number of elements produced by the expression on the right side of the assignment:
- $x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2
- $x = (($foo,$bar) = f()); # set $x to f()'s return count
This is handy when you want to do a list assignment in a Boolean context, because most list functions return a null list when finished, which when assigned produces a 0, which is interpreted as FALSE.
It's also the source of a useful idiom for executing a function or performing an operation in list context and then counting the number of return values, by assigning to an empty list and then using that assignment in scalar context. For example, this code:
- $count = () = $string =~ /\d+/g;
will place into $count the number of digit groups found in $string. This happens because the pattern match is in list context (since it is being assigned to the empty list), and will therefore return a list of all matching parts of the string. The list assignment in scalar context will translate that into the number of elements (here, the number of times the pattern matched) and assign that to $count. Note that simply using
- $count = $string =~ /\d+/g;
would not have worked, since a pattern match in scalar context will only return true or false, rather than a count of matches.
The final element of a list assignment may be an array or a hash:
You can actually put an array or hash anywhere in the list, but the first one in the list will soak up all the values, and anything after it will become undefined. This may be useful in a my() or local().
A hash can be initialized using a literal list holding pairs of items to be interpreted as a key and a value:
- # same as map assignment above
- %map = ('red',0x00f,'blue',0x0f0,'green',0xf00);
While literal lists and named arrays are often interchangeable, that's not the case for hashes. Just because you can subscript a list value like a normal array does not mean that you can subscript a list value as a hash. Likewise, hashes included as parts of other lists (including parameters lists and return lists from functions) always flatten out into key/value pairs. That's why it's good to use references sometimes.
It is often more readable to use the =>
operator between key/value
pairs. The =>
operator is mostly just a more visually distinctive
synonym for a comma, but it also arranges for its left-hand operand to be
interpreted as a string if it's a bareword that would be a legal simple
identifier. =>
doesn't quote compound identifiers, that contain
double colons. This makes it nice for initializing hashes:
- %map = (
- red => 0x00f,
- blue => 0x0f0,
- green => 0xf00,
- );
or for initializing hash references to be used as records:
- $rec = {
- witch => 'Mable the Merciless',
- cat => 'Fluffy the Ferocious',
- date => '10/31/1776',
- };
or for using call-by-named-parameter to complicated functions:
- $field = $query->radio_group(
- name => 'group_name',
- values => ['eenie','meenie','minie'],
- default => 'meenie',
- linebreak => 'true',
- labels => \%labels
- );
Note that just because a hash is initialized in that order doesn't mean that it comes out in that order. See sort for examples of how to arrange for an output ordering.
If a key appears more than once in the initializer list of a hash, the last occurrence wins:
- %circle = (
- center => [5, 10],
- center => [27, 9],
- radius => 100,
- color => [0xDF, 0xFF, 0x00],
- radius => 54,
- );
- # same as
- %circle = (
- center => [27, 9],
- color => [0xDF, 0xFF, 0x00],
- radius => 54,
- );
This can be used to provide overridable configuration defaults:
- # values in %args take priority over %config_defaults
- %config = (%config_defaults, %args);
An array can be accessed one scalar at a
time by specifying a dollar sign ($
), then the
name of the array (without the leading @
), then the subscript inside
square brackets. For example:
- @myarray = (5, 50, 500, 5000);
- print "The Third Element is", $myarray[2], "\n";
The array indices start with 0. A negative subscript retrieves its
value from the end. In our example, $myarray[-1]
would have been
5000, and $myarray[-2]
would have been 500.
Hash subscripts are similar, only instead of square brackets curly brackets are used. For example:
- %scientists =
- (
- "Newton" => "Isaac",
- "Einstein" => "Albert",
- "Darwin" => "Charles",
- "Feynman" => "Richard",
- );
- print "Darwin's First Name is ", $scientists{"Darwin"}, "\n";
You can also subscript a list to get a single element from it:
- $dir = (getpwnam("daemon"))[7];
Multidimensional arrays may be emulated by subscripting a hash with a list. The elements of the list are joined with the subscript separator (see $; in perlvar).
- $foo{$a,$b,$c}
is equivalent to
- $foo{join($;, $a, $b, $c)}
The default subscript separator is "\034", the same as SUBSEP in awk.
A slice accesses several elements of a list, an array, or a hash simultaneously using a list of subscripts. It's more convenient than writing out the individual elements as a list of separate scalar values.
- ($him, $her) = @folks[0,-1]; # array slice
- @them = @folks[0 .. 3]; # array slice
- ($who, $home) = @ENV{"USER", "HOME"}; # hash slice
- ($uid, $dir) = (getpwnam("daemon"))[2,7]; # list slice
Since you can assign to a list of variables, you can also assign to an array or hash slice.
- @days[3..5] = qw/Wed Thu Fri/;
- @colors{'red','blue','green'}
- = (0xff0000, 0x0000ff, 0x00ff00);
- @folks[0, -1] = @folks[-1, 0];
The previous assignments are exactly equivalent to
- ($days[3], $days[4], $days[5]) = qw/Wed Thu Fri/;
- ($colors{'red'}, $colors{'blue'}, $colors{'green'})
- = (0xff0000, 0x0000ff, 0x00ff00);
- ($folks[0], $folks[-1]) = ($folks[-1], $folks[0]);
Since changing a slice changes the original array or hash that it's
slicing, a foreach
construct will alter some--or even all--of the
values of the array or hash.
- foreach (@array[ 4 .. 10 ]) { s/peter/paul/ }
- foreach (@hash{qw[key1 key2]}) {
- s/^\s+//; # trim leading whitespace
- s/\s+$//; # trim trailing whitespace
- s/(\w+)/\u\L$1/g; # "titlecase" words
- }
A slice of an empty list is still an empty list. Thus:
- @a = ()[1,0]; # @a has no elements
- @b = (@a)[0,1]; # @b has no elements
But:
- @a = (1)[1,0]; # @a has two elements
- @b = (1,undef)[1,0,2]; # @b has three elements
More generally, a slice yields the empty list if it indexes only beyond the end of a list:
- @a = (1)[ 1,2]; # @a has no elements
- @b = (1)[0,1,2]; # @b has three elements
This makes it easy to write loops that terminate when a null list is returned:
As noted earlier in this document, the scalar sense of list assignment is the number of elements on the right-hand side of the assignment. The null list contains no elements, so when the password file is exhausted, the result is 0, not 2.
Slices in scalar context return the last item of the slice.
- @a = qw/first second third/;
- %h = (first => 'A', second => 'B');
- $t = @a[0, 1]; # $t is now 'second'
- $u = @h{'first', 'second'}; # $u is now 'B'
If you're confused about why you use an '@' there on a hash slice instead of a '%', think of it like this. The type of bracket (square or curly) governs whether it's an array or a hash being looked at. On the other hand, the leading symbol ('$' or '@') on the array or hash indicates whether you are getting back a singular value (a scalar) or a plural one (a list).
Perl uses an internal type called a typeglob to hold an entire
symbol table entry. The type prefix of a typeglob is a *
, because
it represents all types. This used to be the preferred way to
pass arrays and hashes by reference into a function, but now that
we have real references, this is seldom needed.
The main use of typeglobs in modern Perl is create symbol table aliases. This assignment:
- *this = *that;
makes $this an alias for $that, @this an alias for @that, %this an alias for %that, &this an alias for &that, etc. Much safer is to use a reference. This:
- local *Here::blue = \$There::green;
temporarily makes $Here::blue an alias for $There::green, but doesn't make @Here::blue an alias for @There::green, or %Here::blue an alias for %There::green, etc. See Symbol Tables in perlmod for more examples of this. Strange though this may seem, this is the basis for the whole module import/export system.
Another use for typeglobs is to pass filehandles into a function or to create new filehandles. If you need to use a typeglob to save away a filehandle, do it this way:
- $fh = *STDOUT;
or perhaps as a real reference, like this:
- $fh = \*STDOUT;
See perlsub for examples of using these as indirect filehandles in functions.
Typeglobs are also a way to create a local filehandle using the local() operator. These last until their block is exited, but may be passed back. For example:
Now that we have the *foo{THING}
notation, typeglobs aren't used as much
for filehandle manipulations, although they're still needed to pass brand
new file and directory handles into or out of functions. That's because
*HANDLE{IO}
only works if HANDLE has already been used as a handle.
In other words, *FH
must be used to create new symbol table entries;
*foo{THING}
cannot. When in doubt, use *FH
.
All functions that are capable of creating filehandles (open(),
opendir(), pipe(), socketpair(), sysopen(), socket(), and accept())
automatically create an anonymous filehandle if the handle passed to
them is an uninitialized scalar variable. This allows the constructs
such as open(my $fh, ...)
and open(local $fh,...)
to be used to
create filehandles that will conveniently be closed automatically when
the scope ends, provided there are no other references to them. This
largely eliminates the need for typeglobs when opening filehandles
that must be passed around, as in the following example:
Note that if an initialized scalar variable is used instead the
result is different: my $fh='zzz'; open($fh, ...)
is equivalent
to open( *{'zzz'}, ...)
.
use strict 'refs'
forbids such practice.
Another way to create anonymous filehandles is with the Symbol module or with the IO::Handle module and its ilk. These modules have the advantage of not hiding different types of the same name during the local(). See the bottom of open for an example.
See perlvar for a description of Perl's built-in variables and
a discussion of legal variable names. See perlref, perlsub,
and Symbol Tables in perlmod for more discussion on typeglobs and
the *foo{THING}
syntax.
perldbmfilter - Perl DBM Filters
The four filter_*
methods shown above are available in all the DBM
modules that ship with Perl, namely DB_File, GDBM_File, NDBM_File,
ODBM_File and SDBM_File.
Each of the methods works identically, and is used to install (or uninstall) a single DBM Filter. The only difference between them is the place that the filter is installed.
To summarise:
If a filter has been installed with this method, it will be invoked every time you write a key to a DBM database.
If a filter has been installed with this method, it will be invoked every time you write a value to a DBM database.
If a filter has been installed with this method, it will be invoked every time you read a key from a DBM database.
If a filter has been installed with this method, it will be invoked every time you read a value from a DBM database.
You can use any combination of the methods from none to all four.
All filter methods return the existing filter, if present, or undef
if not.
To delete a filter pass undef to it.
When each filter is called by Perl, a local copy of $_
will contain
the key or value to be filtered. Filtering is achieved by modifying
the contents of $_
. The return code from the filter is ignored.
DBM Filters are useful for a class of problems where you always want to make the same transformation to all keys, all values or both.
For example, consider the following scenario. You have a DBM database that you need to share with a third-party C application. The C application assumes that all keys and values are NULL terminated. Unfortunately when Perl writes to DBM databases it doesn't use NULL termination, so your Perl application will have to manage NULL termination itself. When you write to the database you will have to use something like this:
- $hash{"$key\0"} = "$value\0";
Similarly the NULL needs to be taken into account when you are considering the length of existing keys/values.
It would be much better if you could ignore the NULL terminations issue in the main application code and have a mechanism that automatically added the terminating NULL to all keys and values whenever you write to the database and have them removed when you read from the database. As I'm sure you have already guessed, this is a problem that DBM Filters can fix very easily.
- use strict;
- use warnings;
- use SDBM_File;
- use Fcntl;
- my %hash;
- my $filename = "filt";
- unlink $filename;
- my $db = tie(%hash, 'SDBM_File', $filename, O_RDWR|O_CREAT, 0640)
- or die "Cannot open $filename: $!\n";
- # Install DBM Filters
- $db->filter_fetch_key ( sub { s/\0$// } );
- $db->filter_store_key ( sub { $_ .= "\0" } );
- $db->filter_fetch_value(
- sub { no warnings 'uninitialized'; s/\0$// } );
- $db->filter_store_value( sub { $_ .= "\0" } );
- $hash{"abc"} = "def";
- my $a = $hash{"ABC"};
- # ...
- undef $db;
- untie %hash;
The code above uses SDBM_File, but it will work with any of the DBM modules.
Hopefully the contents of each of the filters should be self-explanatory. Both "fetch" filters remove the terminating NULL, and both "store" filters add a terminating NULL.
Here is another real-life example. By default, whenever Perl writes to a DBM database it always writes the key and value as strings. So when you use this:
- $hash{12345} = "something";
the key 12345 will get stored in the DBM database as the 5 byte string
"12345". If you actually want the key to be stored in the DBM database
as a C int, you will have to use pack when writing, and unpack
when reading.
Here is a DBM Filter that does it:
- use strict;
- use warnings;
- use DB_File;
- my %hash;
- my $filename = "filt";
- unlink $filename;
- my $db = tie %hash, 'DB_File', $filename, O_CREAT|O_RDWR, 0666, $DB_HASH
- or die "Cannot open $filename: $!\n";
- $db->filter_fetch_key ( sub { $_ = unpack("i", $_) } );
- $db->filter_store_key ( sub { $_ = pack ("i", $_) } );
- $hash{123} = "def";
- # ...
- undef $db;
- untie %hash;
The code above uses DB_File, but again it will work with any of the DBM modules.
This time only two filters have been used; we only need to manipulate the contents of the key, so it wasn't necessary to install any value filters.
DB_File, GDBM_File, NDBM_File, ODBM_File and SDBM_File.
Paul Marquess
perldebguts - Guts of Perl debugging
This is not perldebug, which tells you how to use the debugger. This manpage describes low-level details concerning the debugger's internals, which range from difficult to impossible to understand for anyone who isn't incredibly intimate with Perl's guts. Caveat lector.
Perl has special debugging hooks at compile-time and run-time used to create debugging environments. These hooks are not to be confused with the perl -Dxxx command described in perlrun, which is usable only if a special Perl is built per the instructions in the INSTALL podpage in the Perl source tree.
For example, whenever you call Perl's built-in caller function
from the package DB
, the arguments that the corresponding stack
frame was called with are copied to the @DB::args
array. These
mechanisms are enabled by calling Perl with the -d switch.
Specifically, the following additional features are enabled
(cf. $^P in perlvar):
Perl inserts the contents of $ENV{PERL5DB}
(or BEGIN {require
'perl5db.pl'}
if not present) before the first line of your program.
Each array @{"_<$filename"}
holds the lines of $filename for a
file compiled by Perl. The same is also true for evaled strings
that contain subroutines, or which are currently being executed.
The $filename for evaled strings looks like (eval 34)
.
Values in this array are magical in numeric context: they compare equal to zero only if the line is not breakable.
Each hash %{"_<$filename"}
contains breakpoints and actions keyed
by line number. Individual entries (as opposed to the whole hash)
are settable. Perl only cares about Boolean true here, although
the values used by perl5db.pl have the form
"$break_condition\0$action"
.
The same holds for evaluated strings that contain subroutines, or
which are currently being executed. The $filename for evaled strings
looks like (eval 34)
.
Each scalar ${"_<$filename"}
contains "_<$filename"
. This is
also the case for evaluated strings that contain subroutines, or
which are currently being executed. The $filename for evaled
strings looks like (eval 34)
.
After each required file is compiled, but before it is executed,
DB::postponed(*{"_<$filename"})
is called if the subroutine
DB::postponed
exists. Here, the $filename is the expanded name of
the required file, as found in the values of %INC.
After each subroutine subname
is compiled, the existence of
$DB::postponed{subname}
is checked. If this key exists,
DB::postponed(subname)
is called if the DB::postponed
subroutine
also exists.
A hash %DB::sub
is maintained, whose keys are subroutine names
and whose values have the form filename:startline-endline
.
filename
has the form (eval 34)
for subroutines defined inside
evals.
When the execution of your program reaches a point that can hold a
breakpoint, the DB::DB()
subroutine is called if any of the variables
$DB::trace
, $DB::single
, or $DB::signal
is true. These variables
are not localizable. This feature is disabled when executing
inside DB::DB()
, including functions called from it
unless $^D & (1<<30)
is true.
When execution of the program reaches a subroutine call, a call to
&DB::sub
(args) is made instead, with $DB::sub
holding the
name of the called subroutine. (This doesn't happen if the subroutine
was compiled in the DB
package.)
Note that if &DB::sub
needs external data for it to work, no
subroutine call is possible without it. As an example, the standard
debugger's &DB::sub
depends on the $DB::deep
variable
(it defines how many levels of recursion deep into the debugger you can go
before a mandatory break). If $DB::deep
is not defined, subroutine
calls are not possible, even though &DB::sub
exists.
The PERL5DB
environment variable can be used to define a debugger.
For example, the minimal "working" debugger (it actually doesn't do anything)
consists of one line:
It can easily be defined like this:
- $ PERL5DB="sub DB::DB {}" perl -d your-script
Another brief debugger, slightly more useful, can be created with only the line:
This debugger prints a number which increments for each statement encountered and waits for you to hit a newline before continuing to the next statement.
The following debugger is actually useful:
- {
- package DB;
- sub DB {}
- sub sub {print ++$i, " $sub\n"; &$sub}
- }
It prints the sequence number of each subroutine call and the name of the
called subroutine. Note that &DB::sub
is being compiled into the
package DB
through the use of the package directive.
When it starts, the debugger reads your rc file (./.perldb or
~/.perldb under Unix), which can set important options.
(A subroutine (&afterinit
) can be defined here as well; it is executed
after the debugger completes its own initialization.)
After the rc file is read, the debugger reads the PERLDB_OPTS
environment variable and uses it to set debugger options. The
contents of this variable are treated as if they were the argument
of an o ...
debugger command (q.v. in Configurable Options in perldebug).
In addition to the file and subroutine-related variables mentioned above, the debugger also maintains various magical internal variables.
@DB::dbline
is an alias for @{"::_<current_file"}
, which
holds the lines of the currently-selected file (compiled by Perl), either
explicitly chosen with the debugger's f
command, or implicitly by flow
of execution.
Values in this array are magical in numeric context: they compare equal to zero only if the line is not breakable.
%DB::dbline
is an alias for %{"::_<current_file"}
, which
contains breakpoints and actions keyed by line number in
the currently-selected file, either explicitly chosen with the
debugger's f
command, or implicitly by flow of execution.
As previously noted, individual entries (as opposed to the whole hash)
are settable. Perl only cares about Boolean true here, although
the values used by perl5db.pl have the form
"$break_condition\0$action"
.
Some functions are provided to simplify customization.
See Configurable Options in perldebug for a description of options parsed by
DB::parse_options(string)
.
DB::dump_trace(skip[,count])
skips the specified number of frames
and returns a list containing information about the calling frames (all
of them, if count
is missing). Each entry is reference to a hash
with keys context
(either ., $
, or @
), sub (subroutine
name, or info about eval), args
(undef or a reference to
an array), file
, and line
.
DB::print_trace(FH, skip[, count[, short]])
prints
formatted info about caller frames. The last two functions may be
convenient as arguments to <
, <<
commands.
Note that any variables and functions that are not documented in this manpages (or in perldebug) are considered for internal use only, and as such are subject to change without notice.
The frame
option can be used to control the output of frame
information. For example, contrast this expression trace:
- $ perl -de 42
- Stack dump during die enabled outside of evals.
- Loading DB routines from perl5db.pl patch level 0.94
- Emacs support available.
- Enter h or 'h h' for help.
- main::(-e:1): 0
- DB<1> sub foo { 14 }
- DB<2> sub bar { 3 }
- DB<3> t print foo() * bar()
- main::((eval 172):3): print foo() + bar();
- main::foo((eval 168):2):
- main::bar((eval 170):2):
- 42
with this one, once the o
ption frame=2
has been set:
- DB<4> o f=2
- frame = '2'
- DB<5> t print foo() * bar()
- 3: foo() * bar()
- entering main::foo
- 2: sub foo { 14 };
- exited main::foo
- entering main::bar
- 2: sub bar { 3 };
- exited main::bar
- 42
By way of demonstration, we present below a laborious listing
resulting from setting your PERLDB_OPTS
environment variable to
the value f=n N
, and running perl -d -V from the command line.
Examples using various values of n
are shown to give you a feel
for the difference between settings. Long though it may be, this
is not a complete listing, but only excerpts.
- entering main::BEGIN
- entering Config::BEGIN
- Package lib/Exporter.pm.
- Package lib/Carp.pm.
- Package lib/Config.pm.
- entering Config::TIEHASH
- entering Exporter::import
- entering Exporter::export
- entering Config::myconfig
- entering Config::FETCH
- entering Config::FETCH
- entering Config::FETCH
- entering Config::FETCH
- entering main::BEGIN
- entering Config::BEGIN
- Package lib/Exporter.pm.
- Package lib/Carp.pm.
- exited Config::BEGIN
- Package lib/Config.pm.
- entering Config::TIEHASH
- exited Config::TIEHASH
- entering Exporter::import
- entering Exporter::export
- exited Exporter::export
- exited Exporter::import
- exited main::BEGIN
- entering Config::myconfig
- entering Config::FETCH
- exited Config::FETCH
- entering Config::FETCH
- exited Config::FETCH
- entering Config::FETCH
- in $=main::BEGIN() from /dev/null:0
- in $=Config::BEGIN() from lib/Config.pm:2
- Package lib/Exporter.pm.
- Package lib/Carp.pm.
- Package lib/Config.pm.
- in $=Config::TIEHASH('Config') from lib/Config.pm:644
- in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li
- in @=Config::myconfig() from /dev/null:0
- in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574
- in $=main::BEGIN() from /dev/null:0
- in $=Config::BEGIN() from lib/Config.pm:2
- Package lib/Exporter.pm.
- Package lib/Carp.pm.
- out $=Config::BEGIN() from lib/Config.pm:0
- Package lib/Config.pm.
- in $=Config::TIEHASH('Config') from lib/Config.pm:644
- out $=Config::TIEHASH('Config') from lib/Config.pm:644
- in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
- out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
- out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- out $=main::BEGIN() from /dev/null:0
- in @=Config::myconfig() from /dev/null:0
- in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
- out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
- out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
- out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
- in $=main::BEGIN() from /dev/null:0
- in $=Config::BEGIN() from lib/Config.pm:2
- Package lib/Exporter.pm.
- Package lib/Carp.pm.
- out $=Config::BEGIN() from lib/Config.pm:0
- Package lib/Config.pm.
- in $=Config::TIEHASH('Config') from lib/Config.pm:644
- out $=Config::TIEHASH('Config') from lib/Config.pm:644
- in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
- out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
- out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- out $=main::BEGIN() from /dev/null:0
- in @=Config::myconfig() from /dev/null:0
- in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
- out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
- in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
- out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
- in $=CODE(0x15eca4)() from /dev/null:0
- in $=CODE(0x182528)() from lib/Config.pm:2
- Package lib/Exporter.pm.
- out $=CODE(0x182528)() from lib/Config.pm:0
- scalar context return from CODE(0x182528): undef
- Package lib/Config.pm.
- in $=Config::TIEHASH('Config') from lib/Config.pm:628
- out $=Config::TIEHASH('Config') from lib/Config.pm:628
- scalar context return from Config::TIEHASH: empty hash
- in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
- out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
- scalar context return from Exporter::export: ''
- out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- scalar context return from Exporter::import: ''
In all cases shown above, the line indentation shows the call tree.
If bit 2 of frame
is set, a line is printed on exit from a
subroutine as well. If bit 4 is set, the arguments are printed
along with the caller info. If bit 8 is set, the arguments are
printed even if they are tied or references. If bit 16 is set, the
return value is printed, too.
When a package is compiled, a line like this
- Package lib/Carp.pm.
is printed with proper indentation.
There are two ways to enable debugging output for regular expressions.
If your perl is compiled with -DDEBUGGING
, you may use the
-Dr flag on the command line.
Otherwise, one can use re 'debug'
, which has effects at
compile time and run time. Since Perl 5.9.5, this pragma is lexically
scoped.
The debugging output at compile time looks like this:
- Compiling REx '[bc]d(ef*g)+h[ij]k$'
- size 45 Got 364 bytes for offset annotations.
- first at 1
- rarest char g at 0
- rarest char d at 0
- 1: ANYOF[bc](12)
- 12: EXACT <d>(14)
- 14: CURLYX[0] {1,32767}(28)
- 16: OPEN1(18)
- 18: EXACT <e>(20)
- 20: STAR(23)
- 21: EXACT <f>(0)
- 23: EXACT <g>(25)
- 25: CLOSE1(27)
- 27: WHILEM[1/1](0)
- 28: NOTHING(29)
- 29: EXACT <h>(31)
- 31: ANYOF[ij](42)
- 42: EXACT <k>(44)
- 44: EOL(45)
- 45: END(0)
- anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating)
- stclass 'ANYOF[bc]' minlen 7
- Offsets: [45]
- 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1]
- 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0]
- 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0]
- 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0]
- Omitting $` $& $' support.
The first line shows the pre-compiled form of the regex. The second
shows the size of the compiled form (in arbitrary units, usually
4-byte words) and the total number of bytes allocated for the
offset/length table, usually 4+size
*8. The next line shows the
label id of the first node that does a match.
The
- anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating)
- stclass 'ANYOF[bc]' minlen 7
line (split into two lines above) contains optimizer
information. In the example shown, the optimizer found that the match
should contain a substring de
at offset 1, plus substring gh
at some offset between 3 and infinity. Moreover, when checking for
these substrings (to abandon impossible matches quickly), Perl will check
for the substring gh
before checking for the substring de
. The
optimizer may also use the knowledge that the match starts (at the
first
id) with a character class, and no string
shorter than 7 characters can possibly match.
The fields of interest which may appear in this line are
anchored
STRING at
POS
floating
STRING at
POS1..POS2
See above.
matching floating/anchored
Which substring to check first.
minlen
The minimal length of the match.
stclass
TYPE
Type of first matching node.
noscan
Don't scan for the found substrings.
isall
Means that the optimizer information is all that the regular expression contains, and thus one does not need to enter the regex engine at all.
GPOS
Set if the pattern contains \G
.
plus
Set if the pattern starts with a repeated char (as in x+y).
implicit
Set if the pattern starts with .*.
with eval
Set if the pattern contain eval-groups, such as (?{ code }) and
(??{ code })
.
anchored(TYPE)
If the pattern may match only at a handful of places, with TYPE
being BOL
, MBOL
, or GPOS
. See the table below.
If a substring is known to match at end-of-line only, it may be
followed by $
, as in floating 'k'$
.
The optimizer-specific information is used to avoid entering (a slow) regex
engine on strings that will not definitely match. If the isall
flag
is set, a call to the regex engine may be avoided even when the optimizer
found an appropriate place for the match.
Above the optimizer section is the list of nodes of the compiled form of the regex. Each line has format
id: TYPE OPTIONAL-INFO (next-id)
Here are the possible types, with short descriptions:
- # TYPE arg-description [num-args] [longjump-len] DESCRIPTION
- # Exit points
- END no End of program.
- SUCCEED no Return from a subroutine, basically.
- # Anchors:
- BOL no Match "" at beginning of line.
- MBOL no Same, assuming multiline.
- SBOL no Same, assuming singleline.
- EOS no Match "" at end of string.
- EOL no Match "" at end of line.
- MEOL no Same, assuming multiline.
- SEOL no Same, assuming singleline.
- BOUND no Match "" at any word boundary using
- native charset semantics for non-utf8
- BOUNDL no Match "" at any locale word boundary
- BOUNDU no Match "" at any word boundary using
- Unicode semantics
- BOUNDA no Match "" at any word boundary using ASCII
- semantics
- NBOUND no Match "" at any word non-boundary using
- native charset semantics for non-utf8
- NBOUNDL no Match "" at any locale word non-boundary
- NBOUNDU no Match "" at any word non-boundary using
- Unicode semantics
- NBOUNDA no Match "" at any word non-boundary using
- ASCII semantics
- GPOS no Matches where last m//g left off.
- # [Special] alternatives:
- REG_ANY no Match any one character (except newline).
- SANY no Match any one character.
- CANY no Match any one byte.
- ANYOF sv Match character in (or not in) this
- class, single char match only
- ANYOF_WARN_SUPER sv Match character in (or not in) this
- class, warn (if enabled) upon matching a
- char above Unicode max;
- ANYOF_SYNTHETIC sv Synthetic start class
- POSIXD none Some [[:class:]] under /d; the FLAGS
- field gives which one
- POSIXL none Some [[:class:]] under /l; the FLAGS
- field gives which one
- POSIXU none Some [[:class:]] under /u; the FLAGS
- field gives which one
- POSIXA none Some [[:class:]] under /a; the FLAGS
- field gives which one
- NPOSIXD none complement of POSIXD, [[:^class:]]
- NPOSIXL none complement of POSIXL, [[:^class:]]
- NPOSIXU none complement of POSIXU, [[:^class:]]
- NPOSIXA none complement of POSIXA, [[:^class:]]
- CLUMP no Match any extended grapheme cluster
- sequence
- # Alternation
- # BRANCH The set of branches constituting a single choice are
- # hooked together with their "next" pointers, since
- # precedence prevents anything being concatenated to
- # any individual branch. The "next" pointer of the last
- # BRANCH in a choice points to the thing following the
- # whole choice. This is also where the final "next"
- # pointer of each individual branch points; each branch
- # starts with the operand node of a BRANCH node.
- #
- BRANCH node Match this alternative, or the next...
- # Back pointer
- # BACK Normal "next" pointers all implicitly point forward;
- # BACK exists to make loop structures possible.
- # not used
- BACK no Match "", "next" ptr points backward.
- # Literals
- EXACT str Match this string (preceded by length).
- EXACTF str Match this non-UTF-8 string (not
- guaranteed to be folded) using /id rules
- (w/len).
- EXACTFL str Match this string (not guaranteed to be
- folded) using /il rules (w/len).
- EXACTFU str Match this string (folded iff in UTF-8,
- length in folding doesn't change if not
- in UTF-8) using /iu rules (w/len).
- EXACTFA str Match this string (not guaranteed to be
- folded) using /iaa rules (w/len).
- EXACTFU_SS str Match this string (folded iff in UTF-8,
- length in folding may change even if not
- in UTF-8) using /iu rules (w/len).
- EXACTFU_TRICKYFOLD str Match this folded UTF-8 string using /iu
- rules
- # Do nothing types
- NOTHING no Match empty string.
- # A variant of above which delimits a group, thus stops optimizations
- TAIL no Match empty string. Can jump here from
- outside.
- # Loops
- # STAR,PLUS '?', and complex '*' and '+', are implemented as
- # circular BRANCH structures using BACK. Simple cases
- # (one character per match) are implemented with STAR
- # and PLUS for speed and to minimize recursive plunges.
- #
- STAR node Match this (simple) thing 0 or more
- times.
- PLUS node Match this (simple) thing 1 or more
- times.
- CURLY sv 2 Match this simple thing {n,m} times.
- CURLYN no 2 Capture next-after-this simple thing
- CURLYM no 2 Capture this medium-complex thing {n,m}
- times.
- CURLYX sv 2 Match this complex thing {n,m} times.
- # This terminator creates a loop structure for CURLYX
- WHILEM no Do curly processing and see if rest
- matches.
- # Buffer related
- # OPEN,CLOSE,GROUPP ...are numbered at compile time.
- OPEN num 1 Mark this point in input as start of #n.
- CLOSE num 1 Analogous to OPEN.
- REF num 1 Match some already matched string
- REFF num 1 Match already matched string, folded
- using native charset semantics for non-
- utf8
- REFFL num 1 Match already matched string, folded in
- loc.
- REFFU num 1 Match already matched string, folded
- using unicode semantics for non-utf8
- REFFA num 1 Match already matched string, folded
- using unicode semantics for non-utf8, no
- mixing ASCII, non-ASCII
- # Named references. Code in regcomp.c assumes that these all are after
- # the numbered references
- NREF no-sv 1 Match some already matched string
- NREFF no-sv 1 Match already matched string, folded
- using native charset semantics for non-
- utf8
- NREFFL no-sv 1 Match already matched string, folded in
- loc.
- NREFFU num 1 Match already matched string, folded
- using unicode semantics for non-utf8
- NREFFA num 1 Match already matched string, folded
- using unicode semantics for non-utf8, no
- mixing ASCII, non-ASCII
- IFMATCH off 1 2 Succeeds if the following matches.
- UNLESSM off 1 2 Fails if the following matches.
- SUSPEND off 1 1 "Independent" sub-RE.
- IFTHEN off 1 1 Switch, should be preceded by switcher.
- GROUPP num 1 Whether the group matched.
- # Support for long RE
- LONGJMP off 1 1 Jump far away.
- BRANCHJ off 1 1 BRANCH with long offset.
- # The heavy worker
- EVAL evl 1 Execute some Perl code.
- # Modifiers
- MINMOD no Next operator is not greedy.
- LOGICAL no Next opcode should set the flag only.
- # This is not used yet
- RENUM off 1 1 Group with independently numbered parens.
- # Trie Related
- # Behave the same as A|LIST|OF|WORDS would. The '..C' variants
- # have inline charclass data (ascii only), the 'C' store it in the
- # structure.
- TRIE trie 1 Match many EXACT(F[ALU]?)? at once.
- flags==type
- TRIEC trie Same as TRIE, but with embedded charclass
- charclass data
- AHOCORASICK trie 1 Aho Corasick stclass. flags==type
- AHOCORASICKC trie Same as AHOCORASICK, but with embedded
- charclass charclass data
- # Regex Subroutines
- GOSUB num/ofs 2L recurse to paren arg1 at (signed) ofs
- arg2
- GOSTART no recurse to start of pattern
- # Special conditionals
- NGROUPP no-sv 1 Whether the group matched.
- INSUBP num 1 Whether we are in a specific recurse.
- DEFINEP none 1 Never execute directly.
- # Backtracking Verbs
- ENDLIKE none Used only for the type field of verbs
- OPFAIL none Same as (?!)
- ACCEPT parno 1 Accepts the current matched string.
- # Verbs With Arguments
- VERB no-sv 1 Used only for the type field of verbs
- PRUNE no-sv 1 Pattern fails at this startpoint if no-
- backtracking through this
- MARKPOINT no-sv 1 Push the current location for rollback by
- cut.
- SKIP no-sv 1 On failure skip forward (to the mark)
- before retrying
- COMMIT no-sv 1 Pattern fails outright if backtracking
- through this
- CUTGROUP no-sv 1 On failure go to the next alternation in
- the group
- # Control what to keep in $&.
- KEEPS no $& begins here.
- # New charclass like patterns
- LNBREAK none generic newline pattern
- # SPECIAL REGOPS
- # This is not really a node, but an optimized away piece of a "long"
- # node. To simplify debugging output, we mark it as if it were a node
- OPTIMIZED off Placeholder for dump.
- # Special opcode with the property that no opcode in a compiled program
- # will ever be of this type. Thus it can be used as a flag value that
- # no other opcode has been seen. END is used similarly, in that an END
- # node cant be optimized. So END implies "unoptimizable" and PSEUDO
- # mean "not seen anything to optimize yet".
- PSEUDO off Pseudo opcode for internal use.
Following the optimizer information is a dump of the offset/length table, here split across several lines:
- Offsets: [45]
- 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1]
- 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0]
- 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0]
- 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0]
The first line here indicates that the offset/length table contains 45
entries. Each entry is a pair of integers, denoted by offset[length]
.
Entries are numbered starting with 1, so entry #1 here is 1[4]
and
entry #12 is 5[1]
. 1[4]
indicates that the node labeled 1:
(the 1: ANYOF[bc]) begins at character position 1 in the
pre-compiled form of the regex, and has a length of 4 characters.
5[1]
in position 12
indicates that the node labeled 12:
(the 12: EXACT <d>) begins at character position 5 in the
pre-compiled form of the regex, and has a length of 1 character.
12[1]
in position 14
indicates that the node labeled 14:
(the 14: CURLYX[0] {1,32767}) begins at character position 12 in the
pre-compiled form of the regex, and has a length of 1 character---that
is, it corresponds to the +
symbol in the precompiled regex.
0[0]
items indicate that there is no corresponding node.
First of all, when doing a match, one may get no run-time output even if debugging is enabled. This means that the regex engine was never entered and that all of the job was therefore done by the optimizer.
If the regex engine was entered, the output may look like this:
- Matching '[bc]d(ef*g)+h[ij]k$' against 'abcdefg__gh__'
- Setting an EVAL scope, savestack=3
- 2 <ab> <cdefg__gh_> | 1: ANYOF
- 3 <abc> <defg__gh_> | 11: EXACT <d>
- 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767}
- 4 <abcd> <efg__gh_> | 26: WHILEM
- 0 out of 1..32767 cc=effff31c
- 4 <abcd> <efg__gh_> | 15: OPEN1
- 4 <abcd> <efg__gh_> | 17: EXACT <e>
- 5 <abcde> <fg__gh_> | 19: STAR
- EXACT <f> can match 1 times out of 32767...
- Setting an EVAL scope, savestack=3
- 6 <bcdef> <g__gh__> | 22: EXACT <g>
- 7 <bcdefg> <__gh__> | 24: CLOSE1
- 7 <bcdefg> <__gh__> | 26: WHILEM
- 1 out of 1..32767 cc=effff31c
- Setting an EVAL scope, savestack=12
- 7 <bcdefg> <__gh__> | 15: OPEN1
- 7 <bcdefg> <__gh__> | 17: EXACT <e>
- restoring \1 to 4(4)..7
- failed, try continuation...
- 7 <bcdefg> <__gh__> | 27: NOTHING
- 7 <bcdefg> <__gh__> | 28: EXACT <h>
- failed...
- failed...
The most significant information in the output is about the particular node of the compiled regex that is currently being tested against the target string. The format of these lines is
STRING-OFFSET <PRE-STRING> <POST-STRING> |ID: TYPE
The TYPE info is indented with respect to the backtracking level. Other incidental information appears interspersed within.
Perl is a profligate wastrel when it comes to memory use. There is a saying that to estimate memory usage of Perl, assume a reasonable algorithm for memory allocation, multiply that estimate by 10, and while you still may miss the mark, at least you won't be quite so astonished. This is not absolutely true, but may provide a good grasp of what happens.
Assume that an integer cannot take less than 20 bytes of memory, a float cannot take less than 24 bytes, a string cannot take less than 32 bytes (all these examples assume 32-bit architectures, the result are quite a bit worse on 64-bit architectures). If a variable is accessed in two of three different ways (which require an integer, a float, or a string), the memory footprint may increase yet another 20 bytes. A sloppy malloc(3) implementation can inflate these numbers dramatically.
On the opposite end of the scale, a declaration like
- sub foo;
may take up to 500 bytes of memory, depending on which release of Perl you're running.
Anecdotal estimates of source-to-compiled code bloat suggest an eightfold increase. This means that the compiled form of reasonable (normally commented, properly indented etc.) code will take about eight times more space in memory than the code took on disk.
The -DL command-line switch is obsolete since circa Perl 5.6.0
(it was available only if Perl was built with -DDEBUGGING
).
The switch was used to track Perl's memory allocations and possible
memory leaks. These days the use of malloc debugging tools like
Purify or valgrind is suggested instead. See also
PERL_MEM_LOG in perlhacktips.
One way to find out how much memory is being used by Perl data structures is to install the Devel::Size module from CPAN: it gives you the minimum number of bytes required to store a particular data structure. Please be mindful of the difference between the size() and total_size().
If Perl has been compiled using Perl's malloc you can analyze Perl memory usage by setting $ENV{PERL_DEBUG_MSTATS}.
$ENV{PERL_DEBUG_MSTATS}
If your perl is using Perl's malloc() and was compiled with the
necessary switches (this is the default), then it will print memory
usage statistics after compiling your code when $ENV{PERL_DEBUG_MSTATS}
> 1
, and before termination of the program when $ENV{PERL_DEBUG_MSTATS} >= 1
. The report format is similar to
the following example:
- $ PERL_DEBUG_MSTATS=2 perl -e "require Carp"
- Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)
- 14216 free: 130 117 28 7 9 0 2 2 1 0 0
- 437 61 36 0 5
- 60924 used: 125 137 161 55 7 8 6 16 2 0 1
- 74 109 304 84 20
- Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048.
- Memory allocation statistics after execution: (buckets 4(4)..8188(8192)
- 30888 free: 245 78 85 13 6 2 1 3 2 0 1
- 315 162 39 42 11
- 175816 used: 265 176 1112 111 26 22 11 27 2 1 1
- 196 178 1066 798 39
- Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144.
It is possible to ask for such a statistic at arbitrary points in your execution using the mstat() function out of the standard Devel::Peek module.
Here is some explanation of that format:
buckets SMALLEST(APPROX)..GREATEST(APPROX)
Perl's malloc() uses bucketed allocations. Every request is rounded up to the closest bucket size available, and a bucket is taken from the pool of buckets of that size.
The line above describes the limits of buckets currently in use. Each bucket has two sizes: memory footprint and the maximal size of user data that can fit into this bucket. Suppose in the above example that the smallest bucket were size 4. The biggest bucket would have usable size 8188, and the memory footprint would be 8192.
In a Perl built for debugging, some buckets may have negative usable
size. This means that these buckets cannot (and will not) be used.
For larger buckets, the memory footprint may be one page greater
than a power of 2. If so, the corresponding power of two is
printed in the APPROX
field above.
The 1 or 2 rows of numbers following that correspond to the number
of buckets of each size between SMALLEST
and GREATEST
. In
the first row, the sizes (memory footprints) of buckets are powers
of two--or possibly one page greater. In the second row, if present,
the memory footprints of the buckets are between the memory footprints
of two buckets "above".
For example, suppose under the previous example, the memory footprints were
- free: 8 16 32 64 128 256 512 1024 2048 4096 8192
- 4 12 24 48 80
With a non-DEBUGGING
perl, the buckets starting from 128
have
a 4-byte overhead, and thus an 8192-long bucket may take up to
8188-byte allocations.
Total sbrk(): SBRKed/SBRKs:CONTINUOUS
The first two fields give the total amount of memory perl sbrk(2)ed (ess-broken? :-) and number of sbrk(2)s used. The third number is what perl thinks about continuity of returned chunks. So long as this number is positive, malloc() will assume that it is probable that sbrk(2) will provide continuous memory.
Memory allocated by external libraries is not counted.
pad: 0
The amount of sbrk(2)ed memory needed to keep buckets aligned.
heads: 2192
Although memory overhead of bigger buckets is kept inside the bucket, for smaller buckets, it is kept in separate areas. This field gives the total size of these areas.
chain: 0
malloc() may want to subdivide a bigger bucket into smaller buckets. If only a part of the deceased bucket is left unsubdivided, the rest is kept as an element of a linked list. This field gives the total size of these chunks.
tail: 6144
To minimize the number of sbrk(2)s, malloc() asks for more memory. This field gives the size of the yet unused part, which is sbrk(2)ed, but never touched.
perldebug, perlguts, perlrun re, and Devel::DProf.
perldebtut - Perl debugging tutorial
A (very) lightweight introduction in the use of the perl debugger, and a pointer to existing, deeper sources of information on the subject of debugging perl programs.
There's an extraordinary number of people out there who don't appear to know anything about using the perl debugger, though they use the language every day. This is for them.
First of all, there's a few things you can do to make your life a lot more straightforward when it comes to debugging perl programs, without using the debugger at all. To demonstrate, here's a simple script, named "hello", with a problem:
While this compiles and runs happily, it probably won't do what's expected, namely it doesn't print "Hello World\n" at all; It will on the other hand do exactly what it was told to do, computers being a bit that way inclined. That is, it will print out a newline character, and you'll get what looks like a blank line. It looks like there's 2 variables when (because of the typo) there's really 3:
- $var1 = 'Hello World';
- $varl = undef;
- $var2 = "\n";
To catch this kind of problem, we can force each variable to be declared before use by pulling in the strict module, by putting 'use strict;' after the first line of the script.
Now when you run it, perl complains about the 3 undeclared variables and we get four error messages because one variable is referenced twice:
- Global symbol "$var1" requires explicit package name at ./t1 line 4.
- Global symbol "$var2" requires explicit package name at ./t1 line 5.
- Global symbol "$varl" requires explicit package name at ./t1 line 5.
- Global symbol "$var2" requires explicit package name at ./t1 line 7.
- Execution of ./hello aborted due to compilation errors.
Luvverly! and to fix this we declare all variables explicitly and now our script looks like this:
We then do (always a good idea) a syntax check before we try to run it again:
- > perl -c hello
- hello syntax OK
And now when we run it, we get "\n" still, but at least we know why. Just getting this script to compile has exposed the '$varl' (with the letter 'l') variable, and simply changing $varl to $var1 solves the problem.
Ok, but how about when you want to really see your data, what's in that dynamic variable, just before using it?
Looks OK, after it's been through the syntax check (perl -c scriptname), we run it and all we get is a blank line again! Hmmmm.
One common debugging approach here, would be to liberally sprinkle a few print statements, to add a check just before we print out our data, and another just after:
And try again:
- > perl data
- All OK
- done: ''
After much staring at the same piece of code and not seeing the wood for the trees for some time, we get a cup of coffee and try another approach. That is, we bring in the cavalry by giving perl the '-d' switch on the command line:
- > perl -d data
- Default die handler restored.
- Loading DB routines from perl5db.pl version 1.07
- Editor support available.
- Enter h or `h h' for help, or `man perldebug' for more help.
- main::(./data:4): my $key = 'welcome';
Now, what we've done here is to launch the built-in perl debugger on our script. It's stopped at the first line of executable code and is waiting for input.
Before we go any further, you'll want to know how to quit the debugger: use just the letter 'q', not the words 'quit' or 'exit':
- DB<1> q
- >
That's it, you're back on home turf again.
Fire the debugger up again on your script and we'll look at the help menu. There's a couple of ways of calling help: a simple 'h' will get the summary help list, '|h' (pipe-h) will pipe the help through your pager (which is (probably 'more' or 'less'), and finally, 'h h' (h-space-h) will give you the entire help screen. Here is the summary page:
D1h
- List/search source lines: Control script execution:
- l [ln|sub] List source code T Stack trace
- - or . List previous/current line s [expr] Single step [in expr]
- v [line] View around line n [expr] Next, steps over subs
- f filename View source in file <CR/Enter> Repeat last n or s
- /pattern/ ?patt? Search forw/backw r Return from subroutine
- M Show module versions c [ln|sub] Continue until position
- Debugger controls: L List break/watch/actions
- o [...] Set debugger options t [expr] Toggle trace [trace expr]
- <[<]|{[{]|>[>] [cmd] Do pre/post-prompt b [ln|event|sub] [cnd] Set breakpoint
- ! [N|pat] Redo a previous command B ln|* Delete a/all breakpoints
- H [-num] Display last num commands a [ln] cmd Do cmd before line
- = [a val] Define/list an alias A ln|* Delete a/all actions
- h [db_cmd] Get help on command w expr Add a watch expression
- h h Complete help page W expr|* Delete a/all watch exprs
- |[|]db_cmd Send output to pager ![!] syscmd Run cmd in a subprocess
- q or ^D Quit R Attempt a restart
- Data Examination: expr Execute perl code, also see: s,n,t expr
- x|m expr Evals expr in list context, dumps the result or lists methods.
- p expr Print expression (uses script's current package).
- S [[!]pat] List subroutine names [not] matching pattern
- V [Pk [Vars]] List Variables in Package. Vars can be ~pattern or !pattern.
- X [Vars] Same as "V current_package [Vars]".
- y [n [Vars]] List lexicals in higher scope <n>. Vars same as V.
- For more help, type h cmd_letter, or run man perldebug for all docs.
More confusing options than you can shake a big stick at! It's not as bad as it looks and it's very useful to know more about all of it, and fun too!
There's a couple of useful ones to know about straight away. You wouldn't think we're using any libraries at all at the moment, but 'M' will show which modules are currently loaded, and their version number, while 'm' will show the methods, and 'S' shows all subroutines (by pattern) as shown below. 'V' and 'X' show variables in the program by package scope and can be constrained by pattern.
- DB<2>S str
- dumpvar::stringify
- strict::bits
- strict::import
- strict::unimport
Using 'X' and cousins requires you not to use the type identifiers ($@%), just the 'name':
- DM<3>X ~err
- FileHandle(stderr) => fileno(2)
Remember we're in our tiny program with a problem, we should have a look at where we are, and what our data looks like. First of all let's view some code at our present position (the first line of code in this case), via 'v':
- DB<4> v
- 1 #!/usr/bin/perl
- 2: use strict;
- 3
- 4==> my $key = 'welcome';
- 5: my %data = (
- 6 'this' => qw(that),
- 7 'tom' => qw(and jerry),
- 8 'welcome' => q(Hello World),
- 9 'zip' => q(welcome),
- 10 );
At line number 4 is a helpful pointer, that tells you where you are now. To see more code, type 'v' again:
- DB<4> v
- 8 'welcome' => q(Hello World),
- 9 'zip' => q(welcome),
- 10 );
- 11: my @data = keys %data;
- 12: print "All OK\n" if grep($key, keys %data);
- 13: print "$data{$key}\n";
- 14: print "done: '$data{$key}'\n";
- 15: exit;
And if you wanted to list line 5 again, type 'l 5', (note the space):
- DB<4> l 5
- 5: my %data = (
In this case, there's not much to see, but of course normally there's pages of stuff to wade through, and 'l' can be very useful. To reset your view to the line we're about to execute, type a lone period '.':
- DB<5> .
- main::(./data_a:4): my $key = 'welcome';
The line shown is the one that is about to be executed next, it hasn't happened yet. So while we can print a variable with the letter 'p', at this point all we'd get is an empty (undefined) value back. What we need to do is to step through the next executable statement with an 's':
- DB<6> s
- main::(./data_a:5): my %data = (
- main::(./data_a:6): 'this' => qw(that),
- main::(./data_a:7): 'tom' => qw(and jerry),
- main::(./data_a:8): 'welcome' => q(Hello World),
- main::(./data_a:9): 'zip' => q(welcome),
- main::(./data_a:10): );
Now we can have a look at that first ($key) variable:
- DB<7> p $key
- welcome
line 13 is where the action is, so let's continue down to there via the letter 'c', which by the way, inserts a 'one-time-only' breakpoint at the given line or sub routine:
- DB<8> c 13
- All OK
- main::(./data_a:13): print "$data{$key}\n";
We've gone past our check (where 'All OK' was printed) and have stopped just before the meat of our task. We could try to print out a couple of variables to see what is happening:
- DB<9> p $data{$key}
Not much in there, lets have a look at our hash:
- DB<10> p %data
- Hello Worldziptomandwelcomejerrywelcomethisthat
- DB<11> p keys %data
- Hello Worldtomwelcomejerrythis
Well, this isn't very easy to read, and using the helpful manual (h h), the 'x' command looks promising:
- DB<12> x %data
- 0 'Hello World'
- 1 'zip'
- 2 'tom'
- 3 'and'
- 4 'welcome'
- 5 undef
- 6 'jerry'
- 7 'welcome'
- 8 'this'
- 9 'that'
That's not much help, a couple of welcomes in there, but no indication of which are keys, and which are values, it's just a listed array dump and, in this case, not particularly helpful. The trick here, is to use a reference to the data structure:
- DB<13> x \%data
- 0 HASH(0x8194bc4)
- 'Hello World' => 'zip'
- 'jerry' => 'welcome'
- 'this' => 'that'
- 'tom' => 'and'
- 'welcome' => undef
The reference is truly dumped and we can finally see what we're dealing with. Our quoting was perfectly valid but wrong for our purposes, with 'and jerry' being treated as 2 separate words rather than a phrase, thus throwing the evenly paired hash structure out of alignment.
The '-w' switch would have told us about this, had we used it at the start, and saved us a lot of trouble:
- > perl -w data
- Odd number of elements in hash assignment at ./data line 5.
We fix our quoting: 'tom' => q(and jerry), and run it again, this time we get our expected output:
- > perl -w data
- Hello World
While we're here, take a closer look at the 'x' command, it's really useful and will merrily dump out nested references, complete objects, partial objects - just about whatever you throw at it:
Let's make a quick object and x-plode it, first we'll start the debugger: it wants some form of input from STDIN, so we give it something non-committal, a zero:
- > perl -de 0
- Default die handler restored.
- Loading DB routines from perl5db.pl version 1.07
- Editor support available.
- Enter h or `h h' for help, or `man perldebug' for more help.
- main::(-e:1): 0
Now build an on-the-fly object over a couple of lines (note the backslash):
- DB<1> $obj = bless({'unique_id'=>'123', 'attr'=> \
- cont: {'col' => 'black', 'things' => [qw(this that etc)]}}, 'MY_class')
And let's have a look at it:
- DB<2> x $obj
- 0 MY_class=HASH(0x828ad98)
- 'attr' => HASH(0x828ad68)
- 'col' => 'black'
- 'things' => ARRAY(0x828abb8)
- 0 'this'
- 1 'that'
- 2 'etc'
- 'unique_id' => 123
- DB<3>
Useful, huh? You can eval nearly anything in there, and experiment with bits of code or regexes until the cows come home:
- DB<3> @data = qw(this that the other atheism leather theory scythe)
- DB<4> p 'saw -> '.($cnt += map { print "\t:\t$_\n" } grep(/the/, sort @data))
- atheism
- leather
- other
- scythe
- the
- theory
- saw -> 6
If you want to see the command History, type an 'H':
- DB<5> H
- 4: p 'saw -> '.($cnt += map { print "\t:\t$_\n" } grep(/the/, sort @data))
- 3: @data = qw(this that the other atheism leather theory scythe)
- 2: x $obj
- 1: $obj = bless({'unique_id'=>'123', 'attr'=>
- {'col' => 'black', 'things' => [qw(this that etc)]}}, 'MY_class')
- DB<5>
And if you want to repeat any previous command, use the exclamation: '!':
- DB<5> !4
- p 'saw -> '.($cnt += map { print "$_\n" } grep(/the/, sort @data))
- atheism
- leather
- other
- scythe
- the
- theory
- saw -> 12
For more on references see perlref and perlreftut
Here's a simple program which converts between Celsius and Fahrenheit, it too has a problem:
- #!/usr/bin/perl -w
- use strict;
- my $arg = $ARGV[0] || '-c20';
- if ($arg =~ /^\-(c|f)((\-|\+)*\d+(\.\d+)*)$/) {
- my ($deg, $num) = ($1, $2);
- my ($in, $out) = ($num, $num);
- if ($deg eq 'c') {
- $deg = 'f';
- $out = &c2f($num);
- } else {
- $deg = 'c';
- $out = &f2c($num);
- }
- $out = sprintf('%0.2f', $out);
- $out =~ s/^((\-|\+)*\d+)\.0+$/$1/;
- print "$out $deg\n";
- } else {
- print "Usage: $0 -[c|f] num\n";
- }
- exit;
- sub f2c {
- my $f = shift;
- my $c = 5 * $f - 32 / 9;
- return $c;
- }
- sub c2f {
- my $c = shift;
- my $f = 9 * $c / 5 + 32;
- return $f;
- }
For some reason, the Fahrenheit to Celsius conversion fails to return the expected output. This is what it does:
- > temp -c0.72
- 33.30 f
- > temp -f33.3
- 162.94 c
Not very consistent! We'll set a breakpoint in the code manually and run it under the debugger to see what's going on. A breakpoint is a flag, to which the debugger will run without interruption, when it reaches the breakpoint, it will stop execution and offer a prompt for further interaction. In normal use, these debugger commands are completely ignored, and they are safe - if a little messy, to leave in production code.
- my ($in, $out) = ($num, $num);
- $DB::single=2; # insert at line 9!
- if ($deg eq 'c')
- ...
- > perl -d temp -f33.3
- Default die handler restored.
- Loading DB routines from perl5db.pl version 1.07
- Editor support available.
- Enter h or `h h' for help, or `man perldebug' for more help.
- main::(temp:4): my $arg = $ARGV[0] || '-c100';
We'll simply continue down to our pre-set breakpoint with a 'c':
- DB<1> c
- main::(temp:10): if ($deg eq 'c') {
Followed by a view command to see where we are:
- DB<1> v
- 7: my ($deg, $num) = ($1, $2);
- 8: my ($in, $out) = ($num, $num);
- 9: $DB::single=2;
- 10==> if ($deg eq 'c') {
- 11: $deg = 'f';
- 12: $out = &c2f($num);
- 13 } else {
- 14: $deg = 'c';
- 15: $out = &f2c($num);
- 16 }
And a print to show what values we're currently using:
- DB<1> p $deg, $num
- f33.3
We can put another break point on any line beginning with a colon, we'll use line 17 as that's just as we come out of the subroutine, and we'd like to pause there later on:
- DB<2> b 17
There's no feedback from this, but you can see what breakpoints are set by using the list 'L' command:
- DB<3> L
- temp:
- 17: print "$out $deg\n";
- break if (1)
Note that to delete a breakpoint you use 'B'.
Now we'll continue down into our subroutine, this time rather than by line number, we'll use the subroutine name, followed by the now familiar 'v':
- DB<3> c f2c
- main::f2c(temp:30): my $f = shift;
- DB<4> v
- 24: exit;
- 25
- 26 sub f2c {
- 27==> my $f = shift;
- 28: my $c = 5 * $f - 32 / 9;
- 29: return $c;
- 30 }
- 31
- 32 sub c2f {
- 33: my $c = shift;
Note that if there was a subroutine call between us and line 29, and we wanted to single-step through it, we could use the 's' command, and to step over it we would use 'n' which would execute the sub, but not descend into it for inspection. In this case though, we simply continue down to line 29:
- DB<4> c 29
- main::f2c(temp:29): return $c;
And have a look at the return value:
- DB<5> p $c
- 162.944444444444
This is not the right answer at all, but the sum looks correct. I wonder if it's anything to do with operator precedence? We'll try a couple of other possibilities with our sum:
- DB<6> p (5 * $f - 32 / 9)
- 162.944444444444
- DB<7> p 5 * $f - (32 / 9)
- 162.944444444444
- DB<8> p (5 * $f) - 32 / 9
- 162.944444444444
- DB<9> p 5 * ($f - 32) / 9
- 0.722222222222221
:-) that's more like it! Ok, now we can set our return variable and we'll return out of the sub with an 'r':
- DB<10> $c = 5 * ($f - 32) / 9
- DB<11> r
- scalar context return from main::f2c: 0.722222222222221
Looks good, let's just continue off the end of the script:
- DB<12> c
- 0.72 c
- Debugged program terminated. Use q to quit or R to restart,
- use O inhibit_exit to avoid stopping after program termination,
- h q, h R or h O to get additional info.
A quick fix to the offending line (insert the missing parentheses) in the actual program and we're finished.
Actions, watch variables, stack traces etc.: on the TODO list.
- a
- w
- t
- T
Ever wanted to know what a regex looked like? You'll need perl compiled with the DEBUGGING flag for this one:
- > perl -Dr -e '/^pe(a)*rl$/i'
- Compiling REx `^pe(a)*rl$'
- size 17 first at 2
- rarest char
- at 0
- 1: BOL(2)
- 2: EXACTF <pe>(4)
- 4: CURLYN[1] {0,32767}(14)
- 6: NOTHING(8)
- 8: EXACTF <a>(0)
- 12: WHILEM(0)
- 13: NOTHING(14)
- 14: EXACTF <rl>(16)
- 16: EOL(17)
- 17: END(0)
- floating `'$ at 4..2147483647 (checking floating) stclass `EXACTF <pe>'
- anchored(BOL) minlen 4
- Omitting $` $& $' support.
- EXECUTING...
- Freeing REx: `^pe(a)*rl$'
Did you really want to know? :-) For more gory details on getting regular expressions to work, have a look at perlre, perlretut, and to decode the mysterious labels (BOL and CURLYN, etc. above), see perldebguts.
To get all the output from your error log, and not miss any messages via helpful operating system buffering, insert a line like this, at the start of your script:
- $|=1;
To watch the tail of a dynamically growing logfile, (from the command line):
- tail -f $error_log
Wrapping all die calls in a handler routine can be useful to see how, and from where, they're being called, perlvar has more information:
Various useful techniques for the redirection of STDOUT and STDERR filehandles are explained in perlopentut and perlfaq8.
Just a quick hint here for all those CGI programmers who can't figure out how on earth to get past that 'waiting for input' prompt, when running their CGI script from the command-line, try something like this:
- > perl -d my_cgi.pl -nodebug
Of course CGI and perlfaq9 will tell you more.
The command line interface is tightly integrated with an emacs extension and there's a vi interface too.
You don't have to do this all on the command line, though, there are a few GUI options out there. The nice thing about these is you can wave a mouse over a variable and a dump of its data will appear in an appropriate window, or in a popup balloon, no more tiresome typing of 'x $varname' :-)
In particular have a hunt around for the following:
ptkdb perlTK based wrapper for the built-in debugger
ddd data display debugger
PerlDevKit and PerlBuilder are NT specific
NB. (more info on these and others would be appreciated).
We've seen how to encourage good coding practices with use strict and -w. We can run the perl debugger perl -d scriptname to inspect your data from within the perl debugger with the p and x commands. You can walk through your code, set breakpoints with b and step through that code with s or n, continue with c and return from a sub with r. Fairly intuitive stuff when you get down to it.
There is of course lots more to find out about, this has just scratched the surface. The best way to learn more is to use perldoc to find out more about the language, to read the on-line help (perldebug is probably the next place to go), and of course, experiment.
perldebug, perldebguts, perldiag, perlrun
Richard Foley <richard.foley@rfi.net> Copyright (c) 2000
Various people have made helpful suggestions and contributions, in particular:
Ronald J Kimball <rjk@linguist.dartmouth.edu>
Hugo van der Sanden <hv@crypt0.demon.co.uk>
Peter Scott <Peter@PSDT.com>
perldebug - Perl debugging
First of all, have you tried using the -w switch?
If you're new to the Perl debugger, you may prefer to read perldebtut, which is a tutorial introduction to the debugger.
If you invoke Perl with the -d switch, your script runs under the Perl source debugger. This works like an interactive Perl environment, prompting for debugger commands that let you examine source code, set breakpoints, get stack backtraces, change the values of variables, etc. This is so convenient that you often fire up the debugger all by itself just to test out Perl constructs interactively to see what they do. For example:
- $ perl -d -e 42
In Perl, the debugger is not a separate program the way it usually is in the typical compiled environment. Instead, the -d flag tells the compiler to insert source information into the parse trees it's about to hand off to the interpreter. That means your code must first compile correctly for the debugger to work on it. Then when the interpreter starts up, it preloads a special Perl library file containing the debugger.
The program will halt right before the first run-time executable statement (but see below regarding compile-time statements) and ask you to enter a debugger command. Contrary to popular expectations, whenever the debugger halts and shows you a line of code, it always displays the line it's about to execute, rather than the one it has just executed.
Any command not recognized by the debugger is directly executed
(eval'd) as Perl code in the current package. (The debugger
uses the DB package for keeping its own state information.)
Note that the said eval is bound by an implicit scope. As a
result any newly introduced lexical variable or any modified
capture buffer content is lost after the eval. The debugger is a
nice environment to learn Perl, but if you interactively experiment using
material which should be in the same scope, stuff it in one line.
For any text entered at the debugger prompt, leading and trailing whitespace
is first stripped before further processing. If a debugger command
coincides with some function in your own program, merely precede the
function with something that doesn't look like a debugger command, such
as a leading ;
or perhaps a +
, or by wrapping it with parentheses
or braces.
There are several ways to call the debugger:
On the given program identified by program_name
.
Interactively supply an arbitrary expression
using -e
.
Debug a given program via the Devel::Ptkdb
GUI.
Debug a given program using threads (experimental).
The interactive debugger understands the following commands:
Prints out a summary help message
Prints out a help message for the given debugger command.
The special argument of h h
produces the entire help page, which is quite long.
If the output of the h h
command (or any command, for that matter) scrolls
past your screen, precede the command with a leading pipe symbol so
that it's run through your pager, as in
- DB> |h h
You may change the pager which is used via o pager=...
command.
Same as print {$DB::OUT} expr
in the current package. In particular,
because this is just Perl's own print function, this means that nested
data structures and objects are not dumped, unlike with the x
command.
The DB::OUT
filehandle is opened to /dev/tty, regardless of
where STDOUT may be redirected to.
Evaluates its expression in list context and dumps out the result in a
pretty-printed fashion. Nested data structures are printed out
recursively, unlike the real print function in Perl. When dumping
hashes, you'll probably prefer 'x \%h' rather than 'x %h'.
See Dumpvalue if you'd like to do this yourself.
The output format is governed by multiple options described under Configurable Options.
If the maxdepth
is included, it must be a numeral N; the value is
dumped only N levels deep, as if the dumpDepth
option had been
temporarily set to N.
Display all (or some) variables in package (defaulting to main
)
using a data pretty-printer (hashes show their keys and values so
you see what's what, control characters are made printable, etc.).
Make sure you don't put the type specifier (like $
) there, just
the symbol names, like this:
- V DB filename line
Use ~pattern
and !pattern
for positive and negative regexes.
This is similar to calling the x
command on each applicable var.
Same as V currentpackage [vars]
.
Display all (or some) lexical variables (mnemonic: mY
variables)
in the current scope or level scopes higher. You can limit the
variables that you see with vars which works exactly as it does
for the V
and X
commands. Requires the PadWalker
module
version 0.08 or higher; will warn if this isn't installed. Output
is pretty-printed in the same style as for V
and the format is
controlled by the same options.
Produce a stack backtrace. See below for details on its output.
Single step. Executes until the beginning of another statement, descending into subroutine calls. If an expression is supplied that includes function calls, it too will be single-stepped.
Next. Executes over subroutine calls, until the beginning of the next statement. If an expression is supplied that includes function calls, those functions will be executed with stops before each statement.
Continue until the return from the current subroutine.
Dump the return value if the PrintRet
option is set (default).
Repeat last n
or s command.
Continue, optionally inserting a one-time-only breakpoint at the specified line or subroutine.
List next window of lines.
List incr+1
lines starting at min
.
List lines min
through max
. l -
is synonymous to -
.
List a single line.
List first window of lines from subroutine. subname may be a variable that contains a code reference.
List previous window of lines.
View a few lines of code around the current line.
Return the internal debugger pointer to the line last executed, and print out that line.
Switch to viewing a different file or eval statement. If filename
is not a full pathname found in the values of %INC, it is considered
a regex.
evaled strings (when accessible) are considered to be filenames:
f (eval 7)
and f eval 7\b
access the body of the 7th evaled string
(in the order of execution). The bodies of the currently executed eval
and of evaled strings that define subroutines are saved and thus
accessible.
Search forwards for pattern (a Perl regex); final / is optional. The search is case-insensitive by default.
Search backwards for pattern; final ? is optional. The search is case-insensitive by default.
List (default all) actions, breakpoints and watch expressions
List subroutine names [not] matching the regex.
Toggle trace mode (see also the AutoTrace
option).
Optional argument is the maximum number of levels to trace below
the current one; anything deeper than that will be silent.
Trace through execution of expr
.
Optional first argument is the maximum number of levels to trace below
the current one; anything deeper than that will be silent.
See Frame Listing Output Examples in perldebguts for examples.
Sets breakpoint on current line
Set a breakpoint before the given line. If a condition
is specified, it's evaluated each time the statement is reached: a
breakpoint is taken only if the condition is true. Breakpoints may
only be set on lines that begin an executable statement. Conditions
don't use if
:
- b 237 $x > 30
- b 237 ++$count237 < 11
- b 33 /pattern/i
If the line number is ., sets a breakpoint on the current line:
- b . $n > 100
Set a breakpoint before the given line in a (possibly different) file. If a
condition is specified, it's evaluated each time the statement is reached: a
breakpoint is taken only if the condition is true. Breakpoints may only be set
on lines that begin an executable statement. Conditions don't use if
:
- b lib/MyModule.pm:237 $x > 30
- b /usr/lib/perl5/site_perl/CGI.pm:100 ++$count100 < 11
Set a breakpoint before the first line of the named subroutine. subname may be a variable containing a code reference (in this case condition is not supported).
Set a breakpoint at first line of subroutine after it is compiled.
Set a breakpoint before the first executed line of the filename, which should be a full pathname found amongst the %INC values.
Sets a breakpoint before the first statement executed after the specified subroutine is compiled.
Delete a breakpoint from the specified line.
Delete all installed breakpoints.
Disable the breakpoint so it won't stop the execution of the program.
Breakpoints are enabled by default and can be re-enabled using the enable
command.
Disable the breakpoint so it won't stop the execution of the program.
Breakpoints are enabled by default and can be re-enabled using the enable
command.
This is done for a breakpoint in the current file.
Enable the breakpoint so it will stop the execution of the program.
Enable the breakpoint so it will stop the execution of the program.
This is done for a breakpoint in the current file.
Set an action to be done before the line is executed. If line is omitted, set an action on the line about to be executed. The sequence of steps taken by the debugger is
- 1. check for a breakpoint at this line
- 2. print the line if necessary (tracing)
- 3. do any actions associated with that line
- 4. prompt user if at a breakpoint or in single-step
- 5. evaluate line
For example, this will print out $foo every time line 53 is passed:
- a 53 print "DB FOUND $foo\n"
Delete an action from the specified line.
Delete all installed actions.
Add a global watch-expression. Whenever a watched global changes the debugger will stop and display the old and new values.
Delete watch-expression
Delete all watch-expressions.
Display all options.
Set each listed Boolean option to the value 1
.
Print out the value of one or more options.
Set the value of one or more options. If the value has internal
whitespace, it should be quoted. For example, you could set o
pager="less -MQeicsNfr"
to call less with those specific options.
You may use either single or double quotes, but if you do, you must
escape any embedded instances of same sort of quote you began with,
as well as any escaping any escapes that immediately precede that
quote but which are not meant to escape the quote itself. In other
words, you follow single-quoting rules irrespective of the quote;
eg: o option='this isn\'t bad'
or o option="She said, \"Isn't
it?\""
.
For historical reasons, the =value
is optional, but defaults to
1 only where it is safe to do so--that is, mostly for Boolean
options. It is always better to assign a specific value using =
.
The option
can be abbreviated, but for clarity probably should
not be. Several options can be set together. See Configurable Options
for a list of these.
List out all pre-prompt Perl command actions.
Set an action (Perl command) to happen before every debugger prompt. A multi-line command may be entered by backslashing the newlines.
Delete all pre-prompt Perl command actions.
Add an action (Perl command) to happen before every debugger prompt. A multi-line command may be entered by backwhacking the newlines.
List out post-prompt Perl command actions.
Set an action (Perl command) to happen after the prompt when you've just given a command to return to executing the script. A multi-line command may be entered by backslashing the newlines (we bet you couldn't have guessed this by now).
Delete all post-prompt Perl command actions.
Adds an action (Perl command) to happen after the prompt when you've just given a command to return to executing the script. A multi-line command may be entered by backslashing the newlines.
List out pre-prompt debugger commands.
Set an action (debugger command) to happen before every debugger prompt. A multi-line command may be entered in the customary fashion.
Because this command is in some senses new, a warning is issued if
you appear to have accidentally entered a block instead. If that's
what you mean to do, write it as with ;{ ... }
or even
do { ... }
.
Delete all pre-prompt debugger commands.
Add an action (debugger command) to happen before every debugger prompt. A multi-line command may be entered, if you can guess how: see above.
Redo a previous command (defaults to the previous command).
Redo number'th previous command.
Redo last command that started with pattern.
See o recallCommand
, too.
Run cmd in a subprocess (reads from DB::IN, writes to DB::OUT) See
o shellBang
, also. Note that the user's current shell (well,
their $ENV{SHELL}
variable) will be used, which can interfere
with proper interpretation of exit status or signal and coredump
information.
Read and execute debugger commands from file.
file may itself contain source
commands.
Display last n commands. Only commands longer than one character are listed. If number is omitted, list them all.
Quit. ("quit" doesn't work for this, unless you've made an alias)
This is the only supported way to exit the debugger, though typing
exit twice might work.
Set the inhibit_exit
option to 0 if you want to be able to step
off the end the script. You may also need to set $finished to 0
if you want to step through global destruction.
Restart the debugger by exec()ing a new session. We try to maintain
your history across this, but internal settings and command-line options
may be lost.
The following setting are currently preserved: history, breakpoints, actions, debugger options, and the Perl command-line options -w, -I, and -e.
Run the debugger command, piping DB::OUT into your current pager.
Same as |dbcmd but DB::OUT is temporarily selected as well.
Define a command alias, like
- = quit q
or list current aliases.
Execute command as a Perl statement. A trailing semicolon will be supplied. If the Perl statement would otherwise be confused for a Perl debugger, use a leading semicolon, too.
List which methods may be called on the result of the evaluated expression. The expression may evaluated to a reference to a blessed object, or to a package name.
Display all loaded modules and their versions.
Despite its name, this calls your system's default documentation
viewer on the given page, or on the viewer itself if manpage is
omitted. If that viewer is man, the current Config
information
is used to invoke man using the proper MANPATH or -M
manpath option. Failed lookups of the form XXX
that match
known manpages of the form perlXXX will be retried. This lets
you type man debug
or man op
from the debugger.
On systems traditionally bereft of a usable man command, the debugger invokes perldoc. Occasionally this determination is incorrect due to recalcitrant vendors or rather more felicitously, to enterprising users. If you fall into either category, just manually set the $DB::doccmd variable to whatever viewer to view the Perl documentation on your system. This may be set in an rc file, or through direct assignment. We're still waiting for a working example of something along the lines of:
- $DB::doccmd = 'netscape -remote http://something.here/';
The debugger has numerous options settable using the o
command,
either interactively or from the environment or an rc file.
(./.perldb or ~/.perldb under Unix.)
recallCommand
, ShellBang
The characters used to recall a command or spawn a shell. By
default, both are set to !
, which is unfortunate.
pager
Program to use for output of pager-piped commands (those beginning
with a | character.) By default, $ENV{PAGER}
will be used.
Because the debugger uses your current terminal characteristics
for bold and underlining, if the chosen pager does not pass escape
sequences through unchanged, the output of some debugger commands
will not be readable when sent through the pager.
tkRunning
Run Tk while prompting (with ReadLine).
signalLevel
, warnLevel
, dieLevel
Level of verbosity. By default, the debugger leaves your exceptions and warnings alone, because altering them can break correctly running programs. It will attempt to print a message when uncaught INT, BUS, or SEGV signals arrive. (But see the mention of signals in BUGS below.)
To disable this default safe mode, set these values to something higher
than 0. At a level of 1, you get backtraces upon receiving any kind
of warning (this is often annoying) or exception (this is
often valuable). Unfortunately, the debugger cannot discern fatal
exceptions from non-fatal ones. If dieLevel
is even 1, then your
non-fatal exceptions are also traced and unceremoniously altered if they
came from eval'ed strings or from any kind of eval within modules
you're attempting to load. If dieLevel
is 2, the debugger doesn't
care where they came from: It usurps your exception handler and prints
out a trace, then modifies all exceptions with its own embellishments.
This may perhaps be useful for some tracing purposes, but tends to hopelessly
destroy any program that takes its exception handling seriously.
AutoTrace
Trace mode (similar to t
command, but can be put into
PERLDB_OPTS
).
LineInfo
File or pipe to print line number info to. If it is a pipe (say,
|visual_perl_db), then a short message is used. This is the
mechanism used to interact with a slave editor or visual debugger,
such as the special vi
or emacs
hooks, or the ddd
graphical
debugger.
inhibit_exit
If 0, allows stepping off the end of the script.
PrintRet
Print return value after r
command if set (default).
ornaments
Affects screen appearance of the command line (see Term::ReadLine). There is currently no way to disable these, which can render some output illegible on some displays, or with some pagers. This is considered a bug.
frame
Affects the printing of messages upon entry and exit from subroutines. If
frame & 2
is false, messages are printed on entry only. (Printing
on exit might be useful if interspersed with other messages.)
If frame & 4
, arguments to functions are printed, plus context
and caller info. If frame & 8
, overloaded stringify
and
tied FETCH
is enabled on the printed arguments. If frame
& 16
, the return value from the subroutine is printed.
The length at which the argument list is truncated is governed by the next option:
maxTraceLen
Length to truncate the argument list when the frame
option's
bit 4 is set.
windowSize
Change the size of code list window (default is 10 lines).
The following options affect what happens with V
, X
, and x
commands:
arrayDepth
, hashDepth
Print only first N elements ('' for all).
dumpDepth
Limit recursion depth to N levels when dumping structures. Negative values are interpreted as infinity. Default: infinity.
compactDump
, veryCompact
Change the style of array and hash output. If compactDump
, short array
may be printed on one line.
globPrint
Whether to print contents of globs.
DumpDBFiles
Dump arrays holding debugged files.
DumpPackages
Dump symbol tables of packages.
DumpReused
Dump contents of "reused" addresses.
quote
, HighBit
, undefPrint
Change the style of string dump. The default value for quote
is auto
; one can enable double-quotish or single-quotish format
by setting it to " or ', respectively. By default, characters
with their high bit set are printed verbatim.
UsageOnly
Rudimentary per-package memory usage dump. Calculates total size of strings found in variables in the package. This does not include lexicals in a module's file scope, or lost in closures.
HistFile
The path of the file from which the history (assuming a usable
Term::ReadLine backend) will be read on the debugger's startup, and to which
it will be saved on shutdown (for persistence across sessions). Similar in
concept to Bash's .bash_history
file.
HistSize
The count of the saved lines in the history (assuming HistFile
above).
After the rc file is read, the debugger reads the $ENV{PERLDB_OPTS}
environment variable and parses this as the remainder of a "O ..."
line as one might enter at the debugger prompt. You may place the
initialization options TTY
, noTTY
, ReadLine
, and NonStop
there.
If your rc file contains:
- parse_options("NonStop=1 LineInfo=db.out AutoTrace");
then your script will run without human intervention, putting trace
information into the file db.out. (If you interrupt it, you'd
better reset LineInfo
to /dev/tty if you expect to see anything.)
TTY
The TTY to use for debugging I/O.
noTTY
If set, the debugger goes into NonStop
mode and will not connect to a TTY. If
interrupted (or if control goes to the debugger via explicit setting of
$DB::signal or $DB::single from the Perl script), it connects to a TTY
specified in the TTY
option at startup, or to a tty found at
runtime using the Term::Rendezvous
module of your choice.
This module should implement a method named new
that returns an object
with two methods: IN
and OUT
. These should return filehandles to use
for debugging input and output correspondingly. The new
method should
inspect an argument containing the value of $ENV{PERLDB_NOTTY}
at
startup, or "$ENV{HOME}/.perldbtty$$"
otherwise. This file is not
inspected for proper ownership, so security hazards are theoretically
possible.
ReadLine
If false, readline support in the debugger is disabled in order to debug applications that themselves use ReadLine.
NonStop
If set, the debugger goes into non-interactive mode until interrupted, or programmatically by setting $DB::signal or $DB::single.
Here's an example of using the $ENV{PERLDB_OPTS}
variable:
- $ PERLDB_OPTS="NonStop frame=2" perl -d myprogram
That will run the script myprogram without human intervention,
printing out the call tree with entry and exit points. Note that
NonStop=1 frame=2 is equivalent to N f=2
, and that originally,
options could be uniquely abbreviated by the first letter (modulo
the Dump*
options). It is nevertheless recommended that you
always spell them out in full for legibility and future compatibility.
Other examples include
- $ PERLDB_OPTS="NonStop LineInfo=listing frame=2" perl -d myprogram
which runs script non-interactively, printing info on each entry
into a subroutine and each executed line into the file named listing.
(If you interrupt it, you would better reset LineInfo
to something
"interactive"!)
Other examples include (using standard shell syntax to show environment variable settings):
- $ ( PERLDB_OPTS="NonStop frame=1 AutoTrace LineInfo=tperl.out"
- perl -d myprogram )
which may be useful for debugging a program that uses Term::ReadLine
itself. Do not forget to detach your shell from the TTY in the window that
corresponds to /dev/ttyXX, say, by issuing a command like
- $ sleep 1000000
See Debugger Internals in perldebguts for details.
The debugger prompt is something like
- DB<8>
or even
- DB<<17>>
where that number is the command number, and which you'd use to
access with the built-in csh-like history mechanism. For example,
!17
would repeat command number 17. The depth of the angle
brackets indicates the nesting depth of the debugger. You could
get more than one set of brackets, for example, if you'd already
at a breakpoint and then printed the result of a function call that
itself has a breakpoint, or you step into an expression via s/n/t
expression command.
If you want to enter a multi-line command, such as a subroutine definition with several statements or a format, escape the newline that would normally end the debugger command with a backslash. Here's an example:
- DB<1> for (1..4) { \
- cont: print "ok\n"; \
- cont: }
- ok
- ok
- ok
- ok
Note that this business of escaping a newline is specific to interactive commands typed into the debugger.
Here's an example of what a stack backtrace via T
command might
look like:
- $ = main::infested called from file 'Ambulation.pm' line 10
- @ = Ambulation::legs(1, 2, 3, 4) called from file 'camel_flea' line 7
- $ = main::pests('bactrian', 4) called from file 'camel_flea' line 4
The left-hand character up there indicates the context in which the
function was called, with $
and @
meaning scalar or list
contexts respectively, and . meaning void context (which is
actually a sort of scalar context). The display above says
that you were in the function main::infested
when you ran the
stack dump, and that it was called in scalar context from line
10 of the file Ambulation.pm, but without any arguments at all,
meaning it was called as &infested
. The next stack frame shows
that the function Ambulation::legs
was called in list context
from the camel_flea file with four arguments. The last stack
frame shows that main::pests
was called in scalar context,
also from camel_flea, but from line 4.
If you execute the T
command from inside an active use
statement, the backtrace will contain both a require frame and
an eval frame.
This shows the sorts of output the l
command can produce:
- DB<<13>> l
- 101: @i{@i} = ();
- 102:b @isa{@i,$pack} = ()
- 103 if(exists $i{$prevpack} || exists $isa{$pack});
- 104 }
- 105
- 106 next
- 107==> if(exists $isa{$pack});
- 108
- 109:a if ($extra-- > 0) {
- 110: %isa = ($pack,1);
Breakable lines are marked with :
. Lines with breakpoints are
marked by b
and those with actions by a
. The line that's
about to be executed is marked by ==>.
Please be aware that code in debugger listings may not look the same as your original source code. Line directives and external source filters can alter the code before Perl sees it, causing code to move from its original positions or take on entirely different forms.
When the frame
option is set, the debugger would print entered (and
optionally exited) subroutines in different styles. See perldebguts
for incredibly long examples of these.
If you have compile-time executable statements (such as code within
BEGIN, UNITCHECK and CHECK blocks or use statements), these will
not be stopped by debugger, although requires and INIT blocks
will, and compile-time statements can be traced with the AutoTrace
option set in PERLDB_OPTS
). From your own Perl code, however, you
can transfer control back to the debugger using the following
statement, which is harmless if the debugger is not running:
- $DB::single = 1;
If you set $DB::single
to 2, it's equivalent to having
just typed the n
command, whereas a value of 1 means the s
command. The $DB::trace
variable should be set to 1 to simulate
having typed the t
command.
Another way to debug compile-time code is to start the debugger, set a breakpoint on the load of some module:
- DB<7> b load f:/perllib/lib/Carp.pm
- Will stop on load of 'f:/perllib/lib/Carp.pm'.
and then restart the debugger using the R
command (if possible). One can use b
compile subname
for the same purpose.
The debugger probably contains enough configuration hooks that you
won't ever have to modify it yourself. You may change the behaviour
of the debugger from within the debugger using its o
command, from
the command line via the PERLDB_OPTS
environment variable, and
from customization files.
You can do some customization by setting up a .perldb file, which contains initialization code. For instance, you could make aliases like these (the last one is one people expect to be there):
- $DB::alias{'len'} = 's/^len(.*)/p length($1)/';
- $DB::alias{'stop'} = 's/^stop (at|in)/b/';
- $DB::alias{'ps'} = 's/^ps\b/p scalar /';
- $DB::alias{'quit'} = 's/^quit(\s*)/exit/';
You can change options from .perldb by using calls like this one;
- parse_options("NonStop=1 LineInfo=db.out AutoTrace=1 frame=2");
The code is executed in the package DB
. Note that .perldb is
processed before processing PERLDB_OPTS
. If .perldb defines the
subroutine afterinit
, that function is called after debugger
initialization ends. .perldb may be contained in the current
directory, or in the home directory. Because this file is sourced
in by Perl and may contain arbitrary commands, for security reasons,
it must be owned by the superuser or the current user, and writable
by no one but its owner.
You can mock TTY input to debugger by adding arbitrary commands to @DB::typeahead. For example, your .perldb file might contain:
- sub afterinit { push @DB::typeahead, "b 4", "b 6"; }
Which would attempt to set breakpoints on lines 4 and 6 immediately after debugger initialization. Note that @DB::typeahead is not a supported interface and is subject to change in future releases.
If you want to modify the debugger, copy perl5db.pl from the
Perl library to another name and hack it to your heart's content.
You'll then want to set your PERL5DB
environment variable to say
something like this:
- BEGIN { require "myperl5db.pl" }
As a last resort, you could also use PERL5DB
to customize the debugger
by directly setting internal variables or calling debugger functions.
Note that any variables and functions that are not documented in this document (or in perldebguts) are considered for internal use only, and as such are subject to change without notice.
As shipped, the only command-line history supplied is a simplistic one that checks for leading exclamation points. However, if you install the Term::ReadKey and Term::ReadLine modules from CPAN (such as Term::ReadLine::Gnu, Term::ReadLine::Perl, ...) you will have full editing capabilities much like those GNU readline(3) provides. Look for these in the modules/by-module/Term directory on CPAN. These do not support normal vi command-line editing, however.
A rudimentary command-line completion is also available, including
lexical variables in the current scope if the PadWalker
module
is installed.
Without Readline support you may see the symbols "^[[A", "^[[C", "^[[B", "^[[D"", "^H", ... when using the arrow keys and/or the backspace key.
If you have the FSF's version of emacs installed on your system, it can interact with the Perl debugger to provide an integrated software development environment reminiscent of its interactions with C debuggers.
Recent versions of Emacs come with a start file for making emacs act like a syntax-directed editor that understands (some of) Perl's syntax. See perlfaq3.
A similar setup by Tom Christiansen for interacting with any vendor-shipped vi and the X11 window system is also available. This works similarly to the integrated multiwindow support that emacs provides, where the debugger drives the editor. At the time of this writing, however, that tool's eventual location in the Perl distribution was uncertain.
Users of vi should also look into vim and gvim, the mousey and windy version, for coloring of Perl keywords.
Note that only perl can truly parse Perl, so all such CASE tools fall somewhat short of the mark, especially if you don't program your Perl as a C programmer might.
If you wish to supply an alternative debugger for Perl to run, invoke your script with a colon and a package argument given to the -d flag. Perl's alternative debuggers include a Perl profiler, Devel::NYTProf, which is available separately as a CPAN distribution. To profile your Perl program in the file mycode.pl, just type:
- $ perl -d:NYTProf mycode.pl
When the script terminates the profiler will create a database of the profile information that you can turn into reports using the profiler's tools. See <perlperf> for details.
use re 'debug'
enables you to see the gory details of how the Perl
regular expression engine works. In order to understand this typically
voluminous output, one must not only have some idea about how regular
expression matching works in general, but also know how Perl's regular
expressions are internally compiled into an automaton. These matters
are explored in some detail in
Debugging Regular Expressions in perldebguts.
Perl contains internal support for reporting its own memory usage, but this is a fairly advanced concept that requires some understanding of how memory allocation works. See Debugging Perl Memory Usage in perldebguts for the details.
You did try the -w switch, didn't you?
perldebtut, perldebguts, re, DB, Devel::NYTProf, Dumpvalue, and perlrun.
When debugging a script that uses #! and is thus normally found in
$PATH, the -S option causes perl to search $PATH for it, so you don't
have to type the path or which $scriptname
.
- $ perl -Sd foo.pl
You cannot get stack frame information or in any fashion debug functions that were not compiled by Perl, such as those from C or C++ extensions.
If you alter your @_ arguments in a subroutine (such as with shift
or pop), the stack backtrace will not show the original values.
The debugger does not currently work in conjunction with the -W command-line switch, because it itself is not free of warnings.
If you're in a slow syscall (like waiting, accepting, or reading
from your keyboard or a socket) and haven't set up your own $SIG{INT}
handler, then you won't be able to CTRL-C your way back to the debugger,
because the debugger's own $SIG{INT}
handler doesn't understand that
it needs to raise an exception to longjmp(3) out of slow syscalls.
perldelta - what is new for perl v5.18.2
This document describes differences between the 5.18.1 release and the 5.18.2 release.
If you are upgrading from an earlier release such as 5.18.0, first read perl5181delta, which describes differences between 5.18.0 and 5.18.1.
B has been upgraded from version 1.42_01 to 1.42_02.
The fix for [perl #118525] introduced a regression in the behaviour of
B::CV::GV
, changing the return value from a B::SPECIAL
object on
a NULL
CvGV
to undef. B::CV::GV
again returns a
B::SPECIAL
object in this case. [perl #119413]
B::Concise has been upgraded from version 0.95 to 0.95_01.
This fixes a bug in dumping unexpected SEPCIALs.
English has been upgraded from version 1.06 to 1.06_01. This fixes an
error about the performance of $`
, $&
, and c<$'>.
File::Glob has been upgraded from version 1.20 to 1.20_01.
perlrepository has been restored with a pointer to more useful pages.
perlhack has been updated with the latest changes from blead.
Perl 5.18.1 introduced a regression along with a bugfix for lexical subs. Some B::SPECIAL results from B::CV::GV became undefs instead. This broke Devel::Cover among other libraries. This has been fixed. [perl #119351]
Perl 5.18.0 introduced a regression whereby [:^ascii:]
, if used in the same
character class as other qualifiers, would fail to match characters in the
Latin-1 block. This has been fixed. [perl #120799]
Perl 5.18.0 introduced a regression when using ->SUPER::method with AUTOLOAD by looking up AUTOLOAD from the current package, rather than the current package’s superclass. This has been fixed. [perl #120694]
Perl 5.18.0 introduced a regression whereby -bareword
was no longer
permitted under the strict
and integer
pragmata when used together. This
has been fixed. [perl #120288]
Previously PerlIOBase_dup didn't check if pushing the new layer succeeded before (optionally) setting the utf8 flag. This could cause segfaults-by-nullpointer. This has been fixed.
A buffer overflow with very long identifiers has been fixed.
A regression from 5.16 in the handling of padranges led to assertion failures if a keyword plugin declined to handle the second ‘my’, but only after creating a padop.
This affected, at least, Devel::CallParser under threaded builds.
This has been fixed
The construct $r=qr/.../; /$r/p
is now handled properly, an issue which
had been worsened by changes 5.18.0. [perl #118213]
Perl 5.18.2 represents approximately 3 months of development since Perl 5.18.1 and contains approximately 980 lines of changes across 39 files from 4 authors.
Perl continues to flourish into its third decade thanks to a vibrant community of users and developers. The following people are known to have contributed the improvements that became Perl 5.18.2:
Craig A. Berry, David Mitchell, Ricardo Signes, Tony Cook.
The list above is almost certainly incomplete as it is automatically generated from version control history. In particular, it does not include the names of the (very much appreciated) contributors who reported issues to the Perl bug tracker.
Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.
For a more complete list of all of Perl's historical contributors, please see the AUTHORS file in the Perl source distribution.
If you find what you think is a bug, you might check the articles recently posted to the comp.lang.perl.misc newsgroup and the perl bug database at http://rt.perl.org/perlbug/ . There may also be information at http://www.perl.org/ , the Perl Home Page.
If you believe you have an unreported bug, please run the perlbug program
included with your release. Be sure to trim your bug down to a tiny but
sufficient test case. Your bug report, along with the output of perl -V
,
will be sent off to perlbug@perl.org to be analysed by the Perl porting team.
If the bug you are reporting has security implications, which make it inappropriate to send to a publicly archived mailing list, then please send it to perl5-security-report@perl.org. This points to a closed subscription unarchived mailing list, which includes all the core committers, who will be able to help assess the impact of issues, figure out a resolution, and help co-ordinate the release of patches to mitigate or fix the problem across all platforms on which Perl is supported. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
The Changes file for an explanation of how to view exhaustive details on what changed.
The INSTALL file for how to build Perl.
The README file for general stuff.
The Artistic and Copying files for copyright information.
perldgux - Perl under DG/UX.
One can read this document in the following formats:
- man perldgux
- view perl perldgux
- explorer perldgux.html
- info perldgux
to list some (not all may be available simultaneously), or it may be read as is: as README.dgux.
Perl 5.7/8.x for DG/UX ix86 R4.20MU0x
Just run ./Configure script from the top directory. Then give "make" to compile.
If you are using as compiler GCC-2.95.x rev(DG/UX) an easy solution for configuring perl in your DG/UX machine is to run the command:
./Configure -Dusethreads -Duseithreads -Dusedevel -des
This will automatically accept all the defaults and in particular /usr/local/ as installation directory. Note that GCC-2.95.x rev(DG/UX) knows the switch -pthread which allows it to link correctly DG/UX's -lthread library.
If you want to change the installation directory or have a standard DG/UX with C compiler GCC-2.7.2.x then you have no choice than to do an interactive build by issuing the command:
./Configure -Dusethreads -Duseithreads
In particular with GCC-2.7.2.x accept all the defaults and *watch* out for the message:
- Any additional ld flags (NOT including libraries)? [ -pthread]
Instead of -pthread put here -lthread. CGCC-2.7.2.x that comes with the DG/UX OS does NOT know the -pthread switch. So your build will fail if you choose the defaults. After configuration is done correctly give "make" to compile.
Issuing a "make test" will run all the tests. If the test lib/ftmp-security gives you as a result something like
- lib/ftmp-security....File::Temp::_gettemp:
- Parent directory (/tmp/) is not safe (sticky bit not set
- when world writable?) at lib/ftmp-security.t line 100
don't panic and just set the sticky bit in your /tmp directory by doing the following as root:
cd / chmod +t /tmp (=set the sticky bit to /tmp).
Then rerun the tests. This time all must be OK.
Run the command "make install"
Takis Psarogiannakopoulos University of Cambridge Centre for Mathematical Sciences Department of Pure Mathematics Wilberforce road Cambridge CB3 0WB , UK email <takis@XFree86.Org>
perl(1).
perldiag - various Perl diagnostics
These messages are classified as follows (listed in increasing order of desperation):
- (W) A warning (optional).
- (D) A deprecation (enabled by default).
- (S) A severe warning (enabled by default).
- (F) A fatal error (trappable).
- (P) An internal error you should never see (trappable).
- (X) A very fatal error (nontrappable).
- (A) An alien error message (not generated by Perl).
The majority of messages from the first three classifications above
(W, D & S) can be controlled using the warnings
pragma.
If a message can be controlled by the warnings
pragma, its warning
category is included with the classification letter in the description
below. E.g. (W closed)
means a warning in the closed
category.
Optional warnings are enabled by using the warnings
pragma or the -w
and -W switches. Warnings may be captured by setting $SIG{__WARN__}
to a reference to a routine that will be called on each warning instead
of printing it. See perlvar.
Severe warnings are always enabled, unless they are explicitly disabled
with the warnings
pragma or the -X switch.
Trappable errors may be trapped using the eval operator. See
eval. In almost all cases, warnings may be selectively
disabled or promoted to fatal errors using the warnings
pragma.
See warnings.
The messages are in alphabetical order, without regard to upper or lower-case. Some of these messages are generic. Spots that vary are denoted with a %s or other printf-style escape. These escapes are ignored by the alphabetical order, as are all characters other than letters. To look up your message, just ignore anything that is not a letter.
(W closed) You tried to do an accept on a closed socket. Did you forget to check the return value of your socket() call? See accept.
(X) You can't allocate more than 64K on an MS-DOS machine.
(F) The modifiers '!', '<' and '>' are allowed in pack() or unpack() only after certain types. See pack.
(W ambiguous) A subroutine you have declared has the same name as a Perl keyword, and you have used the name without qualification for calling one or the other. Perl decided to call the builtin because the subroutine is not imported.
To force interpretation as a subroutine call, either put an ampersand
before the subroutine name, or qualify the name with its package.
Alternatively, you can import the subroutine (or pretend that it's
imported with the use subs
pragma).
To silently interpret it as the Perl operator, use the CORE::
prefix
on the operator (e.g. CORE::log($x)
) or declare the subroutine
to be an object method (see Subroutine Attributes in perlsub or
attributes).
(F) You wrote something like tr/a-z-0// which doesn't mean anything at
all. To include a -
character in a transliteration, put it either
first or last. (In the past, tr/a-z-0// was synonymous with
tr/a-y//, which was probably not what you would have expected.)
(S ambiguous) You said something that may not be interpreted the way you thought. Normally it's pretty easy to disambiguate it by supplying a missing quote, operator, parenthesis pair or declaration.
(S ambiguous) %
, &
, and *
are both infix operators (modulus,
bitwise and, and multiplication) and initial special characters
(denoting hashes, subroutines and typeglobs), and you said something
like *foo * foo
that might be interpreted as either of them. We
assumed you meant the infix operator, but please try to make it more
clear -- in the example given, you might write *foo * foo()
if you
really meant to multiply a glob by the result of calling a function.
(W ambiguous) You wrote something like @{foo}
, which might be
asking for the variable @foo
, or it might be calling a function
named foo, and dereferencing it as an array reference. If you wanted
the variable, you can just write @foo
. If you wanted to call the
function, write @{foo()}
... or you could just not have a variable
and a function with the same name, and save yourself a lot of trouble.
(W ambiguous) You wrote something like ${foo[2]}
(where foo represents
the name of a Perl keyword), which might be looking for element number
2 of the array named @foo
, in which case please write $foo[2]
, or you
might have meant to pass an anonymous arrayref to the function named
foo, and then do a scalar deref on the value it returns. If you meant
that, write ${foo([2])}
.
In regular expressions, the ${foo[2]}
syntax is sometimes necessary
to disambiguate between array subscripts and character classes.
/$length[2345]/
, for instance, will be interpreted as $length
followed
by the character class [2345]
. If an array subscript is what you
want, you can avoid the warning by changing /${length[2345]}/
to the
unsightly /${\$length[2345]}/
, by renaming your array to something
that does not coincide with a built-in keyword, or by simply turning
off warnings with no warnings 'ambiguous';
.
(S ambiguous) You wrote something like -foo
, which might be the
string "-foo"
, or a call to the function foo
, negated. If you meant
the string, just write "-foo"
. If you meant the function call,
write -foo()
.
(F) An error peculiar to VMS. Perl does its own command line redirection, and found that STDIN was a pipe, and that you also tried to redirect STDIN using '<'. Only one STDIN stream to a customer, please.
(F) An error peculiar to VMS. Perl does its own command line redirection, and thinks you tried to redirect stdout both to a file and into a pipe to another command. You need to choose one or the other, though nothing's stopping you from piping into a program or Perl script which 'splits' output into two streams, such as
(W misc) The pattern match (//
), substitution (s///), and
transliteration (tr///) operators work on scalar values. If you apply
one of them to an array or a hash, it will convert the array or hash to
a scalar value (the length of an array, or the population info of a
hash) and then work on that scalar value. This is probably not what
you meant to do. See grep and map for
alternatives.
(F) msgsnd() requires a string at least as long as sizeof(long).
(F) The argument to exists() must be a hash or array element or a subroutine with an ampersand, such as:
- $foo{$bar}
- $ref->{"susie"}[12]
- &do_something
(F) The argument to delete() must be either a hash or array element, such as:
- $foo{$bar}
- $ref->{"susie"}[12]
or a hash or array slice, such as:
- @foo[$bar, $baz, $xyzzy]
- @{$ref->[12]}{"susie", "queue"}
(F) The argument to exists() for exists &sub
must be a subroutine
name, and not a subroutine call. exists &sub()
will generate this
error.
(W numeric) The indicated string was fed as an argument to an operator that expected a numeric value instead. If you're fortunate the message will identify which operator was so unfortunate.
(W layer) When pushing a layer with arguments onto the Perl I/O system you forgot the ) that closes the argument list. (Layers take care of transforming data between external and internal representations.) Perl stopped parsing the layer list at this point and did not attempt to push this layer. If your program didn't explicitly request the failing operation, it may be the result of the value of the environment variable PERLIO.
(D deprecated) Really old Perl let you omit the @ on array names in some spots. This is now heavily deprecated.
(D) You defined a character name which had multiple space characters in
a row. Change them to single spaces. Usually these names are defined
in the :alias
import argument to use charnames
, but they could be
defined by a translator installed into $^H{charnames}
. See
CUSTOM ALIASES in charnames.
(X) The malloc package that comes with Perl had an internal failure.
(X) A general assertion failed. The file in question must be examined.
(F) When the "array_base" feature is disabled (e.g., under use v5.16;
)
the special variable $[
, which is deprecated, is now a fixed zero value.
(F) If you assign to a conditional operator, the 2nd and 3rd arguments must either both be scalars or both be lists. Otherwise Perl won't know which context to supply to the right side.
(W threads)(S) When using threaded Perl, a thread (not necessarily the main thread) exited while there were still other threads running. Usually it's a good idea first to collect the return values of the created threads by joining them, and only then to exit from the main thread. See threads.
(F) The failing code has attempted to get or set a key which is not in the current set of allowed keys of a restricted hash.
(F) The CLASSNAME argument to the bless() operator is expected to be the name of the package to bless the resulting object into. You've supplied instead a reference to something: perhaps you wrote
- bless $self, $proto;
when you intended
If you actually want to bless into the stringified version of the reference supplied, you need to stringify it yourself, for example by:
- bless $self, "$proto";
(S debugging) An array was assigned to when it was being freed.
Freed values are not supposed to be visible to Perl code. This
can also happen if XS code calls av_clear
from a custom magic
callback on the array.
(F) The failing code attempted to delete from a restricted hash a key which is not in its key set.
(F) The failing code attempted to delete a key whose value has been declared readonly from a restricted hash.
(S internal) All SV objects are supposed to be allocated from arenas that will be garbage collected on exit. An SV was discovered to be outside any of those arenas.
(S internal) Perl maintains a reference-counted internal table of strings to optimize the storage and access of hash keys and other strings. This indicates someone tried to decrement the reference count of a string that can no longer be found in the table.
(S debugging) Mortalized values are supposed to be freed by the free_tmps() routine. This indicates that something else is freeing the SV before the free_tmps() routine gets a chance, which means that the free_tmps() routine will be freeing an unreferenced scalar when it does try to free it.
(S internal) The reference counts got screwed up on symbol aliases.
(S internal) Perl went to decrement the reference count of a scalar to see if it would go to 0, and discovered that it had already gone to 0 earlier, and should have been freed, and in fact, probably was freed. This could indicate that SvREFCNT_dec() was called too many times, or that SvREFCNT_inc() was called too few times, or that the SV was mortalized when it shouldn't have been, or that memory has been corrupted.
(F) You tried to join a thread from within itself, which is an impossible task. You may be joining the wrong thread, or you may need to move the join() to some other thread.
(W pack) You tried to pass a temporary value (like the result of a function, or a computed expression) to the "p" pack() template. This means the result contains a pointer to a location that could become invalid anytime, even before the end of the current statement. Use literals or global values as arguments to the "p" pack() template to avoid this warning.
(F) You tried to load a file with use or require that failed to
compile once already. Perl will not try to compile this file again
unless you delete its entry from %INC. See require and
%INC in perlvar.
(W misc) You tried to set the length of an array which has been freed. You can do this by storing a reference to the scalar representing the last index of an array and later assigning through that reference. For example
(W substr) You supplied a reference as the first argument to substr() used as an lvalue, which is pretty strange. Perhaps you forgot to dereference it first. See substr.
(D deprecated) You have used the attributes pragma to modify the "locked" attribute on a code reference. The :locked attribute is obsolete, has had no effect since 5005 threads were removed, and will be removed in a future release of Perl 5.
(D deprecated) You have used the attributes pragma to modify the "unique" attribute on an array, hash or scalar reference. The :unique attribute has had no effect since Perl 5.8.8, and will be removed in a future release of Perl 5.
(S debugging) This indicates that something went wrong and Perl got very
confused about @_
or @DB::args
being tied.
(F) You passed a buffer of the wrong size to one of msgctl(), semctl() or shmctl(). In C parlance, the correct sizes are, respectively, sizeof(struct msqid_ds *), sizeof(struct semid_ds *), and sizeof(struct shmid_ds *).
(F) You've used the /e switch to evaluate the replacement for a
substitution, but perl found a syntax error in the code to evaluate,
most likely an unexpected right brace '}'.
(F) A symbol was passed to something wanting a filehandle, but the symbol has no filehandle associated with it. Perhaps you didn't do an open(), or did it in another package.
(S malloc) An internal routine called free() on something that had never
been malloc()ed in the first place. Mandatory, but can be disabled by
setting environment variable PERL_BADFREE
to 0.
This message can be seen quite often with DB_File on systems with "hard"
dynamic linking, like AIX
and OS/2
. It is a bug of Berkeley DB
which is left unnoticed if DB
uses forgiving system malloc().
(P) One of the internal hash routines was passed a null HV pointer.
(A) You've accidentally run your script through csh instead of Perl. Check the #! line, or manually feed your script into Perl yourself.
(F) You started to name a symbol by using a package prefix, and then didn't finish the symbol. In particular, you can't interpolate outside of quotes, so
- $var = 'myvar';
- $sym = mypack::$var;
is not the same as
- $var = 'myvar';
- $sym = "mypack::$var";
(F) An extension using the keyword plugin mechanism violated the plugin API.
(S malloc) An internal routine called realloc() on something that
had never been malloc()ed in the first place. Mandatory, but can
be disabled by setting the environment variable PERL_BADFREE
to 1.
(P) An internal request asked to add an array entry to something that wasn't a symbol table entry.
(P) An internal request asked to add a dirhandle entry to something that wasn't a symbol table entry.
(P) An internal request asked to add a filehandle entry to something that wasn't a symbol table entry.
(P) An internal request asked to add a hash entry to something that wasn't a symbol table entry.
(W bareword) The compiler found a bareword where it expected a conditional, which often indicates that an || or && was parsed as part of the last argument of the previous construct, for example:
It may also indicate a misspelled constant that has been interpreted as a bareword:
The strict
pragma is useful in avoiding such errors.
(F) With "strict subs" in use, a bareword is only allowed as a subroutine identifier, in curly brackets or to the left of the "=>" symbol. Perhaps you need to predeclare a subroutine?
(W bareword) You used a qualified bareword of the form Foo::
, but the
compiler saw no other uses of that namespace before that point. Perhaps
you need to predeclare a package?
(F) An untrapped exception was raised while executing a BEGIN subroutine. Compilation stops immediately and the interpreter is exited.
(F) Perl found a BEGIN {}
subroutine (or a use directive, which
implies a BEGIN {}
) after one or more compilation errors had already
occurred. Since the intended environment for the BEGIN {}
could not
be guaranteed (due to the errors), and since subsequent code likely
depends on its correct operation, Perl just gave up.
(W syntax) Outside of patterns, backreferences live on as variables. The use of backslashes is grandfathered on the right-hand side of a substitution, but stylistically it's better to use the variable form because other Perl programmers will expect it, and it works better if there are more than 9 backreferences.
(W portable) The binary number you specified is larger than 2**32-1 (4294967295) and therefore non-portable between systems. See perlport for more on portability concerns.
(W closed) You tried to do a bind on a closed socket. Did you forget to check the return value of your socket() call? See bind.
(W unopened) You tried binmode() on a filehandle that was never opened. Check your control flow and number of arguments.
(W deprecated) Use of an unescaped "{" immediately following a
\b
or \B
is now deprecated so as to reserve its use for Perl
itself in a future release. You can either precede the brace with a
backslash, or enclose it in square brackets; the latter is the way to go
if the pattern delimiters are {}
.
(W portable) Using bit vector sizes larger than 32 is non-portable.
(P) Perl detected an attempt to copy an internal value that is not copiable.
(W internal) A warning peculiar to VMS. While Perl was preparing to iterate over %ENV, it encountered a logical name or symbol definition which was too long, so it was truncated to the string shown.
(P) When starting a new thread or return values from a thread, Perl encountered an invalid data type.
(F) A subroutine invoked from an external package via call_sv() exited by calling exit.
(W prototype) You've called a function that has a prototype before the parser saw a definition or declaration for it, and Perl could not check that the call conforms to the prototype. You need to either add an early prototype declaration for the subroutine in question, or move the subroutine definition ahead of the call to get proper prototype checking. Alternatively, if you are certain that you're calling the function correctly, you may put an ampersand before the name to avoid the warning. See perlsub.
(F) An argument to pack("w",...) was too large to compress. The BER compressed integer format can only be used with positive integers, and you attempted to compress Infinity or a very large number (> 1e308). See pack.
(F) An argument to pack("w",...) was negative. The BER compressed integer format can only be used with positive integers. See pack.
(F) You manipulated Perl's symbol table directly, stored a reference in it, then tried to access that symbol via conventional Perl syntax. The access triggers Perl to autovivify that typeglob, but it there is no legal conversion from that type of reference to a typeglob.
(P) Perl detected an attempt to copy a value to an internal type that cannot be directly assigned to.
(S io) You tried to apply an encoding that did not exist to a filehandle, either with open() or binmode().
(F) caller tried to set @DB::args
, but found it tied. Tying @DB::args
is not supported. (Before this error was added, it used to crash.)
(P) You somehow managed to call tie on an array that does not
keep a reference count on its arguments and cannot be made to
do so. Such arrays are not even supposed to be accessible to
Perl code, but are only used internally.
(F) An argument to pack("w",...) was not an integer. The BER compressed integer format can only be used with positive integers, and you attempted to compress something else. See pack.
(F) Only hard references may be blessed. This is how Perl "enforces" encapsulation of objects. See perlobj.
(F) You called break
, but you're in a foreach
block rather than
a given
block. You probably meant to use next or last.
(F) You called break
, but you're not inside a given
block.
(F) You used the syntax of a method call, but the slot filled by the object reference or package name contains an undefined value. Something like this will reproduce the error:
- $BADREF = undef;
- process $BADREF 1,2,3;
- $BADREF->process(1,2,3);
(F) A method call must know in what package it's supposed to run. It ordinarily finds this out from the object reference you supply, but you didn't supply an object reference in this case. A reference isn't an object reference until it has been blessed. See perlobj.
(F) You used the syntax of a method call, but the slot filled by the object reference or package name contains an expression that returns a defined value which is neither an object reference nor a package name. Something like this will reproduce the error:
- $BADREF = 42;
- process $BADREF 1,2,3;
- $BADREF->process(1,2,3);
(F) You called perl -x/foo/bar, but /foo/bar is not a directory
that you can chdir to, possibly because it doesn't exist.
(P) For some reason you can't check the filesystem of the script for nosuid.
(F) Certain types of SVs, in particular real symbol table entries (typeglobs), can't be forced to stop being what they are. So you can't say things like:
- *foo += 1;
You CAN say
- $foo = *foo;
- $foo += 1;
but then $foo no longer contains a glob.
(F) You called continue, but you're not inside a when
or default
block.
(P) An error peculiar to VMS. The process is suffering from exhausted quotas or other plumbing problems.
(F) Only scalar, array, and hash variables may be declared as "my", "our" or "state" variables. They must have ordinary identifiers as names.
(F) You have used a default
block that is neither inside a
foreach
loop nor a given
block. (Note that this error is
issued on exit from the default
block, so you won't get the
error if you use an explicit continue.)
(S inplace) You tried to use the -i switch on a special file, such as a file in /dev, a FIFO or an uneditable directory. The file was ignored.
(S inplace) The creation of the new file failed for the indicated reason.
(F) You're on a system such as MS-DOS that gets confused if you try
reading from a deleted (but still opened) file. You have to say
-i.bak
, or some such.
(S inplace) Your filesystem does not support filenames longer than 14 characters and Perl was unable to create a unique filename during inplace editing with the -i switch. The file was ignored.
(F) This machine doesn't have either waitpid() or wait4(), so only waitpid() without flags is emulated.
(F) The #! line specifies a switch that doesn't make sense at this point. For example, it'd be kind of silly to put a -x on the #! line.
(F) Your platform's byte-order is neither big-endian nor little-endian, or it has a very strange pointer size. Packing and unpacking big- or little-endian floating point values and pointers may not be possible. See pack.
(W exec) A system(), exec(), or piped open call could not execute the
named program for the indicated reason. Typical reasons include: the
permissions were wrong on the file, the file wasn't found in
$ENV{PATH}
, the executable in question was compiled for another
architecture, or the #! line in a script points to an interpreter that
can't be run for similar reasons. (Or maybe your system doesn't support
#! at all.)
(F) Perl was trying to execute the indicated program for you because that's what the #! line said. If that's not what you wanted, you may need to mention "perl" on the #! line somewhere.
(F) You used the -S switch, but the copies of the script to execute found in the PATH did not have correct permissions.
(F) A string of a form CORE::word
was given to prototype(), but there
is no builtin with the name word
.
(F) You used \p{}
or \P{}
but the character property by that name
could not be found. Maybe you misspelled the name of the property?
See Properties accessible through \p{} and \P{} in perluniprops
for a complete list of available official properties.
(F) You said to goto a label that isn't mentioned anywhere that it's possible for us to go to. See goto.
(F) You used the -S switch, but the script to execute could not be found in the PATH.
(F) You used the -S switch, but the script to execute could not be found in the PATH, or at least not with the correct permissions. The script exists in the current directory, but PATH prohibits running it.
(F) Perl strings can stretch over multiple lines. This message means that the closing delimiter was omitted. Because bracketed quotes count nesting levels, the following is missing its final parenthesis:
- print q(The character '(' starts a side comment.);
If you're getting this error from a here-document, you may have included unseen whitespace before or after your closing tag or there may not be a linebreak after it. A good programmer's editor will have a way to help you find these characters (or lack of characters). See perlop for the full details on here-documents.
(F) You may have tried to use \p
which means a Unicode
property (for example \p{Lu}
matches all uppercase
letters). If you did mean to use a Unicode property, see
Properties accessible through \p{} and \P{} in perluniprops
for a complete list of available properties. If you didn't
mean to use a Unicode property, escape the \p
, either by
\\p
(just the \p
) or by \Q\p
(the rest of the string, or
until \E
).
(F) A fatal error occurred while trying to fork while opening a pipeline.
(W pipe) A fork in a piped open failed with EAGAIN and will be retried after five seconds.
(S) A warning peculiar to VMS. This arises because of the difference
between access checks under VMS and under the Unix model Perl assumes.
Under VMS, access checks are done by filename, rather than by bits in
the stat buffer, so that ACLs and other protections can be taken into
account. Unfortunately, Perl assumes that the stat buffer contains all
the necessary information, and passes it, instead of the filespec, to
the access-checking routine. It will try to retrieve the filespec using
the device name and FID present in the stat buffer, but this works only
if you haven't made a subsequent call to the CRTL stat() routine,
because the device name is overwritten with each call. If this warning
appears, the name lookup failed, and the access-checking routine gave up
and returned FALSE, just to be conservative. (Note: The access-checking
routine knows about the Perl stat operator and file tests, so you
shouldn't ever see this warning in response to a Perl command; it arises
only if some internal code takes stat buffers lightly.)
(P) An error peculiar to VMS. After creating a mailbox to act as a pipe, Perl can't retrieve its name for later use.
(P) An error peculiar to VMS. Perl asked $GETSYI how big you want your mailbox buffers to be, and didn't get an answer.
(F) A "goto" statement was executed to jump into the middle of a foreach loop. You can't get there from here. See goto.
(F) A "goto" statement was executed to jump out of what might look like a block, except that it isn't a proper block. This usually occurs if you tried to jump out of a sort() block or subroutine, which is a no-no. See goto.
(F) The "goto subroutine" call can't be used to jump out of the comparison sub for a sort(), or from a similar callback (such as the reduce() function in List::Util).
(F) The "goto subroutine" call can't be used to jump out of an eval "string" or block.
(F) The deeply magical "goto subroutine" call can only replace one subroutine call for another. It can't manufacture one out of whole cloth. In general you should be calling it out of only an AUTOLOAD routine anyway. See goto.
(W signal) Perl has detected that it is being run with the SIGCHLD signal (sometimes known as SIGCLD) disabled. Since disabling this signal will interfere with proper determination of exit status of child processes, Perl has reset the signal to its default value. This situation typically indicates that the parent program under which Perl may be running (e.g. cron) is being very careless.
(F) Process identifiers must be (signed) integers. It is a fatal error to attempt to kill() an undefined, empty-string or otherwise non-numeric process identifier.
(F) A "last" statement was executed to break out of the current block, except that there's this itty bitty problem called there isn't a current block. Note that an "if" or "else" block doesn't count as a "loopish" block, as doesn't a block given to sort(), map() or grep(). You can usually double the curlies to get the same effect though, because the inner curlies will be considered a block that loops once. See last.
(F) Perl tried to calculate the method resolution order (MRO) of a package, but failed because the package stash has no name.
(F) The module you tried to load failed to load a dynamic extension. This may either mean that you upgraded your version of perl to one that is incompatible with your old dynamic extensions (which is known to happen between major versions of perl), or (more likely) that your dynamic extension was built against an older version of the library that is installed on your system. You may need to rebuild your old dynamic extensions.
(F) You used local on a variable name that was previously declared as a lexical variable using "my" or "state". This is not allowed. If you want to localize a package variable of the same name, qualify it with the package name.
(F) You said something like local $$ref
, which Perl can't currently
handle, because when it goes to restore the old value of whatever $ref
pointed to after the scope of the local() is finished, it can't be sure
that $ref will still be a reference.
(F) You said to do (or require, or use) a file that couldn't be found.
Perl looks for the file in all the locations mentioned in @INC, unless
the file name included the full path to the file. Perhaps you need
to set the PERL5LIB or PERL5OPT environment variable to say where the
extra library is, or maybe the script needs to add the library name
to @INC. Or maybe you just misspelled the name of the file. See
require and lib.
(F) A function (or method) was called in a package which allows
autoload, but there is no function to autoload. Most probable causes
are a misprint in a function/method name or a failure to AutoSplit
the file, say, by doing make install
.
(F) The module you loaded is trying to load an external library, like for example, foo.so or bar.dll, but the DynaLoader module was unable to locate this library. See DynaLoader.
(F) You called a method correctly, and it correctly indicated a package functioning as a class, but that package doesn't define that particular method, nor does any of its base classes. See perlobj.
(W syntax) The @ISA array contained the name of another package that doesn't seem to exist.
(F) You tried to use in open() a PerlIO layer that does not exist, e.g. open(FH, ">:nosuchlayer", "somefile").
(F) List assignment to %ENV is not supported on some systems, notably VMS.
(W) A module passed the flag 0x01 to DynaLoader::dl_load_file() to request that symbols from the stated file are made available globally within the process, but that functionality is not available on this platform. Whilst the module likely will still work, this may prevent the perl interpreter from loading other XS-based extensions which need to link directly to functions defined in the C or XS code in the stated file.
(F) You aren't allowed to assign to the item indicated, or otherwise try to change it, such as with an auto-increment.
(P) The internal routine that does assignment to a substr() was handed a NULL.
(F) Subroutines meant to be used in lvalue context should be declared as such. See Lvalue subroutines in perlsub.
(F) The target of a msgrcv must be modifiable to be used as a receive buffer.
(F) A "next" statement was executed to reiterate the current block, but there isn't a current block. Note that an "if" or "else" block doesn't count as a "loopish" block, as doesn't a block given to sort(), map() or grep(). You can usually double the curlies to get the same effect though, because the inner curlies will be considered a block that loops once. See next.
(F) You tried to run a perl built with MAD support with the PERL_XMLDUMP environment variable set, but the file named by that variable could not be opened.
(S inplace) The implicit opening of a file through use of the <>
filehandle, either implicitly under the -n
or -p
command-line
switches, or explicitly, failed for the indicated reason. Usually
this is because you don't have read permission for a file which
you named on the command line.
(F) You tried to call perl with the -e switch, but /dev/null (or your operating system's equivalent) could not be opened.
(W io) You tried to open a scalar reference for reading or writing, using the 3-arg open() syntax:
- open FH, '>', $ref;
but your version of perl is compiled without perlio, and this form of open is not supported.
(W pipe) You tried to say open(CMD, "|cmd|")
, which is not supported.
You can try any of several modules in the Perl library to do this, such
as IPC::Open2. Alternately, direct the pipe's output to a file using
">", and then read it in under a different file handle.
(F) An error peculiar to VMS. Perl does its own command line redirection, and couldn't open the file specified after '2>' or '2>>' on the command line for writing.
(F) An error peculiar to VMS. Perl does its own command line redirection, and couldn't open the file specified after '<' on the command line for reading.
(F) An error peculiar to VMS. Perl does its own command line redirection, and couldn't open the file specified after '>' or '>>' on the command line for writing.
(P) An error peculiar to VMS. Perl does its own command line redirection, and couldn't open the pipe into which to send data destined for stdout.
(F) The script you specified can't be opened for the indicated reason.
If you're debugging a script that uses #!, and normally relies on the
shell's $PATH search, the -S option causes perl to do that search, so
you don't have to type the path or `which $scriptname`
.
(S) A warning peculiar to VMS. Perl tried to read an element of %ENV from the CRTL's internal environment array and discovered the array was missing. You need to figure out where your CRTL misplaced its environ or define PERL_ENV_TABLES (see perlvms) so that environ is not searched.
(F) A "redo" statement was executed to restart the current block, but there isn't a current block. Note that an "if" or "else" block doesn't count as a "loopish" block, as doesn't a block given to sort(), map() or grep(). You can usually double the curlies to get the same effect though, because the inner curlies will be considered a block that loops once. See redo.
(S inplace) You requested an inplace edit without creating a backup file. Perl was unable to remove the original file to replace it with the modified file. The file was left unmodified.
(S inplace) The rename done by the -i switch failed for some reason, probably because you don't have write permission to the directory.
(P) An error peculiar to VMS. Perl thought stdin was a pipe, and tried to reopen it to accept binary data. Alas, it failed.
(F) You called reset('E') or similar, which tried to reset
all variables in the current package beginning with "E". In
the main package, that includes %ENV. Resetting %ENV is not
supported on some systems, notably VMS.
(F)(P) Error resolving overloading specified by a method name (as
opposed to a subroutine reference): no such method callable via the
package. If the method name is ???, this is an internal error.
(F) Perl detected an attempt to return illegal lvalues (such as temporary or readonly values) from a subroutine used as an lvalue. This is not allowed.
(F) The return statement was executed in mainline code, that is, where there was no subroutine call to return out of. See perlsub.
(F) You tried to return a complete array or hash from an lvalue subroutine, but you called the subroutine in a way that made Perl think you meant to return only one value. You probably meant to write parentheses around the call to the subroutine, which tell Perl that the call should be in list context.
(P) For some reason you can't fstat() the script even though you have it open already. Bizarre.
(F) For ordinary real numbers, you can't take the logarithm of a negative number or zero. There's a Math::Complex package that comes standard with Perl, though, if you really want to do that for the negative numbers.
(F) For ordinary real numbers, you can't take the square root of a negative number. There's a Math::Complex package that comes standard with Perl, though, if you really want to do that.
(F) You can't undefine a routine that's currently running. You can, however, redefine it while it's running, and you can even undef the redefined subroutine while the old routine is running. Go figure.
(P) The internal sv_upgrade routine adds "members" to an SV, making it into a more specialized kind of SV. The top several SV types are so specialized, however, that they cannot be interconverted. This message indicates that such a conversion was attempted.
(F) You tried to call perl with the -m switch, but you put something other than "=" after the module name.
(F) The internal routine that does method lookup was handed a symbol
table that doesn't have a name. Symbol tables can become anonymous
for example by undefining stashes: undef %Some::Package::
.
(F) A value used as either a hard reference or a symbolic reference must be a defined value. This helps to delurk some insidious errors.
(F) Only hard references are allowed by "strict refs". Symbolic references are disallowed. See perlref.
(F) The first time the %!
hash is used, perl automatically loads the
Errno.pm module. The Errno module is expected to tie the %! hash to
provide symbolic names for $!
errno values.
(F) A type cannot be forced to have both big-endian and little-endian byte-order at the same time, so this combination of modifiers is not allowed. See pack.
(F) Only a simple scalar variable may be used as a loop variable on a foreach.
(F) You tried to declare a magical variable as a lexical variable. This is not allowed, because the magic can be tied to only one location (namely the global variable) and it would be incredibly confusing to have variables in your program that looked like magical variables but weren't.
(F) You attempted to force a different byte-order on a type that is already inside a group with a byte-order modifier. For example you cannot force little-endianness on a type that is inside a big-endian group.
(F) The global variables $a and $b are reserved for sort comparisons. You mentioned $a or $b in the same line as the <=> or cmp operator, and the variable had earlier been declared as a lexical variable. Either qualify the sort variable with the package name, or rename the lexical variable.
(F) You've mixed up your reference types. You have to dereference a reference of the type needed. You can use the ref() function to test the type of the reference, if need be.
(F) You've told Perl to dereference a string, something which
use strict
blocks to prevent it happening accidentally. See
Symbolic references in perlref. This can be triggered by an @
or $
in a double-quoted string immediately before interpolating a variable,
for example in "user @$twitter_id"
, which says to treat the contents
of $twitter_id
as an array reference; use a \
to have a literal @
symbol followed by the contents of $twitter_id
: "user \@$twitter_id"
.
(F) The compiler tried to interpret a bracketed expression as a subscript. But to the left of the brackets was an expression that didn't look like a hash or array reference, or anything else subscriptable.
(W syntax) In an ordinary expression, backslash is a unary operator that creates a reference to its argument. The use of backslash to indicate a backreference to a matched substring is valid only as part of a regular expression pattern. Trying to do this in ordinary Perl code produces a value that prints out looking like SCALAR(0xdecaf). Use the $1 form instead.
(F) You attempted to weaken something that was not a reference. Only references can be weakened.
(F) You have used a when() block that is neither inside a foreach
loop nor a given
block. (Note that this error is issued on exit
from the when
block, so you won't get the error if the match fails,
or if you use an explicit continue.)
(F) You tried to repeat a constant value (often the undefined value) with an assignment operator, which implies modifying the value itself. Perhaps you need to copy the value to a temporary, and repeat that.
(F)(W deprecated, syntax) In \cX, X must be an ASCII character.
It is planned to make this fatal in all instances in Perl v5.20. In the
cases where it isn't fatal, the character this evaluates to is
derived by exclusive or'ing the code point of this character with 0x40.
Note that non-alphabetic ASCII characters are discouraged here as well, and using non-printable ones will be deprecated starting in v5.18.
(W pack) You said
- pack("C", $x)
where $x is either less than 0 or more than 255; the "C"
format is
only for encoding native operating system characters (ASCII, EBCDIC,
and so on) and not for Unicode characters, so Perl behaved as if you meant
- pack("C", $x & 255)
If you actually want to pack Unicode codepoints, use the "U"
format
instead.
(W pack) You said
- pack("U0W", $x)
where $x is either less than 0 or more than 255. However, U0
-mode
expects all values to fall in the interval [0, 255], so Perl behaved
as if you meant:
- pack("U0W", $x & 255)
(W pack) You said
- pack("c", $x)
where $x is either less than -128 or more than 127; the "c"
format
is only for encoding native operating system characters (ASCII, EBCDIC,
and so on) and not for Unicode characters, so Perl behaved as if you meant
- pack("c", $x & 255);
If you actually want to pack Unicode codepoints, use the "U"
format
instead.
(W unpack) You tried something like
- unpack("H", "\x{2a1}")
where the format expects to process a byte (a character with a value below 256), but a higher value was provided instead. Perl uses the value modulus 256 instead, as if you had provided:
- unpack("H", "\x{a1}")
(W pack) You tried something like
- pack("u", "\x{1f3}b")
where the format expects to process a sequence of bytes (character with a value below 256), but some of the characters had a higher value. Perl uses the character values modulus 256 instead, as if you had provided:
- pack("u", "\x{f3}b")
(W unpack) You tried something like
- unpack("s", "\x{1f3}b")
where the format expects to process a sequence of bytes (character with a value below 256), but some of the characters had a higher value. Perl uses the character values modulus 256 instead, as if you had provided:
- unpack("s", "\x{f3}b")
(D deprecated, syntax) The \cX construct is intended to be a way
to specify non-printable characters. You used it with a "{" which
evaluates to ";", which is printable. It is planned to remove the
ability to specify a semi-colon this way in Perl 5.20. Just use a
semi-colon or a backslash-semi-colon without the "\c".
(W syntax) The \cX construct is intended to be a way to specify
non-printable characters. You used it for a printable one, which is better
written as simply itself, perhaps preceded by a backslash for non-word
characters.
(F) Creating a new thread inside the s/// operator is not supported.
(W unopened) You tried to close a filehandle that was never opened.
(W io) The dirhandle you tried to close is either closed or not really a dirhandle. Check your control flow.
(F) If a closure has attributes, the subroutine passed to an attribute handler is the prototype that is cloned when a new closure is created. This subroutine cannot be called.
(F) You had a (sub-)template that ends with a '/'. There must be another template code following the slash. See pack.
(S utf8, non_unicode) You had a code point above the Unicode maximum of U+10FFFF.
Perl allows strings to contain a superset of Unicode code points, up to the limit of what is storable in an unsigned integer on your system, but these may not be accepted by other languages/systems. At one time, it was legal in some standards to have code points up to 0x7FFF_FFFF, but not higher. Code points above 0xFFFF_FFFF require larger than a 32 bit word.
None of the Unicode or Perl-defined properties will match a non-Unicode code point. For example,
- chr(0x7FF_FFFF) =~ /\p{Any}/
will not match, because the code point is not in Unicode. But
- chr(0x7FF_FFFF) =~ /\P{Any}/
will match.
This may be counterintuitive at times, as both these fail:
and both these succeed:
(A) You've accidentally run your script through csh or another shell shell instead of Perl. Check the #! line, or manually feed your script into Perl yourself. The #! line at the top of your file could look like
- #!/usr/bin/perl -w
(F) Perl could not compile a file specified in a require statement.
Perl uses this generic message when none of the errors that it
encountered were severe enough to halt compilation immediately.
(W regexp) The regular expression engine uses recursion in complex
situations where back-tracking is required. Recursion depth is limited
to 32766, or perhaps less in architectures where the stack cannot grow
arbitrarily. ("Simple" and "medium" situations are handled without
recursion and are not subject to a limit.) Try shortening the string
under examination; looping in Perl code (e.g. with while
) rather than
in the regular expression engine; or rewriting the regular expression so
that it is simpler or backtracks less. (See perlfaq2 for information
on Mastering Regular Expressions.)
(W threads) Within a thread-enabled program, you tried to call cond_broadcast() on a variable which wasn't locked. The cond_broadcast() function is used to wake up another thread that is waiting in a cond_wait(). To ensure that the signal isn't sent before the other thread has a chance to enter the wait, it is usual for the signaling thread first to wait for a lock on variable. This lock attempt will only succeed after the other thread has entered cond_wait() and thus relinquished the lock.
(W threads) Within a thread-enabled program, you tried to call cond_signal() on a variable which wasn't locked. The cond_signal() function is used to wake up another thread that is waiting in a cond_wait(). To ensure that the signal isn't sent before the other thread has a chance to enter the wait, it is usual for the signaling thread first to wait for a lock on variable. This lock attempt will only succeed after the other thread has entered cond_wait() and thus relinquished the lock.
(W closed) You tried to do a connect on a closed socket. Did you forget to check the return value of your socket() call? See connect.
(F) The subroutine registered to handle constant overloading (see overload) or a custom charnames handler (see CUSTOM TRANSLATORS in charnames) returned an undefined value.
(F) The parser found inconsistencies while attempting to define an overloaded constant. Perhaps you forgot to load the corresponding overload pragma?.
(F) The parser found inconsistencies either while attempting to define
an overloaded constant, or when trying to find the character name
specified in the \N{...}
escape. Perhaps you forgot to load the
corresponding overload pragma?.
(F) A constant value (perhaps declared using the use constant
pragma)
is being dereferenced, but it amounts to the wrong type of reference.
The message indicates the type of reference that was expected. This
usually indicates a syntax error in dereferencing the constant value.
See Constant Functions in perlsub and constant.
(W redefine)(S) You redefined a subroutine which had previously been eligible for inlining. See Constant Functions in perlsub for commentary and workarounds.
(W misc) You undefined a subroutine which had previously been eligible for inlining. See Constant Functions in perlsub for commentary and workarounds.
(F) The method which overloads "=" is buggy. See Copy Constructor in overload.
(F) You tried to call a subroutine in the CORE::
namespace
with &foo
syntax or through a reference. Some subroutines
in this package cannot yet be called that way, but must be
called as barewords. Something like this will work:
- BEGIN { *shove = \&CORE::push; }
- shove @array, 1,2,3; # pushes on to @array
(F) The CORE:: namespace is reserved for Perl keywords.
(P) The regular expression engine got confused by what the regular expression compiler gave it.
(P) The regular expression engine got passed a regexp program without a valid magic number.
(P) The malloc package that comes with Perl had an internal failure.
(F) This is either an error in Perl, or, if you're using one, your custom regular expression engine. If not the latter, report the problem through the perlbug utility.
(F) You had an unpack template indicating a counted-length string, but you have also specified an explicit size for the string. See pack.
(W recursion) This subroutine has called itself (directly or indirectly) 100 times more than it has returned. This probably indicates an infinite recursion, unless you're writing strange benchmark programs, in which case it indicates something else.
This threshold can be changed from 100, by recompiling the perl binary,
setting the C pre-processor macro PERL_SUB_DEPTH_WARN
to the desired value.
(D deprecated) defined() is not usually useful on arrays because it
checks for an undefined scalar value. If you want to see if the
array is empty, just use if (@array) { # not empty } for example.
(D deprecated) defined() is not usually right on hashes and has been
discouraged since 5.004.
Although defined %hash
is false on a plain not-yet-used hash, it
becomes true in several non-obvious circumstances, including iterators,
weak references, stash names, even remaining true after undef %hash
.
These things make defined %hash
fairly useless in practice.
If a check for non-empty is what you wanted then just put it in boolean context (see Scalar values in perldata):
- if (%hash) {
- # not empty
- }
If you had defined %Foo::Bar::QUUX
to check whether such a package
variable exists then that's never really been reliable, and isn't
a good way to enquire about the features of a package, or whether
it's loaded, etc.
(F) You used something like (?(DEFINE)...|..) which is illegal. The
most likely cause of this error is that you left out a parenthesis inside
of the .... part.
The <-- HERE shows whereabouts in the regular expression the problem was discovered.
(F) You said something like "use Module 42" but in the Module file
there are neither package declarations nor a $VERSION
.
(F) In a here document construct like <<FOO, the label FOO
is too
long for Perl to handle. You have to be seriously twisted to write code
that triggers this error.
(D deprecated) You used a declaration similar to my $x if 0
. There
has been a long-standing bug in Perl that causes a lexical variable
not to be cleared at scope exit when its declaration includes a false
conditional. Some people have exploited this bug to achieve a kind of
static variable. Since we intend to fix this bug, we don't want people
relying on this behavior. You can achieve a similar static effect by
declaring the variable in a separate block outside the function, eg
becomes
Beginning with perl 5.9.4, you can also use state variables to have
lexicals that are initialized only once (see feature):
- sub f { state $x; return $x++ }
(F) A DESTROY() method created a new reference to the object which is just being DESTROYed. Perl is confused, and prefers to abort rather than to create a dangling reference.
See Server error.
(F) A required (or used) file must return a true value to indicate that it compiled correctly and ran its initialization code correctly. It's traditional to end such a file with a "1;", though any true value would do. See require.
(W misc) You probably referred to an imported subroutine &FOO as $FOO or some such.
(W misc) Remember that "our" does not localize the declared global variable. You have declared it again in the same lexical scope, which seems superfluous.
(W) You probably said %hash{$key} when you meant $hash{$key} or @hash{@keys}. On the other hand, maybe you just meant %hash and got carried away.
(F) You passed die() an empty string (the equivalent of die ""
) or
you called it with no args and $@
was empty.
See Server error.
(F) You said something like "use Module 42" but the Module did not
define a $VERSION
.
(F) You cannot put a repeat count of any kind right after the '/' code. See pack.
(P) The internal handling of magical variables has been cursed.
(P) This should have been caught by safemalloc() instead.
(S syntax) This is an educated guess made in conjunction with the message "%s found where operator expected". It often means a subroutine or module name is being referenced that hasn't been declared yet. This may be because of ordering problems in your file, or because of a missing "sub", "package", "require", or "use" statement. If you're referencing something that isn't defined yet, you don't actually have to define the subroutine or package before the current location. You can use an empty "sub foo;" or "package FOO;" to enter a "forward" declaration.
(W misc) You used the obsolescent dump() built-in function, without fully
qualifying it as CORE::dump()
. Maybe it's a typo. See dump.
(F) Your machine doesn't support dump/undump.
(S malloc) An internal routine called free() on something that had already been freed.
(W unpack) You have applied the same modifier more than once after a type in a pack template. See pack.
(S syntax) There is no keyword "elseif" in Perl because Larry thinks it's ugly. Your code will be interpreted as an attempt to call a method named "elseif" for the class returned by the following block. This is unlikely to be what you want.
(F) \p
and \P
are used to introduce a named Unicode property, as
described in perlunicode and perlre. You used \p
or \P
in
a regular expression without specifying the property name.
(F) While under the use filetest
pragma, switching the real and
effective uids or gids failed.
(F) You're running under taint mode, and the %ENV
variable has been
aliased to another hash, so it doesn't reflect anymore the state of the
program's environment. This is potentially insecure.
(F) An error peculiar to VMS. Because Perl may have to deal with file specifications in either VMS or Unix syntax, it converts them to a single form when it must operate on them directly. Either you've passed an invalid file specification to Perl, or you've found a case the conversion routines don't handle. Drat.
(D deprecated) You compiled a regular expression pattern with /x to
ignore white space, and you used, as a literal, one of the characters
that Perl plans to eventually treat as white space. The character must
be escaped somehow, or it will work differently on a future Perl that
does treat it as white space. The easiest way is to insert a backslash
immediately before it, or to enclose it with square brackets. This
change is to bring Perl into conformance with Unicode recommendations.
Here are the five characters that generate this warning:
U+0085 NEXT LINE,
U+200E LEFT-TO-RIGHT MARK,
U+200F RIGHT-TO-LEFT MARK,
U+2028 LINE SEPARATOR,
and
U+2029 PARAGRAPH SEPARATOR.
(F) Perl detected tainted data when trying to compile a regular
expression that contains the (?{ ... }) zero-width assertion, which
is unsafe. See (?{ code }) in perlre, and perlsec.
(F) Perl tried to compile a regular expression containing the
(?{ ... }) zero-width assertion at run time, as it would when the
pattern contains interpolated values. Since that is a security risk,
it is not allowed. If you insist, you may still do this by using the
re 'eval'
pragma or by explicitly building the pattern from an
interpolated string at run time and using that in an eval(). See
(?{ code }) in perlre.
(F) A regular expression contained the (?{ ... }) zero-width
assertion, but that construct is only allowed when the use re 'eval'
pragma is in effect. See (?{ code }) in perlre.
(F) You used a pattern that nested too many EVAL calls without consuming any text. Restructure the pattern so that text is consumed.
The <-- HERE shows whereabouts in the regular expression the problem was discovered.
(F) The contents of a <> operator may not exceed the maximum size of a Perl identifier. If you're just trying to glob a long list of filenames, try using the glob() operator, or put the filenames into a variable and glob that.
(F) The exec function is not implemented on some systems, e.g., Symbian
OS. See perlport.
(F) The final summary message when a Perl compilation fails.
(W exiting) You are exiting an eval by unconventional means, such as a goto, or a loop control statement.
(W exiting) You are exiting a format by unconventional means, such as a goto, or a loop control statement.
(W exiting) You are exiting a rather special block construct (like a sort block or subroutine) by unconventional means, such as a goto, or a loop control statement. See sort.
(W exiting) You are exiting a subroutine by unconventional means, such as a goto, or a loop control statement.
(W exiting) You are exiting a substitution by unconventional means, such as a return, a goto, or a loop control statement.
(F) You wrote something like
- (?13
to denote a capturing group of the form
(?PARNO),
but omitted the ")"
.
(F) To use lexical subs, you must first enable them:
(W misc) You are blessing a reference to a zero length string. This has the effect of blessing the reference into the package main. This is usually not what you want. Consider providing a default target package, e.g. bless($ref, $p || 'MyPackage');
(A) You've accidentally run your script through csh instead of Perl. Check the #! line, or manually feed your script into Perl yourself.
(F) An untrapped exception was raised while executing a UNITCHECK, CHECK, INIT, or END subroutine. Processing of the remainder of the queue of such routines has been prematurely ended.
(W regexp) A character class range must start and end at a literal
character, not another character class like \d
or [:alpha:]. The "-"
in your false range is interpreted as a literal "-". Consider quoting the
"-", "\-". The <-- HERE shows whereabouts in the regular expression the
problem was discovered. See perlre.
(P) An error peculiar to VMS. Something untoward happened in a VMS system service or RTL routine; Perl's exit status should provide more details. The filename in "at %s" and the line number in "line %d" tell you which section of the Perl source code is distressed.
(F) Your machine apparently doesn't implement fcntl(). What is this, a PDP-11 or something?
(F) A tied array claimed to have a negative number of elements, which is not possible.
(W pack) Each line in an uuencoded string starts with a length indicator
which can't encode values above 63. So there is no point in asking for
a line length bigger than that. Perl will behave as if you specified
u63
as the format.
(W io) You tried to write on a read-only filehandle. If you intended it to be a read-write filehandle, you needed to open it with "+<" or "+>" or "+>>" instead of with "<" or nothing. If you intended only to write the file, use ">" or ">>". See open.
(W io) You tried to read from a filehandle opened only for writing, If you intended it to be a read/write filehandle, you needed to open it with "+<" or "+>" or "+>>" instead of with ">". If you intended only to read from the file, use "<". See open. Another possibility is that you attempted to open filedescriptor 0 (also known as STDIN) for output (maybe you closed STDIN earlier?).
(W io) You opened for reading a filehandle that got the same filehandle id as STDOUT or STDERR. This occurred because you closed STDOUT or STDERR previously.
(W io) You opened for writing a filehandle that got the same filehandle id as STDIN. This occurred because you closed STDIN previously.
(F) You must now decide whether the final $ in a string was meant to be a literal dollar sign, or was meant to introduce a variable name that happens to be missing. So you have to put either the backslash or the name.
(W closed) The filehandle you're attempting to flock() got itself closed some time before now. Check your control flow. flock() operates on filehandles. Are you attempting to call flock() on a dirhandle by the same name?
(F) A format must be terminated by a line with a solitary dot. Perl got to the end of your file without finding such a line.
(W redefine) You redefined a format. To suppress this warning, say
(W syntax) You said
- if ($foo = 123)
when you meant
- if ($foo == 123)
(or something like that).
(S syntax) The Perl lexer knows whether to expect a term or an operator. If it sees what it knows to be a term when it was expecting to see an operator, it gives you this warning. Usually it indicates that an operator or delimiter was omitted, such as a semicolon.
(S) A warning from the GDBM_File extension that a store failed.
(F) Your C library apparently doesn't implement gethostent(), probably because if it did, it'd feel morally obligated to return every hostname on the Internet.
(W closed) You tried to get a socket or peer socket name on a closed socket. Did you forget to check the return value of your socket() call?
(S) A warning peculiar to VMS. The call to sys$getuai
underlying the
getpwnam operator returned an invalid UIC.
(W closed) You tried to get a socket option on a closed socket. Did you forget to check the return value of your socket() call? See getsockopt.
(S experimental::smartmatch) given
depends on both a lexical $_
and
smartmatch, both of which are experimental, so its behavior may change or
even be removed in any future release of perl.
See the explanation under Experimental Details on given and when in perlsyn.
(F) You've said "use strict" or "use strict vars", which indicates that all variables must either be lexically scoped (using "my" or "state"), declared beforehand using "our", or explicitly qualified to say which package the global variable is in (using "::").
(S glob) Something went wrong with the external program(s) used
for glob and <*.c>
. Usually, this means that you supplied a glob
pattern that caused the external program to fail and exit with a
nonzero status. If the message indicates that the abnormal exit
resulted in a coredump, this may also mean that your csh (C shell)
is broken. If so, you should change all of the csh-related variables
in config.sh: If you have tcsh, make the variables refer to it as
if it were csh (e.g. full_csh='/usr/bin/tcsh'
); otherwise, make them
all empty (except that d_csh
should be 'undef'
) so that Perl will
think csh is missing. In either case, after editing config.sh, run
./Configure -S and rebuild Perl.
(F) The lexer saw a left angle bracket in a place where it was expecting a term, so it's looking for the corresponding right angle bracket, and not finding it. Chances are you left some needed parentheses out earlier in the line, and you really meant a "less than".
(W overflow) You called gmtime with a number that was larger than
it can reliably handle and gmtime probably returned the wrong
date. This warning is also triggered with NaN (the special
not-a-number value).
(W overflow) You called gmtime with a number that was smaller than
it can reliably handle and gmtime probably returned the wrong date.
(P) An error peculiar to OS/2. Most probably you're using an obsolete version of Perl, and this should not happen anyway.
(F) Unlike with "next" or "last", you're not allowed to goto an unspecified destination. See goto.
(F) You tried to call a subroutine with goto &sub
syntax, but
the indicated subroutine hasn't been defined, or if it was, it
has since been undefined.
(F) A ()-group started with a count. A count is supposed to follow something: a template character or a ()-group. See pack.
(F) Group names must follow the rules for perl identifiers, meaning they must start with a non-digit word character. A common cause of this error is using (?&0) instead of (?0). See perlre.
(F) The final summary message when a perl -c
fails.
(S internal) A routine asked for a symbol from a symbol table that ought to have existed already, but for some reason it didn't, and had to be created on an emergency basis to prevent a core dump.
(D deprecated) Really old Perl let you omit the % on hash names in some spots. This is now heavily deprecated.
(F) The parser has given up trying to parse the program after 10 errors. Further error messages would likely be uninformative.
(W portable) The hexadecimal number you specified is larger than 2**32-1 (4294967295) and therefore non-portable between systems. See perlport for more on portability concerns.
(S inplace) The -i
option was passed on the command line, indicating
that the script is intended to edit files inplace, but no files were
given. This is usually a mistake, since editing STDIN inplace doesn't
make sense, and can be confusing because it can make perl look like
it is hanging when it is really just trying to read from STDIN. You
should either pass a filename to edit, or remove -i
from the command
line. See perlrun for more details.
(F) Perl limits identifiers (names for variables, functions, etc.) to
about 250 characters for simple names, and somewhat more for compound
names (like $A::B
). You've exceeded Perl's limits. Future versions
of Perl are likely to eliminate these arbitrary limitations.
(W regexp) Named Unicode character escapes (\N{...})
may return a zero-length
sequence. When such an escape is used in a character class its
behaviour is not well defined. Check that the correct escape has
been used, and the correct charname handler is in scope.
(F) You used a digit other than 0 or 1 in a binary number.
(W digit) You may have tried to use a digit other than 0 or 1 in a binary number. Interpretation of the binary number stopped before the offending digit.
(W illegalproto) An illegal character was found in a prototype declaration. Legal characters in prototypes are $, @, %, *, ;, [, ], &, \, and +.
(F) Perl normally treats carriage returns in the program text as it would any other whitespace, which means you should never see this error when Perl was built using standard options. For some reason, your version of Perl appears to have been built without this support. Talk to your Perl administrator.
(W illegalproto) An illegal character was found in a prototype declaration. Legal characters in prototypes are $, @, %, *, ;, [, ], &, \, and +.
(F) When using the sub keyword to construct an anonymous subroutine,
you must always specify a block of code. See perlsub.
(F) A subroutine was not declared correctly. See perlsub.
(F) You tried to divide a number by 0. Either something was wrong in your logic, or you need to put a conditional in to guard against meaningless input.
(W digit) You may have tried to use a character other than 0 - 9 or A - F, a - f in a hexadecimal number. Interpretation of the hexadecimal number stopped before the illegal character.
(F) You tried to divide a number by 0 to get the remainder. Most numbers don't take to this kindly.
(F) The number of bits in vec() (the third argument) must be a power of two from 1 to 32 (or 64, if your platform supports that).
(F) You used an 8 or 9 in an octal number.
(W digit) You may have tried to use an 8 or 9 in an octal number. Interpretation of the octal number stopped before the 8 or 9.
(F) You wrote something like
- (?+foo)
The "+"
is valid only when followed by digits, indicating a
capturing group. See
(?PARNO).
(X) The PERL5OPT environment variable may only be used to set the following switches: -[CDIMUdmtw].
(W internal) A warning peculiar to VMS. Perl tried to read the CRTL's
internal environ array, and encountered an element without the =
delimiter used to separate keys from values. The element is ignored.
(W internal) A warning peculiar to VMS. Perl tried to read a logical name or CLI symbol definition when preparing to iterate over %ENV, and didn't see the expected delimiter between key and value, so the line was ignored.
(W misc) This prefix usually indicates that a DESTROY() method raised the indicated exception. Since destructors are usually called by the system at arbitrary points during execution, and often a vast number of times, the warning is issued only once for any number of failures that would otherwise result in the same message being repeated.
Failure of user callbacks dispatched using the G_KEEPERR
flag could
also result in this warning. See G_KEEPERR in perlcall.
(D regexp, deprecated)
The two-character sequence "(*"
in this context in a regular
expression pattern should be an indivisible token, with nothing
intervening between the "("
and the "*"
, but you separated them.
Due to an accident of implementation, this prohibition was not enforced,
but we do plan to forbid it in a future Perl version. This message
serves as giving you fair warning of this pending change.
(D regexp, deprecated)
The two-character sequence "(?"
in this context in a regular
expression pattern should be an indivisible token, with nothing
intervening between the "("
and the "?"
, but you separated them.
Due to an accident of implementation, this prohibition was not enforced,
but we do plan to forbid it in a future Perl version. This message
serves as giving you fair warning of this pending change.
(F)
There was a syntax error within the (?[ ])
. This can happen if the
expression inside the construct was completely empty, or if there are
too many or few operands for the number of operators. Perl is not smart
enough to give you a more precise indication as to what is wrong.
(F) The method resolution order (MRO) of the given class is not C3-consistent, and you have enabled the C3 MRO for this class. See the C3 documentation in mro for more information.
(F) An error peculiar to EBCDIC. Internally, v-strings are stored as Unicode code points, and encoded in EBCDIC as UTF-EBCDIC. The UTF-EBCDIC encoding is limited to code points no larger than 2147483647 (0x7FFFFFFF).
(F) You used a pattern that references itself without consuming any input text. You should check the pattern to ensure that recursive patterns either consume text or fail.
The <-- HERE shows whereabouts in the regular expression the problem was discovered.
(F) Currently the implementation of "state" only permits the
initialization of scalar variables in scalar context. Re-write
state ($a) = 42
as state $a = 42
to change from list to scalar
context. Constructions such as state (@a) = foo()
will be
supported in a future perl release.
(F) You tried to do something that the tainting mechanism didn't like. The tainting mechanism is turned on when you're running setuid or setgid, or when you specify -T to turn it on explicitly. The tainting mechanism labels all data that's derived directly or indirectly from the user, who is considered to be unworthy of your trust. If any such data is used in a "dangerous" operation, you get this error. See perlsec for more information.
(F) You can't use system(), exec(), or a piped open in a setuid or
setgid script if $ENV{PATH}
contains a directory that is writable by
the world. Also, the PATH must not contain any relative directory.
See perlsec.
(F) You can't use system(), exec(), or a piped open in a setuid or
setgid script if any of $ENV{PATH}
, $ENV{IFS}
, $ENV{CDPATH}
,
$ENV{ENV}
, $ENV{BASH_ENV}
or $ENV{TERM}
are derived from data
supplied (or potentially supplied) by the user. The script must set
the path to a known value, using trustworthy data. See perlsec.
(F) Perl detected tainted data when trying to compile a regular
expression that contains a call to a user-defined character property
function, i.e. \p{IsFoo}
or \p{InFoo}
.
See User-Defined Character Properties in perlunicode and perlsec.
(F) The indexes and widths specified in the format string of printf()
or sprintf() are too large. The numbers must not overflow the size of
integers for your architecture.
(S overflow) The hexadecimal, octal or binary number you have specified either as a literal or as an argument to hex() or oct() is too big for your architecture, and has been converted to a floating point number. On a 32-bit architecture the largest hexadecimal, octal or binary number representable without overflow is 0xFFFFFFFF, 037777777777, or 0b11111111111111111111111111111111 respectively. Note that Perl transparently promotes all numbers to a floating point representation internally--subject to loss of precision errors in subsequent operations.
(S overflow) The number you have passed to srand is too big to fit in your architecture's integer representation. The number has been replaced with the largest integer supported (0xFFFFFFFF on 32-bit architectures). This means you may be getting less randomness than you expect, because different random seeds above the maximum will return the same sequence of random numbers.
(W overflow) Some portion of a version initialization is too large for the size of integers for your architecture. This is not a warning because there is no rational reason for a version to try and use an element larger than typically 2**32. This is usually caused by trying to use some odd mathematical operation as a version, like 100/9.
(P) Something went badly wrong in the regular expression parser. The <-- HERE shows whereabouts in the regular expression the problem was discovered.
(S) A warning peculiar to VMS. Perl keeps track of the number of times
you've called fork and exec, to determine whether the current call
to exec should affect the current script or a subprocess (see
exec LIST in perlvms). Somehow, this count has become scrambled, so
Perl is making a guess and treating this exec as a request to
terminate the Perl script and execute the specified command.
(P) Something went badly awry in the regular expression parser. The <-- HERE shows whereabouts in the regular expression the problem was discovered.
(W syntax) You've run afoul of the rule that says that any list operator followed by parentheses turns into a function, with all the list operators arguments found inside the parentheses. See Terms and List Operators (Leftward) in perlop.
(F) The indicated attribute for a subroutine or variable was not recognized by Perl or by a user-supplied handler. See attributes.
(F) The indicated attributes for a subroutine or variable were not recognized by Perl or by a user-supplied handler. See attributes.
(F) You wrote something like
- [z-a]
in a regular expression pattern. Ranges must be specified with the lowest code point first. Instead write
- [a-z]
(F) Only certain characters are valid for character names. The indicated one isn't. See CUSTOM ALIASES in charnames.
(F) You tried to create a custom alias for a character name, with
the :alias
option to use charnames
and the specified character in
the indicated name isn't valid. See CUSTOM ALIASES in charnames.
(W printf) Perl does not understand the given format conversion. See sprintf.
(W regexp) The numeric escape (for example \xHH
) of value < 256
didn't correspond to a single character through the conversion
from the encoding specified by the encoding pragma.
The escape was replaced with REPLACEMENT CHARACTER (U+FFFD) instead.
The <-- HERE shows whereabouts in the regular expression the
escape was discovered.
(F) The character constant represented by ...
is not a valid hexadecimal
number. Either it is empty, or you tried to use a character other than
0 - 9 or A - F, a - f in a hexadecimal number.
(F) The module argument to perl's -m and -M command-line options cannot contain single colons in the module name, but only in the arguments after "=". In other words, -MFoo::Bar=:baz is ok, but -MFoo:Bar=baz is not.
(F) You tried to mro::set_mro("classname", "foo")
or use mro 'foo'
,
where foo
is not a valid method resolution order (MRO). Currently,
the only valid ones supported are dfs
and c3
, unless you have loaded
a module that is a MRO plugin. See mro and perlmroapi.
(W utf8) You passed a negative number to chr. Negative numbers are
not valid characters numbers, so it return the Unicode replacement
character (U+FFFD).
(S debugging) Perl was called with invalid debugger flags. Call perl with the -D option with no flags to see the list of acceptable values. See also -Dletters in perlrun.
(F) The range specified in a character class had a minimum character
greater than the maximum character. One possibility is that you forgot the
{}
from your ending \x{}
- \x
without the curly braces can go only
up to ff
. The <-- HERE shows whereabouts in the regular expression the
problem was discovered. See perlre.
(F) The range specified in the tr/// or y/// operator had a minimum character greater than the maximum character. See perlop.
(F) Something other than a colon or whitespace was seen between the elements of an attribute list. If the previous attribute had a parenthesised parameter list, perhaps that list was terminated too soon. See attributes.
(W layer) When pushing layers onto the Perl I/O system, something other than a colon or whitespace was seen between the elements of a layer list. If the previous attribute had a parenthesised parameter list, perhaps that list was terminated too soon.
(F) A version number did not meet the "strict" criteria for versions. A "strict" version number is a positive decimal number (integer or decimal-fraction) without exponentiation or else a dotted-decimal v-string with a leading 'v' character and at least three components. The parenthesized text indicates which criteria were not met. See the version module for more details on allowed version formats.
(F) The given character is not a valid pack or unpack type. See pack.
(W) The given character is not a valid pack or unpack type but used to be silently ignored.
(F) A version number did not meet the "lax" criteria for versions. A "lax" version number is a positive decimal number (integer or decimal-fraction) without exponentiation or else a dotted-decimal v-string. If the v-string has fewer than three components, it must have a leading 'v' character. Otherwise, the leading 'v' is optional. Both decimal and dotted-decimal versions may have a trailing "alpha" component separated by an underscore character after a fractional or dotted-decimal component. The parenthesized text indicates which criteria were not met. See the version module for more details on allowed version formats.
(F) The internal structure of the version object was invalid. Perhaps the internals were modified directly in some way or an arbitrary reference was blessed into the "version" class.
(F) Your machine apparently doesn't implement ioctl(), which is pretty strange for a machine that supports C.
(W unopened) You tried ioctl() on a filehandle that was never opened. Check your control flow and number of arguments.
(F) Your Perl has not been configured to have PerlIO, and therefore you cannot use IO layers. To have PerlIO, Perl must be configured with 'useperlio'.
(F) Your machine doesn't implement the sockatmark() functionality, neither as a system call nor an ioctl call (SIOCATMARK).
(D deprecated, syntax) The special variable $*
, deprecated in older
perls, has been removed as of 5.9.0 and is no longer supported. In
previous versions of perl the use of $*
enabled or disabled multi-line
matching within a string.
Instead of using $*
you should use the /m (and maybe /s) regexp
modifiers. You can enable /m for a lexical scope (even a whole file)
with use re '/m'
. (In older versions: when $*
was set to a true value
then all regular expressions behaved as if they were written using /m.)
(D deprecated, syntax) The special variable $#
, deprecated in older
perls, has been removed as of 5.9.3 and is no longer supported. You
should use the printf/sprintf functions instead.
(W overload) The second (fourth, sixth, ...) argument of overload::constant needs to be a code reference. Either an anonymous subroutine, or a reference to a subroutine.
(W overload) You tried to overload a constant type the overload package is unaware of.
(P) The regular expression parser is confused.
(F) You named a loop to break out of, but you're not currently in a loop of that name, not even if you count where you were called from. See last.
(F) You named a loop to continue, but you're not currently in a loop of that name, not even if you count where you were called from. See last.
(F) You named a loop to restart, but you're not currently in a loop of that name, not even if you count where you were called from. See last.
(F) While under the use filetest
pragma, switching the real and
effective uids or gids failed.
(F) While unpacking, the string buffer was already used up when an unpack length/code combination tried to obtain more data. This results in an undefined value for the length. See pack.
(W syntax) You used length() on either an array or a hash when you probably wanted a count of the items.
Array size can be obtained by doing:
- scalar(@array);
The number of items in a hash can be obtained by doing:
(F) An extension is attempting to insert text into the current parse (using lex_stuff_pvn or similar), but tried to insert a character that couldn't be part of the current input. This is an inherent pitfall of the stuffing mechanism, and one of the reasons to avoid it. Where it is necessary to stuff, stuffing only plain ASCII is recommended.
(F) Lexing code supplied by an extension violated the lexer's API in a detectable way.
(W closed) You tried to do a listen on a closed socket. Did you forget to check the return value of your socket() call? See listen.
(F) On some platforms, notably Windows, the three-or-more-arguments
form of open does not support pipes, such as open($pipe, '|-', @args)
.
Use the two-argument open($pipe, '|prog arg1 arg2...')
form instead.
(W overflow) You called localtime with a number that was larger
than it can reliably handle and localtime probably returned the
wrong date. This warning is also triggered with NaN (the special
not-a-number value).
(W overflow) You called localtime with a number that was smaller
than it can reliably handle and localtime probably returned the
wrong date.
(F) There is currently a limit on the length of string which lookbehind can handle. This restriction may be eased in a future release.
(W imprecision) The value you attempted to increment or decrement by one
is too large for the underlying floating point representation to store
accurately, hence the target of ++
or --
is unchanged. Perl issues this
warning because it has already switched from integers to floating point
when values are too large for integers, and now even floating point is
insufficient. You may wish to switch to using Math::BigInt explicitly.
(W io) You tried to do an lstat on a filehandle. What did you mean by that? lstat() makes sense only on filenames. (Perl did a fstat() instead on the filehandle.)
(W misc) Although attributes.pm allows this, turning the lvalue attribute on or off on a Perl subroutine that is already defined does not always work properly. It may or may not do what you want, depending on what code is inside the subroutine, with exact details subject to change between Perl versions. Only do this if you really know what you are doing.
(W misc) Using the :lvalue
declarative syntax to make a Perl
subroutine an lvalue subroutine after it has been defined is
not permitted. To make the subroutine an lvalue subroutine,
add the lvalue attribute to the definition, or put the sub
foo :lvalue;
declaration before the definition.
See also attributes.pm.
(F) Between the brackets enclosing a numeric repeat count only digits are permitted. See pack.
(F) Between the brackets enclosing a numeric repeat count only digits are permitted. See pack.
(F) An error peculiar to OS/2. PERLLIB_PREFIX should be of the form
- prefix1;prefix2
or prefix1 prefix2
with nonempty prefix1 and prefix2. If prefix1
is indeed a prefix of
a builtin library search path, prefix2 is substituted. The error may
appear if components are not found, or are too long. See
"PERLLIB_PREFIX" in perlos2.
(F) You tried to use a function with a malformed prototype. The syntax of function prototypes is given a brief compile-time check for obvious errors like invalid characters. A more rigorous check is run when the function is called.
(S utf8)(F) Perl detected a string that didn't comply with UTF-8 encoding rules, even though it had the UTF8 flag on.
One possible cause is that you set the UTF8 flag yourself for data that you thought to be in UTF-8 but it wasn't (it was for example legacy 8-bit data). To guard against this, you can use Encode::decode_utf8.
If you use the :encoding(UTF-8)
PerlIO layer for input, invalid byte
sequences are handled gracefully, but if you use :utf8
, the flag is
set without validating the data, possibly resulting in this error
message.
See also Handling Malformed Data in Encode.
(F) You said use utf8
, but the program file doesn't comply with UTF-8
encoding rules. The message prints out the properly encoded characters
just before the first bad one. If utf8
warnings are enabled, a
warning is generated that gives more details about the type of
malformation.
(F) The charnames handler returned malformed UTF-8.
(F) You tried to unpack something that didn't comply with UTF-8 encoding rules and perl was unable to guess how to make more progress.
(F) You tried to pack something that didn't comply with UTF-8 encoding rules and perl was unable to guess how to make more progress.
(F) You tried to unpack something that didn't comply with UTF-8 encoding rules and perl was unable to guess how to make more progress.
(F) Perl thought it was reading UTF-16 encoded character data but while doing it Perl met a malformed Unicode surrogate.
(W regexp) The pattern you've specified would be an infinite loop if the regular expression engine didn't specifically check for that. The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(F) Perl aborted due to too high a number of signals pending. This usually indicates that your operating system tried to deliver signals too fast (with a very high priority), starving the perl process from resources it would need to reach a point where it can process signals safely. (See Deferred Signals (Safe Signals) in perlipc.)
(W) This warning may be due to running a perl5 script through a perl4 interpreter, especially if the word that is being warned about is "use" or "my".
(F) You can't pack a string by supplying a checksum, because the checksumming process loses information, and you can't go the other way. See unpack.
(F) An attempt was made to specify an entry in an overloading table that doesn't resolve to a valid subroutine. See overload.
See Server error.
(S) An advisory indicating that the previous error may have been caused by a missing delimiter on a string or pattern, because it eventually ended earlier on the current line.
(W syntax) An underscore (underbar) in a numeric constant did not separate two digits.
(W uninitialized) A printf-type format required more arguments than were supplied.
(F) The argument to the indicated command line switch must follow immediately after the switch, without intervening spaces.
(F) Wrong syntax of character name literal \N{charname}
within
double-quotish context. This can also happen when there is a space
(or comment) between the \N
and the { in a regex with the /x modifier.
This modifier does not change the requirement that the brace immediately
follow the \N
.
(F) A \o
must be followed immediately by a { in double-quotish context.
(F) While certain functions allow you to specify a filehandle or an "indirect object" before the argument list, this ain't one of them.
(W pipe) You used the open(FH, "| command")
or
open(FH, "command |")
construction, but the command was missing or
blank.
(F) A double-quoted string ended with "\c", without the required control character name.
(F) The reserved syntax for lexically scoped subroutines requires that they have a name with which they can be found.
(F) Apparently you've been programming in csh too much. Variables are always mentioned with the $ in Perl, unlike in the shells, where it can vary from one line to the next.
(S syntax) This is an educated guess made in conjunction with the message "%s found where operator expected". Often the missing operator is a comma.
(F) Missing right brace in \x{...}
, \p{...}
, \P{...}
, or \N{...}
.
(F) \N
has two meanings.
The traditional one has it followed by a name enclosed in braces,
meaning the character (or sequence of characters) given by that
name. Thus \N{ASTERISK}
is another way of writing *
, valid in both
double-quoted strings and regular expression patterns. In patterns,
it doesn't have the meaning an unescaped *
does.
Starting in Perl 5.12.0, \N
also can have an additional meaning (only)
in patterns, namely to match a non-newline character. (This is short
for [^\n], and like . but is not affected by the /s regex modifier.)
This can lead to some ambiguities. When \N
is not followed immediately
by a left brace, Perl assumes the [^\n] meaning. Also, if the braces
form a valid quantifier such as \N{3}
or \N{5,}
, Perl assumes that this
means to match the given quantity of non-newlines (in these examples,
3; and 5 or more, respectively). In all other case, where there is a
\N{ and a matching }, Perl assumes that a character name is desired.
However, if there is no matching }, Perl doesn't know if it was
mistakenly omitted, or if [^\n]{ was desired, and raises this error.
If you meant the former, add the right brace; if you meant the latter,
escape the brace with a backslash, like so: \N\{
(F) The lexer counted more opening curly or square brackets than closing ones. As a general rule, you'll find it's missing near the place you were last editing.
(S syntax) This is an educated guess made in conjunction with the message "%s found where operator expected". Don't automatically put a semicolon on the previous line just because you saw this message.
(F) You tried, directly or indirectly, to change the value of a constant. You didn't, of course, try "2 = 1", because the compiler catches that. But an easy way to do the same thing is:
Another way is to assign to a substr() that's off the end of the string.
Yet another way is to assign to a foreach
loop VAR when VAR
is aliased to a constant in the look LIST:
(F) You tried to make an array value spring into existence, and the subscript was probably negative, even counting from end of the array backwards.
(P) You tried to make a hash value spring into existence, and it couldn't be created for some peculiar reason.
(F) Only a bare module name is allowed as the first argument to a "use".
(F) The -M
or -m options say that Perl should load some module, but
you omitted the name of the module. Consult perlrun for full details
about -M
and -m.
(F) The open function has been asked to open multiple files. This
can happen if you are trying to open a pipe to a command that takes a
list of arguments, but have forgotten to specify a piped open mode.
See open for details.
(F) You don't have System V message IPC on your system.
(W syntax) Multidimensional arrays aren't written like $foo[1,2,3]
.
They're written like $foo[1][2][3]
, as in C.
(F) You had an unpack template that contained a '/', but this did not follow some unpack specification producing a numeric value. See pack.
(F) Lexically scoped subroutines are not yet implemented. Don't try that yet.
(F) Lexically scoped variables aren't in a package, so it doesn't make sense to try to declare one with a package qualifier on the front. Use local() if you want to localize a package variable.
(W once) Typographical errors often show up as unique variable names.
If you had a good reason for having a unique name, then just mention it
again somehow to suppress the message. The our declaration is
provided for this purpose.
NOTE: This warning detects symbols that have been used only once so $c, @c, %c, *c, &c, sub c{}, c(), and c (the filehandle or format) are considered the same; if a program uses $c only once but also uses any of the others it will not trigger this warning.
(F) The new (5.12) meaning of \N
as [^\n] is not valid in a bracketed
character class, for the same reason that . in a character class loses
its specialness: it matches almost everything, which is probably not
what you want.
(F) When compiling a regex pattern, an unresolved named character or sequence was encountered. This can happen in any of several ways that bypass the lexer, such as using single-quotish context, or an extra backslash in double-quotish:
- $re = '\N{SPACE}'; # Wrong!
- $re = "\\N{SPACE}"; # Wrong!
- /$re/;
Instead, use double-quotes with a single backslash:
- $re = "\N{SPACE}"; # ok
- /$re/;
The lexer can be bypassed as well by creating the pattern from smaller components:
- $re = '\N';
- /${re}{SPACE}/; # Wrong!
It's not a good idea to split a construct in the middle like this, and it doesn't work here. Instead use the solution above.
Finally, the message also can happen under the /x regex modifier when the
\N
is separated by spaces from the {, in which case, remove the spaces.
- /\N {SPACE}/x; # Wrong!
- /\N{SPACE}/x; # ok
(F) Within (?[ ])
, all constants interpreted as octal need to be
exactly 3 digits long. This helps catch some ambiguities. If your
constant is too short, add leading zeros, like
- (?[ [ \078 ] ]) # Syntax error!
- (?[ [ \0078 ] ]) # Works
- (?[ [ \007 8 ] ]) # Clearer
The maximum number this construct can express is \777
. If you
need a larger one, you need to use \o{}
instead. If you meant two separate things, you need to separate them
- (?[ [ \7776 ] ]) # Syntax error!
- (?[ [ \o{7776} ] ]) # One meaning
- (?[ [ \777 6 ] ]) # Another meaning
- (?[ [ \777 \006 ] ]) # Still another
(F) The length count obtained from a length/code unpack operation was negative. See pack.
(F) You tried to do a read/write/send/recv operation with a buffer length that is less than 0. This is difficult to imagine.
(F) When vec is called in an lvalue context, the second argument must be
greater than or equal to zero.
(F) You can't quantify a quantifier without intervening parentheses. So things like ** or +* or ?* are illegal. The <-- HERE shows whereabouts in the regular expression the problem was discovered.
Note that the minimal matching quantifiers, *?
, +?, and
??
appear to be nested quantifiers, but aren't. See perlre.
(S internal) The symbol in question was declared but somehow went out of scope before it could possibly have been used.
(F) next::method needs to be called within the context of a
real method in a real package, and it could not find such a context.
See mro.
(F) Certain operations are deemed to be too insecure for a setuid or setgid script to even be allowed to attempt. Generally speaking there will be another way to do what you want that is, if not secure, at least securable. See perlsec.
(F) Perl's -e and -E command-line options require an argument. If you want to run an empty program, pass the empty string as a separate argument or run a program consisting of a single 0 or 1:
- perl -e ""
- perl -e0
- perl -e1
(F) A list operator that has a filehandle or "indirect object" is not allowed to have a comma between that and the following arguments. Otherwise it'd be just another one of the arguments.
One possible cause for this is that you expected to have imported a constant to your name space with use or import while no such importing took place, it may for example be that your operating system does not support that particular constant. Hopefully you did use an explicit import list for the constants you expect to see; please see use and import. While an explicit import list would probably have caught this error earlier it naturally does not remedy the fact that your operating system still does not support that constant. Maybe you have a typo in the constants of the symbol import list of use or import or in the constant name at the line where this error was triggered?
(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a '|' at the end of the command line, so it doesn't know where you want to pipe the output from this command.
(F) The currently executing code was compiled with the -d switch, but
for some reason the current debugger (e.g. perl5db.pl or a Devel::
module) didn't define a routine to be called at the beginning of each
statement.
(P) This is counted as an internal error, because every machine should supply dbm nowadays, because Perl comes with SDBM. See SDBM_File.
(F) The currently executing code was compiled with the -d switch, but
for some reason the current debugger (e.g. perl5db.pl or a Devel::
module) didn't define a DB::sub
routine to be called at the beginning
of each ordinary subroutine call.
(F) The -I command-line switch requires a directory name as part of the same argument. Use -Ilib, for instance. -I lib won't work.
(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a '2>' or a '2>>' on the command line, but can't find the name of the file to which to write data destined for stderr.
(F) A pack or unpack template has an opening '(' or '[' without its matching counterpart. See pack.
(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a '<' on the command line, but can't find the name of the file from which to read data for stdin.
(F) next::method found no further instances of this method name
in the remaining packages of the MRO of this class. If you don't want
it throwing an exception, use maybe::next::method
or next::can. See mro.
(F) The "no" keyword is recognized and executed at compile time, and returns no useful value. See perlmod.
(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a lone '>' at the end of the command line, so it doesn't know where you wanted to redirect stdout.
(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a '>' or a '>>' on the command line, but can't find the name of the file to which to write data destined for stdout.
(F) Fully qualified variable names are not allowed in "our" declarations, because that doesn't make much sense under existing semantics. Such syntax is reserved for future extensions.
(F) You called perl -x
, but no line was found in the file beginning
with #! and containing the word "perl".
(F) Configure didn't find anything resembling the setregid() call for your system.
(F) Configure didn't find anything resembling the setreuid() call for your system.
(F) You tried to access a key from a hash through the indicated typed variable but that key is not allowed by the package of the same type. The indicated package has restricted the set of allowed keys using the fields pragma.
(F) You provided a class qualifier in a "my", "our" or "state" declaration, but this class doesn't exist at this point in your program.
(F) You specified a signal hook that was not recognized by Perl.
Currently, Perl accepts __DIE__
and __WARN__
as valid signal hooks.
(P) An error peculiar to VMS. The internal routine my_pclose() tried to close a pipe which hadn't been opened. This should have been caught earlier as an attempt to close an unopened filehandle.
(W signal) You specified a signal name as a subscript to %SIG that was
not recognized. Say kill -l
in your shell to see the valid signal
names on your system.
(F) Perl was trying to evaluate a reference to a code value (that is, a subroutine), but found a reference to something else instead. You can use the ref() function to find out what kind of ref it really was. See also perlref.
(F) Perl was trying to evaluate a reference to a "typeglob" (that is, a
symbol table entry that looks like *foo
), but found a reference to
something else instead. You can use the ref() function to find out what
kind of ref it really was. See perlref.
(F) Perl was trying to evaluate a reference to a hash value, but found a reference to something else instead. You can use the ref() function to find out what kind of ref it really was. See perlref.
(F) Perl was trying to evaluate a reference to an array value, but found a reference to something else instead. You can use the ref() function to find out what kind of ref it really was. See perlref.
(F) You passed a reference to a blessed array to push, shift or
another array function. These only accept unblessed array references
or arrays beginning explicitly with @
.
(F) Perl was trying to evaluate a reference to a scalar value, but found a reference to something else instead. You can use the ref() function to find out what kind of ref it really was. See perlref.
(F) Perl was trying to evaluate a reference to a code value (that is, a subroutine), but found a reference to something else instead. You can use the ref() function to find out what kind of ref it really was. See also perlref.
(F) An attempt was made to specify an entry in an overloading table that doesn't somehow point to a valid subroutine. See overload.
(F) The function requires more arguments than you specified.
(W syntax) A format specified more picture fields than the next line supplied. See perlform.
(A) You've accidentally run your script through the Bourne shell instead of Perl. Check the #! line, or manually feed your script into Perl yourself.
(S) A warning peculiar to VMS. Perl was unable to find the local timezone offset, so it's assuming that local system time is equivalent to UTC. If it's not, define the logical name SYS$TIMEZONE_DIFFERENTIAL to translate to the number of seconds which need to be added to UTC to get local time.
(F) In a regular expression, there was a non-hexadecimal character where a hex one was expected, like
- (?[ [ \xDG ] ])
- (?[ [ \x{DEKA} ] ])
(W digit) In parsing an octal numeric constant, a character was unexpectedly encountered that isn't octal. The resulting value is as indicated.
(F) In a regular expression, there was a non-octal character where an octal one was expected, like
- (?[ [ \o{1278} ] ])
(W misc) A number has been passed as a bitmask argument to select(). Use the vec() function to construct the file descriptor bitmasks for select. See select.
(F)
(?[...])
cannot be used within the scope of a use locale
or
with an /l
regular expression modifier, as that would require
deferring to run-time the calculation of what it should evaluate to, and
it is regex compile-time only.
(F) You can't require the null filename, especially because on many machines that means the current directory! See require.
(S debugging) Some internal routine called run() with a null opcode pointer.
(F) The first argument to formline must be a valid format picture specification. It was found to be empty, which probably means you supplied it an uninitialized value. See perlform.
(P) An attempt was made to realloc NULL.
(P) The internal pattern matching routines blew it big time.
(P) The internal pattern matching routines are out of their gourd.
(F) Perl limits the representation of decimal numbers in programs to about 250 characters. You've exceeded that length. Future versions of Perl are likely to eliminate this arbitrary limitation. In the meantime, try using scientific notation (e.g. "1e6" instead of "1_000_000").
(F) Perl was looking for a number but found nothing that looked like
a number. This happens, for example with \o{}
, with no number between
the braces.
(W syntax) The package variables $a and $b are used for sort comparisons.
You used $a or $b in as an operand to the <=>
or cmp
operator inside a
sort comparison block, and the variable had earlier been declared as a
lexical variable. Either qualify the sort variable with the package
name, or rename the lexical variable.
(W portable) The octal number you specified is larger than 2**32-1 (4294967295) and therefore non-portable between systems. See perlport for more on portability concerns.
(W overload) The call to overload::constant contained an odd number of arguments. The arguments should come in pairs.
(W misc) You specified an odd number of elements to initialize a hash, which is odd, because hashes come in key/value pairs.
(W misc) You specified an odd number of elements to initialize a hash, which is odd, because hashes come in key/value pairs.
(F)(W layer) You tried to do a read/write/send/recv/seek operation
with an offset pointing outside the buffer. This is difficult to
imagine. The sole exceptions to this are that zero padding will
take place when going past the end of the string when either
sysread()ing a file, or when seeking past the end of a scalar opened
for I/O (in anticipation of future reads and to imitate the behaviour
with real files).
(W unopened) An I/O operation was attempted on a filehandle that was never initialized. You need to do an open(), a sysopen(), or a socket() call, or call a constructor from the FileHandle package.
(W unopened) You tried to invoke a file test operator on a filehandle that isn't open. Check your control flow. See also -X.
(W utf8) You tried to open a reference to a scalar for read or append where the scalar contained code points over 0xFF. In-memory files model on-disk files and can only contain bytes.
(S internal) An internal warning that the grammar is screwed up.
(S internal) An internal warning that the grammar is screwed up.
(D io, deprecated) You used open() to associate a filehandle to a symbol (glob or scalar) that already holds a dirhandle. Although legal, this idiom might render your code confusing and is deprecated.
(D io, deprecated) You used opendir() to associate a dirhandle to a symbol (glob or scalar) that already holds a filehandle. Although legal, this idiom might render your code confusing and is deprecated.
(F) You wrote something like
- (?[ \p{Digit} \p{Thai} ])
There are two operands, but no operator giving how you want to combine them.
(F) An attempt was made to perform an overloaded operation for which no
handler was defined. While some handlers can be autogenerated in terms
of other handlers, there is no default handler for any operation, unless
the fallback
overloading key is specified to be true. See overload.
(S utf8, non_unicode) You performed an operation requiring Unicode semantics on a code point that is not in Unicode, so what it should do is not defined. Perl has chosen to have it do nothing, and warn you.
If the operation shown is "ToFold", it means that case-insensitive matching in a regular expression was done on the code point.
If you know what you are doing you can turn off this warning by
no warnings 'non_unicode';
.
(S utf8, surrogate) You performed an operation requiring Unicode semantics on a Unicode surrogate. Unicode frowns upon the use of surrogates for anything but storing strings in UTF-16, but semantics are (reluctantly) defined for the surrogates, and they are to do nothing for this operation. Because the use of surrogates can be dangerous, Perl warns.
If the operation shown is "ToFold", it means that case-insensitive matching in a regular expression was done on the code point.
If you know what you are doing you can turn off this warning by
no warnings 'surrogate';
.
(S ambiguous) You used a variable or subroutine call where the parser was expecting an operator. The parser has assumed you really meant to use an operator, but this is highly likely to be incorrect. For example, if you say "*foo *foo" it will be interpreted as if you said "*foo * 'foo'".
(W misc) You seem to have already declared the same global once before in the current lexical scope.
(X) The malloc() function returned 0, indicating there was insufficient remaining memory (or virtual memory) to satisfy the request. Perl has no option but to exit immediately.
At least in Unix you may be able to get past this by increasing your
process datasize limits: in csh/tcsh use limit
and
limit datasize n
(where n
is the number of kilobytes) to check
the current limits and change them, and in ksh/bash/zsh use ulimit -a
and ulimit -d n
, respectively.
(X) An attempt was made to extend an array, a list, or a string beyond the largest possible memory allocation.
(F) The malloc() function returned 0, indicating there was insufficient remaining memory (or virtual memory) to satisfy the request. However, the request was judged large enough (compile-time default is 64K), so a possibility to shut down by trapping this error is granted.
(X)(F) The malloc() function returned 0, indicating there was insufficient remaining memory (or virtual memory) to satisfy the request.
The request was judged to be small, so the possibility to trap it
depends on the way perl was compiled. By default it is not trappable.
However, if compiled for this, Perl may use the contents of $^M
as an
emergency pool after die()ing with this message. In this case the error
is trappable once, and the error message will include the line and file
where the failed request happened.
(F) You can't allocate more than 2^31+"small amount" bytes. This error
is most likely to be caused by a typo in the Perl program. e.g.,
$arr[time]
instead of $arr[$time]
.
(F) The yacc parser wanted to grow its stack so it could continue parsing, but realloc() wouldn't give it more memory, virtual or otherwise.
(F) The argument to a '.' in your template tried to move the working position to before the start of the packed string being built.
(F) You had a template that specified an absolute position outside the string being unpacked. See pack.
(F) You had a template that specified an absolute position outside the string being unpacked. The string being unpacked was also invalid UTF-8. See pack.
(W overload) The overload pragma was passed an argument it did not recognize. Did you mistype an operator?
(F) An object with an overloaded dereference operator was dereferenced, but the overloaded operation did not return a reference. See overload.
(F) An object with a qr overload was used as part of a match, but the
overloaded operation didn't return a compiled regexp. See overload.
(W reserved) A lowercase attribute name was used that had a package-specific handler. That name might have a meaning to Perl itself some day, even though it doesn't yet. Perhaps you should use a mixed-case attribute name, instead. See attributes.
(F) You can't specify a repeat count so large that it overflows your signed integers. See pack.
(W io) A single call to write() produced more lines than can fit on a page. See perlform.
(P) An internal error.
(P) One of the file test operators entered a code branch that calls an ACL related-function, but that function is not available on this platform. Earlier checks mean that it should not be possible to enter this branch on this platform.
(P) A child pseudo-process in the ithreads implementation on Windows was not scheduled within the time period allowed and therefore was not able to initialize properly.
(P) Failed an internal consistency check trying to compile a grep.
(P) Failed an internal consistency check trying to compile a split.
(P) The savestack was requested to restore more localized values than there are in the savestack.
(P) Failed an internal consistency check while trying to reset a weak reference.
(P) We popped the context stack to an eval context, and then discovered it wasn't an eval context.
(P) The internal pp_subst() routine was called with invalid operational data.
(P) The internal do_trans routines were called with invalid operational data.
(P) While attempting folding constants an exception other than an eval
failure was caught.
(P) The library function frexp() failed, making printf("%f") impossible.
(P) We popped the context stack to a context with the specified label, and then discovered it wasn't a context we know how to do a goto in.
(P) The internal routine used to clear a typeglob's entries tried repeatedly, but each time something re-created entries in the glob. Most likely the glob contains an object with a reference back to the glob and a destructor that adds a new object to the glob.
(P) The lexer got into a bad state at a case modifier.
(P) The lexer got into a bad state parsing a string with brackets.
(F) forked child returned an incomprehensible message about its errno.
(P) We popped the context stack to a block context, and then discovered it wasn't a block context.
(P) A writable lexical variable became read-only somehow within the scope.
(P) The savestack probably got out of sync. At least, there was an invalid enum on the top of it.
(P) Failed an internal consistency check while trying to reset all weak references to an object.
(P) Something requested a negative number of bytes of malloc.
(P) Something tried to allocate more memory than possible.
(P) The compiler got confused about which scratch pad it was allocating and freeing temporaries and lexicals from.
(P) The compiler got confused about which scratch pad it was allocating and freeing temporaries and lexicals from.
(P) An invalid scratch pad offset was detected internally.
(P) The compiler got confused about which scratch pad it was allocating and freeing temporaries and lexicals from.
(P) An invalid scratch pad offset was detected internally.
(P) The compiler got confused about which scratch pad it was allocating and freeing temporaries and lexicals from.
(P) An invalid scratch pad offset was detected internally.
(P) The foreach iterator got called in a non-loop context frame.
(P) The internal pp_match() routine was called with invalid operational data.
(P) Something terrible went wrong in setting up for the split.
(P) Something requested a negative number of bytes of realloc.
(P) The internal sv_replace() function was handed a new SV with a reference count other than 1.
(P) Some internal routine requested a goto (or something like it), and didn't supply the destination.
(P) We popped the context stack to a subroutine or eval context, and then discovered it wasn't a subroutine or eval context.
(P) scan_num() got called on something that wasn't a number.
(P) while compiling a pattern that has embedded (?{}) or (??{}) code blocks, perl couldn't locate the code block that should have already been seen and compiled by perl before control passed to the regex compiler.
(P) The sv_chop() routine was passed a position that is not within the scalar's string buffer.
(P) The sv_insert() routine was told to remove more string than there was string.
(P) The interpreter's sanity check of the C function strxfrm() failed. In your current locale the returned transformation of the string "ab" is shorter than that of the string "a", which makes no sense.
(P) The compiler attempted to do a goto, or something weird like that.
(P) The compiler is screwed up and attempted to use an op that isn't permitted at run time.
(P) Something tried to call utf16_to_utf8 with an odd (as opposed to even) byte length.
(P) Something tried to call utf16_to_utf8_reversed with an odd (as opposed to even) byte length.
(P) The lexer got into a bad state while processing a case modifier.
(W parenthesis) You said something like
- my $foo, $bar = @_;
when you meant
- my ($foo, $bar) = @_;
Remember that "my", "our", "local" and "state" bind tighter than comma.
(F) Parsing code supplied by an extension violated the parser's API in a detectable way.
(D deprecated, utf8) This message indicates a bug either in the Perl
core or in XS code. Such code was trying to find out if a character,
allegedly stored internally encoded as UTF-8, was of a given type, such
as being punctuation or a digit. But the character was not encoded in
legal UTF-8. The %s
is replaced by a string that can be used by
knowledgeable people to determine what the type being checked against
was. If utf8
warnings are enabled, a further message is raised,
giving details of the malformation.
(F) You used a pattern that uses too many nested subpattern calls without consuming any text. Restructure the pattern so text is consumed before the nesting limit is exceeded.
The <-- HERE shows whereabouts in the regular expression the problem was discovered.
-p
destination: %s
(F) An error occurred during the implicit output invoked by the -p
command-line switch. (This output goes to STDOUT unless you've
redirected it with select().)
(F) This is an educated guess made in conjunction with the message "Can't locate object method \"%s\" via package \"%s\"". It often means that a method requires a package that has not been loaded.
(D regexp, deprecated) You used a regular expression with case-insensitive matching, and there is a bug in Perl in which the built-in regular expression folding rules are not accurate. This may lead to incorrect results. Please report this as a bug using the perlbug utility. (This message is marked deprecated, so that it by default will be turned-on.)
(F) Your platform has very uncommon byte-order and integer size, so it was not possible to set up some or all fixed-width byte-order conversion functions. This is only a problem when you're using the '<' or '>' modifiers in (un)pack templates. See pack.
(F) The code you are trying to run has asked for a newer version of
Perl than you are running. Perhaps use 5.10
was written instead
of use 5.010
or use v5.10
. Without the leading v
, the number is
interpreted as a decimal, with every three digits after the
decimal point representing a part of the version number. So 5.10
is equivalent to v5.100.
(F) The module in question uses features of a version of Perl more recent than the currently running version. How long has it been since you upgraded, anyway? See require.
(F) An error peculiar to OS/2. PERL_SH_DIR is the directory to find the
sh
-shell in. See "PERL_SH_DIR" in perlos2.
(X) See PERL_SIGNALS in perlrun for legal values.
(F) The code you are trying to run claims it will not run on the version of Perl you are using because it is too new. Maybe the code needs to be updated, or maybe it is simply wrong and the version check should just be removed.
(S) The whole warning message will look something like:
- perl: warning: Setting locale failed.
- perl: warning: Please check that your locale settings:
- LC_ALL = "En_US",
- LANG = (unset)
- are supported and installed on your system.
- perl: warning: Falling back to the standard locale ("C").
Exactly what were the failed locale settings varies. In the above the settings were that the LC_ALL was "En_US" and the LANG had no value. This error means that Perl detected that you and/or your operating system supplier and/or system administrator have set up the so-called locale system but Perl could not use those settings. This was not dead serious, fortunately: there is a "default locale" called "C" that Perl can and will use, and the script will be run. Before you really fix the problem, however, you will get the same error message each time you run Perl. How to really fix the problem can be found in perllocale section LOCALE PROBLEMS.
(W) PERL_HASH_SEED should match /^\s*(?:0x)?[0-9a-fA-F]+\s*\z/ but it contained a non hex character. This could mean you are not using the hash seed you think you are.
(W) Perl was run with the environment variable PERL_PERTURB_KEYS defined but containing an unexpected value. The legal values of this setting are as follows.
- Numeric | String | Result
- --------+---------------+-----------------------------------------
- 0 | NO | Disables key traversal randomization
- 1 | RANDOM | Enables full key traversal randomization
- 2 | DETERMINISTIC | Enables repeatable key traversal randomization
Both numeric and string values are accepted, but note that string values are case sensitive. The default for this setting is "RANDOM" or 1.
(W exec) A warning peculiar to VMS. Waitpid() was asked to wait for a process which isn't a subprocess of the current process. While this is fine from VMS' perspective, it's probably not what you intended.
(F) The unpack format P must have an explicit size, not "*".
(F) The class in the character class [: :] syntax is unknown. The <-- HERE
shows whereabouts in the regular expression the problem was discovered.
Note that the POSIX character classes do not have the is
prefix
the corresponding C interfaces have: in other words, it's [[:print:]],
not isprint
. See perlre.
(F) Your system has POSIX getpgrp(), which takes no argument, unlike the BSD version, which takes a pid.
(W regexp) The character class constructs [: :], [= =], and [. .] go inside character classes, the [] are part of the construct, for example: /[012[:alpha:]345]/. Note that [= =] and [. .] are not currently implemented; they are simply placeholders for future extensions and will cause fatal errors. The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(F) Within regular expression character classes ([]) the syntax beginning with "[." and ending with ".]" is reserved for future extensions. If you need to represent those character sequences inside a regular expression character class, just quote the square brackets with the backslash: "\[." and ".\]". The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(F) Within regular expression character classes ([]) the syntax beginning with "[=" and ending with "=]" is reserved for future extensions. If you need to represent those character sequences inside a regular expression character class, just quote the square brackets with the backslash: "\[=" and "=\]". The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(W qw) qw() lists contain items separated by whitespace; as with literal strings, comment characters are not ignored, but are instead treated as literal data. (You may have used different delimiters than the parentheses shown here; braces are also frequently used.)
You probably wrote something like this:
- @list = qw(
- a # a comment
- b # another comment
- );
when you should have written this:
- @list = qw(
- a
- b
- );
If you really want comments, build your list the old-fashioned way, with quotes and commas:
- @list = (
- 'a', # a comment
- 'b', # another comment
- );
(W qw) qw() lists contain items separated by whitespace; therefore commas aren't needed to separate the items. (You may have used different delimiters than the parentheses shown here; braces are also frequently used.)
You probably wrote something like this:
- qw! a, b, c !;
which puts literal commas into some of the list items. Write it without commas if you don't want them to appear in your data:
- qw! a b c !;
(F) An ioctl() or fcntl() returned more than Perl was bargaining for. Perl guesses a reasonable buffer size, but puts a sentinel byte at the end of the buffer just in case. This sentinel byte got clobbered, and Perl assumes that memory is now corrupted. See ioctl.
(W precedence) Your program uses a bitwise logical operator in conjunction with a numeric comparison operator, like this :
- if ($x & $y == 0) { ... }
This expression is actually equivalent to $x & ($y == 0)
, due to the
higher precedence of ==
. This is probably not what you want. (If you
really meant to write this, disable the warning, or, better, put the
parentheses explicitly and write $x & ($y == 0)
).
(W ambiguous) You said something like m/$\/ in a regex.
The regex m/foo$\s+bar/m translates to: match the word 'foo', the output
record separator (see $\ in perlvar) and the letter 's' (one time or more)
followed by the word 'bar'.
If this is what you intended then you can silence the warning by using
m/${\}/ (for example: m/foo${\}s+bar/).
If instead you intended to match the word 'foo' at the end of the line
followed by whitespace and the word 'bar' on the next line then you can use
m/$(?)\/ (for example: m/foo$(?)\s+bar/).
(W ambiguous) You said something like '@foo' in a double-quoted string
but there was no array @foo
in scope at the time. If you wanted a
literal @foo, then write it as \@foo; otherwise find out what happened
to the array you apparently lost track of.
(S precedence) The old irregular construct
is now misinterpreted as
because of the strict regularization of Perl 5's grammar into unary and list operators. (The old open was a little of both.) You must put parentheses around the filehandle, or use the new "or" operator instead of "||".
See Server error.
(W closed) The filehandle you're writing to got itself closed sometime before now. Check your control flow.
(W closed) The filehandle you're printing on got itself closed sometime before now. Check your control flow.
(W) This is a standard message issued by OS/2 applications, while *nix applications die in silence. It is considered a feature of the OS/2 port. One can easily disable this by appropriate sighandlers, see Signals in perlipc. See also "Process terminated by SIGTERM/SIGINT" in perlos2.
(F)
The named property which you specified via \p
or \P
is not one
known to Perl. Perhaps you misspelled the name? See
Properties accessible through \p{} and \P{} in perluniprops
for a complete list of available official properties. If it is a
user-defined property
it must have been defined by the time the regular expression is
compiled.
(W illegalproto) A character follows % or @ in a prototype. This is useless, since % and @ gobble the rest of the subroutine arguments.
(S prototype) The subroutine being declared or defined had previously been declared or defined with a different function prototype.
(F) You've omitted the closing parenthesis in a function prototype definition.
(W) You compiled a regular expression that contained a Unicode property
match (\p
or \P
), but the regular expression is also being told to
use the run-time locale, not Unicode. Instead, use a POSIX character
class, which should know about the locale's rules.
(See POSIX Character Classes in perlrecharclass.)
Even if the run-time locale is ISO 8859-1 (Latin1), which is a subset of Unicode, some properties will give results that are not valid for that subset.
Here are a couple of examples to help you see what's going on. If the
locale is ISO 8859-7, the character at code point 0xD7 is the "GREEK
CAPITAL LETTER CHI". But in Unicode that code point means the
"MULTIPLICATION SIGN" instead, and \p
always uses the Unicode
meaning. That means that \p{Alpha}
won't match, but [[:alpha:]]
should. Only in the Latin1 locale are all the characters in the same
positions as they are in Unicode. But, even here, some properties give
incorrect results. An example is \p{Changes_When_Uppercased}
which
is true for "LATIN SMALL LETTER Y WITH DIAERESIS", but since the upper
case of that character is not in Latin1, in that locale it doesn't
change when upper cased.
(W regexp) Minima should be less than or equal to maxima. If you really want your regexp to match something 0 times, just put {0}.
(F) You started a regular expression with a quantifier. Backslash it if you meant it literally. The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(F) There is currently a limit to the size of the min and max values of the {min,max} construct. The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(W regexp) You applied a regular expression quantifier in a place where
it makes no sense, such as on a zero-width assertion. Try putting the
quantifier inside the assertion instead. For example, the way to match
"abc" provided that it is followed by three repetitions of "xyz" is
/abc(?=(?:xyz){3})/
, not /abc(?=xyz){3}/
.
The <-- HERE shows whereabouts in the regular expression the problem was discovered.
(W regexp) Minima should be less than or equal to maxima. If you really want your regexp to match something 0 times, just put {0}.
(F) One (or both) of the numeric arguments to the range operator ".." are outside the range which can be represented by integers internally. One possible workaround is to force Perl to use magical string increment by prepending "0" to your numbers.
(W io) The dirhandle you're reading from is either closed or not really a dirhandle. Check your control flow.
(W closed) The filehandle you're reading from got itself closed sometime before now. Check your control flow.
(W closed) You tried to read from a closed filehandle.
(W unopened) You tried to read from a filehandle that was never opened.
(F) You can't allocate more than 64K on an MS-DOS machine.
(S malloc) An internal routine called realloc() on something that had already been freed.
(S debugging) You can't use the -D option unless the code to produce the desired output is compiled into Perl, which entails some overhead, which is why it's currently left out of your copy.
(P) It is currently not permitted to load modules when creating
a filehandle inside an %INC hook. This can happen with open my
$fh, '<', \$scalar
, which implicitly loads PerlIO::scalar. Try
loading PerlIO::scalar explicitly first.
(F) While calculating the method resolution order (MRO) of a package, Perl
believes it found an infinite loop in the @ISA
hierarchy. This is a
crude check that bails out after 100 levels of @ISA
depth.
(P) Perl's I/O implementation failed an internal consistency check. If you see this message, something is very wrong.
(W misc) You gave a single reference where Perl was expecting a list with an even number of elements (for assignment to a hash). This usually means that you used the anon hash constructor when you meant to use parens. In any case, a hash requires key/value pairs.
- %hash = { one => 1, two => 2, }; # WRONG
- %hash = [ qw/ an anon array / ]; # WRONG
- %hash = ( one => 1, two => 2, ); # right
- %hash = qw( one 1 two 2 ); # also fine
(W misc) You have attempted to weaken a reference that is already weak. Doing so has no effect.
(F) You used \g0
or similar in a regular expression. You may refer
to capturing parentheses only with strictly positive integers
(normal backreferences) or with strictly negative integers (relative
backreferences). Using 0 does not make sense.
(F) You used something like \7
in your regular expression, but there are
not at least seven sets of capturing parentheses in the expression. If
you wanted to have the character with ordinal 7 inserted into the regular
expression, prepend zeroes to make it three digits long: \007
The <-- HERE shows whereabouts in the regular expression the problem was discovered.
(F) You used something like \k'NAME' or \k<NAME>
in your regular
expression, but there is no corresponding named capturing parentheses
such as (?'NAME'...) or (?<NAME>...)
. Check if the name has been
spelled correctly both in the backreference and the declaration.
The <-- HERE shows whereabouts in the regular expression the problem was discovered.
(F) You used something like \g{-7}
in your regular expression, but there
are not at least seven sets of closed capturing parentheses in the
expression before where the \g{-7}
was located.
The <-- HERE shows whereabouts in the regular expression the problem was discovered.
(P) The regular expression engine got confused by what the regular expression compiler gave it.
(F syntax, regexp) The regular expression pattern had too many occurrences of the specified modifier. Remove the extraneous ones.
(F) Turning off the given modifier has the side effect of turning on another one. Perl currently doesn't allow this. Reword the regular expression to use the modifier you want to turn on (and place it before the minus), instead of the one you want to turn off.
(F syntax, regexp) The regular expression pattern had more than one of these mutually exclusive modifiers. Retain only the modifier that is supposed to be there.
(P) A "can't happen" error, because safemalloc() should have caught it earlier.
(F) Your format contains the ~~ repeat-until-blank sequence and a numeric field that will never go blank so that the repetition never terminates. You might use ^# instead. See perlform.
(W misc) You have used a replacement list that is longer than the search list. So the additional elements in the replacement list are meaningless.
(W misc, regexp) You wrote something like \08
, or \179
in a
double-quotish string. All but the last digit is treated as a single
character, specified in octal. The last digit is the next character in
the string. To tell Perl that this is indeed what you want, you can use
the \o{ }
syntax, or use exactly three digits to specify the octal
for the character.
(W syntax) You wrote your assignment operator backwards. The = must always come last, to avoid ambiguity with subsequent unary operators.
(W io) The dirhandle you tried to do a rewinddir() on is either closed or not really a dirhandle. Check your control flow.
(S internal) Something went wrong in Perl's internal bookkeeping of scalars: not all scalar variables were deallocated by the time Perl exited. What this usually indicates is a memory leak, which is of course bad, especially if the Perl program is intended to be long-running.
(W syntax) You've used an array slice (indicated by @) to select a
single element of an array. Generally it's better to ask for a scalar
value (indicated by $). The difference is that $foo[&bar]
always
behaves like a scalar, both when assigning to it and when evaluating its
argument, while @foo[&bar]
behaves like a list when you assign to it,
and provides a list context to its subscript, which can do weird things
if you're expecting only one subscript.
On the other hand, if you were actually hoping to treat the array element as a list, you need to look into how references work, because Perl will not magically convert between scalars and lists for you. See perlref.
(W syntax) You've used a hash slice (indicated by @) to select a single
element of a hash. Generally it's better to ask for a scalar value
(indicated by $). The difference is that $foo{&bar}
always behaves
like a scalar, both when assigning to it and when evaluating its
argument, while @foo{&bar}
behaves like a list when you assign to it,
and provides a list context to its subscript, which can do weird things
if you're expecting only one subscript.
On the other hand, if you were actually hoping to treat the hash element as a list, you need to look into how references work, because Perl will not magically convert between scalars and lists for you. See perlref.
(F) The lexer couldn't find the final delimiter of a // or m{}
construct. Remember that bracketing delimiters count nesting level.
Missing the leading $
from a variable $m
may cause this error.
Note that since Perl 5.9.0 a // can also be the defined-or construct, not just the empty search pattern. Therefore code written in Perl 5.9.0 or later that uses the // as the defined-or can be misparsed by pre-5.9.0 Perls as a non-terminated search pattern.
(F) The lexer couldn't find the final delimiter of a ?PATTERN?
construct.
The question mark is also used as part of the ternary operator (as in
foo ? 0 : 1
) leading to some ambiguous constructions being wrongly
parsed. One way to disambiguate the parsing is to put parentheses around
the conditional expression, i.e. (foo) ? 0 : 1
.
(W io) The dirhandle you are doing a seekdir() on is either closed or not really a dirhandle. Check your control flow.
(W unopened) You tried to use the seek() or sysseek() function on a filehandle that was either never opened or has since been closed.
(F) This machine doesn't implement the select() system call.
(F) Self-ties are of arrays and hashes are not supported in the current implementation.
(W semicolon) A nearby syntax error was probably caused by a missing semicolon, or possibly some other missing operator, such as a comma.
(S internal) The internal newSVsv() routine was called to duplicate a scalar that had previously been marked as free.
(F) You don't have System V semaphore IPC on your system.
(W closed) The socket you're sending to got itself closed sometime before now. Check your control flow.
(F) A regular expression ended with an incomplete extension (?. The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(F) A proposed regular expression extension has the character reserved but has not yet been written. The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(F) You used a regular expression extension that doesn't make sense. The
<-- HERE shows whereabouts in the regular expression the problem was
discovered. This happens when using the (?^...)
construct to tell
Perl to use the default regular expression modifiers, and you
redundantly specify a default modifier. For other
causes, see perlre.
(F) The regular expression expects a mandatory argument following the escape sequence and this has been omitted or incorrectly written.
(F) A regular expression comment must be terminated by a closing parenthesis. Embedded parentheses aren't allowed. See perlre.
(F) The end of the perl code contained within the {...} must be followed immediately by a ')'.
See Server error.
(A) This is the error message generally seen in a browser window when trying to run a CGI program (including SSI) over the web. The actual error text varies widely from server to server. The most frequently-seen variants are "500 Server error", "Method (something) not permitted", "Document contains no data", "Premature end of script headers", and "Did not produce a valid header".
This is a CGI error, not a Perl error.
You need to make sure your script is executable, is accessible by the user CGI is running the script under (which is probably not the user account you tested it under), does not rely on any environment variables (like PATH) from the user it isn't running under, and isn't in a location where the CGI server can't find it, basically, more or less. Please see the following for more information:
- http://www.perl.org/CGI_MetaFAQ.html
- http://www.htmlhelp.org/faq/cgifaq.html
- http://www.w3.org/Security/Faq/
You should also look at perlfaq9.
(F) You tried to assign to $)
, and your operating system doesn't
support the setegid() system call (or equivalent), or at least Configure
didn't think so.
(F) You tried to assign to $>
, and your operating system doesn't
support the seteuid() system call (or equivalent), or at least Configure
didn't think so.
(F) Your system has the setpgrp() from BSD 4.2, which takes no arguments, unlike POSIX setpgid(), which takes a process ID and process group ID.
(F) You tried to assign to $(
, and your operating system doesn't
support the setrgid() system call (or equivalent), or at least Configure
didn't think so.
(F) You tried to assign to $<
, and your operating system doesn't
support the setruid() system call (or equivalent), or at least Configure
didn't think so.
(W closed) You tried to set a socket option on a closed socket. Did you forget to check the return value of your socket() call? See setsockopt.
(F) You don't have System V shared memory IPC on your system.
(W syntax) The non-matching operator is !~, not !=~. !=~ will be interpreted as the != (numeric not equal) and ~ (1's complement) operators: probably not what you intended.
(F) You wrote require <file>
when you should have written
require 'file'
.
(W syntax) You have used a pattern where Perl expected to find a string,
as in the first argument to join. Perl will treat the true or false
result of matching the pattern against $_ as the string, which is
probably not what you had in mind.
(W closed) You tried to do a shutdown on a closed socket. Seems a bit superfluous.
(W signal) The signal handler named in %SIG doesn't, in fact, exist. Perhaps you put it into the wrong package?
(S) If you see this message, then something is seriously wrong with the internal bookkeeping of op trees. An op tree needed to be freed after a compilation error, but could not be found, so it was leaked instead.
(W overflow) You called sleep with a number that was larger than
it can reliably handle and sleep probably slept for less time than
requested.
(S experimental::smartmatch) This warning is emitted if you
use the smartmatch (~~
) operator. This is currently an experimental
feature, and its details are subject to change in future releases of
Perl. Particularly, its current behavior is noticed for being
unnecessarily complex and unintuitive, and is very likely to be
overhauled.
(F) You should not use the ~~
operator on an object that does not
overload it: Perl refuses to use the object's underlying structure for
the smart match.
(F) An ancient error message that almost nobody ever runs into anymore. But before sort was a keyword, people sometimes used it as a filehandle.
(F) A sort comparison subroutine written in XS must return exactly one item. See sort.
(F) You tried to activate a source filter (usually by loading a
source filter module) within a string passed to eval. This is
not permitted under the unicode_eval
feature. Consider using
evalbytes instead. See feature.
(W misc) You attempted to specify an offset that was past the end of the array passed to splice(). Splicing will instead commence at the end of the array, rather than past it. If this isn't what you want, try explicitly pre-extending the array by assigning $#array = $offset. See splice.
(P) The split was looping infinitely. (Obviously, a split shouldn't iterate more times than there are characters of input, which is what happened.) See split.
(W exec) You did an exec() with some statement after it other than a die(). This is almost always an error, because exec() never returns unless there was a failure. You probably wanted to use system() instead, which does return. To suppress this warning, put the exec() in a block by itself.
(F) Lexically scoped variables aren't in a package, so it doesn't make sense to try to declare one with a package qualifier on the front. Use local() if you want to localize a package variable.
(W syntax) The package variables $a and $b are used for sort comparisons.
You used $a or $b in as an operand to the <=>
or cmp
operator inside a
sort comparison block, and the variable had earlier been declared as a
lexical variable. Either qualify the sort variable with the package
name, or rename the lexical variable.
(W unopened) You tried to use the stat() function on a filehandle that was either never opened or has since been closed.
(P) Overloading resolution over @ISA tree may be broken by importation
stubs. Stubs should never be implicitly created, but explicit calls to
can
may break this.
(W closure) During compilation, an inner named subroutine or eval is attempting to capture an outer lexical subroutine that is not currently available. This can happen for one of two reasons. First, the lexical subroutine may be declared in an outer anonymous subroutine that has not yet been created. (Remember that named subs are created at compile time, while anonymous subs are created at run-time.) For example,
At the time that f is created, it can't capture the current the "a" sub, since the anonymous subroutine hasn't been created yet. Conversely, the following won't give a warning since the anonymous subroutine has by now been created and is live:
The second situation is caused by an eval accessing a variable that has gone out of scope, for example,
Here, when the '\&a' in the eval is being compiled, f() is not currently being executed, so its &a is not available for capture.
(W misc) A "my" or "state" subroutine has been redeclared in the current scope or statement, effectively eliminating all access to the previous instance. This is almost always a typographical error. Note that the earlier subroutine will still exist until the end of the scope or until all closure references to it are destroyed.
(W redefine) You redefined a subroutine. To suppress this warning, say
(P) The substitution was looping infinitely. (Obviously, a substitution shouldn't iterate more times than there are characters of input, which is what happened.) See the discussion of substitution in Regexp Quote-Like Operators in perlop.
(F) The lexer couldn't find the interior delimiter of an s/// or s{}{}
construct. Remember that bracketing delimiters count nesting level.
Missing the leading $
from variable $s
may cause this error.
(F) The lexer couldn't find the final delimiter of an s/// or s{}{}
construct. Remember that bracketing delimiters count nesting level.
Missing the leading $
from variable $s
may cause this error.
(W substr)(F) You tried to reference a substr() that pointed outside of a string. That is, the absolute value of the offset was larger than the length of the string. See substr. This warning is fatal if substr is used in an lvalue context (as the left hand side of an assignment or as a subroutine argument for example).
(P) Perl tried to force the upgrade of an SV to a type which was actually inferior to its current type.
(F) A (?(condition)if-clause|else-clause) construct can have at most
two branches (the if-clause and the else-clause). If you want one or
both to contain alternation, such as using this|that|other
, enclose
it in clustering parentheses:
- (?(condition)(?:this|that|other)|else-clause)
The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(F) If the argument to the (?(...)if-clause|else-clause) construct is a number, it can be only a number. The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(F) While under the use filetest
pragma, we cannot switch the real
and effective uids or gids.
(F) The final summary message when a perl -c
succeeds.
(F) Probably means you had a syntax error. Common reasons include:
- A keyword is misspelled.
- A semicolon is missing.
- A comma is missing.
- An opening or closing parenthesis is missing.
- An opening or closing brace is missing.
- A closing quote is missing.
Often there will be another error message associated with the syntax
error giving more information. (Sometimes it helps to turn on -w.)
The error message itself often tells you where it was in the line when
it decided to give up. Sometimes the actual error is several tokens
before this, because Perl is good at understanding random input.
Occasionally the line number may be misleading, and once in a blue moon
the only way to figure out what's triggering the error is to call
perl -c
repeatedly, chopping away half the program each time to see
if the error went away. Sort of the cybernetic version of 20 questions.
(A) You've accidentally run your script through the Bourne shell instead of Perl. Check the #! line, or manually feed your script into Perl yourself.
(F) This error is likely to occur if you run a perl5 script through a perl4 interpreter, especially if the next 2 tokens are "use strict" or "my $var" or "our $var".
(W closed) You tried to read from a closed filehandle.
(W unopened) You tried to read from a filehandle that was never opened.
(F) Perl could not figure out what you meant inside this construct; this notifies you that it is giving up trying.
(F) You tried to do something with a function beginning with "sem", "shm", or "msg" but that System V IPC is not implemented in your machine. In some machines the functionality can exist but be unconfigured. Consult your system support.
(W closed) The filehandle you're writing to got itself closed sometime before now. Check your control flow.
-T
and -B
not implemented on filehandles
(F) Perl can't peek at the stdio buffer of filehandles when it doesn't know about your kind of stdio. You'll have to use a filename instead.
(F) You tried to use goto to reach a label that was too deeply nested
for Perl to reach. Perl is doing you a favor by refusing.
(W io) The dirhandle you tried to telldir() is either closed or not really a dirhandle. Check your control flow.
(W unopened) You tried to use the tell() function on a filehandle that was either never opened or has since been closed.
(F) Assignment to $[
is now strictly circumscribed, and interpreted
as a compiler directive. You may say only one of
This is to prevent the problem of one module changing the array base out from under another module inadvertently. See $[ in perlvar and arybase.
(F) Configure couldn't find the crypt() function on your machine, probably because your vendor didn't supply it, probably because they think the U.S. Government thinks it's a secret, or at least that they will continue to pretend that it is. And if you quote me on that, I will deny it.
(S experimental::lexical_subs) This warning is emitted if you
declare a sub with my or state. Simply suppress the warning
if you want to use the feature, but know that in doing so you
are taking the risk of using an experimental feature which may
change or be removed in a future Perl version:
(S experimental::regex_sets) This warning is emitted if you
use the syntax (?[ ])
in a regular expression.
The details of this feature are subject to change.
if you want to use it, but know that in doing so you
are taking the risk of using an experimental feature which may
change in a future Perl version, you can do this to silence the
warning:
- no warnings "experimental::regex_sets";
(S experimental) This warning is emitted if you enable an experimental
feature via use feature
. Simply suppress the warning if you want
to use the feature, but know that in doing so you are taking the risk
of using an experimental feature which may change or be removed in a
future Perl version:
(F) The function indicated isn't implemented on this architecture, according to the probings of Configure.
(F) It makes no sense to test the current stat buffer for symbolic linkhood if the last stat that wrote to the stat buffer already went past the symlink to get to the real file. Use an actual filename instead.
(F) This attribute was never supported on my or sub declarations.
(W internal) Warnings peculiar to VMS. You tried to change or delete an element of the CRTL's internal environ array, but your copy of Perl wasn't built with a CRTL that contained the setenv() function. You'll need to rebuild Perl with a CRTL that does, or redefine PERL_ENV_TABLES (see perlvms) so that the environ array isn't the target of the change to %ENV which produced the warning.
(F) Something has attempted to use an internal API call which depends on Perl being compiled with the default support for randomized hash key traversal, but this Perl has been compiled without it. You should report this warning to the relevant upstream party, or recompile perl with default options.
(W threads)(S) The entry point function of threads->create() failed for some reason.
(F) Your version of the C library apparently doesn't do times(). I suspect you're not running on Unix.
(X) The #! line (or local equivalent) in a Perl script contains the -T option (or the -t option), but Perl was not invoked with -T in its command line. This is an error because, by the time Perl discovers a -T in a script, it's too late to properly taint everything from the environment. So Perl gives up.
If the Perl script is being executed as a command using the #!
mechanism (or its local equivalent), this error can usually be
fixed by editing the #! line so that the -%c option is a part of
Perl's first argument: e.g. change perl -n -%c
to perl -%c -n
.
If the Perl script is being executed as perl scriptname
, then the
-%c option must appear on the command line: perl -%c scriptname.
(F) You tried to define a customized To-mapping for lc(), lcfirst, uc(), or ucfirst() (or their string-inlined versions), but you specified an illegal mapping. See User-Defined Character Properties in perlunicode.
(F) Your template contains ()-groups with a ridiculously deep nesting level.
(F) There has to be at least one argument to syscall() to specify the system call to call, silly dilly.
(X) The #! line (or local equivalent) in a Perl script contains the -M, -m or -C option.
In the case of -M and -m, this is an error because those options
are not intended for use inside scripts. Use the use pragma instead.
The -C option only works if it is specified on the command line as well (with the same sequence of letters or numbers following). Either specify this option on the command line, or, if your system supports it, make your script executable and run it directly instead of passing it to perl.
(W void) A CHECK or INIT block is being defined during run time proper,
when the opportunity to run them has already passed. Perhaps you are
loading a file with require or do when you should be using use
instead. Or perhaps you should put the require or do inside a
BEGIN block.
(F) Perl supports a maximum of only 14 args to syscall().
(F) The function requires fewer arguments than you specified.
(A) You've accidentally run your script through csh instead of Perl. Check the #! line, or manually feed your script into Perl yourself.
(A) You've accidentally run your script through csh instead of Perl. Check the #! line, or manually feed your script into Perl yourself.
(F) The regular expression ends with an unbackslashed backslash. Backslash it. See perlre.
(D) You defined a character name which ended in a space character.
Remove the trailing space(s). Usually these names are defined in the
:alias
import argument to use charnames
, but they could be defined
by a translator installed into $^H{charnames}
.
See CUSTOM ALIASES in charnames.
(F) The lexer couldn't find the interior delimiter of a tr/// or tr[][]
or y/// or y[][] construct. Missing the leading $
from variables
$tr
or $y
may cause this error.
(F) The lexer couldn't find the final delimiter of a tr///, tr[][], y/// or y[][] construct.
(F) You tried to use an operator from a Safe compartment in which it's disallowed. See Safe.
(F) Your machine doesn't implement a file truncation mechanism that Configure knows about.
(F) The subroutine in question in the CORE package requires its argument to be a hard reference to data of the specified type. Overloading is ignored, so a reference to an object that is not the specified type, but nonetheless has overloading to handle it, will still not be accepted.
(F) This function requires the argument in that position to be of a
certain type. Arrays must be @NAME or @{EXPR}
. Hashes must be
%NAME or %{EXPR}
. No implicit dereferencing is allowed--use the
{EXPR} forms as an explicit dereference. See perlref.
(F) You called keys, values or each with a scalar argument that
was not a reference to an unblessed hash or array.
(F) Your machine doesn't implement the umask function and you tried to use it to restrict permissions for yourself (EXPR & 0700).
(S internal) The exit code detected an internal inconsistency in how many execution contexts were entered and left.
(S internal) The exit code detected an internal inconsistency in how many values were temporarily localized.
(S internal) The exit code detected an internal inconsistency in how many blocks were entered and left.
(S internal) On exit, Perl found some strings remaining in the shared string table used for copy on write and for hash keys. The entries should have been freed, so this indicates a bug somewhere.
(S internal) The exit code detected an internal inconsistency in how many mortal scalars were allocated and freed.
(F) The format indicated doesn't seem to exist. Perhaps it's really in another package? See perlform.
(F) The sort comparison routine specified doesn't seem to exist. Perhaps it's in a different package? See sort.
(F) The subroutine indicated hasn't been defined, or if it was, it has since been undefined.
(F) The anonymous subroutine you're trying to call hasn't been defined, or if it was, it has since been undefined.
(F) The sort comparison routine specified is declared but doesn't seem to have been defined yet. See sort.
(F) The format indicated doesn't seem to exist. Perhaps it's really in another package? See perlform.
(W misc) An undefined value was assigned to a typeglob, a la
*foo = undef
. This does nothing. It's possible that you really mean
undef *foo
.
(A) You've accidentally run your script through csh instead of Perl. Check the #! line, or manually feed your script into Perl yourself.
(F) The unexec() routine failed for some reason. See your local FSF representative, who probably put it there in the first place.
(F) You had something like this:
- (?[ \p{Digit} ( \p{Lao} + \p{Thai} ) ])
There should be an operator before the "("
, as there's no indication
as to how the digits are to be combined with the characters in the Lao
and Thai scripts.
(F) You had something like this:
- (?[ ( \p{Digit} + ) ])
The ")"
is out-of-place. Something apparently was supposed to be
combined with the digits, or the "+"
shouldn't be there, or something
like that. Perl can't figure out what was intended.
(F) You had something like this:
- (?[ | \p{Digit} ])
where the "|"
is a binary operator with an operand on the right, but
no operand on the left.
(F) You had something like this:
- (?[ z ])
Within (?[ ])
, no literal characters are allowed unless they are
within an inner pair of square brackets, like
- (?[ [ z ] ])
Another possibility is that you forgot a backslash. Perl isn't smart enough to figure out what you really meant.
(P) When compiling a subroutine call in lvalue context, Perl failed an internal consistency check. It encountered a malformed op tree.
(S utf8, nonchar) Certain codepoints, such as U+FFFE and U+FFFF, are
defined by the Unicode standard to be non-characters. Those are
legal codepoints, but are reserved for internal use; so, applications
shouldn't attempt to exchange them. If you know what you are doing
you can turn off this warning by no warnings 'nonchar';
.
(S utf8, surrogate) You had a UTF-16 surrogate in a context where they are
not considered acceptable. These code points, between U+D800 and
U+DFFF (inclusive), are used by Unicode only for UTF-16. However, Perl
internally allows all unsigned integer code points (up to the size limit
available on your platform), including surrogates. But these can cause
problems when being input or output, which is likely where this message
came from. If you really really know what you are doing you can turn
off this warning by no warnings 'surrogate';
.
(F) There are no byte-swapping functions for a machine with this byte order.
(F) The name you used inside \N{}
is unknown to Perl. Check the
spelling. You can say use charnames ":loose"
to not have to be
so precise about spaces, hyphens, and capitalization on standard Unicode
names. (Any custom aliases that have been created must be specified
exactly, regardless of whether :loose
is used or not.) This error may
also happen if the \N{}
is not in the scope of the corresponding
use charnames
.
(P) Perl was about to print an error message in $@
, but the $@
variable
did not exist, even after an attempt to create it.
(F) The second argument of 3-argument open() is not among the list
of valid modes: <
, >, >>
, +<
,
+>, +>>
, -|, |-, <&
, >&.
(W layer) An attempt was made to push an unknown layer onto the Perl I/O
system. (Layers take care of transforming data between external and
internal representations.) Note that some layers, such as mmap
,
are not supported in all environments. If your program didn't
explicitly request the failing operation, it may be the result of the
value of the environment variable PERLIO.
(P) An error peculiar to VMS. Perl was reading values for %ENV before iterating over it, and someone else stuck a message in the stream of data Perl expected. Someone's very confused, or perhaps trying to subvert Perl's population of %ENV for nefarious purposes.
(W) You tried to use an unknown subpragma of the "re" pragma.
(F) Alphanumerics immediately following the closing delimiter of a regular expression pattern are interpreted by Perl as modifier flags for the regex. One of the ones you specified is invalid. One way this can happen is if you didn't put in white space between the end of the regex and a following alphanumeric operator:
The "a"
is a valid modifier flag, but the "n"
is not, and raises
this error. Likely what was meant instead was:
(F) The condition part of a (?(condition)if-clause|else-clause) construct is not known. The condition must be one of the following:
- (1) (2) ... true if 1st, 2nd, etc., capture matched
- (<NAME>) ('NAME') true if named capture matched
- (?=...) (?<=...) true if subpattern matches
- (?!...) (?<!...) true if subpattern fails to match
- (?{ CODE }) true if code returns a true value
- (R) true if evaluating inside recursion
- (R1) (R2) ... true if directly inside capture group 1, 2, etc.
- (R&NAME) true if directly inside named capture
- (DEFINE) always false; for defining named subpatterns
The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(F) You specified an unknown Unicode option. See perlrun documentation
of the -C
switch for the list of known options.
(F) You specified an unknown Unicode option. See perlrun documentation
of the -C
switch for the list of known options.
(F) You either made a typo or have incorrectly put a *
quantifier
after an open brace in your pattern. Check the pattern and review
perlre for details on legal verb patterns.
(F) An error issued by the warnings
pragma. You specified a warnings
category that is unknown to perl at this point.
Note that if you want to enable a warnings category registered by a
module (e.g. use warnings 'File::Find'
), you must have loaded this
module first.
You had something like this:
- (?[ [:alnum] ])
There should be a second ":"
, like this:
- (?[ [:alnum:] ])
(F) The brackets around a character class must match. If you wish to include a closing bracket in a character class, backslash it or put it first. The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(F) You had something like this:
- (?[ [:digit: ])
That should be written:
- (?[ [:digit:] ])
(F) Unbackslashed parentheses must always be balanced in regular expressions. If you're a vi user, the % key is valuable for finding the matching parenthesis. The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(F) The lexer counted more closing curly or square brackets than opening ones, so you're probably missing a matching opening bracket. As a general rule, you'll find the missing one (so to speak) near the place you were last editing.
(W reserved) You used a bareword that might someday be claimed as a reserved word. It's best to put such a word in quotes, or capitalize it somehow, or insert an underbar into it. You might also declare it as a subroutine.
(F) The Perl parser has no idea what to do with the specified character in your Perl script (or eval) near the specified column. Perhaps you tried to run a compressed script, a binary program, or a directory as a Perl program.
(F)
You used a backslash-character combination which is not recognized by
Perl inside character classes. This is a fatal error when the character
class is used within (?[ ])
.
(W regexp) You used a backslash-character combination which is not recognized by Perl inside character classes. The character was understood literally, but this may change in a future version of Perl. The <-- HERE shows whereabouts in the regular expression the escape was discovered.
(W misc) You used a backslash-character combination which is not recognized by Perl. The character was understood literally, but this may change in a future version of Perl.
(W regexp) You used a backslash-character combination which is not recognized by Perl. The character(s) were understood literally, but this may change in a future version of Perl. The <-- HERE shows whereabouts in the regular expression the escape was discovered.
(F) You specified a signal name to the kill() function that was not
recognized. Say kill -l
in your shell to see the valid signal names
on your system.
(F) You specified an illegal option to Perl. Don't do that. (If you think you didn't do that, check the #! line to see if it's supplying the bad switch on your behalf.)
(W newline) A file operation was attempted on a filename, and that operation failed, PROBABLY because the filename contained a newline, PROBABLY because you forgot to chomp() it off. See chomp.
(F) Your machine doesn't support opendir() and readdir().
(F) This machine doesn't implement the indicated function, apparently. At least, Configure doesn't think so.
(F) Your version of executable does not support forking.
Note that under some systems, like OS/2, there may be different flavors
of Perl executables, some of which may support fork, some not. Try
changing the name you call Perl by to perl_
, perl__
, and so on.
(F) Your program file begins with a Unicode Byte Order Mark (BOM) which declares it to be in a Unicode encoding that Perl cannot read.
(F) Your machine doesn't support the Berkeley socket mechanism, or at least that's what Configure thought.
(F) The lexer found something other than a simple identifier at the start of an attribute, and it wasn't a semicolon or the start of a block. Perhaps you terminated the parameter list of the previous attribute too soon. See attributes.
(F) The lexer saw an opening (left) parenthesis character while parsing an attribute list, but the matching closing (right) parenthesis character was not found. You may need to add (or remove) a backslash character to get your parentheses to balance. See attributes.
(F) An argument to unpack("w",...) was incompatible with the BER compressed integer format and could not be converted to an integer. See pack.
(F) This message occurs when a here document label has an initial quotation mark but the final quotation mark is missing. Perhaps you wrote:
- <<"foo
instead of:
- <<"foo"
(F) You missed a close brace on a \g{..} pattern (group reference) in a regular expression. Fix the pattern and retry.
(F) The lexer saw a left angle bracket in a place where it was expecting a term, so it's looking for the corresponding right angle bracket, and not finding it. Chances are you left some needed parentheses out earlier in the line, and you really meant a "less than".
(F) You used a pattern of the form (*VERB:ARG) but did not terminate
the pattern with a ). Fix the pattern and retry.
(F) You used a pattern of the form (*VERB)
but did not terminate
the pattern with a ). Fix the pattern and retry.
(W untie) A copy of the object returned from tie (or tied) was
still valid when untie was called.
(F) You called a POSIX function with incorrect arguments. See FUNCTIONS in POSIX for more information.
(F) You called a Win32 function with incorrect arguments. See Win32 for more information.
(W syntax) You used $[
in a comparison, such as:
- if ($[ > 5.006) {
- ...
- }
You probably meant to use $]
instead. $[
is the base for indexing
arrays. $]
is the Perl version number in decimal.
(F) In a regular expression, you said something like
- (?[ [ \xBEEF ] ])
Perl isn't sure if you meant this
- (?[ [ \x{BEEF} ] ])
or if you meant this
- (?[ [ \x{BE} E F ] ])
You need to add either braces or blanks to disambiguate.
(S internal) The behavior of each() after insertion is undefined, it may
skip items, or visit items more than once. Consider using keys() instead
of each().
(W misc) You assigned to an lvalue subroutine, but what the subroutine returned was a temporary scalar about to be discarded, so the assignment had no effect.
(W regexp) You have used an internal modifier such as (?-o) that has no meaning unless removed from the entire regexp:
- if ($string =~ /(?-o)$pattern/o) { ... }
must be written as
- if ($string =~ /$pattern/) { ... }
The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(W syntax) The localization of lvalues such as local($x=10) is legal,
but in fact the local() currently has no effect. This may change at
some point in the future, but in the meantime such code is discouraged.
(W regexp) You have used an internal modifier such as (?o) that has no meaning unless applied to the entire regexp:
- if ($string =~ /(?o)$pattern/) { ... }
must be written as
- if ($string =~ /$pattern/o) { ... }
The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
(W misc) You have used the /d modifier where the searchlist has the same length as the replacelist. See perlop for more information about the /d modifier.
(D deprecated) You wrote a regular expression pattern something like one of these:
- m{ \x\{FF\} }x
- m{foo\{1,3\}}
- qr(foo\(bar\))
- s[foo\[a-z\]bar][baz]
The interior braces, square brackets, and parentheses are treated as metacharacters even though they are backslashed; instead write:
- m{ \x{FF} }x
- m{foo{1,3}}
- qr(foo(bar))
- s[foo[a-z]bar][baz]
The backslashes have no effect when a regular expression pattern is
delimitted by {}
, []
, or ()
, which ordinarily are
metacharacters, and the delimiters are also used, paired, within the
interior of the pattern. It is planned that a future Perl release will
change the meaning of constructs like these so that the backslashes
will have an effect, so remove them from your code.
(W misc) You have a \E in a double-quotish string without a \U
,
\L
or \Q
preceding it.
(W void) You did something without a side effect in a context that does nothing with the return value, such as a statement that doesn't return a value from a block, or the left side of a scalar comma operator. Very often this points not to stupidity on your part, but a failure of Perl to parse your program the way you thought it would. For example, you'd get this if you mixed up your C precedence with Python precedence and said
- $one, $two = 1, 2;
when you meant to say
- ($one, $two) = (1, 2);
Another common error is to use ordinary parentheses to construct a list reference when you should be using square or curly brackets, for example, if you say
- $array = (1,2);
when you should have said
- $array = [1,2];
The square brackets explicitly turn a list value into a scalar value, while parentheses do not. So when a parenthesized list is evaluated in a scalar context, the comma is treated like C's comma operator, which throws away the left argument, which is not what you want. See perlref for more on this.
This warning will not be issued for numerical constants equal to 0 or 1 since they are often used in statements like
- 1 while sub_with_side_effects();
String constants that would normally evaluate to 0 or 1 are warned about.
(W) You did use re;
without any arguments. That isn't very useful.
(W void) You used sort in scalar context, as in :
This is not very useful, and perl currently optimizes this away.
(W regexp)
The p
modifier cannot be turned off once set. Trying to do so is
futile.
(W syntax) You used the push() or unshift() function with no arguments
apart from the array, like push(@x) or unshift(@foo). That won't
usually have any effect on the array, so is completely useless. It's
possible in principle that push(@tied_array) could have some effect
if the array is tied to a class which implements a PUSH method. If so,
you can write it as push(@tied_array,()) to avoid this warning.
(W regexp)
The /g and /o regular expression modifiers are global and can't be
turned off once set; hence things like (?g)
or (?-o:)
do nothing.
(W regexp)
The /c regular expression modifier is global, can't be turned off
once set, and doesn't do anything without the /g modifier being
specified as well; hence things like (?c)
or (?-c:)
do nothing,
nor do thing like (?gc)
nor (?-gc:)
.
(F) The "use" keyword is recognized and executed at compile time, and returns no useful value. See perlmod.
(D deprecated) The $[
variable (index of the first element in an array)
is deprecated. See $[ in perlvar.
(D deprecated) You are now encouraged to use the explicitly quoted form if you wish to use an empty line as the terminator of the here-document.
(D deprecated) The values you give to a format should be separated by commas, not just aligned on a line.
(D deprecated) chdir() with no arguments is documented to change to $ENV{HOME} or $ENV{LOGDIR}. chdir(undef) and chdir('') share this behavior, but that has been deprecated. In future versions they will simply fail.
Be careful to check that what you pass to chdir() is defined and not blank, else you might find yourself in your home directory.
(W regexp) You used the /c modifier in a substitution. The /c modifier is not presently meaningful in substitutions.
(W regexp) You used the /c modifier with a regex operand, but didn't use the /g modifier. Currently, /c is meaningful only when /g is used. (This may change in the future.)
(F) The construction my $x := 42
used to parse as equivalent to
my $x : = 42
(applying an empty attribute list to $x
).
This construct was deprecated in 5.12.0, and has now been made a syntax
error, so :=
can be reclaimed as a new operator in the future.
If you need an empty attribute list, for example in a code generator, add
a space before the =
.
(F) Perhaps you modified the iterated array within the loop? This error is typically caused by code like the following:
- @a = (3,4);
- @a = () for (1,2,@a);
You are not supposed to modify arrays while they are being iterated over. For speed and efficiency reasons, Perl internally does not do full reference-counting of iterated items, hence deleting such an item in the middle of an iteration causes Perl to see a freed value.
(D deprecated) You are now encouraged to use the shorter *glob{IO} form to access the filehandle slot within a typeglob.
(W regexp) You used the /g modifier on the pattern for a split
operator. Since split always tries to match the pattern
repeatedly, the /g has no effect.
(D deprecated) Using goto to jump from an outer scope into an inner
scope is deprecated and should be avoided.
(D deprecated) As an (ahem) accidental feature, AUTOLOAD
subroutines are looked up as methods (using the @ISA
hierarchy)
even when the subroutines to be autoloaded were called as plain
functions (e.g. Foo::bar()
), not as methods (e.g. Foo->bar()
or
$obj->bar()
).
This bug will be rectified in future by using method lookup only for
methods' AUTOLOAD
s. However, there is a significant base of existing
code that may be using the old behavior. So, as an interim step, Perl
currently issues an optional warning when non-methods use inherited
AUTOLOAD
s.
The simple rule is: Inheritance will not work when autoloading
non-methods. The simple fix for old code is: In any module that used
to depend on inheriting AUTOLOAD
for non-methods from a base class
named BaseClass
, execute *AUTOLOAD = \&BaseClass::AUTOLOAD
during
startup.
In code that currently says use AutoLoader; @ISA = qw(AutoLoader);
you should remove AutoLoader from @ISA and change use AutoLoader;
to
use AutoLoader 'AUTOLOAD';
.
(F) You attempted to use a feature of printf that is accessible from only C. This usually means there's a better way to do it in Perl.
(D deprecated) The construct indicated is no longer recommended for use, generally because there's a better way to do it, and also because the old way has bad side effects.
(W io) A filehandle represents an opened file, and when you opened the file
it already went past any symlink you are presumably trying to look for.
The operation returned undef. Use a filename instead.
(S experimental::lexical_topic) Lexical $_ is an experimental feature and its behavior may change or even be removed in any future release of perl. See the explanation under $_ in perlvar.
(D deprecated) You used tie, tied or untie on a scalar but that scalar
happens to hold a typeglob, which means its filehandle will be tied. If
you mean to tie a handle, use an explicit * as in tie *$handle
.
This was a long-standing bug that was removed in Perl 5.16, as there was no way to tie the scalar itself when it held a typeglob, and no way to untie a scalar that had had a typeglob assigned to it. If you see this message, you must be using an older version.
(D deprecated) You have written something like ?\w?
, for a regular
expression that matches only once. Starting this term directly with
the question mark delimiter is now deprecated, so that the question mark
will be available for use in new operators in the future. Write m?\w?
instead, explicitly using the m operator: the question mark delimiter
still invokes match-once behaviour.
(W misc) You tried to use a reference as an array index; this probably isn't what you mean, because references in numerical context tend to be huge numbers, and so usually indicates programmer error.
If you really do mean it, explicitly numify your reference, like so:
$array[0+$ref]
. This warning is not given for overloaded objects,
however, because you can overload the numification and stringification
operators and then you presumably know what you are doing.
(S experimental::lexical_topic) Lexical $_ is an experimental feature and its behavior may change or even be removed in any future release of perl. See the explanation under $_ in perlvar.
(W taint, deprecated) You have supplied system() or exec() with multiple
arguments and at least one of them is tainted. This used to be allowed
but will become a fatal error in a future version of perl. Untaint your
arguments. See perlsec.
(W uninitialized) An undefined value was used as if it were already defined. It was interpreted as a "" or a 0, but maybe it was a mistake. To suppress this warning assign a defined value to your variables.
To help you figure out what was undefined, perl will try to tell you
the name of the variable (if any) that was undefined. In some cases
it cannot do this, so it also tells you what operation you used the
undefined value in. Note, however, that perl optimizes your program
anid the operation displayed in the warning may not necessarily appear
literally in your program. For example, "that $foo"
is usually
optimized into "that " . $foo
, and the warning will refer to the
concatenation (.) operator, even though there is no . in
your program.
(D deprecated) You tried to use a hash as a reference, as in
%foo->{"bar"}
or %$ref->{"hello"}
. Versions of perl <= 5.6.1
used to allow this syntax, but shouldn't have. It is now
deprecated, and will be removed in a future version.
(D deprecated) You tried to use an array as a reference, as in
@foo->[23]
or @$ref->[99]
. Versions of perl <= 5.6.1 used to
allow this syntax, but shouldn't have. It is now deprecated,
and will be removed in a future version.
(W regexp) A charnames handler may return a sequence of more than one character. Currently all but the first one are discarded when used in a regular expression pattern bracketed character class.
(F) Using the !~
operator with s///r, tr///r or y///r is
currently reserved for future use, as the exact behaviour has not
been decided. (Simply returning the boolean opposite of the
modified string is usually not particularly useful.)
(S utf8, surrogate) You had a UTF-16 surrogate in a context where they are
not considered acceptable. These code points, between U+D800 and
U+DFFF (inclusive), are used by Unicode only for UTF-16. However, Perl
internally allows all unsigned integer code points (up to the size limit
available on your platform), including surrogates. But these can cause
problems when being input or output, which is likely where this message
came from. If you really really know what you are doing you can turn
off this warning by no warnings 'surrogate';
.
(W misc) In a conditional expression, you used <HANDLE>, <*> (glob),
each(), or readdir() as a boolean value. Each of these constructs
can return a value of "0"; that would make the conditional expression
false, which is probably not what you intended. When using these
constructs in conditional expressions, test their values with the
defined operator.
(W misc) A warning peculiar to VMS. Perl tried to read the value of an %ENV element from a CLI symbol table, and found a resultant string longer than 1024 characters. The return value has been truncated to 1024 characters.
(W closure) During compilation, an inner named subroutine or eval is attempting to capture an outer lexical that is not currently available. This can happen for one of two reasons. First, the outer lexical may be declared in an outer anonymous subroutine that has not yet been created. (Remember that named subs are created at compile time, while anonymous subs are created at run-time.) For example,
At the time that f is created, it can't capture the current value of $a, since the anonymous subroutine hasn't been created yet. Conversely, the following won't give a warning since the anonymous subroutine has by now been created and is live:
The second situation is caused by an eval accessing a variable that has gone out of scope, for example,
Here, when the '$a' in the eval is being compiled, f() is not currently being executed, so its $a is not available for capture.
(S misc) With "use strict" in effect, you referred to a global variable that you apparently thought was imported from another module, because something else of the same name (usually a subroutine) is exported by that module. It usually means you put the wrong funny character on the front of your variable.
(F) Lookbehind is allowed only for subexpressions whose length is fixed and known at compile time. See perlre.
(W misc) A "my", "our" or "state" variable has been redeclared in the current scope or statement, effectively eliminating all access to the previous instance. This is almost always a typographical error. Note that the earlier variable will still exist until the end of the scope or until all closure references to it are destroyed.
(A) You've accidentally run your script through csh instead of Perl. Check the #! line, or manually feed your script into Perl yourself.
(W closure) An inner (nested) named subroutine is referencing a lexical variable defined in an outer named subroutine.
When the inner subroutine is called, it will see the value of the outer subroutine's variable as it was before and during the *first* call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {}
syntax. When inner anonymous subs that
reference variables in outer subroutines are created, they
are automatically rebound to the current values of such variables.
(S printf) The %vd (s)printf format does not support version objects with alpha parts.
(F) You used a verb pattern that requires an argument. Supply an argument or check that you are using the right verb.
(F) You used a verb pattern that is not allowed an argument. Remove the argument or check that you are using the right verb.
(P) The attempt to translate a use Module n.n LIST
statement into
its equivalent BEGIN
block found an internal inconsistency with
the version number.
(W misc) The version string contains invalid characters at the end, which are being ignored.
(W) You passed warn() an empty string (the equivalent of warn ""
) or
you called it with no args and $@
was empty.
(S) The implicit close() done by an open() got an error indication on the close(). This usually indicates your file system ran out of disk space.
(S ambiguous) You wrote a unary operator followed by something that looks like a binary operator that could also have been interpreted as a term or unary operator. For instance, if you know that the rand function has a default argument of 1.0, and you write
- rand + 5;
you may THINK you wrote the same thing as
- rand() + 5;
but in actual fact, you got
- rand(+5);
So put in parentheses to say what you really mean.
(S experimental::smartmatch) when
depends on smartmatch, which is
experimental. Additionally, it has several special cases that may
not be immediately obvious, and their behavior may change or
even be removed in any future release of perl.
See the explanation under Experimental Details on given and when in perlsyn.
(S utf8) Perl met a wide character (>255) when it wasn't expecting
one. This warning is by default on for I/O (like print). The easiest
way to quiet this warning is simply to add the :utf8
layer to the
output, e.g. binmode STDOUT, ':utf8'
. Another way to turn off the
warning is to add no warnings 'utf8';
but that is often closer to
cheating. In general, you are supposed to explicitly mark the
filehandle with an encoding, see open and binmode.
(F) The count in the (un)pack template may be replaced by [TEMPLATE]
only if TEMPLATE
always matches the same amount of packed bytes that
can be determined from the template alone. This is not possible if
it contains any of the codes @, /, U, u, w or a *-length. Redesign
the template.
(W closed) The filehandle you're writing to got itself closed sometime before now. Check your control flow.
(F) When reading in different encodings, Perl tries to map everything into Unicode characters. The bytes you read in are not legal in this encoding. For example
- utf8 "\xE4" does not map to Unicode
if you try to read in the a-diaereses Latin-1 as UTF-8.
(F) You had a (un)pack template that specified a relative position before the beginning of the string being (un)packed. See pack.
(F) You had a pack template that specified a relative position after the end of the string being unpacked. See pack.
(F) And you probably never will, because you probably don't have the sources to your kernel, and your vendor probably doesn't give a rip about what you want. Your best bet is to put a setuid C wrapper around your script.
(W syntax) You assigned a bareword as a signal handler name. Unfortunately, you already have a subroutine of that name declared, which means that Perl 5 will try to call the subroutine when the assignment is executed, which is probably not what you want. (If it IS what you want, put an & in front.)
(F) When trying to initialise the random seed for hashes, Perl could not get any randomness out of your system. This usually indicates Something Very Wrong.
warnings, perllexwarn, diagnostics.
- perldoc [-h] [-D] [-t] [-u] [-m] [-l] [-F]
- [-i] [-V] [-T] [-r]
- [-d destination_file]
- [-o formatname]
- [-M FormatterClassName]
- [-w formatteroption:value]
- [-n nroff-replacement]
- [-X]
- [-L language_code]
- PageName|ModuleName|ProgramName|URL
Examples:
- perldoc -f BuiltinFunction
- perldoc -L it -f BuiltinFunction
- perldoc -q FAQ Keyword
- perldoc -L fr -q FAQ Keyword
- perldoc -v PerlVariable
See below for more description of the switches.
perldoc looks up a piece of documentation in .pod format that is
embedded in the perl installation tree or in a perl script, and displays
it via groff -man | $PAGER
. (In addition, if running under HP-UX,
col -x
will be used.) This is primarily used for the documentation for
the perl library modules.
Your system may also have man pages installed for those modules, in which case you can probably just use the man(1) command.
If you are looking for a table of contents to the Perl library modules documentation, see the perltoc page.
Prints out a brief help message.
Describes search for the item in detail.
Display docs using plain text converter, instead of nroff. This may be faster, but it probably won't look as nice.
Skip the real Pod formatting, and just show the raw Pod source (Unformatted)
Display the entire module: both code and unformatted pod documentation. This may be useful if the docs don't explain a function in the detail you need, and you'd like to inspect the code directly; perldoc will find the file for you and simply hand it off for display.
Display only the file name of the module found.
Consider arguments as file names; no search in directories will be performed.
The -f option followed by the name of a perl built-in function will extract the documentation of this function from perlfunc.
Example:
- perldoc -f sprintf
The -q option takes a regular expression as an argument. It will search the question headings in perlfaq[1-9] and print the entries matching the regular expression.
Example:
- perldoc -q shuffle
The -v option followed by the name of a Perl predefined variable will extract the documentation of this variable from perlvar.
Examples:
- perldoc -v '$"'
- perldoc -v @+
- perldoc -v DATA
This specifies that the output is not to be sent to a pager, but is to be sent directly to STDOUT.
This specifies that the output is to be sent neither to a pager nor
to STDOUT, but is to be saved to the specified filename. Example:
perldoc -oLaTeX -dtextwrapdocs.tex Text::Wrap
This specifies that you want Perldoc to try using a Pod-formatting
class for the output format that you specify. For example:
-oman
. This is actually just a wrapper around the -M
switch;
using -oformatname just looks for a loadable class by adding
that format name (with different capitalizations) to the end of
different classname prefixes.
For example, -oLaTeX
currently tries all of the following classes:
Pod::Perldoc::ToLaTeX Pod::Perldoc::Tolatex Pod::Perldoc::ToLatex
Pod::Perldoc::ToLATEX Pod::Simple::LaTeX Pod::Simple::latex
Pod::Simple::Latex Pod::Simple::LATEX Pod::LaTeX Pod::latex Pod::Latex
Pod::LATEX.
This specifies the module that you want to try using for formatting the
pod. The class must at least provide a parse_from_file
method.
For example: perldoc -MPod::Perldoc::ToChecker
.
You can specify several classes to try by joining them with commas
or semicolons, as in -MTk::SuperPod;Tk::Pod
.
This specifies an option to call the formatter with. For example,
-w textsize:15 will call
$formatter->textsize(15)
on the formatter object before it is
used to format the object. For this to be valid, the formatter class
must provide such a method, and the value you pass should be valid.
(So if textsize
expects an integer, and you do -w textsize:big,
expect trouble.)
You can use -w optionname
(without a value) as shorthand for
-w optionname:TRUE. This is presumably useful in cases of on/off
features like: -w page_numbering
.
You can use an "=" instead of the ":", as in: -w textsize=15
. This
might be more (or less) convenient, depending on what shell you use.
Use an index if it is present. The -X option looks for an entry
whose basename matches the name given on the command line in the file
$Config{archlib}/pod.idx
. The pod.idx file should contain fully
qualified filenames, one per line.
This allows one to specify the language code for the desired language
translation. If the POD2::<language_code>
package isn't
installed in your system, the switch is ignored.
All available translation packages are to be found under the POD2::
namespace. See POD2::IT (or POD2::FR) to see how to create new
localized POD2::*
documentation packages and integrate them into
Pod::Perldoc.
The item you want to look up. Nested modules (such as File::Basename
)
are specified either as File::Basename
or File/Basename
. You may also
give a descriptive name of a page, such as perlfunc
. For URLs, HTTP and
HTTPS are the only kind currently supported.
For simple names like 'foo', when the normal search fails to find a matching page, a search with the "perl" prefix is tried as well. So "perldoc intro" is enough to find/render "perlintro.pod".
Specify replacement for groff
Recursive search.
Ignore case.
Displays the version of perldoc you're running.
Because perldoc does not run properly tainted, and is known to have security issues, when run as the superuser it will attempt to drop privileges by setting the effective and real IDs to nobody's or nouser's account, or -2 if unavailable. If it cannot relinquish its privileges, it will not run.
Any switches in the PERLDOC
environment variable will be used before the
command line arguments.
Useful values for PERLDOC
include -oterm
, -otext
, -ortf
,
-oxml
, and so on, depending on what modules you have on hand; or
the formatter class may be specified exactly with -MPod::Perldoc::ToTerm
or the like.
perldoc
also searches directories
specified by the PERL5LIB
(or PERLLIB
if PERL5LIB
is not
defined) and PATH
environment variables.
(The latter is so that embedded pods for executables, such as
perldoc
itself, are available.)
In directories where either Makefile.PL
or Build.PL
exist, perldoc
will add . and lib
first to its search path, and as long as you're not
the superuser will add blib
too. This is really helpful if you're working
inside of a build directory and want to read through the docs even if you
have a version of a module previously installed.
perldoc
will use, in order of preference, the pager defined in
PERLDOC_PAGER
, MANPAGER
, or PAGER
before trying to find a pager
on its own. (MANPAGER
is not used if perldoc
was told to display
plain text or unformatted pod.)
One useful value for PERLDOC_PAGER
is less -+C -E
.
Having PERLDOCDEBUG set to a positive integer will make perldoc emit
even more descriptive output than the -D
switch does; the higher the
number, the more it emits.
Up to 3.14_05, the switch -v was used to produce verbose messages of perldoc operation, which is now enabled by -D.
Current maintainer: Mark Allen <mallen@cpan.org>
Past contributors are:
brian d foy <bdfoy@cpan.org>
Adriano R. Ferreira <ferreira@cpan.org>
,
Sean M. Burke <sburke@cpan.org>
,
Kenneth Albanowski <kjahds@kjahds.com>
,
Andy Dougherty <doughera@lafcol.lafayette.edu>
,
and many others.
perldos - Perl under DOS, W31, W95.
These are instructions for building Perl under DOS (or w??), using DJGPP v2.03 or later. Under w95 long filenames are supported.
Before you start, you should glance through the README file found in the top-level directory where the Perl distribution was extracted. Make sure you read and understand the terms under which this software is being distributed.
This port currently supports MakeMaker (the set of modules that is used to build extensions to perl). Therefore, you should be able to build and install most extensions found in the CPAN sites.
Detailed instructions on how to build and install perl extension modules, including XS-type modules, is included. See 'BUILDING AND INSTALLING MODULES'.
DJGPP is a port of GNU C/C++ compiler and development tools to 32-bit, protected-mode environment on Intel 32-bit CPUs running MS-DOS and compatible operating systems, by DJ Delorie <dj@delorie.com> and friends.
For more details (FAQ), check out the home of DJGPP at:
- http://www.delorie.com/djgpp/
If you have questions about DJGPP, try posting to the DJGPP newsgroup: comp.os.msdos.djgpp, or use the email gateway djgpp@delorie.com.
You can find the full DJGPP distribution on any of the mirrors listed here:
- http://www.delorie.com/djgpp/getting.html
You need the following files to build perl (or add new modules):
- v2/djdev203.zip
- v2gnu/bnu2112b.zip
- v2gnu/gcc2953b.zip
- v2gnu/bsh204b.zip
- v2gnu/mak3791b.zip
- v2gnu/fil40b.zip
- v2gnu/sed3028b.zip
- v2gnu/txt20b.zip
- v2gnu/dif272b.zip
- v2gnu/grep24b.zip
- v2gnu/shl20jb.zip
- v2gnu/gwk306b.zip
- v2misc/csdpmi5b.zip
or possibly any newer version.
Thread support is not tested in this version of the djgpp perl.
Perl under DOS lacks some features of perl under UNIX because of deficiencies in the UNIX-emulation, most notably:
fork() and pipe()
some features of the UNIX filesystem regarding link count and file dates
in-place operation is a little bit broken with short filenames
sockets
Unpack the source package perl5.8*.tar.gz with djtarx. If you want to use long file names under w95 and also to get Perl to pass all its tests, don't forget to use
- set LFN=y
- set FNCASE=y
before unpacking the archive.
Create a "symlink" or copy your bash.exe to sh.exe in your ($DJDIR)/bin
directory.
- ln -s bash.exe sh.exe
[If you have the recommended version of bash for DJGPP, this is already done for you.]
And make the SHELL
environment variable point to this sh.exe:
- set SHELL=c:/djgpp/bin/sh.exe (use full path name!)
You can do this in djgpp.env too. Add this line BEFORE any section definition:
- +SHELL=%DJDIR%/bin/sh.exe
If you have split.exe and gsplit.exe in your path, then rename split.exe to djsplit.exe, and gsplit.exe to split.exe. Copy or link gecho.exe to echo.exe if you don't have echo.exe. Copy or link gawk.exe to awk.exe if you don't have awk.exe.
[If you have the recommended versions of djdev, shell utilities and gawk, all these are already done for you, and you will not need to do anything.]
Chdir to the djgpp subdirectory of perl toplevel and type the following commands:
- set FNCASE=y
- configure.bat
This will do some preprocessing then run the Configure script for you. The Configure script is interactive, but in most cases you just need to press ENTER. The "set" command ensures that DJGPP preserves the letter case of file names when reading directories. If you already issued this set command when unpacking the archive, and you are in the same DOS session as when you unpacked the archive, you don't have to issue the set command again. This command is necessary *before* you start to (re)configure or (re)build perl in order to ensure both that perl builds correctly and that building XS-type modules can succeed. See the DJGPP info entry for "_preserve_fncase" for more information:
- info libc alphabetical _preserve_fncase
If the script says that your package is incomplete, and asks whether to continue, just answer with Y (this can only happen if you don't use long filenames or forget to issue "set FNCASE=y" first).
When Configure asks about the extensions, I suggest IO and Fcntl,
and if you want database handling then SDBM_File or GDBM_File
(you need to install gdbm for this one). If you want to use the
POSIX extension (this is the default), make sure that the stack
size of your cc1.exe is at least 512kbyte (you can check this
with: stubedit cc1.exe
).
You can use the Configure script in non-interactive mode too. When I built my perl.exe, I used something like this:
- configure.bat -des
You can find more info about Configure's command line switches in the INSTALL file.
When the script ends, and you want to change some values in the generated config.sh file, then run
- sh Configure -S
after you made your modifications.
IMPORTANT: if you use this -S
switch, be sure to delete the CONFIG
environment variable before running the script:
- set CONFIG=
Now you can compile Perl. Type:
- make
Type:
- make test
If you're lucky you should see "All tests successful". But there can be a few failed subtests (less than 5 hopefully) depending on some external conditions (e.g. some subtests fail under linux/dosemu or plain dos with short filenames only).
Type:
- make install
This will copy the newly compiled perl and libraries into your DJGPP
directory structure. Perl.exe and the utilities go into ($DJDIR)/bin
,
and the library goes under ($DJDIR)/lib/perl5
. The pod documentation
goes under ($DJDIR)/lib/perl5/pod
.
For building and installing non-XS modules, all you need is a working perl under DJGPP. Non-XS modules do not require re-linking the perl binary, and so are simpler to build and install.
XS-type modules do require re-linking the perl binary, because part of an XS module is written in "C", and has to be linked together with the perl binary to be executed. This is required because perl under DJGPP is built with the "static link" option, due to the lack of "dynamic linking" in the DJGPP environment.
Because XS modules require re-linking of the perl binary, you need both the perl binary distribution and the perl source distribution to build an XS extension module. In addition, you will have to have built your perl binary from the source distribution so that all of the components of the perl binary are available for the required link step.
First, download the module package from CPAN (e.g., the "Comma Separated
Value" text package, Text-CSV-0.01.tar.gz). Then expand the contents of
the package into some location on your disk. Most CPAN modules are
built with an internal directory structure, so it is usually safe to
expand it in the root of your DJGPP installation. Some people prefer to
locate source trees under /usr/src (i.e., ($DJDIR)/usr/src
), but you may
put it wherever seems most logical to you, *EXCEPT* under the same
directory as your perl source code. There are special rules that apply
to modules which live in the perl source tree that do not apply to most
of the modules in CPAN.
Unlike other DJGPP packages, which are normal "zip" files, most CPAN module packages are "gzipped tarballs". Recent versions of WinZip will safely unpack and expand them, *UNLESS* they have zero-length files. It is a known WinZip bug (as of v7.0) that it will not extract zero-length files.
From the command line, you can use the djtar utility provided with DJGPP to unpack and expand these files. For example:
- C:\djgpp>djtarx -v Text-CSV-0.01.tar.gz
This will create the new directory ($DJDIR)/Text-CSV-0.01
, filling
it with the source for this module.
To build a non-XS module, you can use the standard module-building instructions distributed with perl modules.
- perl Makefile.PL
- make
- make test
- make install
This is sufficient because non-XS modules install only ".pm" files and (sometimes) pod and/or man documentation. No re-linking of the perl binary is needed to build, install or use non-XS modules.
To build an XS module, you must use the standard module-building instructions distributed with perl modules *PLUS* three extra instructions specific to the DJGPP "static link" build environment.
- set FNCASE=y
- perl Makefile.PL
- make
- make perl
- make test
- make -f Makefile.aperl inst_perl MAP_TARGET=perl.exe
- make install
The first extra instruction sets DJGPP's FNCASE environment variable so
that the new perl binary which you must build for an XS-type module will
build correctly. The second extra instruction re-builds the perl binary
in your module directory before you run "make test", so that you are
testing with the new module code you built with "make". The third extra
instruction installs the perl binary from your module directory into the
standard DJGPP binary directory, ($DJDIR)/bin
, replacing your
previous perl binary.
Note that the MAP_TARGET value *must* have the ".exe" extension or you
will not create a "perl.exe" to replace the one in ($DJDIR)/bin
.
When you are done, the XS-module install process will have added information to your "perllocal" information telling that the perl binary has been replaced, and what module was installed. You can view this information at any time by using the command:
- perl -S perldoc perllocal
Laszlo Molnar, laszlo.molnar@eth.ericsson.se [Installing/building perl]
Peter J. Farley III pjfarley@banet.net [Building/installing modules]
perl(1).
perldsc - Perl Data Structures Cookbook
Perl lets us have complex data structures. You can write something like this and all of a sudden, you'd have an array with three dimensions!
- for $x (1 .. 10) {
- for $y (1 .. 10) {
- for $z (1 .. 10) {
- $AoA[$x][$y][$z] =
- $x ** $y + $z;
- }
- }
- }
Alas, however simple this may appear, underneath it's a much more elaborate construct than meets the eye!
How do you print it out? Why can't you say just print @AoA
? How do
you sort it? How can you pass it to a function or get one of these back
from a function? Is it an object? Can you save it to disk to read
back later? How do you access whole rows or columns of that matrix? Do
all the values have to be numeric?
As you see, it's quite easy to become confused. While some small portion of the blame for this can be attributed to the reference-based implementation, it's really more due to a lack of existing documentation with examples designed for the beginner.
This document is meant to be a detailed but understandable treatment of the many different sorts of data structures you might want to develop. It should also serve as a cookbook of examples. That way, when you need to create one of these complex data structures, you can just pinch, pilfer, or purloin a drop-in example from here.
Let's look at each of these possible constructs in detail. There are separate sections on each of the following:
But for now, let's look at general issues common to all these types of data structures.
The most important thing to understand about all data structures in
Perl--including multidimensional arrays--is that even though they might
appear otherwise, Perl @ARRAY
s and %HASH
es are all internally
one-dimensional. They can hold only scalar values (meaning a string,
number, or a reference). They cannot directly contain other arrays or
hashes, but instead contain references to other arrays or hashes.
You can't use a reference to an array or hash in quite the same way that you would a real array or hash. For C or C++ programmers unused to distinguishing between arrays and pointers to the same, this can be confusing. If so, just think of it as the difference between a structure and a pointer to a structure.
You can (and should) read more about references in perlref. Briefly, references are rather like pointers that know what they point to. (Objects are also a kind of reference, but we won't be needing them right away--if ever.) This means that when you have something which looks to you like an access to a two-or-more-dimensional array and/or hash, what's really going on is that the base type is merely a one-dimensional entity that contains references to the next level. It's just that you can use it as though it were a two-dimensional one. This is actually the way almost all C multidimensional arrays work as well.
- $array[7][12] # array of arrays
- $array[7]{string} # array of hashes
- $hash{string}[7] # hash of arrays
- $hash{string}{'another string'} # hash of hashes
Now, because the top level contains only references, if you try to print out your array in with a simple print() function, you'll get something that doesn't look very nice, like this:
- @AoA = ( [2, 3], [4, 5, 7], [0] );
- print $AoA[1][2];
- 7
- print @AoA;
- ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
That's because Perl doesn't (ever) implicitly dereference your variables.
If you want to get at the thing a reference is referring to, then you have
to do this yourself using either prefix typing indicators, like
${$blah}
, @{$blah}
, @{$blah[$i]}
, or else postfix pointer arrows,
like $a->[3]
, $h->{fred}
, or even $ob->method()->[3]
.
The two most common mistakes made in constructing something like an array of arrays is either accidentally counting the number of elements or else taking a reference to the same memory location repeatedly. Here's the case where you just get the count instead of a nested array:
- for $i (1..10) {
- @array = somefunc($i);
- $AoA[$i] = @array; # WRONG!
- }
That's just the simple case of assigning an array to a scalar and getting its element count. If that's what you really and truly want, then you might do well to consider being a tad more explicit about it, like this:
- for $i (1..10) {
- @array = somefunc($i);
- $counts[$i] = scalar @array;
- }
Here's the case of taking a reference to the same memory location again and again:
- for $i (1..10) {
- @array = somefunc($i);
- $AoA[$i] = \@array; # WRONG!
- }
So, what's the big problem with that? It looks right, doesn't it? After all, I just told you that you need an array of references, so by golly, you've made me one!
Unfortunately, while this is true, it's still broken. All the references in @AoA refer to the very same place, and they will therefore all hold whatever was last in @array! It's similar to the problem demonstrated in the following C program:
- #include <pwd.h>
- main() {
- struct passwd *getpwnam(), *rp, *dp;
- rp = getpwnam("root");
- dp = getpwnam("daemon");
- printf("daemon name is %s\nroot name is %s\n",
- dp->pw_name, rp->pw_name);
- }
Which will print
- daemon name is daemon
- root name is daemon
The problem is that both rp
and dp
are pointers to the same location
in memory! In C, you'd have to remember to malloc() yourself some new
memory. In Perl, you'll want to use the array constructor []
or the
hash constructor {}
instead. Here's the right way to do the preceding
broken code fragments:
- for $i (1..10) {
- @array = somefunc($i);
- $AoA[$i] = [ @array ];
- }
The square brackets make a reference to a new array with a copy of what's in @array at the time of the assignment. This is what you want.
Note that this will produce something similar, but it's much harder to read:
- for $i (1..10) {
- @array = 0 .. $i;
- @{$AoA[$i]} = @array;
- }
Is it the same? Well, maybe so--and maybe not. The subtle difference
is that when you assign something in square brackets, you know for sure
it's always a brand new reference with a new copy of the data.
Something else could be going on in this new case with the @{$AoA[$i]}
dereference on the left-hand-side of the assignment. It all depends on
whether $AoA[$i]
had been undefined to start with, or whether it
already contained a reference. If you had already populated @AoA with
references, as in
- $AoA[3] = \@another_array;
Then the assignment with the indirection on the left-hand-side would use the existing reference that was already there:
- @{$AoA[3]} = @array;
Of course, this would have the "interesting" effect of clobbering @another_array. (Have you ever noticed how when a programmer says something is "interesting", that rather than meaning "intriguing", they're disturbingly more apt to mean that it's "annoying", "difficult", or both? :-)
So just remember always to use the array or hash constructors with []
or {}
, and you'll be fine, although it's not always optimally
efficient.
Surprisingly, the following dangerous-looking construct will actually work out fine:
- for $i (1..10) {
- my @array = somefunc($i);
- $AoA[$i] = \@array;
- }
That's because my() is more of a run-time statement than it is a
compile-time declaration per se. This means that the my() variable is
remade afresh each time through the loop. So even though it looks as
though you stored the same variable reference each time, you actually did
not! This is a subtle distinction that can produce more efficient code at
the risk of misleading all but the most experienced of programmers. So I
usually advise against teaching it to beginners. In fact, except for
passing arguments to functions, I seldom like to see the gimme-a-reference
operator (backslash) used much at all in code. Instead, I advise
beginners that they (and most of the rest of us) should try to use the
much more easily understood constructors []
and {}
instead of
relying upon lexical (or dynamic) scoping and hidden reference-counting to
do the right thing behind the scenes.
In summary:
- $AoA[$i] = [ @array ]; # usually best
- $AoA[$i] = \@array; # perilous; just how my() was that array?
- @{ $AoA[$i] } = @array; # way too tricky for most programmers
Speaking of things like @{$AoA[$i]}
, the following are actually the
same thing:
- $aref->[2][2] # clear
- $$aref[2][2] # confusing
That's because Perl's precedence rules on its five prefix dereferencers
(which look like someone swearing: $ @ * % &
) make them bind more
tightly than the postfix subscripting brackets or braces! This will no
doubt come as a great shock to the C or C++ programmer, who is quite
accustomed to using *a[i]
to mean what's pointed to by the i'th
element of a
. That is, they first take the subscript, and only then
dereference the thing at that subscript. That's fine in C, but this isn't C.
The seemingly equivalent construct in Perl, $$aref[$i]
first does
the deref of $aref, making it take $aref as a reference to an
array, and then dereference that, and finally tell you the i'th value
of the array pointed to by $AoA. If you wanted the C notion, you'd have to
write ${$AoA[$i]}
to force the $AoA[$i]
to get evaluated first
before the leading $
dereferencer.
use strict
If this is starting to sound scarier than it's worth, relax. Perl has some features to help you avoid its most common pitfalls. The best way to avoid getting confused is to start every program like this:
- #!/usr/bin/perl -w
- use strict;
This way, you'll be forced to declare all your variables with my() and also disallow accidental "symbolic dereferencing". Therefore if you'd done this:
The compiler would immediately flag that as an error at compile time,
because you were accidentally accessing @aref
, an undeclared
variable, and it would thereby remind you to write instead:
- print $aref->[2][2]
You can use the debugger's x
command to dump out complex data structures.
For example, given the assignment to $AoA above, here's the debugger output:
- DB<1> x $AoA
- $AoA = ARRAY(0x13b5a0)
- 0 ARRAY(0x1f0a24)
- 0 'fred'
- 1 'barney'
- 2 'pebbles'
- 3 'bambam'
- 4 'dino'
- 1 ARRAY(0x13b558)
- 0 'homer'
- 1 'bart'
- 2 'marge'
- 3 'maggie'
- 2 ARRAY(0x13b540)
- 0 'george'
- 1 'jane'
- 2 'elroy'
- 3 'judy'
Presented with little comment (these will get their own manpages someday) here are short code examples illustrating access of various types of data structures.
- @AoA = (
- [ "fred", "barney" ],
- [ "george", "jane", "elroy" ],
- [ "homer", "marge", "bart" ],
- );
- # one element
- $AoA[0][0] = "Fred";
- # another element
- $AoA[1][1] =~ s/(\w)/\u$1/;
- # print the whole thing with refs
- for $aref ( @AoA ) {
- print "\t [ @$aref ],\n";
- }
- # print the whole thing with indices
- for $i ( 0 .. $#AoA ) {
- print "\t [ @{$AoA[$i]} ],\n";
- }
- # print the whole thing one at a time
- for $i ( 0 .. $#AoA ) {
- for $j ( 0 .. $#{ $AoA[$i] } ) {
- print "elt $i $j is $AoA[$i][$j]\n";
- }
- }
- %HoA = (
- flintstones => [ "fred", "barney" ],
- jetsons => [ "george", "jane", "elroy" ],
- simpsons => [ "homer", "marge", "bart" ],
- );
- # reading from file
- # flintstones: fred barney wilma dino
- while ( <> ) {
- next unless s/^(.*?):\s*//;
- $HoA{$1} = [ split ];
- }
- # reading from file; more temps
- # flintstones: fred barney wilma dino
- while ( $line = <> ) {
- ($who, $rest) = split /:\s*/, $line, 2;
- @fields = split ' ', $rest;
- $HoA{$who} = [ @fields ];
- }
- # calling a function that returns a list
- for $group ( "simpsons", "jetsons", "flintstones" ) {
- $HoA{$group} = [ get_family($group) ];
- }
- # likewise, but using temps
- for $group ( "simpsons", "jetsons", "flintstones" ) {
- @members = get_family($group);
- $HoA{$group} = [ @members ];
- }
- # append new members to an existing family
- push @{ $HoA{"flintstones"} }, "wilma", "betty";
- # one element
- $HoA{flintstones}[0] = "Fred";
- # another element
- $HoA{simpsons}[1] =~ s/(\w)/\u$1/;
- # print the whole thing
- foreach $family ( keys %HoA ) {
- print "$family: @{ $HoA{$family} }\n"
- }
- # print the whole thing with indices
- foreach $family ( keys %HoA ) {
- print "family: ";
- foreach $i ( 0 .. $#{ $HoA{$family} } ) {
- print " $i = $HoA{$family}[$i]";
- }
- print "\n";
- }
- # print the whole thing sorted by number of members
- foreach $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) {
- print "$family: @{ $HoA{$family} }\n"
- }
- # print the whole thing sorted by number of members and name
- foreach $family ( sort {
- @{$HoA{$b}} <=> @{$HoA{$a}}
- ||
- $a cmp $b
- } keys %HoA )
- {
- print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n";
- }
- @AoH = (
- {
- Lead => "fred",
- Friend => "barney",
- },
- {
- Lead => "george",
- Wife => "jane",
- Son => "elroy",
- },
- {
- Lead => "homer",
- Wife => "marge",
- Son => "bart",
- }
- );
- # reading from file
- # format: LEAD=fred FRIEND=barney
- while ( <> ) {
- $rec = {};
- for $field ( split ) {
- ($key, $value) = split /=/, $field;
- $rec->{$key} = $value;
- }
- push @AoH, $rec;
- }
- # reading from file
- # format: LEAD=fred FRIEND=barney
- # no temp
- while ( <> ) {
- push @AoH, { split /[\s+=]/ };
- }
- # calling a function that returns a key/value pair list, like
- # "lead","fred","daughter","pebbles"
- while ( %fields = getnextpairset() ) {
- push @AoH, { %fields };
- }
- # likewise, but using no temp vars
- while (<>) {
- push @AoH, { parsepairs($_) };
- }
- # add key/value to an element
- $AoH[0]{pet} = "dino";
- $AoH[2]{pet} = "santa's little helper";
- # one element
- $AoH[0]{lead} = "fred";
- # another element
- $AoH[1]{lead} =~ s/(\w)/\u$1/;
- # print the whole thing with refs
- for $href ( @AoH ) {
- print "{ ";
- for $role ( keys %$href ) {
- print "$role=$href->{$role} ";
- }
- print "}\n";
- }
- # print the whole thing with indices
- for $i ( 0 .. $#AoH ) {
- print "$i is { ";
- for $role ( keys %{ $AoH[$i] } ) {
- print "$role=$AoH[$i]{$role} ";
- }
- print "}\n";
- }
- # print the whole thing one at a time
- for $i ( 0 .. $#AoH ) {
- for $role ( keys %{ $AoH[$i] } ) {
- print "elt $i $role is $AoH[$i]{$role}\n";
- }
- }
- %HoH = (
- flintstones => {
- lead => "fred",
- pal => "barney",
- },
- jetsons => {
- lead => "george",
- wife => "jane",
- "his boy" => "elroy",
- },
- simpsons => {
- lead => "homer",
- wife => "marge",
- kid => "bart",
- },
- );
- # reading from file
- # flintstones: lead=fred pal=barney wife=wilma pet=dino
- while ( <> ) {
- next unless s/^(.*?):\s*//;
- $who = $1;
- for $field ( split ) {
- ($key, $value) = split /=/, $field;
- $HoH{$who}{$key} = $value;
- }
- # reading from file; more temps
- while ( <> ) {
- next unless s/^(.*?):\s*//;
- $who = $1;
- $rec = {};
- $HoH{$who} = $rec;
- for $field ( split ) {
- ($key, $value) = split /=/, $field;
- $rec->{$key} = $value;
- }
- }
- # calling a function that returns a key,value hash
- for $group ( "simpsons", "jetsons", "flintstones" ) {
- $HoH{$group} = { get_family($group) };
- }
- # likewise, but using temps
- for $group ( "simpsons", "jetsons", "flintstones" ) {
- %members = get_family($group);
- $HoH{$group} = { %members };
- }
- # append new members to an existing family
- %new_folks = (
- wife => "wilma",
- pet => "dino",
- );
- for $what (keys %new_folks) {
- $HoH{flintstones}{$what} = $new_folks{$what};
- }
- # one element
- $HoH{flintstones}{wife} = "wilma";
- # another element
- $HoH{simpsons}{lead} =~ s/(\w)/\u$1/;
- # print the whole thing
- foreach $family ( keys %HoH ) {
- print "$family: { ";
- for $role ( keys %{ $HoH{$family} } ) {
- print "$role=$HoH{$family}{$role} ";
- }
- print "}\n";
- }
- # print the whole thing somewhat sorted
- foreach $family ( sort keys %HoH ) {
- print "$family: { ";
- for $role ( sort keys %{ $HoH{$family} } ) {
- print "$role=$HoH{$family}{$role} ";
- }
- print "}\n";
- }
- # print the whole thing sorted by number of members
- foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) {
- print "$family: { ";
- for $role ( sort keys %{ $HoH{$family} } ) {
- print "$role=$HoH{$family}{$role} ";
- }
- print "}\n";
- }
- # establish a sort order (rank) for each role
- $i = 0;
- for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
- # now print the whole thing sorted by number of members
- foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) {
- print "$family: { ";
- # and print these according to rank order
- for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) {
- print "$role=$HoH{$family}{$role} ";
- }
- print "}\n";
- }
Here's a sample showing how to create and use a record whose fields are of many different sorts:
- $rec = {
- TEXT => $string,
- SEQUENCE => [ @old_values ],
- LOOKUP => { %some_table },
- THATCODE => \&some_function,
- THISCODE => sub { $_[0] ** $_[1] },
- HANDLE => \*STDOUT,
- };
- print $rec->{TEXT};
- print $rec->{SEQUENCE}[0];
- $last = pop @ { $rec->{SEQUENCE} };
- print $rec->{LOOKUP}{"key"};
- ($first_k, $first_v) = each %{ $rec->{LOOKUP} };
- $answer = $rec->{THATCODE}->($arg);
- $answer = $rec->{THISCODE}->($arg1, $arg2);
- # careful of extra block braces on fh ref
- print { $rec->{HANDLE} } "a string\n";
- use FileHandle;
- $rec->{HANDLE}->autoflush(1);
- $rec->{HANDLE}->print(" a string\n");
- %TV = (
- flintstones => {
- series => "flintstones",
- nights => [ qw(monday thursday friday) ],
- members => [
- { name => "fred", role => "lead", age => 36, },
- { name => "wilma", role => "wife", age => 31, },
- { name => "pebbles", role => "kid", age => 4, },
- ],
- },
- jetsons => {
- series => "jetsons",
- nights => [ qw(wednesday saturday) ],
- members => [
- { name => "george", role => "lead", age => 41, },
- { name => "jane", role => "wife", age => 39, },
- { name => "elroy", role => "kid", age => 9, },
- ],
- },
- simpsons => {
- series => "simpsons",
- nights => [ qw(monday) ],
- members => [
- { name => "homer", role => "lead", age => 34, },
- { name => "marge", role => "wife", age => 37, },
- { name => "bart", role => "kid", age => 11, },
- ],
- },
- );
- # reading from file
- # this is most easily done by having the file itself be
- # in the raw data format as shown above. perl is happy
- # to parse complex data structures if declared as data, so
- # sometimes it's easiest to do that
- # here's a piece by piece build up
- $rec = {};
- $rec->{series} = "flintstones";
- $rec->{nights} = [ find_days() ];
- @members = ();
- # assume this file in field=value syntax
- while (<>) {
- %fields = split /[\s=]+/;
- push @members, { %fields };
- }
- $rec->{members} = [ @members ];
- # now remember the whole thing
- $TV{ $rec->{series} } = $rec;
- ###########################################################
- # now, you might want to make interesting extra fields that
- # include pointers back into the same data structure so if
- # change one piece, it changes everywhere, like for example
- # if you wanted a {kids} field that was a reference
- # to an array of the kids' records without having duplicate
- # records and thus update problems.
- ###########################################################
- foreach $family (keys %TV) {
- $rec = $TV{$family}; # temp pointer
- @kids = ();
- for $person ( @{ $rec->{members} } ) {
- if ($person->{role} =~ /kid|son|daughter/) {
- push @kids, $person;
- }
- }
- # REMEMBER: $rec and $TV{$family} point to same data!!
- $rec->{kids} = [ @kids ];
- }
- # you copied the array, but the array itself contains pointers
- # to uncopied objects. this means that if you make bart get
- # older via
- $TV{simpsons}{kids}[0]{age}++;
- # then this would also change in
- print $TV{simpsons}{members}[2]{age};
- # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
- # both point to the same underlying anonymous hash table
- # print the whole thing
- foreach $family ( keys %TV ) {
- print "the $family";
- print " is on during @{ $TV{$family}{nights} }\n";
- print "its members are:\n";
- for $who ( @{ $TV{$family}{members} } ) {
- print " $who->{name} ($who->{role}), age $who->{age}\n";
- }
- print "it turns out that $TV{$family}{lead} has ";
- print scalar ( @{ $TV{$family}{kids} } ), " kids named ";
- print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } );
- print "\n";
- }
You cannot easily tie a multilevel data structure (such as a hash of hashes) to a dbm file. The first problem is that all but GDBM and Berkeley DB have size limitations, but beyond that, you also have problems with how references are to be represented on disk. One experimental module that does partially attempt to address this need is the MLDBM module. Check your nearest CPAN site as described in perlmodlib for source code to MLDBM.
perlref, perllol, perldata, perlobj
Tom Christiansen <tchrist@perl.com>
perldtrace - Perl's support for DTrace
- # dtrace -Zn 'perl::sub-entry, perl::sub-return { trace(copyinstr(arg0)) }'
- dtrace: description 'perl::sub-entry, perl::sub-return ' matched 10 probes
- # perl -E 'sub outer { inner(@_) } sub inner { say shift } outer("hello")'
- hello
- (dtrace output)
- CPU ID FUNCTION:NAME
- 0 75915 Perl_pp_entersub:sub-entry BEGIN
- 0 75915 Perl_pp_entersub:sub-entry import
- 0 75922 Perl_pp_leavesub:sub-return import
- 0 75922 Perl_pp_leavesub:sub-return BEGIN
- 0 75915 Perl_pp_entersub:sub-entry outer
- 0 75915 Perl_pp_entersub:sub-entry inner
- 0 75922 Perl_pp_leavesub:sub-return inner
- 0 75922 Perl_pp_leavesub:sub-return outer
DTrace is a framework for comprehensive system- and application-level tracing. Perl is a DTrace provider, meaning it exposes several probes for instrumentation. You can use these in conjunction with kernel-level probes, as well as probes from other providers such as MySQL, in order to diagnose software defects, or even just your application's bottlenecks.
Perl must be compiled with the -Dusedtrace
option in order to
make use of the provided probes. While DTrace aims to have no
overhead when its instrumentation is not active, Perl's support
itself cannot uphold that guarantee, so it is built without DTrace
probes under most systems. One notable exception is that Mac OS X
ships a /usr/bin/perl with DTrace support enabled.
Perl's initial DTrace support was added, providing sub-entry and
sub-return probes.
The sub-entry and sub-return probes gain a fourth argument: the
package name of the function.
The phase-change
probe was added.
The op-entry
, loading-file
, and loaded-file
probes were added.
Traces the entry of any subroutine. Note that all of the variables refer to the subroutine that is being invoked; there is currently no way to get ahold of any information about the subroutine's caller from a DTrace action.
- :*perl*::sub-entry {
- printf("%s::%s entered at %s line %d\n",
- copyinstr(arg3), copyinstr(arg0), copyinstr(arg1), arg2);
- }
Traces the exit of any subroutine. Note that all of the variables refer to the subroutine that is returning; there is currently no way to get ahold of any information about the subroutine's caller from a DTrace action.
Traces changes to Perl's interpreter state. You can internalize this
as tracing changes to Perl's ${^GLOBAL_PHASE}
variable, especially
since the values for NEWPHASE
and OLDPHASE
are the strings that
${^GLOBAL_PHASE}
reports.
- :*perl*::phase-change {
- printf("Phase changed from %s to %s\n",
- copyinstr(arg1), copyinstr(arg0));
- }
Traces the execution of each opcode in the Perl runloop. This probe is fired before the opcode is executed. When the Perl debugger is enabled, the DTrace probe is fired after the debugger hooks (but still before the opcode itself is executed).
- :*perl*::op-entry {
- printf("About to execute opcode %s\n", copyinstr(arg0));
- }
Fires when Perl is about to load an individual file, whether from
use, require, or do. This probe fires before the file is
read from disk. The filename argument is converted to local filesystem
paths instead of providing Module::Name
-style names.
- :*perl*:loading-file {
- printf("About to load %s\n", copyinstr(arg0));
- }
Fires when Perl has successfully loaded an individual file, whether
from use, require, or do. This probe fires after the file
is read from disk and its contentss evaluated. The filename argument
is converted to local filesystem paths instead of providing
Module::Name
-style names.
- :*perl*:loaded-file {
- printf("Successfully loaded %s\n", copyinstr(arg0));
- }
- # dtrace -qZn 'sub-entry { @[strjoin(strjoin(copyinstr(arg3),"::"),copyinstr(arg0))] = count() } END {trunc(@, 10)}'
- Class::MOP::Attribute::slots 400
- Try::Tiny::catch 411
- Try::Tiny::try 411
- Class::MOP::Instance::inline_slot_access 451
- Class::MOP::Class::Immutable::Trait:::around 472
- Class::MOP::Mixin::AttributeCore::has_initializer 496
- Class::MOP::Method::Wrapped::__ANON__ 544
- Class::MOP::Package::_package_stash 737
- Class::MOP::Class::initialize 1128
- Class::MOP::get_metaclass_by_name 1204
- # dtrace -qFZn 'sub-entry, sub-return { trace(copyinstr(arg0)) }'
- 0 -> Perl_pp_entersub BEGIN
- 0 <- Perl_pp_leavesub BEGIN
- 0 -> Perl_pp_entersub BEGIN
- 0 -> Perl_pp_entersub import
- 0 <- Perl_pp_leavesub import
- 0 <- Perl_pp_leavesub BEGIN
- 0 -> Perl_pp_entersub BEGIN
- 0 -> Perl_pp_entersub dress
- 0 <- Perl_pp_leavesub dress
- 0 -> Perl_pp_entersub dirty
- 0 <- Perl_pp_leavesub dirty
- 0 -> Perl_pp_entersub whiten
- 0 <- Perl_pp_leavesub whiten
- 0 <- Perl_dounwind BEGIN
- # dtrace -Zn 'phase-change /copyinstr(arg0) == "END"/ { self->ending = 1 } sub-entry /self->ending/ { trace(copyinstr(arg0)) }'
- CPU ID FUNCTION:NAME
- 1 77214 Perl_pp_entersub:sub-entry END
- 1 77214 Perl_pp_entersub:sub-entry END
- 1 77214 Perl_pp_entersub:sub-entry cleanup
- 1 77214 Perl_pp_entersub:sub-entry _force_writable
- 1 77214 Perl_pp_entersub:sub-entry _force_writable
- # dtrace -qZn 'phase-change /copyinstr(arg0) == "START"/ { self->interesting = 1 } phase-change /copyinstr(arg0) == "RUN"/ { self->interesting = 0 } syscall::: /self->interesting/ { @[probefunc] = count() } END { trunc(@, 3) }'
- lseek 310
- read 374
- stat64 1056
- # dtrace -qZn 'sub-entry { self->fqn = strjoin(copyinstr(arg3), strjoin("::", copyinstr(arg0))) } op-entry /self->fqn != ""/ { @[self->fqn] = count() } END { trunc(@, 3) }'
- warnings::unimport 4589
- Exporter::Heavy::_rebuild_cache 5039
- Exporter::import 14578
http://www.amazon.com/DTrace-Dynamic-Tracing-Solaris-FreeBSD/dp/0132091518/
This CPAN module lets you create application-level DTrace probes written in Perl.
Shawn M Moore sartak@gmail.com
perlebcdic - Considerations for running Perl on EBCDIC platforms
An exploration of some of the issues facing Perl programmers on EBCDIC based computers. We do not cover localization, internationalization, or multi-byte character set issues other than some discussion of UTF-8 and UTF-EBCDIC.
Portions that are still incomplete are marked with XXX.
Perl used to work on EBCDIC machines, but there are now areas of the code where it doesn't. If you want to use Perl on an EBCDIC machine, please let us know by sending mail to perlbug@perl.org
The American Standard Code for Information Interchange (ASCII or US-ASCII) is a set of integers running from 0 to 127 (decimal) that imply character interpretation by the display and other systems of computers. The range 0..127 can be covered by setting the bits in a 7-bit binary digit, hence the set is sometimes referred to as "7-bit ASCII". ASCII was described by the American National Standards Institute document ANSI X3.4-1986. It was also described by ISO 646:1991 (with localization for currency symbols). The full ASCII set is given in the table below as the first 128 elements. Languages that can be written adequately with the characters in ASCII include English, Hawaiian, Indonesian, Swahili and some Native American languages.
There are many character sets that extend the range of integers from 0..2**7-1 up to 2**8-1, or 8 bit bytes (octets if you prefer). One common one is the ISO 8859-1 character set.
The ISO 8859-$n are a collection of character code sets from the International Organization for Standardization (ISO), each of which adds characters to the ASCII set that are typically found in European languages, many of which are based on the Roman, or Latin, alphabet.
A particular 8-bit extension to ASCII that includes grave and acute accented Latin characters. Languages that can employ ISO 8859-1 include all the languages covered by ASCII as well as Afrikaans, Albanian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian, Portuguese, Spanish, and Swedish. Dutch is covered albeit without the ij ligature. French is covered too but without the oe ligature. German can use ISO 8859-1 but must do so without German-style quotation marks. This set is based on Western European extensions to ASCII and is commonly encountered in world wide web work. In IBM character code set identification terminology ISO 8859-1 is also known as CCSID 819 (or sometimes 0819 or even 00819).
The Extended Binary Coded Decimal Interchange Code refers to a large collection of single- and multi-byte coded character sets that are different from ASCII or ISO 8859-1 and are all slightly different from each other; they typically run on host computers. The EBCDIC encodings derive from 8-bit byte extensions of Hollerith punched card encodings. The layout on the cards was such that high bits were set for the upper and lower case alphabet characters [a-z] and [A-Z], but there were gaps within each Latin alphabet range.
Some IBM EBCDIC character sets may be known by character code set identification numbers (CCSID numbers) or code page numbers.
Perl can be compiled on platforms that run any of three commonly used EBCDIC character sets, listed below.
Among IBM EBCDIC character code sets there are 13 characters that are often mapped to different integer values. Those characters are known as the 13 "variant" characters and are:
- \ [ ] { } ^ ~ ! # | $ @ `
When Perl is compiled for a platform, it looks at some of these characters to guess which EBCDIC character set the platform uses, and adapts itself accordingly to that platform. If the platform uses a character set that is not one of the three Perl knows about, Perl will either fail to compile, or mistakenly and silently choose one of the three. They are:
Character code set ID 0037 is a mapping of the ASCII plus Latin-1 characters (i.e. ISO 8859-1) to an EBCDIC set. 0037 is used in North American English locales on the OS/400 operating system that runs on AS/400 computers. CCSID 0037 differs from ISO 8859-1 in 237 places, in other words they agree on only 19 code point values.
Character code set ID 1047 is also a mapping of the ASCII plus Latin-1 characters (i.e. ISO 8859-1) to an EBCDIC set. 1047 is used under Unix System Services for OS/390 or z/OS, and OpenEdition for VM/ESA. CCSID 1047 differs from CCSID 0037 in eight places.
The EBCDIC code page in use on Siemens' BS2000 system is distinct from 1047 and 0037. It is identified below as the POSIX-BC set.
In Unicode terminology a code point is the number assigned to a character: for example, in EBCDIC the character "A" is usually assigned the number 193. In Unicode the character "A" is assigned the number 65. This causes a problem with the semantics of the pack/unpack "U", which are supposed to pack Unicode code points to characters and back to numbers. The problem is: which code points to use for code points less than 256? (for 256 and over there's no problem: Unicode code points are used) In EBCDIC, for the low 256 the EBCDIC code points are used. This means that the equivalences
will hold. (If Unicode code points were applied consistently over all the possible code points, pack("U",ord("A")) would in EBCDIC equal A with acute or chr(101), and unpack("U", "A") would equal 65, or non-breaking space, not 193, or ord "A".)
Many of the remaining problems seem to be related to case-insensitive matching
The extensions Unicode::Collate and Unicode::Normalized are not supported under EBCDIC, likewise for the encoding pragma.
UTF stands for Unicode Transformation Format
.
UTF-8 is an encoding of Unicode into a sequence of 8-bit byte chunks, based on
ASCII and Latin-1.
The length of a sequence required to represent a Unicode code point
depends on the ordinal number of that code point,
with larger numbers requiring more bytes.
UTF-EBCDIC is like UTF-8, but based on EBCDIC.
You may see the term invariant
character or code point.
This simply means that the character has the same numeric
value when encoded as when not.
(Note that this is a very different concept from The 13 variant characters
mentioned above.)
For example, the ordinal value of 'A' is 193 in most EBCDIC code pages,
and also is 193 when encoded in UTF-EBCDIC.
All variant code points occupy at least two bytes when encoded.
In UTF-8, the code points corresponding to the lowest 128
ordinal numbers (0 - 127: the ASCII characters) are invariant.
In UTF-EBCDIC, there are 160 invariant characters.
(If you care, the EBCDIC invariants are those characters
which have ASCII equivalents, plus those that correspond to
the C1 controls (80..9f on ASCII platforms).)
A string encoded in UTF-EBCDIC may be longer (but never shorter) than one encoded in UTF-8.
Starting from Perl 5.8 you can use the standard new module Encode to translate from EBCDIC to Latin-1 code points. Encode knows about more EBCDIC character sets than Perl can currently be compiled to run on.
and from Latin-1 code points to EBCDIC code points
For doing I/O it is suggested that you use the autotranslating features of PerlIO, see perluniintro.
Since version 5.8 Perl uses the new PerlIO I/O library. This enables you to use different encodings per IO channel. For example you may use
to get four files containing "Hello World!\n" in ASCII, CP 0037 EBCDIC, ISO 8859-1 (Latin-1) (in this example identical to ASCII since only ASCII characters were printed), and UTF-EBCDIC (in this example identical to normal EBCDIC since only characters that don't differ between EBCDIC and UTF-EBCDIC were printed). See the documentation of Encode::PerlIO for details.
As the PerlIO layer uses raw IO (bytes) internally, all this totally ignores things like the type of your filesystem (ASCII or EBCDIC).
The following tables list the ASCII and Latin 1 ordered sets including the subsets: C0 controls (0..31), ASCII graphics (32..7e), delete (7f), C1 controls (80..9f), and Latin-1 (a.k.a. ISO 8859-1) (a0..ff). In the table names of the Latin 1 extensions to ASCII have been labelled with character names roughly corresponding to The Unicode Standard, Version 6.1 albeit with substitutions such as s/LATIN// and s/VULGAR// in all cases, s/CAPITAL LETTER// in some cases, and s/SMALL LETTER ([A-Z])/\l$1/ in some other cases. Controls are listed using their Unicode 6.1 abbreviatons. The differences between the 0037 and 1047 sets are flagged with **. The differences between the 1047 and POSIX-BC sets are flagged with ##. All ord() numbers listed are decimal. If you would rather see this table listing octal values, then run the table (that is, the pod source text of this document, since this recipe may not work with a pod2_other_format translation) through:
- perl -ne 'if(/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \
- -e '{printf("%s%-5.03o%-5.03o%-5.03o%.03o\n",$1,$2,$3,$4,$5)}' \
- perlebcdic.pod
If you want to retain the UTF-x code points then in script form you might want to write:
- open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
- while (<FH>) {
- if (/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/)
- {
- if ($7 ne '' && $9 ne '') {
- printf(
- "%s%-5.03o%-5.03o%-5.03o%-5.03o%-3o.%-5o%-3o.%.03o\n",
- $1,$2,$3,$4,$5,$6,$7,$8,$9);
- }
- elsif ($7 ne '') {
- printf("%s%-5.03o%-5.03o%-5.03o%-5.03o%-3o.%-5o%.03o\n",
- $1,$2,$3,$4,$5,$6,$7,$8);
- }
- else {
- printf("%s%-5.03o%-5.03o%-5.03o%-5.03o%-5.03o%.03o\n",
- $1,$2,$3,$4,$5,$6,$8);
- }
- }
- }
If you would rather see this table listing hexadecimal values then run the table through:
- perl -ne 'if(/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \
- -e '{printf("%s%-5.02X%-5.02X%-5.02X%.02X\n",$1,$2,$3,$4,$5)}' \
- perlebcdic.pod
Or, in order to retain the UTF-x code points in hexadecimal:
- open(FH,"<perlebcdic.pod") or die "Could not open perlebcdic.pod: $!";
- while (<FH>) {
- if (/(.{29})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/)
- {
- if ($7 ne '' && $9 ne '') {
- printf(
- "%s%-5.02X%-5.02X%-5.02X%-5.02X%-2X.%-6.02X%02X.%02X\n",
- $1,$2,$3,$4,$5,$6,$7,$8,$9);
- }
- elsif ($7 ne '') {
- printf("%s%-5.02X%-5.02X%-5.02X%-5.02X%-2X.%-6.02X%02X\n",
- $1,$2,$3,$4,$5,$6,$7,$8);
- }
- else {
- printf("%s%-5.02X%-5.02X%-5.02X%-5.02X%-5.02X%02X\n",
- $1,$2,$3,$4,$5,$6,$8);
- }
- }
- }
- ISO
- 8859-1 POS-
- CCSID CCSID CCSID IX-
- chr 0819 0037 1047 BC UTF-8 UTF-EBCDIC
- ---------------------------------------------------------------------
- <NUL> 0 0 0 0 0 0
- <SOH> 1 1 1 1 1 1
- <STX> 2 2 2 2 2 2
- <ETX> 3 3 3 3 3 3
- <EOT> 4 55 55 55 4 55
- <ENQ> 5 45 45 45 5 45
- <ACK> 6 46 46 46 6 46
- <BEL> 7 47 47 47 7 47
- <BS> 8 22 22 22 8 22
- <HT> 9 5 5 5 9 5
- <LF> 10 37 21 21 10 21 **
- <VT> 11 11 11 11 11 11
- <FF> 12 12 12 12 12 12
- <CR> 13 13 13 13 13 13
- <SO> 14 14 14 14 14 14
- <SI> 15 15 15 15 15 15
- <DLE> 16 16 16 16 16 16
- <DC1> 17 17 17 17 17 17
- <DC2> 18 18 18 18 18 18
- <DC3> 19 19 19 19 19 19
- <DC4> 20 60 60 60 20 60
- <NAK> 21 61 61 61 21 61
- <SYN> 22 50 50 50 22 50
- <ETB> 23 38 38 38 23 38
- <CAN> 24 24 24 24 24 24
- <EOM> 25 25 25 25 25 25
- <SUB> 26 63 63 63 26 63
- <ESC> 27 39 39 39 27 39
- <FS> 28 28 28 28 28 28
- <GS> 29 29 29 29 29 29
- <RS> 30 30 30 30 30 30
- <US> 31 31 31 31 31 31
- <SPACE> 32 64 64 64 32 64
- ! 33 90 90 90 33 90
- " 34 127 127 127 34 127
- # 35 123 123 123 35 123
- $ 36 91 91 91 36 91
- % 37 108 108 108 37 108
- & 38 80 80 80 38 80
- ' 39 125 125 125 39 125
- ( 40 77 77 77 40 77
- ) 41 93 93 93 41 93
- * 42 92 92 92 42 92
- + 43 78 78 78 43 78
- , 44 107 107 107 44 107
- - 45 96 96 96 45 96
- . 46 75 75 75 46 75
- / 47 97 97 97 47 97
- 0 48 240 240 240 48 240
- 1 49 241 241 241 49 241
- 2 50 242 242 242 50 242
- 3 51 243 243 243 51 243
- 4 52 244 244 244 52 244
- 5 53 245 245 245 53 245
- 6 54 246 246 246 54 246
- 7 55 247 247 247 55 247
- 8 56 248 248 248 56 248
- 9 57 249 249 249 57 249
- : 58 122 122 122 58 122
- ; 59 94 94 94 59 94
- < 60 76 76 76 60 76
- = 61 126 126 126 61 126
- > 62 110 110 110 62 110
- ? 63 111 111 111 63 111
- @ 64 124 124 124 64 124
- A 65 193 193 193 65 193
- B 66 194 194 194 66 194
- C 67 195 195 195 67 195
- D 68 196 196 196 68 196
- E 69 197 197 197 69 197
- F 70 198 198 198 70 198
- G 71 199 199 199 71 199
- H 72 200 200 200 72 200
- I 73 201 201 201 73 201
- J 74 209 209 209 74 209
- K 75 210 210 210 75 210
- L 76 211 211 211 76 211
- M 77 212 212 212 77 212
- N 78 213 213 213 78 213
- O 79 214 214 214 79 214
- P 80 215 215 215 80 215
- Q 81 216 216 216 81 216
- R 82 217 217 217 82 217
- S 83 226 226 226 83 226
- T 84 227 227 227 84 227
- U 85 228 228 228 85 228
- V 86 229 229 229 86 229
- W 87 230 230 230 87 230
- X 88 231 231 231 88 231
- Y 89 232 232 232 89 232
- Z 90 233 233 233 90 233
- [ 91 186 173 187 91 173 ** ##
- \ 92 224 224 188 92 224 ##
- ] 93 187 189 189 93 189 **
- ^ 94 176 95 106 94 95 ** ##
- _ 95 109 109 109 95 109
- ` 96 121 121 74 96 121 ##
- a 97 129 129 129 97 129
- b 98 130 130 130 98 130
- c 99 131 131 131 99 131
- d 100 132 132 132 100 132
- e 101 133 133 133 101 133
- f 102 134 134 134 102 134
- g 103 135 135 135 103 135
- h 104 136 136 136 104 136
- i 105 137 137 137 105 137
- j 106 145 145 145 106 145
- k 107 146 146 146 107 146
- l 108 147 147 147 108 147
- m 109 148 148 148 109 148
- n 110 149 149 149 110 149
- o 111 150 150 150 111 150
- p 112 151 151 151 112 151
- q 113 152 152 152 113 152
- r 114 153 153 153 114 153
- s 115 162 162 162 115 162
- t 116 163 163 163 116 163
- u 117 164 164 164 117 164
- v 118 165 165 165 118 165
- w 119 166 166 166 119 166
- x 120 167 167 167 120 167
- y 121 168 168 168 121 168
- z 122 169 169 169 122 169
- { 123 192 192 251 123 192 ##
- | 124 79 79 79 124 79
- } 125 208 208 253 125 208 ##
- ~ 126 161 161 255 126 161 ##
- <DEL> 127 7 7 7 127 7
- <PAD> 128 32 32 32 194.128 32
- <HOP> 129 33 33 33 194.129 33
- <BPH> 130 34 34 34 194.130 34
- <NBH> 131 35 35 35 194.131 35
- <IND> 132 36 36 36 194.132 36
- <NEL> 133 21 37 37 194.133 37 **
- <SSA> 134 6 6 6 194.134 6
- <ESA> 135 23 23 23 194.135 23
- <HTS> 136 40 40 40 194.136 40
- <HTJ> 137 41 41 41 194.137 41
- <VTS> 138 42 42 42 194.138 42
- <PLD> 139 43 43 43 194.139 43
- <PLU> 140 44 44 44 194.140 44
- <RI> 141 9 9 9 194.141 9
- <SS2> 142 10 10 10 194.142 10
- <SS3> 143 27 27 27 194.143 27
- <DCS> 144 48 48 48 194.144 48
- <PU1> 145 49 49 49 194.145 49
- <PU2> 146 26 26 26 194.146 26
- <STS> 147 51 51 51 194.147 51
- <CCH> 148 52 52 52 194.148 52
- <MW> 149 53 53 53 194.149 53
- <SPA> 150 54 54 54 194.150 54
- <EPA> 151 8 8 8 194.151 8
- <SOS> 152 56 56 56 194.152 56
- <SGC> 153 57 57 57 194.153 57
- <SCI> 154 58 58 58 194.154 58
- <CSI> 155 59 59 59 194.155 59
- <ST> 156 4 4 4 194.156 4
- <OSC> 157 20 20 20 194.157 20
- <PM> 158 62 62 62 194.158 62
- <APC> 159 255 255 95 194.159 255 ##
- <NON-BREAKING SPACE> 160 65 65 65 194.160 128.65
- <INVERTED "!" > 161 170 170 170 194.161 128.66
- <CENT SIGN> 162 74 74 176 194.162 128.67 ##
- <POUND SIGN> 163 177 177 177 194.163 128.68
- <CURRENCY SIGN> 164 159 159 159 194.164 128.69
- <YEN SIGN> 165 178 178 178 194.165 128.70
- <BROKEN BAR> 166 106 106 208 194.166 128.71 ##
- <SECTION SIGN> 167 181 181 181 194.167 128.72
- <DIAERESIS> 168 189 187 121 194.168 128.73 ** ##
- <COPYRIGHT SIGN> 169 180 180 180 194.169 128.74
- <FEMININE ORDINAL> 170 154 154 154 194.170 128.81
- <LEFT POINTING GUILLEMET> 171 138 138 138 194.171 128.82
- <NOT SIGN> 172 95 176 186 194.172 128.83 ** ##
- <SOFT HYPHEN> 173 202 202 202 194.173 128.84
- <REGISTERED TRADE MARK> 174 175 175 175 194.174 128.85
- <MACRON> 175 188 188 161 194.175 128.86 ##
- <DEGREE SIGN> 176 144 144 144 194.176 128.87
- <PLUS-OR-MINUS SIGN> 177 143 143 143 194.177 128.88
- <SUPERSCRIPT TWO> 178 234 234 234 194.178 128.89
- <SUPERSCRIPT THREE> 179 250 250 250 194.179 128.98
- <ACUTE ACCENT> 180 190 190 190 194.180 128.99
- <MICRO SIGN> 181 160 160 160 194.181 128.100
- <PARAGRAPH SIGN> 182 182 182 182 194.182 128.101
- <MIDDLE DOT> 183 179 179 179 194.183 128.102
- <CEDILLA> 184 157 157 157 194.184 128.103
- <SUPERSCRIPT ONE> 185 218 218 218 194.185 128.104
- <MASC. ORDINAL INDICATOR> 186 155 155 155 194.186 128.105
- <RIGHT POINTING GUILLEMET> 187 139 139 139 194.187 128.106
- <FRACTION ONE QUARTER> 188 183 183 183 194.188 128.112
- <FRACTION ONE HALF> 189 184 184 184 194.189 128.113
- <FRACTION THREE QUARTERS> 190 185 185 185 194.190 128.114
- <INVERTED QUESTION MARK> 191 171 171 171 194.191 128.115
- <A WITH GRAVE> 192 100 100 100 195.128 138.65
- <A WITH ACUTE> 193 101 101 101 195.129 138.66
- <A WITH CIRCUMFLEX> 194 98 98 98 195.130 138.67
- <A WITH TILDE> 195 102 102 102 195.131 138.68
- <A WITH DIAERESIS> 196 99 99 99 195.132 138.69
- <A WITH RING ABOVE> 197 103 103 103 195.133 138.70
- <CAPITAL LIGATURE AE> 198 158 158 158 195.134 138.71
- <C WITH CEDILLA> 199 104 104 104 195.135 138.72
- <E WITH GRAVE> 200 116 116 116 195.136 138.73
- <E WITH ACUTE> 201 113 113 113 195.137 138.74
- <E WITH CIRCUMFLEX> 202 114 114 114 195.138 138.81
- <E WITH DIAERESIS> 203 115 115 115 195.139 138.82
- <I WITH GRAVE> 204 120 120 120 195.140 138.83
- <I WITH ACUTE> 205 117 117 117 195.141 138.84
- <I WITH CIRCUMFLEX> 206 118 118 118 195.142 138.85
- <I WITH DIAERESIS> 207 119 119 119 195.143 138.86
- <CAPITAL LETTER ETH> 208 172 172 172 195.144 138.87
- <N WITH TILDE> 209 105 105 105 195.145 138.88
- <O WITH GRAVE> 210 237 237 237 195.146 138.89
- <O WITH ACUTE> 211 238 238 238 195.147 138.98
- <O WITH CIRCUMFLEX> 212 235 235 235 195.148 138.99
- <O WITH TILDE> 213 239 239 239 195.149 138.100
- <O WITH DIAERESIS> 214 236 236 236 195.150 138.101
- <MULTIPLICATION SIGN> 215 191 191 191 195.151 138.102
- <O WITH STROKE> 216 128 128 128 195.152 138.103
- <U WITH GRAVE> 217 253 253 224 195.153 138.104 ##
- <U WITH ACUTE> 218 254 254 254 195.154 138.105
- <U WITH CIRCUMFLEX> 219 251 251 221 195.155 138.106 ##
- <U WITH DIAERESIS> 220 252 252 252 195.156 138.112
- <Y WITH ACUTE> 221 173 186 173 195.157 138.113 ** ##
- <CAPITAL LETTER THORN> 222 174 174 174 195.158 138.114
- <SMALL LETTER SHARP S> 223 89 89 89 195.159 138.115
- <a WITH GRAVE> 224 68 68 68 195.160 139.65
- <a WITH ACUTE> 225 69 69 69 195.161 139.66
- <a WITH CIRCUMFLEX> 226 66 66 66 195.162 139.67
- <a WITH TILDE> 227 70 70 70 195.163 139.68
- <a WITH DIAERESIS> 228 67 67 67 195.164 139.69
- <a WITH RING ABOVE> 229 71 71 71 195.165 139.70
- <SMALL LIGATURE ae> 230 156 156 156 195.166 139.71
- <c WITH CEDILLA> 231 72 72 72 195.167 139.72
- <e WITH GRAVE> 232 84 84 84 195.168 139.73
- <e WITH ACUTE> 233 81 81 81 195.169 139.74
- <e WITH CIRCUMFLEX> 234 82 82 82 195.170 139.81
- <e WITH DIAERESIS> 235 83 83 83 195.171 139.82
- <i WITH GRAVE> 236 88 88 88 195.172 139.83
- <i WITH ACUTE> 237 85 85 85 195.173 139.84
- <i WITH CIRCUMFLEX> 238 86 86 86 195.174 139.85
- <i WITH DIAERESIS> 239 87 87 87 195.175 139.86
- <SMALL LETTER eth> 240 140 140 140 195.176 139.87
- <n WITH TILDE> 241 73 73 73 195.177 139.88
- <o WITH GRAVE> 242 205 205 205 195.178 139.89
- <o WITH ACUTE> 243 206 206 206 195.179 139.98
- <o WITH CIRCUMFLEX> 244 203 203 203 195.180 139.99
- <o WITH TILDE> 245 207 207 207 195.181 139.100
- <o WITH DIAERESIS> 246 204 204 204 195.182 139.101
- <DIVISION SIGN> 247 225 225 225 195.183 139.102
- <o WITH STROKE> 248 112 112 112 195.184 139.103
- <u WITH GRAVE> 249 221 221 192 195.185 139.104 ##
- <u WITH ACUTE> 250 222 222 222 195.186 139.105
- <u WITH CIRCUMFLEX> 251 219 219 219 195.187 139.106
- <u WITH DIAERESIS> 252 220 220 220 195.188 139.112
- <y WITH ACUTE> 253 141 141 141 195.189 139.113
- <SMALL LETTER thorn> 254 142 142 142 195.190 139.114
- <y WITH DIAERESIS> 255 223 223 223 195.191 139.115
If you would rather see the above table in CCSID 0037 order rather than ASCII + Latin-1 order then run the table through:
- perl \
- -ne 'if(/.{29}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}/)'\
- -e '{push(@l,$_)}' \
- -e 'END{print map{$_->[0]}' \
- -e ' sort{$a->[1] <=> $b->[1]}' \
- -e ' map{[$_,substr($_,34,3)]}@l;}' perlebcdic.pod
If you would rather see it in CCSID 1047 order then change the number 34 in the last line to 39, like this:
- perl \
- -ne 'if(/.{29}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}/)'\
- -e '{push(@l,$_)}' \
- -e 'END{print map{$_->[0]}' \
- -e ' sort{$a->[1] <=> $b->[1]}' \
- -e ' map{[$_,substr($_,39,3)]}@l;}' perlebcdic.pod
If you would rather see it in POSIX-BC order then change the number 39 in the last line to 44, like this:
- perl \
- -ne 'if(/.{29}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}\s{2,4}\d{1,3}/)'\
- -e '{push(@l,$_)}' \
- -e 'END{print map{$_->[0]}' \
- -e ' sort{$a->[1] <=> $b->[1]}' \
- -e ' map{[$_,substr($_,44,3)]}@l;}' perlebcdic.pod
To determine the character set you are running under from perl one could use the return value of ord() or chr() to test one or more character values. For example:
Also, "\t" is a HORIZONTAL TABULATION
character so that:
To distinguish EBCDIC code pages try looking at one or more of the characters that differ between them. For example:
Or better still choose a character that is uniquely encoded in any of the code sets, e.g.:
However, it would be unwise to write tests such as:
Obviously the first of these will fail to distinguish most ASCII platforms
from either a CCSID 0037, a 1047, or a POSIX-BC EBCDIC platform since "\r" eq
chr(13) under all of those coded character sets. But note too that
because "\n" is chr(13) and "\r" is chr(10) on the Macintosh (which is an
ASCII platform) the second $is_ascii
test will lead to trouble there.
To determine whether or not perl was built under an EBCDIC code page you can use the Config module like so:
- use Config;
- $is_ebcdic = $Config{'ebcdic'} eq 'define';
utf8::unicode_to_native()
and utf8::native_to_unicode()
These functions take an input numeric code point in one encoding and return what its equivalent value is in the other.
In order to convert a string of characters from one character set to another a simple list of numbers, such as in the right columns in the above table, along with perl's tr/// operator is all that is needed. The data in the table are in ASCII/Latin1 order, hence the EBCDIC columns provide easy-to-use ASCII/Latin1 to EBCDIC operations that are also easily reversed.
For example, to convert ASCII/Latin1 to code page 037 take the output of the second numbers column from the output of recipe 2 (modified to add '\' characters), and use it in tr/// like so:
- $cp_037 =
- '\x00\x01\x02\x03\x37\x2D\x2E\x2F\x16\x05\x25\x0B\x0C\x0D\x0E\x0F' .
- '\x10\x11\x12\x13\x3C\x3D\x32\x26\x18\x19\x3F\x27\x1C\x1D\x1E\x1F' .
- '\x40\x5A\x7F\x7B\x5B\x6C\x50\x7D\x4D\x5D\x5C\x4E\x6B\x60\x4B\x61' .
- '\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\x7A\x5E\x4C\x7E\x6E\x6F' .
- '\x7C\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xD1\xD2\xD3\xD4\xD5\xD6' .
- '\xD7\xD8\xD9\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xBA\xE0\xBB\xB0\x6D' .
- '\x79\x81\x82\x83\x84\x85\x86\x87\x88\x89\x91\x92\x93\x94\x95\x96' .
- '\x97\x98\x99\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xC0\x4F\xD0\xA1\x07' .
- '\x20\x21\x22\x23\x24\x15\x06\x17\x28\x29\x2A\x2B\x2C\x09\x0A\x1B' .
- '\x30\x31\x1A\x33\x34\x35\x36\x08\x38\x39\x3A\x3B\x04\x14\x3E\xFF' .
- '\x41\xAA\x4A\xB1\x9F\xB2\x6A\xB5\xBD\xB4\x9A\x8A\x5F\xCA\xAF\xBC' .
- '\x90\x8F\xEA\xFA\xBE\xA0\xB6\xB3\x9D\xDA\x9B\x8B\xB7\xB8\xB9\xAB' .
- '\x64\x65\x62\x66\x63\x67\x9E\x68\x74\x71\x72\x73\x78\x75\x76\x77' .
- '\xAC\x69\xED\xEE\xEB\xEF\xEC\xBF\x80\xFD\xFE\xFB\xFC\xAD\xAE\x59' .
- '\x44\x45\x42\x46\x43\x47\x9C\x48\x54\x51\x52\x53\x58\x55\x56\x57' .
- '\x8C\x49\xCD\xCE\xCB\xCF\xCC\xE1\x70\xDD\xDE\xDB\xDC\x8D\x8E\xDF';
- my $ebcdic_string = $ascii_string;
- eval '$ebcdic_string =~ tr/\000-\377/' . $cp_037 . '/';
To convert from EBCDIC 037 to ASCII just reverse the order of the tr/// arguments like so:
Similarly one could take the output of the third numbers column from recipe 2
to obtain a $cp_1047
table. The fourth numbers column of the output from
recipe 2 could provide a $cp_posix_bc
table suitable for transcoding as
well.
If you wanted to see the inverse tables, you would first have to sort on the desired numbers column as in recipes 4, 5 or 6, then take the output of the first numbers column.
XPG operability often implies the presence of an iconv utility available from the shell or from the C library. Consult your system's documentation for information on iconv.
On OS/390 or z/OS see the iconv(1) manpage. One way to invoke the iconv shell utility from within perl would be to:
- # OS/390 or z/OS example
- $ascii_data = `echo '$ebcdic_data'| iconv -f IBM-1047 -t ISO8859-1`
or the inverse map:
- # OS/390 or z/OS example
- $ebcdic_data = `echo '$ascii_data'| iconv -f ISO8859-1 -t IBM-1047`
For other perl-based conversion options see the Convert::* modules on CPAN.
The OS/390 and z/OS C run-time libraries provide _atoe() and _etoa() functions.
The ..
range operator treats certain character ranges with
care on EBCDIC platforms. For example the following array
will have twenty six elements on either an EBCDIC platform
or an ASCII platform:
- @alphabet = ('A'..'Z'); # $#alphabet == 25
The bitwise operators such as & ^ | may return different results when operating on string or character data in a perl program running on an EBCDIC platform than when run on an ASCII platform. Here is an example adapted from the one in perlop:
An interesting property of the 32 C0 control characters
in the ASCII table is that they can "literally" be constructed
as control characters in perl, e.g. (chr(0)
eq \c@
)>
(chr(1)
eq \cA
)>, and so on. Perl on EBCDIC platforms has been
ported to take \c@
to chr(0) and \cA
to chr(1), etc. as well, but the
thirty three characters that result depend on which code page you are
using. The table below uses the standard acronyms for the controls.
The POSIX-BC and 1047 sets are
identical throughout this range and differ from the 0037 set at only
one spot (21 decimal). Note that the LINE FEED
character
may be generated by \cJ
on ASCII platforms but by \cU
on 1047 or POSIX-BC
platforms and cannot be generated as a "\c.letter."
control character on
0037 platforms. Note also that \c\
cannot be the final element in a string
or regex, as it will absorb the terminator. But \c\X is a FILE
SEPARATOR
concatenated with X for all X.
- chr ord 8859-1 0037 1047 && POSIX-BC
- -----------------------------------------------------------------------
- \c? 127 <DEL> " "
- \c@ 0 <NUL> <NUL> <NUL>
- \cA 1 <SOH> <SOH> <SOH>
- \cB 2 <STX> <STX> <STX>
- \cC 3 <ETX> <ETX> <ETX>
- \cD 4 <EOT> <ST> <ST>
- \cE 5 <ENQ> <HT> <HT>
- \cF 6 <ACK> <SSA> <SSA>
- \cG 7 <BEL> <DEL> <DEL>
- \cH 8 <BS> <EPA> <EPA>
- \cI 9 <HT> <RI> <RI>
- \cJ 10 <LF> <SS2> <SS2>
- \cK 11 <VT> <VT> <VT>
- \cL 12 <FF> <FF> <FF>
- \cM 13 <CR> <CR> <CR>
- \cN 14 <SO> <SO> <SO>
- \cO 15 <SI> <SI> <SI>
- \cP 16 <DLE> <DLE> <DLE>
- \cQ 17 <DC1> <DC1> <DC1>
- \cR 18 <DC2> <DC2> <DC2>
- \cS 19 <DC3> <DC3> <DC3>
- \cT 20 <DC4> <OSC> <OSC>
- \cU 21 <NAK> <NEL> <LF> **
- \cV 22 <SYN> <BS> <BS>
- \cW 23 <ETB> <ESA> <ESA>
- \cX 24 <CAN> <CAN> <CAN>
- \cY 25 <EOM> <EOM> <EOM>
- \cZ 26 <SUB> <PU2> <PU2>
- \c[ 27 <ESC> <SS3> <SS3>
- \c\X 28 <FS>X <FS>X <FS>X
- \c] 29 <GS> <GS> <GS>
- \c^ 30 <RS> <RS> <RS>
- \c_ 31 <US> <US> <US>
chr() must be given an EBCDIC code number argument to yield a desired character return value on an EBCDIC platform. For example:
- $CAPITAL_LETTER_A = chr(193);
ord() will return EBCDIC code number values on an EBCDIC platform. For example:
- $the_number_193 = ord("A");
The c and C templates for pack() are dependent upon character set encoding. Examples of usage on EBCDIC include:
One must be careful with scalars and strings that are passed to print that contain ASCII encodings. One common place for this to occur is in the output of the MIME type header for CGI script writing. For example, many perl programming guides recommend something similar to:
- print "Content-type:\ttext/html\015\012\015\012";
- # this may be wrong on EBCDIC
Under the IBM OS/390 USS Web Server or WebSphere on z/OS for example you should instead write that as:
- print "Content-type:\ttext/html\r\n\r\n"; # OK for DGW et al
That is because the translation from EBCDIC to ASCII is done by the web server in this case (such code will not be appropriate for the Macintosh however). Consult your web server's documentation for further details.
The formats that can convert characters to numbers and vice versa will be different from their ASCII counterparts when executed on an EBCDIC platform. Examples include:
- printf("%c%c%c",193,194,195); # prints ABC
EBCDIC sort results may differ from ASCII sort results especially for mixed case strings. This is discussed in more detail below.
See the discussion of printf() above. An example of the use of sprintf would be:
- $CAPITAL_LETTER_A = sprintf("%c",193);
See the discussion of pack() above.
As of perl 5.005_03 the letter range regular expressions such as
[A-Z] and [a-z] have been especially coded to not pick up gap
characters. For example, characters such as ô o WITH CIRCUMFLEX
that lie between I and J would not be matched by the
regular expression range /[H-K]/
. This works in
the other direction, too, if either of the range end points is
explicitly numeric: [\x89-\x91]
will match \x8e
, even
though \x89
is i
and \x91
is j
, and \x8e
is a gap character from the alphabetic viewpoint.
If you do want to match the alphabet gap characters in a single octet
regular expression try matching the hex or octal code such
as /\313/
on EBCDIC or /\364/
on ASCII platforms to
have your regular expression match o WITH CIRCUMFLEX
.
Another construct to be wary of is the inappropriate use of hex or octal constants in regular expressions. Consider the following set of subs:
- sub is_c0 {
- my $char = substr(shift,0,1);
- $char =~ /[\000-\037]/;
- }
- sub is_print_ascii {
- my $char = substr(shift,0,1);
- $char =~ /[\040-\176]/;
- }
- sub is_delete {
- my $char = substr(shift,0,1);
- $char eq "\177";
- }
- sub is_c1 {
- my $char = substr(shift,0,1);
- $char =~ /[\200-\237]/;
- }
- sub is_latin_1 {
- my $char = substr(shift,0,1);
- $char =~ /[\240-\377]/;
- }
The above would be adequate if the concern was only with numeric code points.
However, the concern may be with characters rather than code points
and on an EBCDIC platform it may be desirable for constructs such as
if (is_print_ascii("A")) {print "A is a printable character\n";}
to print
out the expected message. One way to represent the above collection
of character classification subs that is capable of working across the
four coded character sets discussed in this document is as follows:
- sub Is_c0 {
- my $char = substr(shift,0,1);
- if (ord('^')==94) { # ascii
- return $char =~ /[\000-\037]/;
- }
- if (ord('^')==176) { # 0037
- return $char =~ /[\000-\003\067\055-\057\026\005\045\013-\023\074\075\062\046\030\031\077\047\034-\037]/;
- }
- if (ord('^')==95 || ord('^')==106) { # 1047 || posix-bc
- return $char =~ /[\000-\003\067\055-\057\026\005\025\013-\023\074\075\062\046\030\031\077\047\034-\037]/;
- }
- }
- sub Is_print_ascii {
- my $char = substr(shift,0,1);
- $char =~ /[ !"\#\$%&'()*+,\-.\/0-9:;<=>?\@A-Z[\\\]^_`a-z{|}~]/;
- }
- sub Is_delete {
- my $char = substr(shift,0,1);
- if (ord('^')==94) { # ascii
- return $char eq "\177";
- }
- else { # ebcdic
- return $char eq "\007";
- }
- }
- sub Is_c1 {
- my $char = substr(shift,0,1);
- if (ord('^')==94) { # ascii
- return $char =~ /[\200-\237]/;
- }
- if (ord('^')==176) { # 0037
- return $char =~ /[\040-\044\025\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/;
- }
- if (ord('^')==95) { # 1047
- return $char =~ /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/;
- }
- if (ord('^')==106) { # posix-bc
- return $char =~
- /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\137]/;
- }
- }
- sub Is_latin_1 {
- my $char = substr(shift,0,1);
- if (ord('^')==94) { # ascii
- return $char =~ /[\240-\377]/;
- }
- if (ord('^')==176) { # 0037
- return $char =~
- /[\101\252\112\261\237\262\152\265\275\264\232\212\137\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/;
- }
- if (ord('^')==95) { # 1047
- return $char =~
- /[\101\252\112\261\237\262\152\265\273\264\232\212\260\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\272\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/;
- }
- if (ord('^')==106) { # posix-bc
- return $char =~
- /[\101\252\260\261\237\262\320\265\171\264\232\212\272\312\257\241\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\340\376\335\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\300\336\333\334\215\216\337]/;
- }
- }
Note however that only the Is_ascii_print()
sub is really independent
of coded character set. Another way to write Is_latin_1()
would be
to use the characters in the range explicitly:
Although that form may run into trouble in network transit (due to the presence of 8 bit characters) or on non ISO-Latin character sets.
Most socket programming assumes ASCII character encodings in network byte order. Exceptions can include CGI script writing under a host web server where the server may take care of translation for you. Most host web servers convert EBCDIC data to ISO-8859-1 or Unicode on output.
One big difference between ASCII-based character sets and EBCDIC ones are the relative positions of upper and lower case letters and the letters compared to the digits. If sorted on an ASCII-based platform the two-letter abbreviation for a physician comes before the two letter abbreviation for drive; that is:
- @sorted = sort(qw(Dr. dr.)); # @sorted holds ('Dr.','dr.') on ASCII,
- # but ('dr.','Dr.') on EBCDIC
The property of lowercase before uppercase letters in EBCDIC is
even carried to the Latin 1 EBCDIC pages such as 0037 and 1047.
An example would be that Ë E WITH DIAERESIS
(203) comes
before ë e WITH DIAERESIS
(235) on an ASCII platform, but
the latter (83) comes before the former (115) on an EBCDIC platform.
(Astute readers will note that the uppercase version of ß
SMALL LETTER SHARP S
is simply "SS" and that the upper case version of
ÿ y WITH DIAERESIS is not in the 0..255 range but it is
at U+x0178 in Unicode, or "\x{178}"
in a Unicode enabled Perl).
The sort order will cause differences between results obtained on ASCII platforms versus EBCDIC platforms. What follows are some suggestions on how to deal with these differences.
This is the least computationally expensive strategy. It may require some user education.
In order to minimize the expense of mono casing mixed-case text, try to
tr/// towards the character set case most employed within the data.
If the data are primarily UPPERCASE non Latin 1 then apply tr/[a-z]/[A-Z]/
then sort(). If the data are primarily lowercase non Latin 1 then
apply tr/[A-Z]/[a-z]/ before sorting. If the data are primarily UPPERCASE
and include Latin-1 characters then apply:
- tr/[a-z]/[A-Z]/;
- tr/[àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ]/[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ/;
- s/ß/SS/g;
then sort(). Do note however that such Latin-1 manipulation does not
address the ÿ y WITH DIAERESIS character that will remain at
code point 255 on ASCII platforms, but 223 on most EBCDIC platforms
where it will sort to a place less than the EBCDIC numerals. With a
Unicode-enabled Perl you might try:
- tr/^?/\x{178}/;
The strategy of mono casing data before sorting does not preserve the case of the data and may not be acceptable for that reason.
This is the most expensive proposition that does not employ a network connection.
This strategy can employ a network connection. As such it would be computationally expensive.
There are a variety of ways of transforming data with an intra character set mapping that serve a variety of purposes. Sorting was discussed in the previous section and a few of the other more popular mapping techniques are discussed next.
Note that some URLs have hexadecimal ASCII code points in them in an attempt to overcome character or protocol limitation issues. For example the tilde character is not on every keyboard hence a URL of the form:
- http://www.pvhp.com/~pvhp/
may also be expressed as either of:
- http://www.pvhp.com/%7Epvhp/
- http://www.pvhp.com/%7epvhp/
where 7E is the hexadecimal ASCII code point for '~'. Here is an example of decoding such a URL under CCSID 1047:
- $url = 'http://www.pvhp.com/%7Epvhp/';
- # this array assumes code page 1047
- my @a2e_1047 = (
- 0, 1, 2, 3, 55, 45, 46, 47, 22, 5, 21, 11, 12, 13, 14, 15,
- 16, 17, 18, 19, 60, 61, 50, 38, 24, 25, 63, 39, 28, 29, 30, 31,
- 64, 90,127,123, 91,108, 80,125, 77, 93, 92, 78,107, 96, 75, 97,
- 240,241,242,243,244,245,246,247,248,249,122, 94, 76,126,110,111,
- 124,193,194,195,196,197,198,199,200,201,209,210,211,212,213,214,
- 215,216,217,226,227,228,229,230,231,232,233,173,224,189, 95,109,
- 121,129,130,131,132,133,134,135,136,137,145,146,147,148,149,150,
- 151,152,153,162,163,164,165,166,167,168,169,192, 79,208,161, 7,
- 32, 33, 34, 35, 36, 37, 6, 23, 40, 41, 42, 43, 44, 9, 10, 27,
- 48, 49, 26, 51, 52, 53, 54, 8, 56, 57, 58, 59, 4, 20, 62,255,
- 65,170, 74,177,159,178,106,181,187,180,154,138,176,202,175,188,
- 144,143,234,250,190,160,182,179,157,218,155,139,183,184,185,171,
- 100,101, 98,102, 99,103,158,104,116,113,114,115,120,117,118,119,
- 172,105,237,238,235,239,236,191,128,253,254,251,252,186,174, 89,
- 68, 69, 66, 70, 67, 71,156, 72, 84, 81, 82, 83, 88, 85, 86, 87,
- 140, 73,205,206,203,207,204,225,112,221,222,219,220,141,142,223
- );
- $url =~ s/%([0-9a-fA-F]{2})/pack("c",$a2e_1047[hex($1)])/ge;
Conversely, here is a partial solution for the task of encoding such a URL under the 1047 code page:
- $url = 'http://www.pvhp.com/~pvhp/';
- # this array assumes code page 1047
- my @e2a_1047 = (
- 0, 1, 2, 3,156, 9,134,127,151,141,142, 11, 12, 13, 14, 15,
- 16, 17, 18, 19,157, 10, 8,135, 24, 25,146,143, 28, 29, 30, 31,
- 128,129,130,131,132,133, 23, 27,136,137,138,139,140, 5, 6, 7,
- 144,145, 22,147,148,149,150, 4,152,153,154,155, 20, 21,158, 26,
- 32,160,226,228,224,225,227,229,231,241,162, 46, 60, 40, 43,124,
- 38,233,234,235,232,237,238,239,236,223, 33, 36, 42, 41, 59, 94,
- 45, 47,194,196,192,193,195,197,199,209,166, 44, 37, 95, 62, 63,
- 248,201,202,203,200,205,206,207,204, 96, 58, 35, 64, 39, 61, 34,
- 216, 97, 98, 99,100,101,102,103,104,105,171,187,240,253,254,177,
- 176,106,107,108,109,110,111,112,113,114,170,186,230,184,198,164,
- 181,126,115,116,117,118,119,120,121,122,161,191,208, 91,222,174,
- 172,163,165,183,169,167,182,188,189,190,221,168,175, 93,180,215,
- 123, 65, 66, 67, 68, 69, 70, 71, 72, 73,173,244,246,242,243,245,
- 125, 74, 75, 76, 77, 78, 79, 80, 81, 82,185,251,252,249,250,255,
- 92,247, 83, 84, 85, 86, 87, 88, 89, 90,178,212,214,210,211,213,
- 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,179,219,220,217,218,159
- );
- # The following regular expression does not address the
- # mappings for: ('.' => '%2E', '/' => '%2F', ':' => '%3A')
- $url =~ s/([\t "#%&\(\),;<=>\?\@\[\\\]^`{|}~])/sprintf("%%%02X",$e2a_1047[ord($1)])/ge;
where a more complete solution would split the URL into components and apply a full s/// substitution only to the appropriate parts.
In the remaining examples a @e2a or @a2e array may be employed but the assignment will not be shown explicitly. For code page 1047 you could use the @a2e_1047 or @e2a_1047 arrays just shown.
The u
template to pack() or unpack() will render EBCDIC data in EBCDIC
characters equivalent to their ASCII counterparts. For example, the
following will print "Yes indeed\n" on either an ASCII or EBCDIC computer:
- $all_byte_chrs = '';
- for (0..255) { $all_byte_chrs .= chr($_); }
- $uuencode_byte_chrs = pack('u', $all_byte_chrs);
- ($uu = <<'ENDOFHEREDOC') =~ s/^\s*//gm;
- M``$"`P0%!@<("0H+#`T.#Q`1$A,4%187&!D:&QP='A\@(2(C)"4F)R@I*BLL
- M+2XO,#$R,S0U-C<X.3H[/#T^/T!!0D-$149'2$E*2TQ-3D]045)35%565UA9
- M6EM<75Y?8&%B8V1E9F=H:6IK;&UN;W!Q<G-T=79W>'EZ>WQ]?G^`@8*#A(6&
- MAXB)BHN,C8Z/D)&2DY25EI>8F9J;G)V>GZ"AHJ.DI::GJ*FJJZRMKJ^PL;*S
- MM+6VM[BYNKN\O;Z_P,'"P\3%QL?(R<K+S,W.S]#1TM/4U=;7V-G:V]S=WM_@
- ?X>+CY.7FY^CIZNOL[>[O\/'R\_3U]O?X^?K[_/W^_P``
- ENDOFHEREDOC
- if ($uuencode_byte_chrs eq $uu) {
- print "Yes ";
- }
- $uudecode_byte_chrs = unpack('u', $uuencode_byte_chrs);
- if ($uudecode_byte_chrs eq $all_byte_chrs) {
- print "indeed\n";
- }
Here is a very spartan uudecoder that will work on EBCDIC provided that the @e2a array is filled in appropriately:
- #!/usr/local/bin/perl
- @e2a = ( # this must be filled in
- );
- $_ = <> until ($mode,$file) = /^begin\s*(\d*)\s*(\S*)/;
- open(OUT, "> $file") if $file ne "";
- while(<>) {
- last if /^end/;
- next if /[a-z]/;
- next unless int(((($e2a[ord()] - 32 ) & 077) + 2) / 3) ==
- int(length() / 4);
- print OUT unpack("u", $_);
- }
- close(OUT);
- chmod oct($mode), $file;
On ASCII-encoded platforms it is possible to strip characters outside of the printable set using:
- # This QP encoder works on ASCII only
- $qp_string =~ s/([=\x00-\x1F\x80-\xFF])/sprintf("=%02X",ord($1))/ge;
Whereas a QP encoder that works on both ASCII and EBCDIC platforms would look somewhat like the following (where the EBCDIC branch @e2a array is omitted for brevity):
- if (ord('A') == 65) { # ASCII
- $delete = "\x7F"; # ASCII
- @e2a = (0 .. 255) # ASCII to ASCII identity map
- }
- else { # EBCDIC
- $delete = "\x07"; # EBCDIC
- @e2a = # EBCDIC to ASCII map (as shown above)
- }
- $qp_string =~
- s/([^ !"\#\$%&'()*+,\-.\/0-9:;<>?\@A-Z[\\\]^_`a-z{|}~$delete])/sprintf("=%02X",$e2a[ord($1)])/ge;
(although in production code the substitutions might be done in the EBCDIC branch with the @e2a array and separately in the ASCII branch without the expense of the identity map).
Such QP strings can be decoded with:
- # This QP decoder is limited to ASCII only
- $string =~ s/=([0-9A-Fa-f][0-9A-Fa-f])/chr hex $1/ge;
- $string =~ s/=[\n\r]+$//;
Whereas a QP decoder that works on both ASCII and EBCDIC platforms would look somewhat like the following (where the @a2e array is omitted for brevity):
- $string =~ s/=([0-9A-Fa-f][0-9A-Fa-f])/chr $a2e[hex $1]/ge;
- $string =~ s/=[\n\r]+$//;
The practice of shifting an alphabet one or more characters for encipherment dates back thousands of years and was explicitly detailed by Gaius Julius Caesar in his Gallic Wars text. A single alphabet shift is sometimes referred to as a rotation and the shift amount is given as a number $n after the string 'rot' or "rot$n". Rot0 and rot26 would designate identity maps on the 26-letter English version of the Latin alphabet. Rot13 has the interesting property that alternate subsequent invocations are identity maps (thus rot13 is its own non-trivial inverse in the group of 26 alphabet rotations). Hence the following is a rot13 encoder and decoder that will work on ASCII and EBCDIC platforms:
- #!/usr/local/bin/perl
- while(<>){
- tr/n-za-mN-ZA-M/a-zA-Z/;
- print;
- }
In one-liner form:
- perl -ne 'tr/n-za-mN-ZA-M/a-zA-Z/;print'
To the extent that it is possible to write code that depends on hashing order there may be differences between hashes as stored on an ASCII-based platform and hashes stored on an EBCDIC-based platform. XXX
Internationalization (I18N) and localization (L10N) are supported at least in principle even on EBCDIC platforms. The details are system-dependent and discussed under the OS ISSUES in perlebcdic section below.
Perl may work with an internal UTF-EBCDIC encoding form for wide characters on EBCDIC platforms in a manner analogous to the way that it works with the UTF-8 internal encoding form on ASCII based platforms.
Legacy multi byte EBCDIC code pages XXX.
There may be a few system-dependent issues of concern to EBCDIC Perl programmers.
The PASE environment is a runtime environment for OS/400 that can run executables built for PowerPC AIX in OS/400; see perlos400. PASE is ASCII-based, not EBCDIC-based as the ILE.
XXX.
Perl runs under Unix Systems Services or USS.
chcp is supported as a shell utility for displaying and changing one's code page. See also chcp(1).
For sequential data set access try:
- my @ds_records = `cat //DSNAME`;
or:
- my @ds_records = `cat //'HLQ.DSNAME'`;
See also the OS390::Stdio module on CPAN.
iconv is supported as both a shell utility and a C RTL routine. See also the iconv(1) and iconv(3) manual pages.
On OS/390 or z/OS see locale for information on locales. The L10N files are in /usr/nls/locale. $Config{d_setlocale} is 'define' on OS/390 or z/OS.
XXX.
This pod document contains literal Latin 1 characters and may encounter
translation difficulties. In particular one popular nroff implementation
was known to strip accented characters to their unaccented counterparts
while attempting to view this document through the pod2man program
(for example, you may see a plain y rather than one with a diaeresis
as in ÿ). Another nroff truncated the resultant manpage at
the first occurrence of 8 bit characters.
Not all shells will allow multiple -e
string arguments to perl to
be concatenated together properly as recipes 0, 2, 4, 5, and 6 might
seem to imply.
perllocale, perlfunc, perlunicode, utf8.
http://anubis.dkuug.dk/i18n/charmaps
http://www.unicode.org/unicode/reports/tr16/
http://www.wps.com/projects/codes/ ASCII: American Standard Code for Information Infiltration Tom Jennings, September 1999.
The Unicode Standard, Version 3.0 The Unicode Consortium, Lisa Moore ed., ISBN 0-201-61633-5, Addison Wesley Developers Press, February 2000.
CDRA: IBM - Character Data Representation Architecture - Reference and Registry, IBM SC09-2190-00, December 1996.
"Demystifying Character Sets", Andrea Vine, Multilingual Computing & Technology, #26 Vol. 10 Issue 4, August/September 1999; ISSN 1523-0309; Multilingual Computing Inc. Sandpoint ID, USA.
Codes, Ciphers, and Other Cryptic and Clandestine Communication Fred B. Wrixon, ISBN 1-57912-040-7, Black Dog & Leventhal Publishers, 1998.
http://www.bobbemer.com/P-BIT.HTM IBM - EBCDIC and the P-bit; The biggest Computer Goof Ever Robert Bemer.
15 April 2001: added UTF-8 and UTF-EBCDIC to main table, pvhp.
Peter Prymmer pvhp@best.com wrote this in 1999 and 2000 with CCSID 0819 and 0037 help from Chris Leach and André Pirard A.Pirard@ulg.ac.be as well as POSIX-BC help from Thomas Dorner Thomas.Dorner@start.de. Thanks also to Vickie Cooper, Philip Newton, William Raffloer, and Joe Smith. Trademarks, registered trademarks, service marks and registered service marks used in this document are the property of their respective owners.
perlembed - how to embed perl in your C program
Do you want to:
Read about back-quotes and about system and exec in perlfunc.
Rethink your design.
Read on...
Compiling your C program
Adding a Perl interpreter to your C program
Calling a Perl subroutine from your C program
Evaluating a Perl statement from your C program
Performing Perl pattern matches and substitutions from your C program
Fiddling with the Perl stack from your C program
Maintaining a persistent interpreter
Maintaining multiple interpreter instances
Using Perl modules, which themselves use C libraries, from your C program
Embedding Perl under Win32
If you have trouble compiling the scripts in this documentation, you're not alone. The cardinal rule: COMPILE THE PROGRAMS IN EXACTLY THE SAME WAY THAT YOUR PERL WAS COMPILED. (Sorry for yelling.)
Also, every C program that uses Perl must link in the perl library. What's that, you ask? Perl is itself written in C; the perl library is the collection of compiled C programs that were used to create your perl executable (/usr/bin/perl or equivalent). (Corollary: you can't use Perl from your C program unless Perl has been compiled on your machine, or installed properly--that's why you shouldn't blithely copy Perl executables from machine to machine without also copying the lib directory.)
When you use Perl from C, your C program will--usually--allocate, "run", and deallocate a PerlInterpreter object, which is defined by the perl library.
If your copy of Perl is recent enough to contain this documentation (version 5.002 or later), then the perl library (and EXTERN.h and perl.h, which you'll also need) will reside in a directory that looks like this:
- /usr/local/lib/perl5/your_architecture_here/CORE
or perhaps just
- /usr/local/lib/perl5/CORE
or maybe something like
- /usr/opt/perl5/CORE
Execute this statement for a hint about where to find CORE:
- perl -MConfig -e 'print $Config{archlib}'
Here's how you'd compile the example in the next section, Adding a Perl interpreter to your C program, on my Linux box:
- % gcc -O2 -Dbool=char -DHAS_BOOL -I/usr/local/include
- -I/usr/local/lib/perl5/i586-linux/5.003/CORE
- -L/usr/local/lib/perl5/i586-linux/5.003/CORE
- -o interp interp.c -lperl -lm
(That's all one line.) On my DEC Alpha running old 5.003_05, the incantation is a bit different:
- % cc -O2 -Olimit 2900 -DSTANDARD_C -I/usr/local/include
- -I/usr/local/lib/perl5/alpha-dec_osf/5.00305/CORE
- -L/usr/local/lib/perl5/alpha-dec_osf/5.00305/CORE -L/usr/local/lib
- -D__LANGUAGE_C__ -D_NO_PROTO -o interp interp.c -lperl -lm
How can you figure out what to add? Assuming your Perl is post-5.001,
execute a perl -V
command and pay special attention to the "cc" and
"ccflags" information.
You'll have to choose the appropriate compiler (cc, gcc, et al.) for
your machine: perl -MConfig -e 'print $Config{cc}'
will tell you what
to use.
You'll also have to choose the appropriate library directory
(/usr/local/lib/...) for your machine. If your compiler complains
that certain functions are undefined, or that it can't locate
-lperl, then you need to change the path following the -L
. If it
complains that it can't find EXTERN.h and perl.h, you need to
change the path following the -I
.
You may have to add extra libraries as well. Which ones? Perhaps those printed by
- perl -MConfig -e 'print $Config{libs}'
Provided your perl binary was properly configured and installed the ExtUtils::Embed module will determine all of this information for you:
- % cc -o interp interp.c `perl -MExtUtils::Embed -e ccopts -e ldopts`
If the ExtUtils::Embed module isn't part of your Perl distribution, you can retrieve it from http://www.perl.com/perl/CPAN/modules/by-module/ExtUtils/ (If this documentation came from your Perl distribution, then you're running 5.004 or better and you already have it.)
The ExtUtils::Embed kit on CPAN also contains all source code for the examples in this document, tests, additional examples and other information you may find useful.
In a sense, perl (the C program) is a good example of embedding Perl (the language), so I'll demonstrate embedding with miniperlmain.c, included in the source distribution. Here's a bastardized, non-portable version of miniperlmain.c containing the essentials of embedding:
- #include <EXTERN.h> /* from the Perl distribution */
- #include <perl.h> /* from the Perl distribution */
- static PerlInterpreter *my_perl; /*** The Perl interpreter ***/
- int main(int argc, char **argv, char **env)
- {
- PERL_SYS_INIT3(&argc,&argv,&env);
- my_perl = perl_alloc();
- perl_construct(my_perl);
- PL_exit_flags |= PERL_EXIT_DESTRUCT_END;
- perl_parse(my_perl, NULL, argc, argv, (char **)NULL);
- perl_run(my_perl);
- perl_destruct(my_perl);
- perl_free(my_perl);
- PERL_SYS_TERM();
- }
Notice that we don't use the env
pointer. Normally handed to
perl_parse
as its final argument, env
here is replaced by
NULL
, which means that the current environment will be used.
The macros PERL_SYS_INIT3() and PERL_SYS_TERM() provide system-specific tune up of the C runtime environment necessary to run Perl interpreters; they should only be called once regardless of how many interpreters you create or destroy. Call PERL_SYS_INIT3() before you create your first interpreter, and PERL_SYS_TERM() after you free your last interpreter.
Since PERL_SYS_INIT3() may change env
, it may be more appropriate to
provide env
as an argument to perl_parse().
Also notice that no matter what arguments you pass to perl_parse(), PERL_SYS_INIT3() must be invoked on the C main() argc, argv and env and only once.
Now compile this program (I'll call it interp.c) into an executable:
- % cc -o interp interp.c `perl -MExtUtils::Embed -e ccopts -e ldopts`
After a successful compilation, you'll be able to use interp just like perl itself:
- % interp
- print "Pretty Good Perl \n";
- print "10890 - 9801 is ", 10890 - 9801;
- <CTRL-D>
- Pretty Good Perl
- 10890 - 9801 is 1089
or
- % interp -e 'printf("%x", 3735928559)'
- deadbeef
You can also read and execute Perl statements from a file while in the midst of your C program, by placing the filename in argv[1] before calling perl_run.
To call individual Perl subroutines, you can use any of the call_*
functions documented in perlcall.
In this example we'll use call_argv
.
That's shown below, in a program I'll call showtime.c.
- #include <EXTERN.h>
- #include <perl.h>
- static PerlInterpreter *my_perl;
- int main(int argc, char **argv, char **env)
- {
- char *args[] = { NULL };
- PERL_SYS_INIT3(&argc,&argv,&env);
- my_perl = perl_alloc();
- perl_construct(my_perl);
- perl_parse(my_perl, NULL, argc, argv, NULL);
- PL_exit_flags |= PERL_EXIT_DESTRUCT_END;
- /*** skipping perl_run() ***/
- call_argv("showtime", G_DISCARD | G_NOARGS, args);
- perl_destruct(my_perl);
- perl_free(my_perl);
- PERL_SYS_TERM();
- }
where showtime is a Perl subroutine that takes no arguments (that's the G_NOARGS) and for which I'll ignore the return value (that's the G_DISCARD). Those flags, and others, are discussed in perlcall.
I'll define the showtime subroutine in a file called showtime.pl:
Simple enough. Now compile and run:
- % cc -o showtime showtime.c `perl -MExtUtils::Embed -e ccopts -e ldopts`
- % showtime showtime.pl
- 818284590
yielding the number of seconds that elapsed between January 1, 1970 (the beginning of the Unix epoch), and the moment I began writing this sentence.
In this particular case we don't have to call perl_run, as we set the PL_exit_flag PERL_EXIT_DESTRUCT_END which executes END blocks in perl_destruct.
If you want to pass arguments to the Perl subroutine, you can add
strings to the NULL
-terminated args
list passed to
call_argv. For other data types, or to examine return values,
you'll need to manipulate the Perl stack. That's demonstrated in
Fiddling with the Perl stack from your C program.
Perl provides two API functions to evaluate pieces of Perl code. These are eval_sv in perlapi and eval_pv in perlapi.
Arguably, these are the only routines you'll ever need to execute snippets of Perl code from within your C program. Your code can be as long as you wish; it can contain multiple statements; it can employ use, require, and do to include external Perl files.
eval_pv lets us evaluate individual Perl strings, and then
extract variables for coercion into C types. The following program,
string.c, executes three Perl strings, extracting an int from
the first, a float
from the second, and a char *
from the third.
- #include <EXTERN.h>
- #include <perl.h>
- static PerlInterpreter *my_perl;
- main (int argc, char **argv, char **env)
- {
- char *embedding[] = { "", "-e", "0" };
- PERL_SYS_INIT3(&argc,&argv,&env);
- my_perl = perl_alloc();
- perl_construct( my_perl );
- perl_parse(my_perl, NULL, 3, embedding, NULL);
- PL_exit_flags |= PERL_EXIT_DESTRUCT_END;
- perl_run(my_perl);
- /** Treat $a as an integer **/
- eval_pv("$a = 3; $a **= 2", TRUE);
- printf("a = %d\n", SvIV(get_sv("a", 0)));
- /** Treat $a as a float **/
- eval_pv("$a = 3.14; $a **= 2", TRUE);
- printf("a = %f\n", SvNV(get_sv("a", 0)));
- /** Treat $a as a string **/
- eval_pv("$a = 'rekcaH lreP rehtonA tsuJ'; $a = reverse($a);", TRUE);
- printf("a = %s\n", SvPV_nolen(get_sv("a", 0)));
- perl_destruct(my_perl);
- perl_free(my_perl);
- PERL_SYS_TERM();
- }
All of those strange functions with sv in their names help convert Perl scalars to C types. They're described in perlguts and perlapi.
If you compile and run string.c, you'll see the results of using
SvIV() to create an int, SvNV() to create a float
, and
SvPV() to create a string:
- a = 9
- a = 9.859600
- a = Just Another Perl Hacker
In the example above, we've created a global variable to temporarily store the computed value of our eval'ed expression. It is also possible and in most cases a better strategy to fetch the return value from eval_pv() instead. Example:
- ...
- SV *val = eval_pv("reverse 'rekcaH lreP rehtonA tsuJ'", TRUE);
- printf("%s\n", SvPV_nolen(val));
- ...
This way, we avoid namespace pollution by not creating global variables and we've simplified our code as well.
The eval_sv() function lets us evaluate strings of Perl code, so we can define some functions that use it to "specialize" in matches and substitutions: match(), substitute(), and matches().
- I32 match(SV *string, char *pattern);
Given a string and a pattern (e.g., m/clasp/ or /\b\w*\b/
, which
in your C program might appear as "/\\b\\w*\\b/"), match()
returns 1 if the string matches the pattern and 0 otherwise.
- int substitute(SV **string, char *pattern);
Given a pointer to an SV
and an =~
operation (e.g.,
s/bob/robert/g or tr[A-Z][a-z]), substitute() modifies the string
within the SV
as according to the operation, returning the number of substitutions
made.
- int matches(SV *string, char *pattern, AV **matches);
Given an SV
, a pattern, and a pointer to an empty AV
,
matches() evaluates $string =~ $pattern
in a list context, and
fills in matches with the array elements, returning the number of matches found.
Here's a sample program, match.c, that uses all three (long lines have been wrapped here):
- #include <EXTERN.h>
- #include <perl.h>
- static PerlInterpreter *my_perl;
- /** my_eval_sv(code, error_check)
- ** kinda like eval_sv(),
- ** but we pop the return value off the stack
- **/
- SV* my_eval_sv(SV *sv, I32 croak_on_error)
- {
- dSP;
- SV* retval;
- PUSHMARK(SP);
- eval_sv(sv, G_SCALAR);
- SPAGAIN;
- retval = POPs;
- PUTBACK;
- if (croak_on_error && SvTRUE(ERRSV))
- croak(SvPVx_nolen(ERRSV));
- return retval;
- }
- /** match(string, pattern)
- **
- ** Used for matches in a scalar context.
- **
- ** Returns 1 if the match was successful; 0 otherwise.
- **/
- I32 match(SV *string, char *pattern)
- {
- SV *command = newSV(0), *retval;
- sv_setpvf(command, "my $string = '%s'; $string =~ %s",
- SvPV_nolen(string), pattern);
- retval = my_eval_sv(command, TRUE);
- SvREFCNT_dec(command);
- return SvIV(retval);
- }
- /** substitute(string, pattern)
- **
- ** Used for =~ operations that modify their left-hand side (s/// and tr///)
- **
- ** Returns the number of successful matches, and
- ** modifies the input string if there were any.
- **/
- I32 substitute(SV **string, char *pattern)
- {
- SV *command = newSV(0), *retval;
- sv_setpvf(command, "$string = '%s'; ($string =~ %s)",
- SvPV_nolen(*string), pattern);
- retval = my_eval_sv(command, TRUE);
- SvREFCNT_dec(command);
- *string = get_sv("string", 0);
- return SvIV(retval);
- }
- /** matches(string, pattern, matches)
- **
- ** Used for matches in a list context.
- **
- ** Returns the number of matches,
- ** and fills in **matches with the matching substrings
- **/
- I32 matches(SV *string, char *pattern, AV **match_list)
- {
- SV *command = newSV(0);
- I32 num_matches;
- sv_setpvf(command, "my $string = '%s'; @array = ($string =~ %s)",
- SvPV_nolen(string), pattern);
- my_eval_sv(command, TRUE);
- SvREFCNT_dec(command);
- *match_list = get_av("array", 0);
- num_matches = av_top_index(*match_list) + 1;
- return num_matches;
- }
- main (int argc, char **argv, char **env)
- {
- char *embedding[] = { "", "-e", "0" };
- AV *match_list;
- I32 num_matches, i;
- SV *text;
- PERL_SYS_INIT3(&argc,&argv,&env);
- my_perl = perl_alloc();
- perl_construct(my_perl);
- perl_parse(my_perl, NULL, 3, embedding, NULL);
- PL_exit_flags |= PERL_EXIT_DESTRUCT_END;
- text = newSV(0);
- sv_setpv(text, "When he is at a convenience store and the "
- "bill comes to some amount like 76 cents, Maynard is "
- "aware that there is something he *should* do, something "
- "that will enable him to get back a quarter, but he has "
- "no idea *what*. He fumbles through his red squeezey "
- "changepurse and gives the boy three extra pennies with "
- "his dollar, hoping that he might luck into the correct "
- "amount. The boy gives him back two of his own pennies "
- "and then the big shiny quarter that is his prize. "
- "-RICHH");
- if (match(text, "m/quarter/")) /** Does text contain 'quarter'? **/
- printf("match: Text contains the word 'quarter'.\n\n");
- else
- printf("match: Text doesn't contain the word 'quarter'.\n\n");
- if (match(text, "m/eighth/")) /** Does text contain 'eighth'? **/
- printf("match: Text contains the word 'eighth'.\n\n");
- else
- printf("match: Text doesn't contain the word 'eighth'.\n\n");
- /** Match all occurrences of /wi../ **/
- num_matches = matches(text, "m/(wi..)/g", &match_list);
- printf("matches: m/(wi..)/g found %d matches...\n", num_matches);
- for (i = 0; i < num_matches; i++)
- printf("match: %s\n", SvPV_nolen(*av_fetch(match_list, i, FALSE)));
- printf("\n");
- /** Remove all vowels from text **/
- num_matches = substitute(&text, "s/[aeiou]//gi");
- if (num_matches) {
- printf("substitute: s/[aeiou]//gi...%d substitutions made.\n",
- num_matches);
- printf("Now text is: %s\n\n", SvPV_nolen(text));
- }
- /** Attempt a substitution **/
- if (!substitute(&text, "s/Perl/C/")) {
- printf("substitute: s/Perl/C...No substitution made.\n\n");
- }
- SvREFCNT_dec(text);
- PL_perl_destruct_level = 1;
- perl_destruct(my_perl);
- perl_free(my_perl);
- PERL_SYS_TERM();
- }
which produces the output (again, long lines have been wrapped here)
- match: Text contains the word 'quarter'.
- match: Text doesn't contain the word 'eighth'.
- matches: m/(wi..)/g found 2 matches...
- match: will
- match: with
- substitute: s/[aeiou]//gi...139 substitutions made.
- Now text is: Whn h s t cnvnnc str nd th bll cms t sm mnt lk 76 cnts,
- Mynrd s wr tht thr s smthng h *shld* d, smthng tht wll nbl hm t gt bck
- qrtr, bt h hs n d *wht*. H fmbls thrgh hs rd sqzy chngprs nd gvs th by
- thr xtr pnns wth hs dllr, hpng tht h mght lck nt th crrct mnt. Th by gvs
- hm bck tw f hs wn pnns nd thn th bg shny qrtr tht s hs prz. -RCHH
- substitute: s/Perl/C...No substitution made.
When trying to explain stacks, most computer science textbooks mumble something about spring-loaded columns of cafeteria plates: the last thing you pushed on the stack is the first thing you pop off. That'll do for our purposes: your C program will push some arguments onto "the Perl stack", shut its eyes while some magic happens, and then pop the results--the return value of your Perl subroutine--off the stack.
First you'll need to know how to convert between C types and Perl types, with newSViv() and sv_setnv() and newAV() and all their friends. They're described in perlguts and perlapi.
Then you'll need to know how to manipulate the Perl stack. That's described in perlcall.
Once you've understood those, embedding Perl in C is easy.
Because C has no builtin function for integer exponentiation, let's make Perl's ** operator available to it (this is less useful than it sounds, because Perl implements ** with C's pow() function). First I'll create a stub exponentiation function in power.pl:
Now I'll create a C program, power.c, with a function PerlPower() that contains all the perlguts necessary to push the two arguments into expo() and to pop the return value out. Take a deep breath...
- #include <EXTERN.h>
- #include <perl.h>
- static PerlInterpreter *my_perl;
- static void
- PerlPower(int a, int b)
- {
- dSP; /* initialize stack pointer */
- ENTER; /* everything created after here */
- SAVETMPS; /* ...is a temporary variable. */
- PUSHMARK(SP); /* remember the stack pointer */
- XPUSHs(sv_2mortal(newSViv(a))); /* push the base onto the stack */
- XPUSHs(sv_2mortal(newSViv(b))); /* push the exponent onto stack */
- PUTBACK; /* make local stack pointer global */
- call_pv("expo", G_SCALAR); /* call the function */
- SPAGAIN; /* refresh stack pointer */
- /* pop the return value from stack */
- printf ("%d to the %dth power is %d.\n", a, b, POPi);
- PUTBACK;
- FREETMPS; /* free that return value */
- LEAVE; /* ...and the XPUSHed "mortal" args.*/
- }
- int main (int argc, char **argv, char **env)
- {
- char *my_argv[] = { "", "power.pl" };
- PERL_SYS_INIT3(&argc,&argv,&env);
- my_perl = perl_alloc();
- perl_construct( my_perl );
- perl_parse(my_perl, NULL, 2, my_argv, (char **)NULL);
- PL_exit_flags |= PERL_EXIT_DESTRUCT_END;
- perl_run(my_perl);
- PerlPower(3, 4); /*** Compute 3 ** 4 ***/
- perl_destruct(my_perl);
- perl_free(my_perl);
- PERL_SYS_TERM();
- }
Compile and run:
- % cc -o power power.c `perl -MExtUtils::Embed -e ccopts -e ldopts`
- % power
- 3 to the 4th power is 81.
When developing interactive and/or potentially long-running applications, it's a good idea to maintain a persistent interpreter rather than allocating and constructing a new interpreter multiple times. The major reason is speed: since Perl will only be loaded into memory once.
However, you have to be more cautious with namespace and variable
scoping when using a persistent interpreter. In previous examples
we've been using global variables in the default package main
. We
knew exactly what code would be run, and assumed we could avoid
variable collisions and outrageous symbol table growth.
Let's say your application is a server that will occasionally run Perl code from some arbitrary file. Your server has no way of knowing what code it's going to run. Very dangerous.
If the file is pulled in by perl_parse()
, compiled into a newly
constructed interpreter, and subsequently cleaned out with
perl_destruct()
afterwards, you're shielded from most namespace
troubles.
One way to avoid namespace collisions in this scenario is to translate
the filename into a guaranteed-unique package name, and then compile
the code into that package using eval. In the example
below, each file will only be compiled once. Or, the application
might choose to clean out the symbol table associated with the file
after it's no longer needed. Using call_argv in perlapi, We'll
call the subroutine Embed::Persistent::eval_file
which lives in the
file persistent.pl
and pass the filename and boolean cleanup/cache
flag as arguments.
Note that the process will continue to grow for each file that it
uses. In addition, there might be AUTOLOAD
ed subroutines and other
conditions that cause Perl's symbol table to grow. You might want to
add some logic that keeps track of the process size, or restarts
itself after a certain number of requests, to ensure that memory
consumption is minimized. You'll also want to scope your variables
with my whenever possible.
- package Embed::Persistent;
- #persistent.pl
- use strict;
- our %Cache;
- use Symbol qw(delete_package);
- sub valid_package_name {
- my($string) = @_;
- $string =~ s/([^A-Za-z0-9\/])/sprintf("_%2x",unpack("C",$1))/eg;
- # second pass only for words starting with a digit
- $string =~ s|/(\d)|sprintf("/_%2x",unpack("C",$1))|eg;
- # Dress it up as a real package name
- $string =~ s|/|::|g;
- return "Embed" . $string;
- }
- sub eval_file {
- my($filename, $delete) = @_;
- my $package = valid_package_name($filename);
- my $mtime = -M $filename;
- if(defined $Cache{$package}{mtime}
- &&
- $Cache{$package}{mtime} <= $mtime)
- {
- # we have compiled this subroutine already,
- # it has not been updated on disk, nothing left to do
- print STDERR "already compiled $package->handler\n";
- }
- else {
- local *FH;
- open FH, $filename or die "open '$filename' $!";
- local($/) = undef;
- my $sub = <FH>;
- close FH;
- #wrap the code into a subroutine inside our unique package
- my $eval = qq{package $package; sub handler { $sub; }};
- {
- # hide our variables within this block
- my($filename,$mtime,$package,$sub);
- eval $eval;
- }
- die $@ if $@;
- #cache it unless we're cleaning out each time
- $Cache{$package}{mtime} = $mtime unless $delete;
- }
- eval {$package->handler;};
- die $@ if $@;
- delete_package($package) if $delete;
- #take a look if you want
- #print Devel::Symdump->rnew($package)->as_string, $/;
- }
- 1;
- __END__
- /* persistent.c */
- #include <EXTERN.h>
- #include <perl.h>
- /* 1 = clean out filename's symbol table after each request, 0 = don't */
- #ifndef DO_CLEAN
- #define DO_CLEAN 0
- #endif
- #define BUFFER_SIZE 1024
- static PerlInterpreter *my_perl = NULL;
- int
- main(int argc, char **argv, char **env)
- {
- char *embedding[] = { "", "persistent.pl" };
- char *args[] = { "", DO_CLEAN, NULL };
- char filename[BUFFER_SIZE];
- int exitstatus = 0;
- PERL_SYS_INIT3(&argc,&argv,&env);
- if((my_perl = perl_alloc()) == NULL) {
- fprintf(stderr, "no memory!");
- exit(1);
- }
- perl_construct(my_perl);
- PL_origalen = 1; /* don't let $0 assignment update the proctitle or embedding[0] */
- exitstatus = perl_parse(my_perl, NULL, 2, embedding, NULL);
- PL_exit_flags |= PERL_EXIT_DESTRUCT_END;
- if(!exitstatus) {
- exitstatus = perl_run(my_perl);
- while(printf("Enter file name: ") &&
- fgets(filename, BUFFER_SIZE, stdin)) {
- filename[strlen(filename)-1] = '\0'; /* strip \n */
- /* call the subroutine, passing it the filename as an argument */
- args[0] = filename;
- call_argv("Embed::Persistent::eval_file",
- G_DISCARD | G_EVAL, args);
- /* check $@ */
- if(SvTRUE(ERRSV))
- fprintf(stderr, "eval error: %s\n", SvPV_nolen(ERRSV));
- }
- }
- PL_perl_destruct_level = 0;
- perl_destruct(my_perl);
- perl_free(my_perl);
- PERL_SYS_TERM();
- exit(exitstatus);
- }
Now compile:
- % cc -o persistent persistent.c `perl -MExtUtils::Embed -e ccopts -e ldopts`
Here's an example script file:
Now run:
- % persistent
- Enter file name: test.pl
- foo says: hello
- Enter file name: test.pl
- already compiled Embed::test_2epl->handler
- foo says: hello
- Enter file name: ^C
Traditionally END blocks have been executed at the end of the perl_run.
This causes problems for applications that never call perl_run. Since
perl 5.7.2 you can specify PL_exit_flags |= PERL_EXIT_DESTRUCT_END
to get the new behaviour. This also enables the running of END blocks if
the perl_parse fails and perl_destruct
will return the exit value.
When a perl script assigns a value to $0 then the perl runtime will
try to make this value show up as the program name reported by "ps" by
updating the memory pointed to by the argv passed to perl_parse() and
also calling API functions like setproctitle() where available. This
behaviour might not be appropriate when embedding perl and can be
disabled by assigning the value 1
to the variable PL_origalen
before perl_parse() is called.
The persistent.c example above is for instance likely to segfault
when $0 is assigned to if the PL_origalen = 1;
assignment is
removed. This because perl will try to write to the read only memory
of the embedding[]
strings.
Some rare applications will need to create more than one interpreter during a session. Such an application might sporadically decide to release any resources associated with the interpreter.
The program must take care to ensure that this takes place before
the next interpreter is constructed. By default, when perl is not
built with any special options, the global variable
PL_perl_destruct_level
is set to 0
, since extra cleaning isn't
usually needed when a program only ever creates a single interpreter
in its entire lifetime.
Setting PL_perl_destruct_level
to 1
makes everything squeaky clean:
- while(1) {
- ...
- /* reset global variables here with PL_perl_destruct_level = 1 */
- PL_perl_destruct_level = 1;
- perl_construct(my_perl);
- ...
- /* clean and reset _everything_ during perl_destruct */
- PL_perl_destruct_level = 1;
- perl_destruct(my_perl);
- perl_free(my_perl);
- ...
- /* let's go do it again! */
- }
When perl_destruct() is called, the interpreter's syntax parse tree
and symbol tables are cleaned up, and global variables are reset. The
second assignment to PL_perl_destruct_level
is needed because
perl_construct resets it to 0
.
Now suppose we have more than one interpreter instance running at the
same time. This is feasible, but only if you used the Configure option
-Dusemultiplicity
or the options -Dusethreads -Duseithreads
when
building perl. By default, enabling one of these Configure options
sets the per-interpreter global variable PL_perl_destruct_level
to
1
, so that thorough cleaning is automatic and interpreter variables
are initialized correctly. Even if you don't intend to run two or
more interpreters at the same time, but to run them sequentially, like
in the above example, it is recommended to build perl with the
-Dusemultiplicity
option otherwise some interpreter variables may
not be initialized correctly between consecutive runs and your
application may crash.
See also Thread-aware system interfaces in perlxs.
Using -Dusethreads -Duseithreads
rather than -Dusemultiplicity
is more appropriate if you intend to run multiple interpreters
concurrently in different threads, because it enables support for
linking in the thread libraries of your system with the interpreter.
Let's give it a try:
- #include <EXTERN.h>
- #include <perl.h>
- /* we're going to embed two interpreters */
- #define SAY_HELLO "-e", "print qq(Hi, I'm $^X\n)"
- int main(int argc, char **argv, char **env)
- {
- PerlInterpreter *one_perl, *two_perl;
- char *one_args[] = { "one_perl", SAY_HELLO };
- char *two_args[] = { "two_perl", SAY_HELLO };
- PERL_SYS_INIT3(&argc,&argv,&env);
- one_perl = perl_alloc();
- two_perl = perl_alloc();
- PERL_SET_CONTEXT(one_perl);
- perl_construct(one_perl);
- PERL_SET_CONTEXT(two_perl);
- perl_construct(two_perl);
- PERL_SET_CONTEXT(one_perl);
- perl_parse(one_perl, NULL, 3, one_args, (char **)NULL);
- PERL_SET_CONTEXT(two_perl);
- perl_parse(two_perl, NULL, 3, two_args, (char **)NULL);
- PERL_SET_CONTEXT(one_perl);
- perl_run(one_perl);
- PERL_SET_CONTEXT(two_perl);
- perl_run(two_perl);
- PERL_SET_CONTEXT(one_perl);
- perl_destruct(one_perl);
- PERL_SET_CONTEXT(two_perl);
- perl_destruct(two_perl);
- PERL_SET_CONTEXT(one_perl);
- perl_free(one_perl);
- PERL_SET_CONTEXT(two_perl);
- perl_free(two_perl);
- PERL_SYS_TERM();
- }
Note the calls to PERL_SET_CONTEXT(). These are necessary to initialize the global state that tracks which interpreter is the "current" one on the particular process or thread that may be running it. It should always be used if you have more than one interpreter and are making perl API calls on both interpreters in an interleaved fashion.
PERL_SET_CONTEXT(interp) should also be called whenever interp
is
used by a thread that did not create it (using either perl_alloc(), or
the more esoteric perl_clone()).
Compile as usual:
- % cc -o multiplicity multiplicity.c `perl -MExtUtils::Embed -e ccopts -e ldopts`
Run it, Run it:
- % multiplicity
- Hi, I'm one_perl
- Hi, I'm two_perl
If you've played with the examples above and tried to embed a script that use()s a Perl module (such as Socket) which itself uses a C or C++ library, this probably happened:
- Can't load module Socket, dynamic loading not available in this perl.
- (You may need to build a new perl executable which either supports
- dynamic loading or has the Socket module statically linked into it.)
What's wrong?
Your interpreter doesn't know how to communicate with these extensions on its own. A little glue will help. Up until now you've been calling perl_parse(), handing it NULL for the second argument:
- perl_parse(my_perl, NULL, argc, my_argv, NULL);
That's where the glue code can be inserted to create the initial contact between Perl and linked C/C++ routines. Let's take a look some pieces of perlmain.c to see how Perl does this:
- static void xs_init (pTHX);
- EXTERN_C void boot_DynaLoader (pTHX_ CV* cv);
- EXTERN_C void boot_Socket (pTHX_ CV* cv);
- EXTERN_C void
- xs_init(pTHX)
- {
- char *file = __FILE__;
- /* DynaLoader is a special case */
- newXS("DynaLoader::boot_DynaLoader", boot_DynaLoader, file);
- newXS("Socket::bootstrap", boot_Socket, file);
- }
Simply put: for each extension linked with your Perl executable (determined during its initial configuration on your computer or when adding a new extension), a Perl subroutine is created to incorporate the extension's routines. Normally, that subroutine is named Module::bootstrap() and is invoked when you say use Module. In turn, this hooks into an XSUB, boot_Module, which creates a Perl counterpart for each of the extension's XSUBs. Don't worry about this part; leave that to the xsubpp and extension authors. If your extension is dynamically loaded, DynaLoader creates Module::bootstrap() for you on the fly. In fact, if you have a working DynaLoader then there is rarely any need to link in any other extensions statically.
Once you have this code, slap it into the second argument of perl_parse():
- perl_parse(my_perl, xs_init, argc, my_argv, NULL);
Then compile:
- % cc -o interp interp.c `perl -MExtUtils::Embed -e ccopts -e ldopts`
- % interp
- use Socket;
- use SomeDynamicallyLoadedModule;
- print "Now I can use extensions!\n"'
ExtUtils::Embed can also automate writing the xs_init glue code.
- % perl -MExtUtils::Embed -e xsinit -- -o perlxsi.c
- % cc -c perlxsi.c `perl -MExtUtils::Embed -e ccopts`
- % cc -c interp.c `perl -MExtUtils::Embed -e ccopts`
- % cc -o interp perlxsi.o interp.o `perl -MExtUtils::Embed -e ldopts`
Consult perlxs, perlguts, and perlapi for more details.
If you completely hide the short forms of the Perl public API, add -DPERL_NO_SHORT_NAMES to the compilation flags. This means that for example instead of writing
- warn("%d bottles of beer on the wall", bottlecount);
you will have to write the explicit full form
- Perl_warn(aTHX_ "%d bottles of beer on the wall", bottlecount);
(See Background and PERL_IMPLICIT_CONTEXT in perlguts for the explanation
of the aTHX_
. ) Hiding the short forms is very useful for avoiding
all sorts of nasty (C preprocessor or otherwise) conflicts with other
software packages (Perl defines about 2400 APIs with these short names,
take or leave few hundred, so there certainly is room for conflict.)
You can sometimes write faster code in C, but you can always write code faster in Perl. Because you can use each from the other, combine them as you wish.
Jon Orwant <orwant@media.mit.edu> and Doug MacEachern <dougm@covalent.net>, with small contributions from Tim Bunce, Tom Christiansen, Guy Decoux, Hallvard Furuseth, Dov Grobgeld, and Ilya Zakharevich.
Doug MacEachern has an article on embedding in Volume 1, Issue 4 of The Perl Journal ( http://www.tpj.com/ ). Doug is also the developer of the most widely-used Perl embedding: the mod_perl system (perl.apache.org), which embeds Perl in the Apache web server. Oracle, Binary Evolution, ActiveState, and Ben Sugars's nsapi_perl have used this model for Oracle, Netscape and Internet Information Server Perl plugins.
Copyright (C) 1995, 1996, 1997, 1998 Doug MacEachern and Jon Orwant. All Rights Reserved.
This document may be distributed under the same terms as Perl itself.
perlexperiment - A listing of experimental features in Perl
This document lists the current and past experimental features in the perl core. Although all of these are documented with their appropriate topics, this succinct listing gives you an overview and basic facts about their status.
So far we've merely tried to find and list the experimental features and infer their inception, versions, etc. There's a lot of speculation here.
Introduced in Perl 5.6.0
Introduced in Perl 5.7.0
our can now have an experimental optional attribute unique
Introduced in Perl 5.8.0
Deprecated in Perl 5.10.0
Introduced in Perl 5.9.2
See also Socket
See also perlrun
See also perlrun
See also perldsc
See also perlguts
Introduced in Perl 5.13.7
%^H
Introduced in Perl 5.13.7
See also cophh_
in perlapi.
Introduced in Perl 5.18.0
Introduced in Perl 5.16.0
Introduced in Perl 5.16.0
Introduced in Perl 5.6.0
See also perlsub
installhtml
target in the Makefile.
(?{code})
See also perlre
(??{ code })
See also perlre
~~
)
Introduced in Perl 5.10.0
Modified in Perl 5.10.1, 5.12.0
$_
Introduced in Perl 5.10.0
(*ACCEPT)
Introduced in: Perl 5.10
See also perlintern
See also perllinux
See PL_keyword_plugin in perlapi for the mechanism.
Introduced in: Perl 5.11.2
Introduced in Perl 5.14.0
Introduced in: Perl 5.18
See also: Lexical Subroutines in perlsub
Introduced in: Perl 5.18
See also: Extended Bracketed Character Classes in perlrecharclass
These features were so wildly successful and played so well with others that we decided to remove their experimental status and admit them as full, stable features in the world of Perl, lavishing all the benefits and luxuries thereof. They are also awarded +5 Stability and +3 Charisma.
\N
regex character class
The \N
character class, not to be confused with the named character
sequence \N{NAME}
, denotes any non-newline character in a regular
expression.
Introduced in: Perl 5.12
Introduced in Perl 5.6.1
See also perlfork
Introduced in Perl 5.6.0
See also perldebug, perldebtut
Introduced in Perl 5.6.0
Introduced in Perl 5.6.0
Introduced in Perl 5.005
Introduced in Perl 5.005
These features are no longer considered experimental and their functionality has disappeared. It's your own fault if you wrote production programs using these features after we explicitly told you not to (see perlpolicy).
legacy
The experimental legacy
pragma was swallowed by the feature
pragma.
Introduced in: 5.11.2
Removed in: 5.11.3
The -A
command line switch
Introduced in Perl 5.9.0
Removed in Perl 5.9.5
Moved from Perl 5.10.1 to CPAN
Getopt::Long
upgraded to version 2.35
Removed in Perl 5.8.8
Introduced in Perl 5.6.0
Removed in Perl 5.9.0
Introduced in Perl 5.005
Removed in Perl 5.10
Introduced in Perl 5.005
Moved from Perl 5.9.0 to CPAN
brian d foy <brian.d.foy@gmail.com>
Sébastien Aperghis-Tramoni <saper@cpan.org>
Copyright 2010, brian d foy <brian.d.foy@gmail.com>
You can use and redistribute this document under the same terms as Perl itself.
perlfaq - frequently asked questions about Perl
The perlfaq comprises several documents that answer the most commonly asked questions about Perl and Perl programming. It's divided by topic into nine major sections outlined in this document.
The perlfaq is an evolving document. Read the latest version at http://learn.perl.org/faq/. It is also included in the standard Perl distribution.
The perldoc
command line tool is part of the standard Perl distribution. To
read the perlfaq:
- $ perldoc perlfaq
To search the perlfaq question headings:
- $ perldoc -q open
Review https://github.com/perl-doc-cats/perlfaq/wiki. If you don't find your suggestion create an issue or pull request against https://github.com/perl-doc-cats/perlfaq.
Once approved, changes are merged into https://github.com/tpf/perlfaq, the repository which drives http://learn.perl.org/faq/, and they are distributed with the next Perl 5 release.
Try the resources in perlfaq2.
This section of the FAQ answers very general, high-level questions about Perl.
What is Perl?
Who supports Perl? Who develops it? Why is it free?
Which version of Perl should I use?
What are Perl 4, Perl 5, or Perl 6?
What is Perl 6?
How stable is Perl?
Is Perl difficult to learn?
How does Perl compare with other languages like Java, Python, REXX, Scheme, or Tcl?
Can I do [task] in Perl?
When shouldn't I program in Perl?
What's the difference between "perl" and "Perl"?
What is a JAPH?
How can I convince others to use Perl?
This section of the FAQ answers questions about where to find source and documentation for Perl, support, and related matters.
What machines support Perl? Where do I get it?
How can I get a binary version of Perl?
I don't have a C compiler. How can I build my own Perl interpreter?
I copied the Perl binary from one machine to another, but scripts don't work.
I grabbed the sources and tried to compile but gdbm/dynamic loading/malloc/linking/... failed. How do I make it work?
What modules and extensions are available for Perl? What is CPAN?
Where can I get information on Perl?
What is perl.com? Perl Mongers? pm.org? perl.org? cpan.org?
Where can I post questions?
Perl Books
Which magazines have Perl content?
Which Perl blogs should I read?
What mailing lists are there for Perl?
Where can I buy a commercial version of Perl?
Where do I send bug reports?
This section of the FAQ answers questions related to programmer tools and programming support.
How do I do (anything)?
How can I use Perl interactively?
How do I find which modules are installed on my system?
How do I debug my Perl programs?
How do I profile my Perl programs?
How do I cross-reference my Perl programs?
Is there a pretty-printer (formatter) for Perl?
Is there an IDE or Windows Perl Editor?
Where can I get Perl macros for vi?
Where can I get perl-mode or cperl-mode for emacs?
How can I use curses with Perl?
How can I write a GUI (X, Tk, Gtk, etc.) in Perl?
How can I make my Perl program run faster?
How can I make my Perl program take less memory?
Is it safe to return a reference to local or lexical data?
How can I free an array or hash so my program shrinks?
How can I make my CGI script more efficient?
How can I hide the source for my Perl program?
How can I compile my Perl program into byte code or C?
How can I get #!perl
to work on [MS-DOS,NT,...]?
Can I write useful Perl programs on the command line?
Why don't Perl one-liners work on my DOS/Mac/VMS system?
Where can I learn about CGI or Web programming in Perl?
Where can I learn about object-oriented Perl programming?
Where can I learn about linking C with Perl?
I've read perlembed, perlguts, etc., but I can't embed perl in my C program; what am I doing wrong?
When I tried to run my script, I got this message. What does it mean?
What's MakeMaker?
This section of the FAQ answers questions related to manipulating numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
Why is int() broken?
Why isn't my octal data interpreted correctly?
Does Perl have a round() function? What about ceil() and floor()? Trig functions?
How do I convert between numeric representations/bases/radixes?
Why doesn't & work the way I want it to?
How do I multiply matrices?
How do I perform an operation on a series of integers?
How can I output Roman numerals?
Why aren't my random numbers random?
How do I get a random number between X and Y?
How do I find the day or week of the year?
How do I find the current century or millennium?
How can I compare two dates and find the difference?
How can I take a string and turn it into epoch seconds?
How can I find the Julian Day?
How do I find yesterday's date?
Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
How do I validate input?
How do I unescape a string?
How do I remove consecutive pairs of characters?
How do I expand function calls in a string?
How do I find matching/nesting anything?
How do I reverse a string?
How do I expand tabs in a string?
How do I reformat a paragraph?
How can I access or change N characters of a string?
How do I change the Nth occurrence of something?
How can I count the number of occurrences of a substring within a string?
How do I capitalize all the words on one line?
How can I split a [character]-delimited string except when inside [character]?
How do I strip blank space from the beginning/end of a string?
How do I pad a string with blanks or pad a number with zeroes?
How do I extract selected columns from a string?
How do I find the soundex value of a string?
How can I expand variables in text strings?
What's wrong with always quoting "$vars"?
Why don't my <<HERE documents work?
What is the difference between a list and an array?
What is the difference between $array[1] and @array[1]?
How can I remove duplicate elements from a list or array?
How can I tell whether a certain element is contained in a list or array?
How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
How do I test whether two arrays or hashes are equal?
How do I find the first array element for which a condition is true?
How do I handle linked lists?
How do I handle circular lists?
How do I shuffle an array randomly?
How do I process/modify each element of an array?
How do I select a random element from an array?
How do I permute N elements of a list?
How do I sort an array by (anything)?
How do I manipulate arrays of bits?
Why does defined() return true on empty arrays and hashes?
How do I process an entire hash?
How do I merge two hashes?
What happens if I add or remove keys from a hash while iterating over it?
How do I look up a hash element by value?
How can I know how many entries are in a hash?
How do I sort a hash (optionally by value instead of key)?
How can I always keep my hash sorted?
What's the difference between "delete" and "undef" with hashes?
Why don't my tied hashes make the defined/exists distinction?
How do I reset an each() operation part-way through?
How can I get the unique keys from two hashes?
How can I store a multidimensional array in a DBM file?
How can I make my hash remember the order I put elements into it?
Why does passing a subroutine an undefined element in a hash create it?
How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
How can I use a reference as a hash key?
How can I check if a key exists in a multilevel hash?
How can I prevent addition of unwanted keys into a hash?
How do I handle binary data correctly?
How do I determine whether a scalar is a number/whole/integer/float?
How do I keep persistent data across program calls?
How do I print out or copy a recursive data structure?
How do I define methods for every class/object?
How do I verify a credit card checksum?
How do I pack arrays of doubles or floats for XS code?
This section deals with I/O and the "f" issues: filehandles, flushing, formats, and footers.
How do I flush/unbuffer an output filehandle? Why must I do this?
How do I change, delete, or insert a line in a file, or append to the beginning of a file?
How do I count the number of lines in a file?
How do I delete the last N lines from a file?
How can I use Perl's -i
option from within a program?
How can I copy a file?
How do I make a temporary file name?
How can I manipulate fixed-record-length files?
How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles?
How can I use a filehandle indirectly?
How can I set up a footer format to be used with write()?
How can I write() into a string?
How can I open a filehandle to a string?
How can I output my numbers with commas added?
How can I translate tildes (~) in a filename?
How come when I open a file read-write it wipes it out?
Why do I sometimes get an "Argument list too long" when I use <*>?
How can I open a file with a leading ">" or trailing blanks?
How can I reliably rename a file?
How can I lock a file?
Why can't I just open(FH, ">file.lock")?
I still don't get locking. I just want to increment the number in the file. How can I do this?
All I want to do is append a small amount of text to the end of a file. Do I still have to use locking?
How do I randomly update a binary file?
How do I get a file's timestamp in perl?
How do I set a file's timestamp in perl?
How do I print to more than one file at once?
How can I read in an entire file all at once?
How can I read in a file by paragraphs?
How can I read a single character from a file? From the keyboard?
How can I tell whether there's a character waiting on a filehandle?
How do I do a tail -f
in perl?
How do I dup() a filehandle in Perl?
How do I close a file descriptor by number?
Why can't I use "C:\temp\foo" in DOS paths? Why doesn't `C:\temp\foo.exe` work?
Why doesn't glob("*.*") get all the files?
Why does Perl let me delete read-only files? Why does -i
clobber protected files? Isn't this a bug in Perl?
How do I select a random line from a file?
Why do I get weird spaces when I print an array of lines?
How do I traverse a directory tree?
How do I delete a directory tree?
How do I copy an entire directory?
This section is surprisingly small because the rest of the FAQ is littered with answers involving regular expressions. For example, decoding a URL and checking whether something is a number can be handled with regular expressions, but those answers are found elsewhere in this document (in perlfaq9 : "How do I decode or create those %-encodings on the web" and perlfaq4 : "How do I determine whether a scalar is a number/whole/integer/float", to be precise).
How can I hope to use regular expressions without creating illegible and unmaintainable code?
I'm having trouble matching over more than one line. What's wrong?
How can I pull out lines between two patterns that are themselves on different lines?
How do I match XML, HTML, or other nasty, ugly things with a regex?
I put a regular expression into $/ but it didn't work. What's wrong?
How do I substitute case-insensitively on the LHS while preserving case on the RHS?
How can I make \w
match national character sets?
How can I match a locale-smart version of /[a-zA-Z]/
?
How can I quote a variable to use in a regex?
What is /o really for?
How do I use a regular expression to strip C-style comments from a file?
Can I use Perl regular expressions to match balanced text?
What does it mean that regexes are greedy? How can I get around it?
How do I process each word on each line?
How can I print out a word-frequency or line-frequency summary?
How can I do approximate matching?
How do I efficiently match many regular expressions at once?
Why don't word-boundary searches with \b
work for me?
Why does using $&, $`, or $' slow my program down?
What good is \G
in a regular expression?
Are Perl regexes DFAs or NFAs? Are they POSIX compliant?
What's wrong with using grep in a void context?
How can I match strings with multibyte characters?
How do I match a regular expression that's in a variable?
This section deals with general Perl language issues that don't clearly fit into any of the other sections.
Can I get a BNF/yacc/RE for the Perl language?
What are all these $@%&* punctuation signs, and how do I know when to use them?
Do I always/never have to quote my strings or use semicolons and commas?
How do I skip some return values?
How do I temporarily block warnings?
What's an extension?
Why do Perl operators have different precedence than C operators?
How do I declare/create a structure?
How do I create a module?
How do I adopt or take over a module already on CPAN?
How do I create a class?
How can I tell if a variable is tainted?
What's a closure?
What is variable suicide and how can I prevent it?
How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regex}?
How do I create a static variable?
What's the difference between dynamic and lexical (static) scoping? Between local() and my()?
How can I access a dynamic variable while a similarly named lexical is in scope?
What's the difference between deep and shallow binding?
Why doesn't "my($foo) = <$fh>;" work right?
How do I redefine a builtin function, operator, or method?
What's the difference between calling a function as &foo and foo()?
How do I create a switch or case statement?
How can I catch accesses to undefined variables, functions, or methods?
Why can't a method included in this same file be found?
How can I find out my current or calling package?
How can I comment out a large block of Perl code?
How do I clear a package?
How can I use a variable as a variable name?
What does "bad interpreter" mean?
This section of the Perl FAQ covers questions involving operating system interaction. Topics include interprocess communication (IPC), control over the user-interface (keyboard, screen and pointing devices), and most anything else not related to data manipulation.
How do I find out which operating system I'm running under?
How come exec() doesn't return?
How do I do fancy stuff with the keyboard/screen/mouse?
How do I print something out in color?
How do I read just one key without waiting for a return key?
How do I check whether input is ready on the keyboard?
How do I clear the screen?
How do I get the screen size?
How do I ask the user for a password?
How do I read and write the serial port?
How do I decode encrypted password files?
How do I start a process in the background?
How do I trap control characters/signals?
How do I modify the shadow password file on a Unix system?
How do I set the time and date?
How can I sleep() or alarm() for under a second?
How can I measure time under a second?
How can I do an atexit() or setjmp()/longjmp()? (Exception handling)
Why doesn't my sockets program work under System V (Solaris)? What does the error message "Protocol not supported" mean?
How can I call my system's unique C functions from Perl?
Where do I get the include files to do ioctl() or syscall()?
Why do setuid perl scripts complain about kernel problems?
How can I open a pipe both to and from a command?
Why can't I get the output of a command with system()?
How can I capture STDERR from an external command?
Why doesn't open() return an error when a pipe open fails?
What's wrong with using backticks in a void context?
How can I call backticks without shell processing?
Why can't my script read from STDIN after I gave it EOF (^D on Unix, ^Z on MS-DOS)?
How can I convert my shell script to perl?
Can I use perl to run a telnet or ftp session?
How can I write expect in Perl?
Is there a way to hide perl's command line from programs such as "ps"?
I {changed directory, modified my environment} in a perl script. How come the change disappeared when I exited the script? How do I get my changes to be visible?
How do I close a process's filehandle without waiting for it to complete?
How do I fork a daemon process?
How do I find out if I'm running interactively or not?
How do I timeout a slow event?
How do I set CPU limits?
How do I avoid zombies on a Unix system?
How do I use an SQL database?
How do I make a system() exit on control-C?
How do I open a file without blocking?
How do I tell the difference between errors from the shell and perl?
How do I install a module from CPAN?
What's the difference between require and use?
How do I keep my own module/library directory?
How do I add the directory my program lives in to the module/library search path?
How do I add a directory to my include path (@INC) at runtime?
What is socket.ph and where do I get it?
This section deals with questions related to running web sites, sending and receiving email as well as general networking.
Should I use a web framework?
Which web framework should I use?
What is Plack and PSGI?
How do I remove HTML from a string?
How do I extract URLs?
How do I fetch an HTML file?
How do I automate an HTML form submission?
How do I decode or create those %-encodings on the web?
How do I redirect to another page?
How do I put a password on my web pages?
How do I make sure users can't enter values into a form that causes my CGI script to do bad things?
How do I parse a mail header?
How do I check a valid mail address?
How do I decode a MIME/BASE64 string?
How do I find the user's mail address?
How do I send email?
How do I use MIME to make an attachment to a mail message?
How do I read email?
How do I find out my hostname, domainname, or IP address?
How do I fetch/put an (S)FTP file?
How can I do RPC in Perl?
Tom Christiansen wrote the original perlfaq then expanded it with the help of Nat Torkington. brian d foy substantialy edited and expanded the perlfaq. perlfaq-workers and others have also supplied feedback, patches and corrections over the years.
Tom Christiansen wrote the original version of this document.
brian d foy <bdfoy@cpan.org>
wrote this version. See the
individual perlfaq documents for additional copyright information.
This document is available under the same terms as Perl itself. Code examples in all the perlfaq documents are in the public domain. Use them as you see fit (and at your own risk with no warranty from anyone).
perlfaq1 - General Questions About Perl
This section of the FAQ answers very general, high-level questions about Perl.
Perl is a high-level programming language with an eclectic heritage written by Larry Wall and a cast of thousands.
Perl's process, file, and text manipulation facilities make it particularly well-suited for tasks involving quick prototyping, system utilities, software tools, system management tasks, database access, graphical programming, networking, and web programming.
Perl derives from the ubiquitous C programming language and to a lesser extent from sed, awk, the Unix shell, and many other tools and languages.
These strengths make it especially popular with web developers and system administrators. Mathematicians, geneticists, journalists, managers and many other people also use Perl.
The original culture of the pre-populist Internet and the deeply-held beliefs of Perl's author, Larry Wall, gave rise to the free and open distribution policy of Perl. Perl is supported by its users. The core, the standard Perl library, the optional modules, and the documentation you're reading now were all written by volunteers.
The core development team (known as the Perl Porters) are a group of highly altruistic individuals committed to producing better software for free than you could hope to purchase for money. You may snoop on pending developments via the archives or read the faq, or you can subscribe to the mailing list by sending perl5-porters-subscribe@perl.org a subscription request (an empty message with no subject is fine).
While the GNU project includes Perl in its distributions, there's no such thing as "GNU Perl". Perl is not produced nor maintained by the Free Software Foundation. Perl's licensing terms are also more open than GNU software's tend to be.
You can get commercial support of Perl if you wish, although for most users the informal support will more than suffice. See the answer to "Where can I buy a commercial version of Perl?" for more information.
(contributed by brian d foy)
There is often a matter of opinion and taste, and there isn't any one answer that fits everyone. In general, you want to use either the current stable release, or the stable release immediately prior to that one. Currently, those are perl5.14.x and perl5.12.x, respectively.
Beyond that, you have to consider several things and decide which is best for you.
If things aren't broken, upgrading perl may break them (or at least issue new warnings).
The latest versions of perl have more bug fixes.
The Perl community is geared toward supporting the most recent releases, so you'll have an easier time finding help for those.
Versions prior to perl5.004 had serious security problems with buffer overflows, and in some cases have CERT advisories (for instance, http://www.cert.org/advisories/CA-1997-17.html ).
The latest versions are probably the least deployed and widely tested, so you may want to wait a few months after their release and see what problems others have if you are risk averse.
The immediate, previous releases (i.e. perl5.8.x ) are usually maintained for a while, although not at the same level as the current releases.
No one is actively supporting Perl 4. Ten years ago it was a dead camel carcass (according to this document). Now it's barely a skeleton as its whitewashed bones have fractured or eroded.
The current leading implementation of Perl 6, Rakudo, released a "useful, usable, 'early adopter'" distribution of Perl 6 (called Rakudo Star) in July of 2010. Please see http://rakudo.org/ for more information.
There are really two tracks of perl development: a maintenance version and an experimental version. The maintenance versions are stable, and have an even number as the minor release (i.e. perl5.10.x, where 10 is the minor release). The experimental versions may include features that don't make it into the stable versions, and have an odd number as the minor release (i.e. perl5.9.x, where 9 is the minor release).
In short, Perl 4 is the parent to both Perl 5 and Perl 6. Perl 5 is the older sibling, and though they are different languages, someone who knows one will spot many similarities in the other.
The number after Perl (i.e. the 5 after Perl 5) is the major release of the perl interpreter as well as the version of the language. Each major version has significant differences that earlier versions cannot support.
The current major release of Perl is Perl 5, first released in 1994. It can run scripts from the previous major release, Perl 4 (March 1991), but has significant differences.
Perl 6 is a reinvention of Perl, it is a language in the same lineage but not compatible. The two are complementary, not mutually exclusive. Perl 6 is not meant to replace Perl 5, and vice versa. See What is Perl 6? below to find out more.
See perlhist for a history of Perl revisions.
Perl 6 was originally described as the community's rewrite of Perl 5. Development started in 2002; syntax and design work continue to this day. As the language has evolved, it has become clear that it is a separate language, incompatible with Perl 5 but in the same language family.
Contrary to popular belief, Perl 6 and Perl 5 peacefully coexist with one another. Perl 6 has proven to be a fascinating source of ideas for those using Perl 5 (the Moose object system is a well-known example). There is overlap in the communities, and this overlap fosters the tradition of sharing and borrowing that have been instrumental to Perl's success. The current leading implementation of Perl 6 is Rakudo, and you can learn more about it at http://rakudo.org.
If you want to learn more about Perl 6, or have a desire to help in the crusade to make Perl a better place then read the Perl 6 developers page at http://www.perl6.org/ and get involved.
"We're really serious about reinventing everything that needs reinventing." --Larry Wall
Production releases, which incorporate bug fixes and new functionality, are widely tested before release. Since the 5.000 release, we have averaged about one production release per year.
The Perl development team occasionally make changes to the internal core of the language, but all possible efforts are made toward backward compatibility.
No, Perl is easy to start learning --and easy to keep learning. It looks like most programming languages you're likely to have experience with, so if you've ever written a C program, an awk script, a shell script, or even a BASIC program, you're already partway there.
Most tasks only require a small subset of the Perl language. One of the guiding mottos for Perl development is "there's more than one way to do it" (TMTOWTDI, sometimes pronounced "tim toady"). Perl's learning curve is therefore shallow (easy to learn) and long (there's a whole lot you can do if you really want).
Finally, because Perl is frequently (but not always, and certainly not by definition) an interpreted language, you can write your programs and test them without an intermediate compilation step, allowing you to experiment and test/debug quickly and easily. This ease of experimentation flattens the learning curve even more.
Things that make Perl easier to learn: Unix experience, almost any kind of programming experience, an understanding of regular expressions, and the ability to understand other people's code. If there's something you need to do, then it's probably already been done, and a working example is usually available for free. Don't forget Perl modules, either. They're discussed in Part 3 of this FAQ, along with CPAN, which is discussed in Part 2.
Perl can be used for almost any coding problem, even ones which require integrating specialist C code for extra speed. As with any tool it can be used well or badly. Perl has many strengths, and a few weaknesses, precisely which areas are good and bad is often a personal choice.
When choosing a language you should also be influenced by the resources, testing culture and community which surrounds it.
For comparisons to a specific language it is often best to create a small project in both languages and compare the results, make sure to use all the resources of each language, as a language is far more than just it's syntax.
Perl is flexible and extensible enough for you to use on virtually any task, from one-line file-processing tasks to large, elaborate systems.
For many people, Perl serves as a great replacement for shell scripting. For others, it serves as a convenient, high-level replacement for most of what they'd program in low-level languages like C or C++. It's ultimately up to you (and possibly your management) which tasks you'll use Perl for and which you won't.
If you have a library that provides an API, you can make any component of it available as just another Perl function or variable using a Perl extension written in C or C++ and dynamically linked into your main perl interpreter. You can also go the other direction, and write your main program in C or C++, and then link in some Perl code on the fly, to create a powerful application. See perlembed.
That said, there will always be small, focused, special-purpose languages dedicated to a specific problem domain that are simply more convenient for certain kinds of problems. Perl tries to be all things to all people, but nothing special to anyone. Examples of specialized languages that come to mind include prolog and matlab.
One good reason is when you already have an existing application written in another language that's all done (and done well), or you have an application language specifically designed for a certain task (e.g. prolog, make).
If you find that you need to speed up a specific part of a Perl application (not something you often need) you may want to use C, but you can access this from your Perl code with perlxs.
"Perl" is the name of the language. Only the "P" is capitalized. The name of the interpreter (the program which runs the Perl script) is "perl" with a lowercase "p".
You may or may not choose to follow this usage. But never write "PERL", because perl is not an acronym.
(contributed by brian d foy)
JAPH stands for "Just another Perl hacker,", which Randal Schwartz used to sign email and usenet messages starting in the late 1980s. He previously used the phrase with many subjects ("Just another x hacker,"), so to distinguish his JAPH, he started to write them as Perl programs:
- print "Just another Perl hacker,";
Other people picked up on this and started to write clever or obfuscated programs to produce the same output, spinning things quickly out of control while still providing hours of amusement for their creators and readers.
CPAN has several JAPH programs at http://www.cpan.org/misc/japh.
(contributed by brian d foy)
Appeal to their self interest! If Perl is new (and thus scary) to them, find something that Perl can do to solve one of their problems. That might mean that Perl either saves them something (time, headaches, money) or gives them something (flexibility, power, testability).
In general, the benefit of a language is closely related to the skill of the people using that language. If you or your team can be faster, better, and stronger through Perl, you'll deliver more value. Remember, people often respond better to what they get out of it. If you run into resistance, figure out what those people get out of the other choice and how Perl might satisfy that requirement.
You don't have to worry about finding or paying for Perl; it's freely available and several popular operating systems come with Perl. Community support in places such as Perlmonks ( http://www.perlmonks.com ) and the various Perl mailing lists ( http://lists.perl.org ) means that you can usually get quick answers to your problems.
Finally, keep in mind that Perl might not be the right tool for every job. You're a much better advocate if your claims are reasonable and grounded in reality. Dogmatically advocating anything tends to make people discount your message. Be honest about possible disadvantages to your choice of Perl since any choice has trade-offs.
You might find these links useful:
Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.
Irrespective of its distribution, all code examples here are in the public domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required.
perlfaq2 - Obtaining and Learning about Perl
This section of the FAQ answers questions about where to find source and documentation for Perl, support, and related matters.
The standard release of Perl (the one maintained by the Perl development team) is distributed only in source code form. You can find the latest releases at http://www.cpan.org/src/.
Perl builds and runs on a bewildering number of platforms. Virtually all known and current Unix derivatives are supported (perl's native platform), as are other systems like VMS, DOS, OS/2, Windows, QNX, BeOS, OS X, MPE/iX and the Amiga.
Binary distributions for some proprietary platforms can be found http://www.cpan.org/ports/ directory. Because these are not part of the standard distribution, they may and in fact do differ from the base perl port in a variety of ways. You'll have to check their respective release notes to see just what the differences are. These differences can be either positive (e.g. extensions for the features of the particular platform that are not supported in the source release of perl) or negative (e.g. might be based upon a less current source release of perl).
See CPAN Ports
For Windows, use a binary version of Perl, Strawberry Perl and ActivePerl come with a bundled C compiler.
Otherwise if you really do want to build Perl, you need to get a
binary version of gcc
for your system first. Use a search
engine to find out how to do this for your operating system.
That's probably because you forgot libraries, or library paths differ.
You really should build the whole distribution on the machine it will
eventually live on, and then type make install
. Most other
approaches are doomed to failure.
One simple way to check that things are in the right place is to print out
the hard-coded @INC
that perl looks through for libraries:
- % perl -le 'print for @INC'
If this command lists any paths that don't exist on your system, then you
may need to move the appropriate libraries to these locations, or create
symbolic links, aliases, or shortcuts appropriately. @INC
is also printed as
part of the output of
- % perl -V
You might also want to check out How do I keep my own module/library directory? in perlfaq8.
Read the INSTALL file, which is part of the source distribution.
It describes in detail how to cope with most idiosyncrasies that the
Configure
script can't work around for any given system or
architecture.
CPAN stands for Comprehensive Perl Archive Network, a multi-gigabyte archive replicated on hundreds of machines all over the world. CPAN contains tens of thousands of modules and extensions, source code and documentation, designed for everything from commercial database interfaces to keyboard/screen control and running large web sites.
You can search CPAN on http://metacpan.org or http://search.cpan.org/.
The master web site for CPAN is http://www.cpan.org/, http://www.cpan.org/SITES.html lists all mirrors.
See the CPAN FAQ at http://www.cpan.org/misc/cpan-faq.html for answers to the most frequently asked questions about CPAN.
The Task::Kensho module has a list of recommended modules which you should review as a good starting point.
The complete Perl documentation is available with the Perl distribution.
If you have Perl installed locally, you probably have the documentation
installed as well: type perldoc perl
in a terminal or
view online.
(Some operating system distributions may ship the documentation in a different
package; for instance, on Debian, you need to install the perl-doc
package.)
Many good books have been written about Perl--see the section later in perlfaq2 for more details.
Perl.com used to be part of the O'Reilly Network, a subsidiary of O'Reilly Media. Although it retains most of the original content from its O'Reilly Network, it is now hosted by The Perl Foundation.
The Perl Foundation is an advocacy organization for the Perl language which maintains the web site http://www.perl.org/ as a general advocacy site for the Perl language. It uses the domain to provide general support services to the Perl community, including the hosting of mailing lists, web sites, and other services. There are also many other sub-domains for special topics like learning Perl and jobs in Perl, such as:
Perl Mongers uses the pm.org domain for services related to local Perl user groups, including the hosting of mailing lists and web sites. See the Perl Mongers web site for more information about joining, starting, or requesting services for a Perl user group.
CPAN, or the Comprehensive Perl Archive Network http://www.cpan.org/, is a replicated, worldwide repository of Perl software. See What is CPAN?.
There are many Perl mailing lists for various topics, specifically the beginners list may be of use.
Other places to ask questions are on the PerlMonks site or stackoverflow.
There are many good books on Perl.
There's also $foo Magazin, a German magazine dedicated to Perl, at ( http://www.foo-magazin.de ). The Perl-Zeitung is another German-speaking magazine for Perl beginners (see http://perl-zeitung.at.tf ).
Several unix/linux releated magazines frequently includes articles on Perl.
Perl News covers some of the major events in the Perl world, Perl Weekly is a weekly e-mail (and RSS feed) of hand-picked Perl articles.
http://blogs.perl.org/ hosts many Perl blogs, there are also several blog aggregators: Perlsphere and IronMan are two of them.
A comprehensive list of Perl-related mailing lists can be found at http://lists.perl.org/
Perl already is commercial software: it has a license that you can grab and carefully read to your manager. It is distributed in releases and comes in well-defined packages. There is a very large and supportive user community and an extensive literature.
If you still need commercial support ActiveState offers this.
(contributed by brian d foy)
First, ensure that you've found an actual bug. Second, ensure you've found an actual bug.
If you've found a bug with the perl interpreter or one of the modules in the standard library (those that come with Perl), you can use the perlbug utility that comes with Perl (>= 5.004). It collects information about your installation to include with your message, then sends the message to the right place.
To determine if a module came with your version of Perl, you can install and use the Module::CoreList module. It has the information about the modules (with their versions) included with each release of Perl.
Every CPAN module has a bug tracker set up in RT, http://rt.cpan.org. You can submit bugs to RT either through its web interface or by email. To email a bug report, send it to bug-<distribution-name>@rt.cpan.org . For example, if you wanted to report a bug in Business::ISBN, you could send a message to bug-Business-ISBN@rt.cpan.org .
Some modules might have special reporting requirements, such as a Github or Google Code tracking system, so you should check the module documentation too.
Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.
Irrespective of its distribution, all code examples here are in the public domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required.
perlfaq3 - Programming Tools
This section of the FAQ answers questions related to programmer tools and programming support.
Have you looked at CPAN (see perlfaq2)? The chances are that someone has already written a module that can solve your problem. Have you read the appropriate manpages? Here's a brief index:
http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz (not a man-page but still useful, a collection of various essays on Perl techniques)
A crude table of contents for the Perl manpage set is found in perltoc.
The typical approach uses the Perl debugger, described in the perldebug(1) manpage, on an "empty" program, like this:
- perl -de 42
Now just type in any legal Perl code, and it will be immediately evaluated. You can also examine the symbol table, get stack backtraces, check variable values, set breakpoints, and other operations typically found in symbolic debuggers.
You can also use Devel::REPL which is an interactive shell for Perl, commonly known as a REPL - Read, Evaluate, Print, Loop. It provides various handy features.
From the command line, you can use the cpan
command's -l
switch:
- $ cpan -l
You can also use cpan
's -a
switch to create an autobundle file
that CPAN.pm
understands and can use to re-install every module:
- $ cpan -a
Inside a Perl program, you can use the ExtUtils::Installed module to show all installed distributions, although it can take awhile to do its magic. The standard library which comes with Perl just shows up as "Perl" (although you can get those with Module::CoreList).
If you want a list of all of the Perl module filenames, you can use File::Find::Rule:
If you do not have that module, you can do the same thing with File::Find which is part of the standard library:
If you simply need to check quickly to see if a module is available, you can check for its documentation. If you can read the documentation the module is most likely installed. If you cannot read the documentation, the module might not have any (in rare cases):
- $ perldoc Module::Name
You can also try to include the module in a one-liner to see if perl finds it:
- $ perl -MModule::Name -e1
(If you don't receive a "Can't locate ... in @INC" error message, then Perl found the module name you asked for.)
(contributed by brian d foy)
Before you do anything else, you can help yourself by ensuring that you let Perl tell you about problem areas in your code. By turning on warnings and strictures, you can head off many problems before they get too big. You can find out more about these in strict and warnings.
Beyond that, the simplest debugger is the print function. Use it
to look at values as you run your program:
- print STDERR "The value is [$value]\n";
The Data::Dumper module can pretty-print Perl data structures:
Perl comes with an interactive debugger, which you can start with the
-d
switch. It's fully explained in perldebug.
If you'd like a graphical user interface and you have Tk, you can use
ptkdb
. It's on CPAN and available for free.
If you need something much more sophisticated and controllable, Leon
Brocard's Devel::ebug (which you can call with the -D
switch as -Debug
)
gives you the programmatic hooks into everything you need to write your
own (without too much pain and suffering).
You can also use a commercial debugger such as Affrus (Mac OS X), Komodo from Activestate (Windows and Mac OS X), or EPIC (most platforms).
(contributed by brian d foy, updated Fri Jul 25 12:22:26 PDT 2008)
The Devel
namespace has several modules which you can use to
profile your Perl programs.
The Devel::NYTProf (New York Times Profiler) does both statement
and subroutine profiling. It's available from CPAN and you also invoke
it with the -d
switch:
- perl -d:NYTProf some_perl.pl
It creates a database of the profile information that you can turn into
reports. The nytprofhtml
command turns the data into an HTML report
similar to the Devel::Cover report:
- nytprofhtml
You might also be interested in using the Benchmark to measure and compare code snippets.
You can read more about profiling in Programming Perl, chapter 20, or Mastering Perl, chapter 5.
perldebguts documents creating a custom debugger if you need to create a special sort of profiler. brian d foy describes the process in The Perl Journal, "Creating a Perl Debugger", http://www.ddj.com/184404522 , and "Profiling in Perl" http://www.ddj.com/184404580 .
Perl.com has two interesting articles on profiling: "Profiling Perl", by Simon Cozens, http://www.perl.com/lpt/a/850 and "Debugging and Profiling mod_perl Applications", by Frank Wiles, http://www.perl.com/pub/a/2006/02/09/debug_mod_perl.html .
Randal L. Schwartz writes about profiling in "Speeding up Your Perl Programs" for Unix Review, http://www.stonehenge.com/merlyn/UnixReview/col49.html , and "Profiling in Template Toolkit via Overriding" for Linux Magazine, http://www.stonehenge.com/merlyn/LinuxMag/col75.html .
The B::Xref module can be used to generate cross-reference reports for Perl programs.
- perl -MO=Xref[,OPTIONS] scriptname.plx
Perl::Tidy comes with a perl script perltidy which indents and reformats Perl scripts to make them easier to read by trying to follow the rules of the perlstyle. If you write Perl, or spend much time reading Perl, you will probably find it useful.
Of course, if you simply follow the guidelines in perlstyle, you shouldn't need to reformat. The habit of formatting your code as you write it will help prevent bugs. Your editor can and should help you with this. The perl-mode or newer cperl-mode for emacs can provide remarkable amounts of help with most (but not all) code, and even less programmable editors can provide significant assistance. Tom Christiansen and many other VI users swear by the following settings in vi and its clones:
- set ai sw=4
- map! ^O {^M}^[O^T
Put that in your .exrc file (replacing the caret characters with control characters) and away you go. In insert mode, ^T is for indenting, ^D is for undenting, and ^O is for blockdenting--as it were. A more complete example, with comments, can be found at http://www.cpan.org/authors/id/TOMC/scripts/toms.exrc.gz
Perl programs are just plain text, so any editor will do.
If you're on Unix, you already have an IDE--Unix itself. The Unix philosophy is the philosophy of several small tools that each do one thing and do it well. It's like a carpenter's toolbox.
If you want an IDE, check the following (in alphabetical order, not order of preference):
The Eclipse Perl Integration Project integrates Perl editing/debugging with Eclipse.
Perl Editor by EngInSite is a complete integrated development environment (IDE) for creating, testing, and debugging Perl scripts; the tool runs on Windows 9x/NT/2000/XP or later.
GUI Editor written in Perl using wxWidgets and Scintilla with lots of smaller features. Aims for an UI based on Perls principles like TIMTWTDI and "easy thinkd should be ..".
http://www.ActiveState.com/Products/Komodo/
ActiveState's cross-platform (as of October 2004, that's Windows, Linux, and Solaris), multi-language IDE has Perl support, including a regular expression debugger and remote debugging.
http://open-perl-ide.sourceforge.net/
Open Perl IDE is an integrated development environment for writing and debugging Perl scripts with ActiveState's ActivePerl distribution under Windows 95/98/NT/2000.
OptiPerl is a Windows IDE with simulated CGI environment, including debugger and syntax-highlighting editor.
Padre is cross-platform IDE for Perl written in Perl using wxWidgets to provide a native look and feel. It's open source under the Artistic License. It is one of the newer Perl IDEs.
http://www.solutionsoft.com/perl.htm
PerlBuilder is an integrated development environment for Windows that supports Perl development.
http://helpconsulting.net/visiperl/index.html
From Help Consulting, for Windows.
http://www.activestate.com/Products/Visual_Perl/
Visual Perl is a Visual Studio.NET plug-in from ActiveState.
http://www.zeusedit.com/lookmain.html
Zeus for Windows is another Win32 multi-language editor/IDE that comes with support for Perl.
For editors: if you're on Unix you probably have vi or a vi clone already, and possibly an emacs too, so you may not need to download anything. In any emacs the cperl-mode (M-x cperl-mode) gives you perhaps the best available Perl editing mode in any editor.
If you are using Windows, you can use any editor that lets you work with plain text, such as NotePad or WordPad. Word processors, such as Microsoft Word or WordPerfect, typically do not work since they insert all sorts of behind-the-scenes information, although some allow you to save files as "Text Only". You can also download text editors designed specifically for programming, such as Textpad ( http://www.textpad.com/ ) and UltraEdit ( http://www.ultraedit.com/ ), among others.
If you are using MacOS, the same concerns apply. MacPerl (for Classic environments) comes with a simple editor. Popular external editors are BBEdit ( http://www.bbedit.com/ ) or Alpha ( http://www.his.com/~jguyer/Alpha/Alpha8.html ). MacOS X users can use Unix editors as well.
or a vi clone such as
The following are Win32 multilanguage editor/IDEs that support Perl:
There is also a toyedit Text widget based editor written in Perl that is distributed with the Tk module on CPAN. The ptkdb ( http://ptkdb.sourceforge.net/ ) is a Perl/Tk-based debugger that acts as a development environment of sorts. Perl Composer ( http://perlcomposer.sourceforge.net/ ) is an IDE for Perl/Tk GUI creation.
In addition to an editor/IDE you might be interested in a more powerful shell environment for Win32. Your options include
from the Cygwin package ( http://sources.redhat.com/cygwin/ )
from the MKS Toolkit ( http://www.mkssoftware.com/ ), or the Bourne shell of the U/WIN environment ( http://www.research.att.com/sw/tools/uwin/ )
ftp://ftp.astron.com/pub/tcsh/ , see also http://www.primate.wisc.edu/software/csh-tcsh-book/
MKS and U/WIN are commercial (U/WIN is free for educational and research purposes), Cygwin is covered by the GNU General Public License (but that shouldn't matter for Perl use). The Cygwin, MKS, and U/WIN all contain (in addition to the shells) a comprehensive set of standard Unix toolkit utilities.
If you're transferring text files between Unix and Windows using FTP be sure to transfer them in ASCII mode so the ends of lines are appropriately converted.
On Mac OS the MacPerl Application comes with a simple 32k text editor that behaves like a rudimentary IDE. In contrast to the MacPerl Application the MPW Perl tool can make use of the MPW Shell itself as an editor (with no 32k limit).
is a full Perl development environment with full debugger support ( http://www.latenightsw.com ).
is an editor, written and extensible in Tcl, that nonetheless has built-in support for several popular markup and programming languages, including Perl and HTML ( http://www.his.com/~jguyer/Alpha/Alpha8.html ).
are text editors for Mac OS that have a Perl sensitivity mode ( http://web.barebones.com/ ).
For a complete version of Tom Christiansen's vi configuration file, see http://www.cpan.org/authors/Tom_Christiansen/scripts/toms.exrc.gz , the standard benchmark file for vi emulators. The file runs best with nvi, the current version of vi out of Berkeley, which incidentally can be built with an embedded Perl interpreter--see http://www.cpan.org/src/misc/ .
Since Emacs version 19 patchlevel 22 or so, there have been both a perl-mode.el and support for the Perl debugger built in. These should come with the standard Emacs 19 distribution.
Note that the perl-mode of emacs will have fits with "main'foo"
(single quote), and mess up the indentation and highlighting. You
are probably using "main::foo"
in new Perl code anyway, so this
shouldn't be an issue.
For CPerlMode, see http://www.emacswiki.org/cgi-bin/wiki/CPerlMode
The Curses module from CPAN provides a dynamically loadable object module interface to a curses library. A small demo can be found at the directory http://www.cpan.org/authors/Tom_Christiansen/scripts/rep.gz ; this program repeats a command and updates the screen as needed, rendering rep ps axu similar to top.
(contributed by Ben Morrow)
There are a number of modules which let you write GUIs in Perl. Most GUI toolkits have a perl interface: an incomplete list follows.
This works under Unix and Windows, and the current version doesn't look half as bad under Windows as it used to. Some of the gui elements still don't 'feel' quite right, though. The interface is very natural and 'perlish', making it easy to use in small scripts that just need a simple gui. It hasn't been updated in a while.
This is a Perl binding for the cross-platform wxWidgets toolkit ( http://www.wxwidgets.org ). It works under Unix, Win32 and Mac OS X, using native widgets (Gtk under Unix). The interface follows the C++ interface closely, but the documentation is a little sparse for someone who doesn't know the library, mostly just referring you to the C++ documentation.
These are Perl bindings for the Gtk toolkit ( http://www.gtk.org ). The interface changed significantly between versions 1 and 2 so they have separate Perl modules. It runs under Unix, Win32 and Mac OS X (currently it requires an X server on Mac OS, but a 'native' port is underway), and the widgets look the same on every platform: i.e., they don't match the native widgets. As with Wx, the Perl bindings follow the C API closely, and the documentation requires you to read the C documentation to understand it.
This provides access to most of the Win32 GUI widgets from Perl. Obviously, it only runs under Win32, and uses native widgets. The Perl interface doesn't really follow the C interface: it's been made more Perlish, and the documentation is pretty good. More advanced stuff may require familiarity with the C Win32 APIs, or reference to MSDN.
CamelBones ( http://camelbones.sourceforge.net ) is a Perl interface to Mac OS X's Cocoa GUI toolkit, and as such can be used to produce native GUIs on Mac OS X. It's not on CPAN, as it requires frameworks that CPAN.pm doesn't know how to install, but installation is via the standard OSX package installer. The Perl API is, again, very close to the ObjC API it's wrapping, and the documentation just tells you how to translate from one to the other.
There is a Perl interface to TrollTech's Qt toolkit, but it does not appear to be maintained.
Sx is an interface to the Athena widget set which comes with X, but again it appears not to be much used nowadays.
The best way to do this is to come up with a better algorithm. This can often make a dramatic difference. Jon Bentley's book Programming Pearls (that's not a misspelling!) has some good tips on optimization, too. Advice on benchmarking boils down to: benchmark and profile to make sure you're optimizing the right part, look for better algorithms instead of microtuning your code, and when all else fails consider just buying faster hardware. You will probably want to read the answer to the earlier question "How do I profile my Perl programs?" if you haven't done so already.
A different approach is to autoload seldom-used Perl code. See the AutoSplit and AutoLoader modules in the standard distribution for that. Or you could locate the bottleneck and think about writing just that part in C, the way we used to take bottlenecks in C code and write them in assembler. Similar to rewriting in C, modules that have critical sections can be written in C (for instance, the PDL module from CPAN).
If you're currently linking your perl executable to a shared libc.so, you can often gain a 10-25% performance benefit by rebuilding it to link with a static libc.a instead. This will make a bigger perl executable, but your Perl programs (and programmers) may thank you for it. See the INSTALL file in the source distribution for more information.
The undump program was an ancient attempt to speed up Perl program by storing the already-compiled form to disk. This is no longer a viable option, as it only worked on a few architectures, and wasn't a good solution anyway.
When it comes to time-space tradeoffs, Perl nearly always prefers to throw memory at a problem. Scalars in Perl use more memory than strings in C, arrays take more than that, and hashes use even more. While there's still a lot to be done, recent releases have been addressing these issues. For example, as of 5.004, duplicate hash keys are shared amongst all hashes using them, so require no reallocation.
In some cases, using substr() or vec() to simulate arrays can be highly beneficial. For example, an array of a thousand booleans will take at least 20,000 bytes of space, but it can be turned into one 125-byte bit vector--a considerable memory savings. The standard Tie::SubstrHash module can also help for certain types of data structure. If you're working with specialist data structures (matrices, for instance) modules that implement these in C may use less memory than equivalent Perl modules.
Another thing to try is learning whether your Perl was compiled with
the system malloc or with Perl's builtin malloc. Whichever one it
is, try using the other one and see whether this makes a difference.
Information about malloc is in the INSTALL file in the source
distribution. You can find out whether you are using perl's malloc by
typing perl -V:usemymalloc.
Of course, the best way to save memory is to not do anything to waste it in the first place. Good programming practices can go a long way toward this:
Don't read an entire file into memory if you can process it line by line. Or more concretely, use a loop like this:
instead of this:
When the files you're processing are small, it doesn't much matter which way you do it, but it makes a huge difference when they start getting larger.
Remember that both map and grep expect a LIST argument, so doing this:
- @wanted = grep {/pattern/} <$file_handle>;
will cause the entire file to be slurped. For large files, it's better to loop:
Don't quote large strings unless absolutely necessary:
- my $copy = "$large_string";
makes 2 copies of $large_string (one for $copy and another for the quotes), whereas
- my $copy = $large_string;
only makes one copy.
Ditto for stringifying large arrays:
is much more memory-efficient than either
or
Pass arrays and hashes by reference, not by value. For one thing, it's the only way to pass multiple lists or hashes (or both) in a single call/return. It also avoids creating a copy of all the contents. This requires some judgement, however, because any changes will be propagated back to the original data. If you really want to mangle (er, modify) a copy, you'll have to sacrifice the memory needed to make one.
For "big" data stores (i.e. ones that exceed available memory) consider using one of the DB modules to store it on disk instead of in RAM. This will incur a penalty in access time, but that's probably better than causing your hard disk to thrash due to massive swapping.
Yes. Perl's garbage collection system takes care of this so everything works out right.
(contributed by Michael Carman)
You usually can't. Memory allocated to lexicals (i.e. my() variables) cannot be reclaimed or reused even if they go out of scope. It is reserved in case the variables come back into scope. Memory allocated to global variables can be reused (within your program) by using undef() and/or delete().
On most operating systems, memory allocated to a program can never be returned to the system. That's why long-running programs sometimes re- exec themselves. Some operating systems (notably, systems that use mmap(2) for allocating large chunks of memory) can reclaim memory that is no longer used, but on such systems, perl must be configured and compiled to use the OS's malloc, not perl's.
In general, memory allocation and de-allocation isn't something you can or should be worrying about much in Perl.
See also "How can I make my Perl program take less memory?"
Beyond the normal measures described to make general Perl programs faster or smaller, a CGI program has additional issues. It may be run several times per second. Given that each time it runs it will need to be re-compiled and will often allocate a megabyte or more of system memory, this can be a killer. Compiling into C isn't going to help you because the process start-up overhead is where the bottleneck is.
There are three popular ways to avoid this overhead. One solution involves running the Apache HTTP server (available from http://www.apache.org/ ) with either of the mod_perl or mod_fastcgi plugin modules.
With mod_perl and the Apache::Registry module (distributed with mod_perl), httpd will run with an embedded Perl interpreter which pre-compiles your script and then executes it within the same address space without forking. The Apache extension also gives Perl access to the internal server API, so modules written in Perl can do just about anything a module written in C can. For more on mod_perl, see http://perl.apache.org/
With the FCGI module (from CPAN) and the mod_fastcgi module (available from http://www.fastcgi.com/ ) each of your Perl programs becomes a permanent CGI daemon process.
Finally, Plack is a Perl module and toolkit that contains PSGI middleware, helpers and adapters to web servers, allowing you to easily deploy scripts which can continue running, and provides flexibility with regards to which web server you use. It can allow existing CGI scripts to enjoy this flexibility and performance with minimal changes, or can be used along with modern Perl web frameworks to make writing and deploying web services with Perl a breeze.
These solutions can have far-reaching effects on your system and on the way you write your CGI programs, so investigate them with care.
See also http://www.cpan.org/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/ .
Delete it. :-) Seriously, there are a number of (mostly unsatisfactory) solutions with varying levels of "security".
First of all, however, you can't take away read permission, because the source code has to be readable in order to be compiled and interpreted. (That doesn't mean that a CGI script's source is readable by people on the web, though--only by people with access to the filesystem.) So you have to leave the permissions at the socially friendly 0755 level.
Some people regard this as a security problem. If your program does insecure things and relies on people not knowing how to exploit those insecurities, it is not secure. It is often possible for someone to determine the insecure things and exploit them without viewing the source. Security through obscurity, the name for hiding your bugs instead of fixing them, is little security indeed.
You can try using encryption via source filters (Starting from Perl 5.8 the Filter::Simple and Filter::Util::Call modules are included in the standard distribution), but any decent programmer will be able to decrypt it. You can try using the byte code compiler and interpreter described later in perlfaq3, but the curious might still be able to de-compile it. You can try using the native-code compiler described later, but crackers might be able to disassemble it. These pose varying degrees of difficulty to people wanting to get at your code, but none can definitively conceal it (true of every language, not just Perl).
It is very easy to recover the source of Perl programs. You simply feed the program to the perl interpreter and use the modules in the B:: hierarchy. The B::Deparse module should be able to defeat most attempts to hide source. Again, this is not unique to Perl.
If you're concerned about people profiting from your code, then the bottom line is that nothing but a restrictive license will give you legal security. License your software and pepper it with threatening statements like "This is unpublished proprietary software of XYZ Corp. Your access to it does not give you permission to use it blah blah blah." We are not lawyers, of course, so you should see a lawyer if you want to be sure your license's wording will stand up in court.
(contributed by brian d foy)
In general, you can't do this. There are some things that may work for your situation though. People usually ask this question because they want to distribute their works without giving away the source code, and most solutions trade disk space for convenience. You probably won't see much of a speed increase either, since most solutions simply bundle a Perl interpreter in the final product (but see How can I make my Perl program run faster?).
The Perl Archive Toolkit ( http://par.perl.org/ ) is Perl's analog to Java's JAR. It's freely available and on CPAN ( http://search.cpan.org/dist/PAR/ ).
There are also some commercial products that may work for you, although you have to buy a license for them.
The Perl Dev Kit ( http://www.activestate.com/Products/Perl_Dev_Kit/ ) from ActiveState can "Turn your Perl programs into ready-to-run executables for HP-UX, Linux, Solaris and Windows."
Perl2Exe ( http://www.indigostar.com/perl2exe.htm ) is a command line program for converting perl scripts to executable files. It targets both Windows and Unix platforms.
#!perl
to work on [MS-DOS,NT,...]?For OS/2 just use
- extproc perl -S -your_switches
as the first line in *.cmd file (-S
due to a bug in cmd.exe's
"extproc" handling). For DOS one should first invent a corresponding
batch file and codify it in ALTERNATE_SHEBANG
(see the
dosish.h file in the source distribution for more information).
The Win95/NT installation, when using the ActiveState port of Perl,
will modify the Registry to associate the .pl extension with the
perl interpreter. If you install another port, perhaps even building
your own Win95/NT Perl from the standard sources by using a Windows port
of gcc (e.g., with cygwin or mingw32), then you'll have to modify
the Registry yourself. In addition to associating .pl with the
interpreter, NT people can use: SET PATHEXT=%PATHEXT%;.PL to let them
run the program install-linux.pl
merely by typing install-linux
.
Under "Classic" MacOS, a perl program will have the appropriate Creator and
Type, so that double-clicking them will invoke the MacPerl application.
Under Mac OS X, clickable apps can be made from any #!
script using Wil
Sanchez' DropScript utility: http://www.wsanchez.net/software/ .
IMPORTANT!: Whatever you do, PLEASE don't get frustrated, and just throw the perl interpreter into your cgi-bin directory, in order to get your programs working for a web server. This is an EXTREMELY big security risk. Take the time to figure out how to do it correctly.
Yes. Read perlrun for more information. Some examples follow. (These assume standard Unix shell quoting rules.)
- # sum first and last fields
- perl -lane 'print $F[0] + $F[-1]' *
- # identify text files
- perl -le 'for(@ARGV) {print if -f && -T _}' *
- # remove (most) comments from C program
- perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
- # make file a month younger than today, defeating reaper daemons
- perl -e '$X=24*60*60; utime(time(),time() + 30 * $X,@ARGV)' *
- # find first unused uid
- perl -le '$i++ while getpwuid($i); print $i'
- # display reasonable manpath
- echo $PATH | perl -nl -072 -e '
- s![^/+]*$!man!&&-d&&!$s{$_}++&&push@m,$_;END{print"@m"}'
OK, the last one was actually an Obfuscated Perl Contest entry. :-)
The problem is usually that the command interpreters on those systems have rather different ideas about quoting than the Unix shells under which the one-liners were created. On some systems, you may have to change single-quotes to double ones, which you must NOT do on Unix or Plan9 systems. You might also have to change a single % to a %%.
For example:
- # Unix (including Mac OS X)
- perl -e 'print "Hello world\n"'
- # DOS, etc.
- perl -e "print \"Hello world\n\""
- # Mac Classic
- print "Hello world\n"
- (then Run "Myscript" or Shift-Command-R)
- # MPW
- perl -e 'print "Hello world\n"'
- # VMS
- perl -e "print ""Hello world\n"""
The problem is that none of these examples are reliable: they depend on the command interpreter. Under Unix, the first two often work. Under DOS, it's entirely possible that neither works. If 4DOS was the command shell, you'd probably have better luck like this:
- perl -e "print <Ctrl-x>"Hello world\n<Ctrl-x>""
Under the Mac, it depends which environment you are using. The MacPerl shell, or MPW, is much like Unix shells in its support for several quoting variants, except that it makes free use of the Mac's non-ASCII characters as control characters.
Using qq(), q(), and qx(), instead of "double quotes", 'single quotes', and `backticks`, may make one-liners easier to write.
There is no general solution to all of this. It is a mess.
[Some of this answer was contributed by Kenneth Albanowski.]
For modules, get the CGI or LWP modules from CPAN. For textbooks, see the two especially dedicated to web stuff in the question on books. For problems and questions related to the web, like "Why do I get 500 Errors" or "Why doesn't it run from the browser right when it runs fine on the command line", see the troubleshooting guides and references in perlfaq9 or in the CGI MetaFAQ:
Looking in to Plack and modern Perl web frameworks is highly recommended, though; web programming in Perl has evolved a long way from the old days of simple CGI scripts.
A good place to start is perltoot, and you can use perlobj, perlboot, perltoot, perltooc, and perlbot for reference.
A good book on OO on Perl is the "Object-Oriented Perl" by Damian Conway from Manning Publications, or "Intermediate Perl" by Randal Schwartz, brian d foy, and Tom Phoenix from O'Reilly Media.
If you want to call C from Perl, start with perlxstut, moving on to perlxs, xsubpp, and perlguts. If you want to call Perl from C, then read perlembed, perlcall, and perlguts. Don't forget that you can learn a lot from looking at how the authors of existing extension modules wrote their code and solved their problems.
You might not need all the power of XS. The Inline::C module lets you put C code directly in your Perl source. It handles all the magic to make it work. You still have to learn at least some of the perl API but you won't have to deal with the complexity of the XS support files.
Download the ExtUtils::Embed kit from CPAN and run `make test'. If
the tests pass, read the pods again and again and again. If they
fail, see perlbug and send a bug report with the output of
make test TEST_VERBOSE=1
along with perl -V
.
A complete list of Perl's error messages and warnings with explanatory text can be found in perldiag. You can also use the splain program (distributed with Perl) to explain the error messages:
- perl program 2>diag.out
- splain [-v] [-p] diag.out
or change your program to explain the messages for you:
- use diagnostics;
or
- use diagnostics -verbose;
(contributed by brian d foy)
The ExtUtils::MakeMaker module, better known simply as "MakeMaker",
turns a Perl script, typically called Makefile.PL
, into a Makefile.
The Unix tool make
uses this file to manage dependencies and actions
to process and install a Perl distribution.
Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.
Irrespective of its distribution, all code examples here are in the public domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required.
perlfaq4 - Data Manipulation
This section of the FAQ answers questions related to manipulating numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
For the long explanation, see David Goldberg's "What Every Computer Scientist Should Know About Floating-Point Arithmetic" (http://web.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf).
Internally, your computer represents floating-point numbers in binary. Digital (as in powers of two) computers cannot store all numbers exactly. Some real numbers lose precision in the process. This is a problem with how computers store numbers and affects all computer languages, not just Perl.
perlnumber shows the gory details of number representations and conversions.
To limit the number of decimal places in your numbers, you can use the
printf or sprintf function. See
Floating-point Arithmetic in perlop for more details.
Your int() is most probably working just fine. It's the numbers that
aren't quite what you think.
First, see the answer to "Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?".
For example, this
will in most computers print 0, not 1, because even such simple numbers as 0.6 and 0.2 cannot be presented exactly by floating-point numbers. What you think in the above as 'three' is really more like 2.9999999999999995559.
(contributed by brian d foy)
You're probably trying to convert a string to a number, which Perl only converts as a decimal number. When Perl converts a string to a number, it ignores leading spaces and zeroes, then assumes the rest of the digits are in base 10:
This problem usually involves one of the Perl built-ins that has the
same name a Unix command that uses octal numbers as arguments on the
command line. In this example, chmod on the command line knows that
its first argument is octal because that's what it does:
- %prompt> chmod 644 file
If you want to use the same literal digits (644) in Perl, you have to tell
Perl to treat them as octal numbers either by prefixing the digits with
a 0
or using oct:
The problem comes in when you take your numbers from something that Perl
thinks is a string, such as a command line argument in @ARGV
:
You can always check the value you're using by printing it in octal notation to ensure it matches what you think it should be. Print it in octal and decimal format:
- printf "0%o %d", $number, $number;
Remember that int() merely truncates toward 0. For rounding to a
certain number of digits, sprintf() or printf() is usually the
easiest route.
- printf("%.3f", 3.1415926535); # prints 3.142
The POSIX module (part of the standard Perl distribution)
implements ceil()
, floor()
, and a number of other mathematical
and trigonometric functions.
In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex module. With 5.004, the Math::Trig module (part of the standard Perl distribution) implements the trigonometric functions. Internally it uses the Math::Complex module and some functions can break out from the real axis into the complex plane, for example the inverse sine of 2.
Rounding in financial applications can have serious implications, and the rounding method used should be specified precisely. In these cases, it probably pays not to trust whichever system of rounding is being used by Perl, but instead to implement the rounding function you need yourself.
To see why, notice how you'll still have an issue on half-way-point alternation:
Don't blame Perl. It's the same as in C. IEEE says we have to do this. Perl numbers whose absolute values are integers under 2**31 (on 32-bit machines) will work pretty much like mathematical integers. Other numbers are not guaranteed.
As always with Perl there is more than one way to do it. Below are a few examples of approaches to making common conversions between number representations. This is intended to be representational rather than exhaustive.
Some of the examples later in perlfaq4 use the Bit::Vector module from CPAN. The reason you might choose Bit::Vector over the perl built-in functions is that it works with numbers of ANY size, that it is optimized for speed on some operations, and for at least some programmers the notation might be familiar.
Using perl's built in conversion of 0x
notation:
- my $dec = 0xDEADBEEF;
Using the hex function:
Using pack:
Using the CPAN module Bit::Vector
:
Using sprintf:
Using unpack:
Using Bit::Vector:
And Bit::Vector supports odd bit counts:
Using Perl's built in conversion of numbers with leading zeros:
- my $dec = 033653337357; # note the leading 0!
Using the oct function:
Using Bit::Vector:
Using sprintf:
Using Bit::Vector:
Perl 5.6 lets you write binary numbers directly with
the 0b notation:
- my $number = 0b10110110;
Using oct:
Using pack and unpack for larger strings:
Using Bit::Vector:
Using sprintf (perl 5.6+):
Using unpack:
Using Bit::Vector:
The remaining transformations (e.g. hex -> oct, bin -> hex, etc.) are left as an exercise to the inclined reader.
The behavior of binary arithmetic operators depends on whether they're
used on numbers or strings. The operators treat a string as a series
of bits and work with that (the string "3"
is the bit pattern
00110011
). The operators work with the binary form of a number
(the number 3
is treated as the bit pattern 00000011
).
So, saying 11 & 3
performs the "and" operation on numbers (yielding
3
). Saying "11" & "3"
performs the "and" operation on strings
(yielding "1"
).
Most problems with &
and | arise because the programmer thinks
they have a number but really it's a string or vice versa. To avoid this,
stringify the arguments explicitly (using ""
or qq()) or convert them
to numbers explicitly (using 0+$arg
). The rest arise because
the programmer says:
- if ("\020\020" & "\101\101") {
- # ...
- }
but a string consisting of two null bytes (the result of "\020\020"
& "\101\101"
) is not a false value in Perl. You need:
- if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
- # ...
- }
Use the Math::Matrix or Math::MatrixReal modules (available from CPAN) or the PDL extension (also available from CPAN).
To call a function on each element in an array, and collect the results, use:
For example:
To call a function on each element of an array, but ignore the results:
To call a function on each integer in a (small) range, you can use:
but you should be aware that in this form, the ..
operator
creates a list of all integers in the range, which can take a lot of
memory for large ranges. However, the problem does not occur when
using ..
within a for
loop, because in that case the range
operator is optimized to iterate over the range, without creating
the entire list. So
or even
will not create an intermediate list of 500,000 integers.
Get the http://www.cpan.org/modules/by-module/Roman module.
If you're using a version of Perl before 5.004, you must call srand
once at the start of your program to seed the random number generator.
5.004 and later automatically call srand at the beginning. Don't
call srand more than once--you make your numbers less random,
rather than more.
Computers are good at being predictable and bad at being random (despite appearances caused by bugs in your programs :-). The random article in the "Far More Than You Ever Wanted To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz, courtesy of Tom Phoenix, talks more about this. John von Neumann said, "Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin."
Perl relies on the underlying system for the implementation of
rand and srand; on some systems, the generated numbers are
not random enough (especially on Windows : see
http://www.perlmonks.org/?node_id=803632).
Several CPAN modules in the Math
namespace implement better
pseudorandom generators; see for example
Math::Random::MT ("Mersenne Twister", fast), or
Math::TrulyRandom (uses the imperfections in the system's
timer to generate random numbers, which is rather slow).
More algorithms for random numbers are described in
"Numerical Recipes in C" at http://www.nr.com/
To get a random number between two values, you can use the rand()
built-in to get a random number between 0 and 1. From there, you shift
that into the range that you want.
rand($x) returns a number such that 0 <= rand($x) < $x
. Thus
what you want to have perl figure out is a random number in the range
from 0 to the difference between your X and Y.
That is, to get a number between 10 and 15, inclusive, you want a random number between 0 and 5 that you can then add to 10.
Hence you derive the following simple function to abstract
that. It selects a random integer between the two given
integers (inclusive), For example: random_int_between(50,120)
.
The day of the year is in the list returned
by the localtime function. Without an
argument localtime uses the current time.
The POSIX module can also format a date as the day of the year or week of the year.
To get the day of year for any date, use POSIX's mktime
to get
a time in epoch seconds for the argument to localtime.
You can also use Time::Piece, which comes with Perl and provides a
localtime that returns an object:
The Date::Calc module provides two functions to calculate these, too:
Use the following simple functions:
On some systems, the POSIX module's strftime()
function has been
extended in a non-standard way to use a %C
format, which they
sometimes claim is the "century". It isn't, because on most such
systems, this is only the first two digits of the four-digit year, and
thus cannot be used to determine reliably the current century or
millennium.
(contributed by brian d foy)
You could just store all your dates as a number and then subtract. Life isn't always that simple though.
The Time::Piece module, which comes with Perl, replaces localtime with a version that returns an object. It also overloads the comparison operators so you can compare them directly:
You can also get differences with a subtraction, which returns a Time::Seconds object:
If you want to work with formatted dates, the Date::Manip, Date::Calc, or DateTime modules can help you.
If it's a regular enough string that it always has the same format,
you can split it up and pass the parts to timelocal
in the standard
Time::Local module. Otherwise, you should look into the Date::Calc,
Date::Parse, and Date::Manip modules from CPAN.
(contributed by brian d foy and Dave Cross)
You can use the Time::Piece module, part of the Standard Library, which can convert a date/time to a Julian Day:
- $ perl -MTime::Piece -le 'print localtime->julian_day'
- 2455607.7959375
Or the modified Julian Day:
- $ perl -MTime::Piece -le 'print localtime->mjd'
- 55607.2961226851
Or even the day of the year (which is what some people think of as a Julian day):
- $ perl -MTime::Piece -le 'print localtime->yday'
- 45
You can also do the same things with the DateTime module:
You can use the Time::JulianDay module available on CPAN. Ensure that you really want to find a Julian day, though, as many people have different ideas about Julian days (see http://www.hermetic.ch/cal_stud/jdn.htm for instance):
- $ perl -MTime::JulianDay -le 'print local_julian_day( time )'
- 55608
(contributed by brian d foy)
To do it correctly, you can use one of the Date
modules since they
work with calendars instead of times. The DateTime module makes it
simple, and give you the same time of day, only the day before,
despite daylight saving time changes:
You can also use the Date::Calc module using its Today_and_Now
function.
Most people try to use the time rather than the calendar to figure out dates, but that assumes that days are twenty-four hours each. For most people, there are two days a year when they aren't: the switch to and from summer time throws this off. For example, the rest of the suggestions will be wrong sometimes:
Starting with Perl 5.10, Time::Piece and Time::Seconds are part of the standard distribution, so you might think that you could do something like this:
The Time::Piece module exports a new localtime that returns an
object, and Time::Seconds exports the ONE_DAY
constant that is a
set number of seconds. This means that it always gives the time 24
hours ago, which is not always yesterday. This can cause problems
around the end of daylight saving time when there's one day that is 25
hours long.
You have the same problem with Time::Local, which will give the wrong answer for those same special cases:
(contributed by brian d foy)
Perl itself never had a Y2K problem, although that never stopped people
from creating Y2K problems on their own. See the documentation for
localtime for its proper use.
Starting with Perl 5.12, localtime and gmtime can handle dates past
03:14:08 January 19, 2038, when a 32-bit based time would overflow. You
still might get a warning on a 32-bit perl
:
- % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
- Integer overflow in hexadecimal number at -e line 1.
- Wed Nov 1 19:42:39 5576711
On a 64-bit perl
, you can get even larger dates for those really long
running projects:
- % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
- Thu Nov 2 00:42:39 5576711
You're still out of luck if you need to keep track of decaying protons though.
(contributed by brian d foy)
There are many ways to ensure that values are what you expect or want to accept. Besides the specific examples that we cover in the perlfaq, you can also look at the modules with "Assert" and "Validate" in their names, along with other modules such as Regexp::Common.
Some modules have validation for particular types of input, such as Business::ISBN, Business::CreditCard, Email::Valid, and Data::Validate::IP.
It depends just what you mean by "escape". URL escapes are dealt
with in perlfaq9. Shell escapes with the backslash (\
)
character are removed with
- s/\\(.)/$1/g;
This won't expand "\n"
or "\t"
or any other special escapes.
(contributed by brian d foy)
You can use the substitution operator to find pairs of characters (or
runs of characters) and replace them with a single instance. In this
substitution, we find a character in (.). The memory parentheses
store the matched character in the back-reference \g1
and we use
that to require that the same thing immediately follow it. We replace
that part of the string with the character in $1
.
- s/(.)\g1/$1/g;
We can also use the transliteration operator, tr///. In this
example, the search list side of our tr/// contains nothing, but
the c
option complements that so it contains everything. The
replacement list also contains nothing, so the transliteration is
almost a no-op since it won't do any replacements (or more exactly,
replace the character with itself). However, the s option squashes
duplicated and consecutive characters in the string so a character
does not show up next to itself
- my $str = 'Haarlem'; # in the Netherlands
- $str =~ tr///cs; # Now Harlem, like in New York
(contributed by brian d foy)
This is documented in perlref, and although it's not the easiest thing to read, it does work. In each of these examples, we call the function inside the braces used to dereference a reference. If we have more than one return value, we can construct and dereference an anonymous array. In this case, we call the function in list context.
- print "The time values are @{ [localtime] }.\n";
If we want to call the function in scalar context, we have to do a bit
more work. We can really have any code we like inside the braces, so
we simply have to end with the scalar reference, although how you do
that is up to you, and you can use code inside the braces. Note that
the use of parens creates a list context, so we need scalar to
force the scalar context on the function:
If your function already returns a reference, you don't need to create the reference yourself.
The Interpolation
module can also do a lot of magic for you. You can
specify a variable name, in this case E
, to set up a tied hash that
does the interpolation for you. It has several other methods to do this
as well.
In most cases, it is probably easier to simply use string concatenation, which also forces scalar context.
To find something between two single
characters, a pattern like /x([^x]*)x/
will get the intervening
bits in $1. For multiple ones, then something more like
/alpha(.*?)omega/
would be needed. For nested patterns
and/or balanced expressions, see the so-called
(?PARNO)
construct (available since perl 5.10).
The CPAN module Regexp::Common can help to build such
regular expressions (see in particular
Regexp::Common::balanced and Regexp::Common::delimited).
More complex cases will require to write a parser, probably using a parsing module from CPAN, like Regexp::Grammars, Parse::RecDescent, Parse::Yapp, Text::Balanced, or Marpa::XS.
Use reverse() in scalar context, as documented in
reverse.
You can do it yourself:
- 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
Or you can just use the Text::Tabs module (part of the standard Perl distribution).
Use Text::Wrap (part of the standard Perl distribution):
The paragraphs you give to Text::Wrap should not contain embedded newlines. Text::Wrap doesn't justify the lines (flush-right).
Or use the CPAN module Text::Autoformat. Formatting files can be easily done by making a shell alias, like so:
- alias fmt="perl -i -MText::Autoformat -n0777 \
- -e 'print autoformat $_, {all=>1}' $*"
See the documentation for Text::Autoformat to appreciate its many capabilities.
You can access the first characters of a string with substr(). To get the first character, for example, start at position 0 and grab the string of length 1.
To change part of a string, you can use the optional fourth argument which is the replacement string.
- substr( $string, 13, 4, "Perl 5.8.0" );
You can also use substr() as an lvalue.
- substr( $string, 13, 4 ) = "Perl 5.8.0";
You have to keep track of N yourself. For example, let's say you want
to change the fifth occurrence of "whoever"
or "whomever"
into
"whosoever"
or "whomsoever"
, case insensitively. These
all assume that $_ contains the string to be altered.
- $count = 0;
- s{((whom?)ever)}{
- ++$count == 5 # is it the 5th?
- ? "${2}soever" # yes, swap
- : $1 # renege and leave it there
- }ige;
In the more general case, you can use the /g modifier in a while
loop, keeping count of matches.
- $WANT = 3;
- $count = 0;
- $_ = "One fish two fish red fish blue fish";
- while (/(\w+)\s+fish\b/gi) {
- if (++$count == $WANT) {
- print "The third fish is a $1 one.\n";
- }
- }
That prints out: "The third fish is a red one."
You can also use a
repetition count and repeated pattern like this:
- /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
There are a number of ways, with varying efficiency. If you want a
count of a certain single character (X) within a string, you can use the
tr/// function like so:
This is fine if you are just looking for a single character. However,
if you are trying to count multiple character substrings within a
larger string, tr/// won't work. What you can do is wrap a while()
loop around a global pattern match. For example, let's count negative
integers:
Another version uses a global match in list context, then assigns the result to a scalar, producing a count of the number of matches.
- my $count = () = $string =~ /-\d+/g;
(contributed by brian d foy)
Damian Conway's Text::Autoformat handles all of the thinking for you.
How do you want to capitalize those words?
- FRED AND BARNEY'S LODGE # all uppercase
- Fred And Barney's Lodge # title case
- Fred and Barney's Lodge # highlight case
It's not as easy a problem as it looks. How many words do you think
are in there? Wait for it... wait for it.... If you answered 5
you're right. Perl words are groups of \w+
, but that's not what
you want to capitalize. How is Perl supposed to know not to capitalize
that s after the apostrophe? You could try a regular expression:
- $string =~ s/ (
- (^\w) #at the beginning of the line
- | # or
- (\s\w) #preceded by whitespace
- )
- /\U$1/xg;
- $string =~ s/([\w']+)/\u\L$1/g;
Now, what if you don't want to capitalize that "and"? Just use Text::Autoformat and get on with the next problem. :)
Several modules can handle this sort of parsing--Text::Balanced, Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
Take the example case of trying to split a string that is
comma-separated into its different fields. You can't use split(/,/)
because you shouldn't split if the comma is inside quotes. For
example, take a data line like this:
- SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex
problem. Thankfully, we have Jeffrey Friedl, author of
Mastering Regular Expressions, to handle these for us. He
suggests (assuming your string is contained in $text
):
If you want to represent quotation marks inside a
quotation-mark-delimited field, escape them with backslashes (eg,
"like \"this\""
.
Alternatively, the Text::ParseWords module (part of the standard Perl distribution) lets you say:
- use Text::ParseWords;
- @new = quotewords(",", 0, $text);
For parsing or generating CSV, though, using Text::CSV rather than implementing it yourself is highly recommended; you'll save yourself odd bugs popping up later by just using code which has already been tried and tested in production for years.
(contributed by brian d foy)
A substitution can do this for you. For a single line, you want to replace all the leading or trailing whitespace with nothing. You can do that with a pair of substitutions:
- s/^\s+//;
- s/\s+$//;
You can also write that as a single substitution, although it turns out the combined statement is slower than the separate ones. That might not matter to you, though:
- s/^\s+|\s+$//g;
In this regular expression, the alternation matches either at the
beginning or the end of the string since the anchors have a lower
precedence than the alternation. With the /g flag, the substitution
makes all possible matches, so it gets both. Remember, the trailing
newline matches the \s+, and the $
anchor can match to the
absolute end of the string, so the newline disappears too. Just add
the newline to the output, which has the added benefit of preserving
"blank" (consisting entirely of whitespace) lines which the ^\s+
would remove all by itself:
For a multi-line string, you can apply the regular expression to each
logical line in the string by adding the /m flag (for
"multi-line"). With the /m flag, the $
matches before an
embedded newline, so it doesn't remove it. This pattern still removes
the newline at the end of the string:
- $string =~ s/^\s+|\s+$//gm;
Remember that lines consisting entirely of whitespace will disappear, since the first part of the alternation can match the entire string and replace it with nothing. If you need to keep embedded blank lines, you have to do a little more work. Instead of matching any whitespace (since that includes a newline), just match the other whitespace:
- $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
In the following examples, $pad_len
is the length to which you wish
to pad the string, $text
or $num
contains the string to be padded,
and $pad_char
contains the padding character. You can use a single
character string constant instead of the $pad_char
variable if you
know what it is in advance. And in the same way you can use an integer in
place of $pad_len
if you know the pad length in advance.
The simplest method uses the sprintf function. It can pad on the left
or right with blanks and on the left with zeroes and it will not
truncate the result. The pack function can only pad strings on the
right with blanks and it will truncate the result to a maximum length of
$pad_len
.
- # Left padding a string with blanks (no truncation):
- my $padded = sprintf("%${pad_len}s", $text);
- my $padded = sprintf("%*s", $pad_len, $text); # same thing
- # Right padding a string with blanks (no truncation):
- my $padded = sprintf("%-${pad_len}s", $text);
- my $padded = sprintf("%-*s", $pad_len, $text); # same thing
- # Left padding a number with 0 (no truncation):
- my $padded = sprintf("%0${pad_len}d", $num);
- my $padded = sprintf("%0*d", $pad_len, $num); # same thing
- # Right padding a string with blanks using pack (will truncate):
- my $padded = pack("A$pad_len",$text);
If you need to pad with a character other than blank or zero you can use
one of the following methods. They all generate a pad string with the
x
operator and combine that with $text
. These methods do
not truncate $text
.
Left and right padding with any character, creating a new string:
Left and right padding with any character, modifying $text
directly:
(contributed by brian d foy)
If you know the columns that contain the data, you can
use substr to extract a single column.
You can use split if the columns are separated by whitespace or
some other delimiter, as long as whitespace or the delimiter cannot
appear as part of the data.
If you want to work with comma-separated values, don't do this since that format is a bit more complicated. Use one of the modules that handle that format, such as Text::CSV, Text::CSV_XS, or Text::CSV_PP.
If you want to break apart an entire line of fixed columns, you can use
unpack with the A (ASCII) format. By using a number after the format
specifier, you can denote the column width. See the pack and unpack
entries in perlfunc for more details.
Note that spaces in the format argument to unpack do not denote literal
spaces. If you have space separated data, you may want split instead.
(contributed by brian d foy)
You can use the Text::Soundex
module. If you want to do fuzzy or close
matching, you might also try the String::Approx, and
Text::Metaphone, and Text::DoubleMetaphone modules.
(contributed by brian d foy)
If you can avoid it, don't, or if you can use a templating system,
such as Text::Template or Template Toolkit, do that instead. You
might even be able to get the job done with sprintf or printf:
However, for the one-off simple case where I don't want to pull out a
full templating system, I'll use a string that has two Perl scalar
variables in it. In this example, I want to expand $foo
and $bar
to their variable's values:
One way I can do this involves the substitution operator and a double
/e flag. The first /e evaluates $1
on the replacement side and
turns it into $foo
. The second /e starts with $foo
and replaces
it with its value. $foo
, then, turns into 'Fred', and that's finally
what's left in the string:
- $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
The /e will also silently ignore violations of strict, replacing
undefined variable names with the empty string. Since I'm using the
/e flag (twice even!), I have all of the same security problems I
have with eval in its string form. If there's something odd in
$foo
, perhaps something like @{[ system "rm -rf /" ]}
, then
I could get myself in trouble.
To get around the security problem, I could also pull the values from
a hash instead of evaluating variable names. Using a single /e, I
can check the hash to ensure the value exists, and if it doesn't, I
can replace the missing value with a marker, in this case ??? to
signal that I missed something:
The problem is that those double-quotes force stringification--coercing numbers and references into strings--even when you don't want them to be strings. Think of it this way: double-quote expansion is used to produce new strings. If you already have a string, why do you need more?
If you get used to writing odd things like these:
You'll be in trouble. Those should (in 99.8% of the cases) be the simpler and more direct:
Otherwise, besides slowing you down, you're going to break code when the thing in the scalar is actually neither a string nor a number, but a reference:
You can also get into subtle problems on those few operations in Perl
that actually do care about the difference between a string and a
number, such as the magical ++
autoincrement operator or the
syscall() function.
Stringification also destroys arrays.
Here documents are found in perlop. Check for these three things:
If you want to indent the text in the here document, you can do this:
- # all in one
- (my $VAR = <<HERE_TARGET) =~ s/^\s+//gm;
- your text
- goes here
- HERE_TARGET
But the HERE_TARGET must still be flush against the margin. If you want that indented also, you'll have to quote in the indentation.
- (my $quote = <<' FINIS') =~ s/^\s+//gm;
- ...we will have peace, when you and all your works have
- perished--and the works of your dark master to whom you
- would deliver us. You are a liar, Saruman, and a corrupter
- of men's hearts. --Theoden in /usr/src/perl/taint.c
- FINIS
- $quote =~ s/\s+--/\n--/;
A nice general-purpose fixer-upper function for indented here documents follows. It expects to be called with a here document as its argument. It looks to see whether each line begins with a common substring, and if so, strips that substring off. Otherwise, it takes the amount of leading whitespace found on the first line and removes that much off each subsequent line.
This works with leading special strings, dynamically determined:
- my $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
- @@@ int
- @@@ runops() {
- @@@ SAVEI32(runlevel);
- @@@ runlevel++;
- @@@ while ( op = (*op->op_ppaddr)() );
- @@@ TAINT_NOT;
- @@@ return 0;
- @@@ }
- MAIN_INTERPRETER_LOOP
Or with a fixed amount of leading whitespace, with remaining indentation correctly preserved:
- my $poem = fix<<EVER_ON_AND_ON;
- Now far ahead the Road has gone,
- And I must follow, if I can,
- Pursuing it with eager feet,
- Until it joins some larger way
- Where many paths and errands meet.
- And whither then? I cannot say.
- --Bilbo in /usr/src/perl/pp_ctl.c
- EVER_ON_AND_ON
(contributed by brian d foy)
A list is a fixed collection of scalars. An array is a variable that holds a variable collection of scalars. An array can supply its collection for list operations, so list operations also work on arrays:
- # slices
- ( 'dog', 'cat', 'bird' )[2,3];
- @animals[2,3];
- # iteration
- foreach ( qw( dog cat bird ) ) { ... }
- foreach ( @animals ) { ... }
- my @three = grep { length == 3 } qw( dog cat bird );
- my @three = grep { length == 3 } @animals;
- # supply an argument list
- wash_animals( qw( dog cat bird ) );
- wash_animals( @animals );
Array operations, which change the scalars, rearrange them, or add
or subtract some scalars, only work on arrays. These can't work on a
list, which is fixed. Array operations include shift, unshift,
push, pop, and splice.
An array can also change its length:
- $#animals = 1; # truncate to two elements
- $#animals = 10000; # pre-extend to 10,001 elements
You can change an array element, but you can't change a list element:
However, if the list element is itself a variable, it appears that you can change a list element. However, the list element is the variable, not the data. You're not changing the list element, but something the list element refers to. The list element itself doesn't change: it's still the same variable.
You also have to be careful about context. You can assign an array to a scalar to get the number of elements in the array. This only works for arrays, though:
- my $count = @animals; # only works with arrays
If you try to do the same thing with what you think is a list, you get a quite different result. Although it looks like you have a list on the righthand side, Perl actually sees a bunch of scalars separated by a comma:
- my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
Since you're assigning to a scalar, the righthand side is in scalar
context. The comma operator (yes, it's an operator!) in scalar
context evaluates its lefthand side, throws away the result, and
evaluates it's righthand side and returns the result. In effect,
that list-lookalike assigns to $scalar
it's rightmost value. Many
people mess this up because they choose a list-lookalike whose
last element is also the count they expect:
- my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
(contributed by brian d foy)
The difference is the sigil, that special character in front of the
array name. The $
sigil means "exactly one item", while the @
sigil means "zero or more items". The $
gets you a single scalar,
while the @
gets you a list.
The confusion arises because people incorrectly assume that the sigil denotes the variable type.
The $array[1]
is a single-element access to the array. It's going
to return the item in index 1 (or undef if there is no item there).
If you intend to get exactly one element from the array, this is the
form you should use.
The @array[1]
is an array slice, although it has only one index.
You can pull out multiple elements simultaneously by specifying
additional indices as a list, like @array[1,4,3,0]
.
Using a slice on the lefthand side of the assignment supplies list context to the righthand side. This can lead to unexpected results. For instance, if you want to read a single line from a filehandle, assigning to a scalar value is fine:
- $array[1] = <STDIN>;
However, in list context, the line input operator returns all of the
lines as a list. The first line goes into @array[1]
and the rest
of the lines mysteriously disappear:
- @array[1] = <STDIN>; # most likely not what you want
Either the use warnings
pragma or the -w flag will warn you when
you use an array slice with a single index.
(contributed by brian d foy)
Use a hash. When you think the words "unique" or "duplicated", think "hash keys".
If you don't care about the order of the elements, you could just
create the hash then extract the keys. It's not important how you
create that hash: just that you use keys to get the unique
elements.
If you want to use a module, try the uniq
function from
List::MoreUtils. In list context it returns the unique elements,
preserving their order in the list. In scalar context, it returns the
number of unique elements.
You can also go through each element and skip the ones you've seen
before. Use a hash to keep track. The first time the loop sees an
element, that element has no key in %Seen
. The next statement
creates the key and immediately uses its value, which is undef, so
the loop continues to the push and increments the value for that
key. The next time the loop sees that same element, its key exists in
the hash and the value for that key is true (since it's not 0 or
undef), so the next skips that iteration and the loop goes to the
next element.
You can write this more briefly using a grep, which does the same thing.
(portions of this answer contributed by Anno Siegel and brian d foy)
Hearing the word "in" is an indication that you probably should have used a hash, not a list or array, to store your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren't.
That being said, there are several ways to approach this. In Perl 5.10 and later, you can use the smart match operator to check that an item is contained in an array or a hash:
With earlier versions of Perl, you have to do a bit more work. If you are going to make this query many times over arbitrary string values, the fastest way is probably to invert the original array and maintain a hash whose keys are the first array's values:
Now you can check whether $is_blue{$some_color}
. It might have
been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed array. This kind of an array will take up less space:
Now you check whether $is_tiny_prime[$some_number].
If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings instead:
Now check whether vec($read,$n,1) is true for some $n
.
These methods guarantee fast individual tests but require a re-organization of the original list or array. They only pay off if you have to test multiple values against the same array.
If you are testing only once, the standard module List::Util exports
the function first
for this purpose. It works by stopping once it
finds the element. It's written in C for speed, and its Perl equivalent
looks like this subroutine:
If speed is of little concern, the common idiom uses grep in scalar context (which returns the number of items that passed its condition) to traverse the entire list. This does have the benefit of telling you how many matches it found, though.
If you want to actually extract the matching elements, simply use grep in list context.
Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:
Note that this is the symmetric difference, that is, all elements in either A or in B but not in both. Think of it as an xor operation.
With Perl 5.10 and later, the smart match operator can give you the answer with the least amount of work:
The following code works for single-level arrays. It uses a stringwise comparison, and does not distinguish defined versus undefined empty strings. Modify if you have other needs.
For multilevel structures, you may wish to use an approach more like this one. It uses the CPAN module FreezeThaw:
This approach also works for comparing hashes. Here we'll demonstrate two different answers:
- use FreezeThaw qw(cmpStr cmpStrHard);
- my %a = my %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
- $a{EXTRA} = \%b;
- $b{EXTRA} = \%a;
- printf "a and b contain %s hashes\n",
- cmpStr(\%a, \%b) == 0 ? "the same" : "different";
- printf "a and b contain %s hashes\n",
- cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
The first reports that both those the hashes contain the same data, while the second reports that they do not. Which you prefer is left as an exercise to the reader.
To find the first array element which satisfies a condition, you can
use the first()
function in the List::Util module, which comes
with Perl 5.8. This example finds the first element that contains
"Perl".
If you cannot use List::Util, you can make your own loop to do the same thing. Once you find the element, you stop the loop with last.
If you want the array index, use the firstidx()
function from
List::MoreUtils
:
Or write it yourself, iterating through the indices and checking the array element at each index until you find one that satisfies the condition:
(contributed by brian d foy)
Perl's arrays do not have a fixed size, so you don't need linked lists
if you just want to add or remove items. You can use array operations
such as push, pop, shift, unshift, or splice to do
that.
Sometimes, however, linked lists can be useful in situations where you want to "shard" an array so you have have many small arrays instead of a single big array. You can keep arrays longer than Perl's largest array index, lock smaller arrays separately in threaded programs, reallocate less memory, or quickly insert elements in the middle of the chain.
Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly Linked Lists" ( http://www.slideshare.net/lembark/perly-linked-lists ), although you can just use his LinkedList::Single module.
(contributed by brian d foy)
If you want to cycle through an array endlessly, you can increment the index modulo the number of elements in the array:
You can also use Tie::Cycle to use a scalar that always has the next element of the circular array:
The Array::Iterator::Circular creates an iterator object for circular arrays:
If you either have Perl 5.8.0 or later installed, or if you have Scalar-List-Utils 1.03 or later installed, you can say:
- use List::Util 'shuffle';
- @shuffled = shuffle(@list);
If not, you can use a Fisher-Yates shuffle.
- sub fisher_yates_shuffle {
- my $deck = shift; # $deck is a reference to an array
- return unless @$deck; # must not be empty!
- my $i = @$deck;
- while (--$i) {
- my $j = int rand ($i+1);
- @$deck[$i,$j] = @$deck[$j,$i];
- }
- }
- # shuffle my mpeg collection
- #
- my @mpeg = <audio/*/*.mp3>;
- fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
- print @mpeg;
Note that the above implementation shuffles an array in place,
unlike the List::Util::shuffle()
which takes a list and returns
a new shuffled list.
You've probably seen shuffling algorithms that work using splice, randomly picking another element to swap the current element with
This is bad because splice is already O(N), and since you do it N times, you just invented a quadratic algorithm; that is, O(N**2). This does not scale, although Perl is so efficient that you probably won't notice this until you have rather largish arrays.
Use for
/foreach
:
- for (@lines) {
- s/foo/bar/; # change that word
- tr/XZ/ZX/; # swap those letters
- }
Here's another; let's compute spherical volumes:
which can also be done with map() which is made to transform
one list into another:
If you want to do the same thing to modify the values of the
hash, you can use the values function. As of Perl 5.6
the values are not copied, so if you modify $orbit (in this
case), you modify the value.
Prior to perl 5.6 values returned copies of the values,
so older perl code often contains constructions such as
@orbits{keys %orbits}
instead of values %orbits
where
the hash is to be modified.
Use the rand() function (see rand):
Or, simply:
Use the List::Permutor module on CPAN. If the list is actually an array, try the Algorithm::Permute module (also on CPAN). It's written in XS code and is very efficient:
For even faster execution, you could do:
Here's a little program that generates all permutations of all the
words on each line of input. The algorithm embodied in the
permute()
function is discussed in Volume 4 (still unpublished) of
Knuth's The Art of Computer Programming and will work on any list:
- #!/usr/bin/perl -n
- # Fischer-Krause ordered permutation generator
- sub permute (&@) {
- my $code = shift;
- my @idx = 0..$#_;
- while ( $code->(@_[@idx]) ) {
- my $p = $#idx;
- --$p while $idx[$p-1] > $idx[$p];
- my $q = $p or return;
- push @idx, reverse splice @idx, $p;
- ++$q while $idx[$p-1] > $idx[$q];
- @idx[$p-1,$q]=@idx[$q,$p-1];
- }
- }
- permute { print "@_\n" } split;
The Algorithm::Loops module also provides the NextPermute
and
NextPermuteNum
functions which efficiently find all unique permutations
of an array, even if it contains duplicate values, modifying it in-place:
if its elements are in reverse-sorted order then the array is reversed,
making it sorted, and it returns false; otherwise the next
permutation is returned.
NextPermute
uses string order and NextPermuteNum
numeric order, so
you can enumerate all the permutations of 0..9
like this:
Supply a comparison function to sort() (described in sort):
- @list = sort { $a <=> $b } @list;
The default sort function is cmp, string comparison, which would
sort (1, 2, 10)
into (1, 10, 2)
. <=>
, used above, is
the numerical comparison operator.
If you have a complicated function needed to pull out the part you want to sort on, then don't do it inside the sort function. Pull it out first, because the sort BLOCK can be called many times for the same element. Here's an example of how to pull out the first word after the first number on each item, and then sort those words case-insensitively.
which could also be written this way, using a trick that's come to be known as the Schwartzian Transform:
If you need to sort on several fields, the following paradigm is useful.
This can be conveniently combined with precalculation of keys as given above.
See the sort article in the "Far More Than You Ever Wanted To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for more about this approach.
See also the question later in perlfaq4 on sorting hashes.
Use pack() and unpack(), or else vec() and the bitwise
operations.
For example, you don't have to store individual bits in an array
(which would mean that you're wasting a lot of space). To convert an
array of bits to a string, use vec() to set the right bits. This
sets $vec
to have bit N set only if $ints[N]
was set:
The string $vec
only takes up as many bits as it needs. For
instance, if you had 16 entries in @ints
, $vec
only needs two
bytes to store them (not counting the scalar variable overhead).
Here's how, given a vector in $vec
, you can get those bits into
your @ints
array:
- sub bitvec_to_list {
- my $vec = shift;
- my @ints;
- # Find null-byte density then select best algorithm
- if ($vec =~ tr/\0// / length $vec > 0.95) {
- use integer;
- my $i;
- # This method is faster with mostly null-bytes
- while($vec =~ /[^\0]/g ) {
- $i = -9 + 8 * pos $vec;
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- push @ints, $i if vec($vec, ++$i, 1);
- }
- }
- else {
- # This method is a fast general algorithm
- use integer;
- my $bits = unpack "b*", $vec;
- push @ints, 0 if $bits =~ s/^(\d)// && $1;
- push @ints, pos $bits while($bits =~ /1/g);
- }
- return \@ints;
- }
This method gets faster the more sparse the bit vector is. (Courtesy of Tim Bunce and Winfried Koenig.)
You can make the while loop a lot shorter with this suggestion from Benjamin Goldberg:
Or use the CPAN module Bit::Vector:
Bit::Vector provides efficient methods for bit vector, sets of small integers and "big int" math.
Here's a more extensive illustration using vec():
- # vec demo
- my $vector = "\xff\x0f\xef\xfe";
- print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
- unpack("N", $vector), "\n";
- my $is_set = vec($vector, 23, 1);
- print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
- pvec($vector);
- set_vec(1,1,1);
- set_vec(3,1,1);
- set_vec(23,1,1);
- set_vec(3,1,3);
- set_vec(3,2,3);
- set_vec(3,4,3);
- set_vec(3,4,7);
- set_vec(3,8,3);
- set_vec(3,8,7);
- set_vec(0,32,17);
- set_vec(1,32,17);
- sub set_vec {
- my ($offset, $width, $value) = @_;
- my $vector = '';
- vec($vector, $offset, $width) = $value;
- print "offset=$offset width=$width value=$value\n";
- pvec($vector);
- }
- sub pvec {
- my $vector = shift;
- my $bits = unpack("b*", $vector);
- my $i = 0;
- my $BASE = 8;
- print "vector length in bytes: ", length($vector), "\n";
- @bytes = unpack("A8" x length($vector), $bits);
- print "bits are: @bytes\n\n";
- }
The short story is that you should probably only use defined on scalars or functions, not on aggregates (arrays and hashes). See defined in the 5.004 release or later of Perl for more detail.
(contributed by brian d foy)
There are a couple of ways that you can process an entire hash. You can get a list of keys, then go through each key, or grab a one key-value pair at a time.
To go through all of the keys, use the keys function. This extracts
all of the keys of the hash and gives them back to you as a list. You
can then get the value through the particular key you're processing:
Once you have the list of keys, you can process that list before you process the hash elements. For instance, you can sort the keys so you can process them in lexical order:
Or, you might want to only process some of the items. If you only want
to deal with the keys that start with text:
, you can select just
those using grep:
If the hash is very large, you might not want to create a long list of
keys. To save some memory, you can grab one key-value pair at a time using
each(), which returns a pair you haven't seen yet:
The each operator returns the pairs in apparently random order, so if
ordering matters to you, you'll have to stick with the keys method.
The each() operator can be a bit tricky though. You can't add or
delete keys of the hash while you're using it without possibly
skipping or re-processing some pairs after Perl internally rehashes
all of the elements. Additionally, a hash has only one iterator, so if
you mix keys, values, or each on the same hash, you risk resetting
the iterator and messing up your processing. See the each entry in
perlfunc for more details.
(contributed by brian d foy)
Before you decide to merge two hashes, you have to decide what to do if both hashes contain keys that are the same and if you want to leave the original hashes as they were.
If you want to preserve the original hashes, copy one hash (%hash1
)
to a new hash (%new_hash
), then add the keys from the other hash
(%hash2
to the new hash. Checking that the key already exists in
%new_hash
gives you a chance to decide what to do with the
duplicates:
If you don't want to create a new hash, you can still use this looping
technique; just change the %new_hash
to %hash1
.
If you don't care that one hash overwrites keys and values from the other, you
could just use a hash slice to add one hash to another. In this case, values
from %hash2
replace values from %hash1
when they have keys in common:
(contributed by brian d foy)
The easy answer is "Don't do that!"
If you iterate through the hash with each(), you can delete the key
most recently returned without worrying about it. If you delete or add
other keys, the iterator may skip or double up on them since perl
may rearrange the hash table. See the
entry for each() in perlfunc.
Create a reverse hash:
That's not particularly efficient. It would be more space-efficient to use:
If your hash could have repeated values, the methods above will only find one of the associated keys. This may or may not worry you. If it does worry you, you can always reverse the hash into a hash of arrays instead:
(contributed by brian d foy)
This is very similar to "How do I process an entire hash?", also in perlfaq4, but a bit simpler in the common cases.
You can use the keys() built-in function in scalar context to find out
have many entries you have in a hash:
If you want to find out how many entries have a defined value, that's
a bit different. You have to check each value. A grep is handy:
You can use that same structure to count the entries any way that you like. If you want the count of the keys with vowels in them, you just test for that instead:
The grep in scalar context returns the count. If you want the list
of matching items, just use it in list context instead:
The keys() function also resets the iterator, which means that you may
see strange results if you use this between uses of other hash operators
such as each().
(contributed by brian d foy)
To sort a hash, start with the keys. In this example, we give the list of keys to the sort function which then compares them ASCIIbetically (which might be affected by your locale settings). The output list has the keys in ASCIIbetical order. Once we have the keys, we can go through them to create a report which lists the keys in ASCIIbetical order.
We could get more fancy in the sort() block though. Instead of
comparing the keys, we can compute a value with them and use that
value as the comparison.
For instance, to make our report order case-insensitive, we use
lc to lowercase the keys before comparing them:
Note: if the computation is expensive or the hash has many elements, you may want to look at the Schwartzian Transform to cache the computation results.
If we want to sort by the hash value instead, we use the hash key to look it up. We still get out a list of keys, but this time they are ordered by their value.
From there we can get more complex. If the hash values are the same, we can provide a secondary sort on the hash key.
You can look into using the DB_File
module and tie() using the
$DB_BTREE
hash bindings as documented in In Memory Databases in DB_File. The Tie::IxHash module from CPAN might also be
instructive. Although this does keep your hash sorted, you might not
like the slowdown you suffer from the tie interface. Are you sure you
need to do this? :)
Hashes contain pairs of scalars: the first is the key, the
second is the value. The key will be coerced to a string,
although the value can be any kind of scalar: string,
number, or reference. If a key $key
is present in
%hash, exists($hash{$key}) will return true. The value
for a given key can be undef, in which case
$hash{$key}
will be undef while exists $hash{$key}
will return true. This corresponds to ($key
, undef)
being in the hash.
Pictures help... Here's the %hash
table:
And these conditions hold
If you now say
- undef $hash{'a'}
your table now reads:
and these conditions now hold; changes in caps:
Notice the last two: you have an undef value, but a defined key!
Now, consider this:
- delete $hash{'a'}
your table now reads:
and these conditions now hold; changes in caps:
See, the whole entry is gone!
This depends on the tied hash's implementation of EXISTS(). For example, there isn't the concept of undef with hashes that are tied to DBM* files. It also means that exists() and defined() do the same thing with a DBM* file, and what they end up doing is not what they do with ordinary hashes.
(contributed by brian d foy)
You can use the keys or values functions to reset each. To
simply reset the iterator used by each without doing anything else,
use one of them in void context:
See the documentation for each in perlfunc.
First you extract the keys from the hashes into lists, then solve the "removing duplicates" problem described above. For example:
Or more succinctly:
Or if you really want to save space:
Either stringify the structure yourself (no fun), or else get the MLDBM (which uses Data::Dumper) module from CPAN and layer it on top of either DB_File or GDBM_File. You might also try DBM::Deep, but it can be a bit slow.
Use the Tie::IxHash from CPAN.
(contributed by brian d foy)
Are you using a really old version of Perl?
Normally, accessing a hash key's value for a nonexistent key will not create the key.
Passing $hash{ 'foo' }
to a subroutine used to be a special case, though.
Since you could assign directly to $_[0]
, Perl had to be ready to
make that assignment so it created the hash key ahead of time:
Since Perl 5.004, however, this situation is a special case and Perl creates the hash key only when you make the assignment:
However, if you want the old behavior (and think carefully about that because it's a weird side effect), you can pass a hash slice instead. Perl 5.004 didn't make this a special case:
- my_sub( @hash{ qw/foo/ } );
Usually a hash ref, perhaps like this:
- $record = {
- NAME => "Jason",
- EMPNO => 132,
- TITLE => "deputy peon",
- AGE => 23,
- SALARY => 37_000,
- PALS => [ "Norbert", "Rhys", "Phineas"],
- };
References are documented in perlref and perlreftut. Examples of complex data structures are given in perldsc and perllol. Examples of structures and object-oriented classes are in perltoot.
(contributed by brian d foy and Ben Morrow)
Hash keys are strings, so you can't really use a reference as the key.
When you try to do that, perl turns the reference into its stringified
form (for instance, HASH(0xDEADBEEF)
). From there you can't get
back the reference from the stringified form, at least without doing
some extra work on your own.
Remember that the entry in the hash will still be there even if the referenced variable goes out of scope, and that it is entirely possible for Perl to subsequently allocate a different variable at the same address. This will mean a new variable might accidentally be associated with the value for an old.
If you have Perl 5.10 or later, and you just want to store a value against the reference for lookup later, you can use the core Hash::Util::Fieldhash module. This will also handle renaming the keys if you use multiple threads (which causes all variables to be reallocated at new addresses, changing their stringification), and garbage-collecting the entries when the referenced variable goes out of scope.
If you actually need to be able to get a real reference back from each hash entry, you can use the Tie::RefHash module, which does the required work for you.
(contributed by brian d foy)
The trick to this problem is avoiding accidental autovivification. If you want to check three keys deep, you might naïvely try this:
Even though you started with a completely empty hash, after that call to
exists you've created the structure you needed to check for key3
:
- %hash = (
- 'key1' => {
- 'key2' => {}
- }
- );
That's autovivification. You can get around this in a few ways. The
easiest way is to just turn it off. The lexical autovivification
pragma is available on CPAN. Now you don't add to the hash:
The Data::Diver module on CPAN can do it for you too. Its Dive
subroutine can tell you not only if the keys exist but also get the
value:
You can easily do this yourself too by checking each level of the hash before you move onto the next level. This is essentially what Data::Diver does for you:
Since version 5.8.0, hashes can be restricted to a fixed number of given keys. Methods for creating and dealing with restricted hashes are exported by the Hash::Util module.
Perl is binary-clean, so it can handle binary data just fine.
On Windows or DOS, however, you have to use binmode for binary
files to avoid conversions for line endings. In general, you should
use binmode any time you want to work with binary data.
Also see binmode or perlopentut.
If you're concerned about 8-bit textual data then see perllocale. If you want to deal with multibyte characters, however, there are some gotchas. See the section on Regular Expressions.
Assuming that you don't care about IEEE notations like "NaN" or "Infinity", you probably just want to use a regular expression:
- use 5.010;
- given( $number ) {
- when( /\D/ )
- { say "\thas nondigits"; continue }
- when( /^\d+\z/ )
- { say "\tis a whole number"; continue }
- when( /^-?\d+\z/ )
- { say "\tis an integer"; continue }
- when( /^[+-]?\d+\z/ )
- { say "\tis a +/- integer"; continue }
- when( /^-?(?:\d+\.?|\.\d)\d*\z/ )
- { say "\tis a real number"; continue }
- when( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i)
- { say "\tis a C float" }
- }
There are also some commonly used modules for the task.
Scalar::Util (distributed with 5.8) provides access to perl's
internal function looks_like_number
for determining whether a
variable looks like a number. Data::Types exports functions that
validate data types using both the above and other regular
expressions. Thirdly, there is Regexp::Common which has regular
expressions to match various types of numbers. Those three modules are
available from the CPAN.
If you're on a POSIX system, Perl supports the POSIX::strtod
function for converting strings to doubles (and also POSIX::strtol
for longs). Its semantics are somewhat cumbersome, so here's a
getnum
wrapper function for more convenient access. This function
takes a string and returns the number it found, or undef for input
that isn't a C float. The is_numeric
function is a front end to
getnum
if you just want to say, "Is this a float?"
Or you could check out the String::Scanf module on the CPAN instead.
For some specific applications, you can use one of the DBM modules.
See AnyDBM_File. More generically, you should consult the FreezeThaw
or Storable modules from CPAN. Starting from Perl 5.8, Storable is part
of the standard distribution. Here's one example using Storable's store
and retrieve
functions:
- use Storable;
- store(\%hash, "filename");
- # later on...
- $href = retrieve("filename"); # by ref
- %hash = %{ retrieve("filename") }; # direct to hash
The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
for printing out data structures. The Storable module on CPAN (or the
5.8 release of Perl), provides a function called dclone
that recursively
copies its argument.
- use Storable qw(dclone);
- $r2 = dclone($r1);
Where $r1
can be a reference to any kind of data structure you'd like.
It will be deeply copied. Because dclone
takes and returns references,
you'd have to add extra punctuation if you had a hash of arrays that
you wanted to copy.
- %newhash = %{ dclone(\%oldhash) };
(contributed by Ben Morrow)
You can use the UNIVERSAL
class (see UNIVERSAL). However, please
be very careful to consider the consequences of doing this: adding
methods to every object is very likely to have unintended
consequences. If possible, it would be better to have all your object
inherit from some common base class, or to use an object system like
Moose that supports roles.
Get the Business::CreditCard module from CPAN.
The arrays.h/arrays.c code in the PGPLOT module on CPAN does just this. If you're doing a lot of float or double processing, consider using the PDL module from CPAN instead--it makes number-crunching easy.
See http://search.cpan.org/dist/PGPLOT for the code.
Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.
Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required.
perlfaq5 - Files and Formats
This section deals with I/O and the "f" issues: filehandles, flushing, formats, and footers.
(contributed by brian d foy)
You might like to read Mark Jason Dominus's "Suffering From Buffering" at http://perl.plover.com/FAQs/Buffering.html .
Perl normally buffers output so it doesn't make a system call for every bit of output. By saving up output, it makes fewer expensive system calls. For instance, in this little bit of code, you want to print a dot to the screen for every line you process to watch the progress of your program. Instead of seeing a dot for every line, Perl buffers the output and you have a long wait before you see a row of 50 dots all at once:
To get around this, you have to unbuffer the output filehandle, in this
case, STDOUT
. You can set the special variable $|
to a true value
(mnemonic: making your filehandles "piping hot"):
The $|
is one of the per-filehandle special variables, so each
filehandle has its own copy of its value. If you want to merge
standard output and standard error for instance, you have to unbuffer
each (although STDERR might be unbuffered by default):
- {
- my $previous_default = select(STDOUT); # save previous default
- $|++; # autoflush STDOUT
- select(STDERR);
- $|++; # autoflush STDERR, to be sure
- select($previous_default); # restore previous default
- }
- # now should alternate . and +
- while( 1 ) {
- sleep 1;
- print STDOUT ".";
- print STDERR "+";
- print STDOUT "\n" unless ++$count % 25;
- }
Besides the $|
special variable, you can use binmode to give
your filehandle a :unix
layer, which is unbuffered:
For more information on output layers, see the entries for binmode
and open in perlfunc, and the PerlIO module documentation.
If you are using IO::Handle or one of its subclasses, you can
call the autoflush
method to change the settings of the
filehandle:
The IO::Handle objects also have a flush
method. You can flush
the buffer any time you want without auto-buffering
- $io_fh->flush;
(contributed by brian d foy)
The basic idea of inserting, changing, or deleting a line from a text
file involves reading and printing the file to the point you want to
make the change, making the change, then reading and printing the rest
of the file. Perl doesn't provide random access to lines (especially
since the record input separator, $/
, is mutable), although modules
such as Tie::File can fake it.
A Perl program to do these tasks takes the basic form of opening a file, printing its lines, then closing the file:
Within that basic form, add the parts that you need to insert, change, or delete lines.
To prepend lines to the beginning, print those lines before you enter the loop that prints the existing lines.
To change existing lines, insert the code to modify the lines inside
the while
loop. In this case, the code finds all lowercased
versions of "perl" and uppercases them. The happens for every line, so
be sure that you're supposed to do that on every line!
To change only a particular line, the input line number, $.
, is
useful. First read and print the lines up to the one you want to
change. Next, read the single line you want to change, change it, and
print it. After that, read the rest of the lines and print those:
To skip lines, use the looping controls. The next in this example
skips comment lines, and the last stops all processing once it
encounters either __END__
or __DATA__
.
Do the same sort of thing to delete a particular line by using next
to skip the lines you don't want to show up in the output. This
example skips every fifth line:
If, for some odd reason, you really want to see the whole file at once rather than processing line-by-line, you can slurp it in (as long as you can fit the whole thing in memory!):
Modules such as File::Slurp and Tie::File can help with that too. If you can, however, avoid reading the entire file at once. Perl won't give that memory back to the operating system until the process finishes.
You can also use Perl one-liners to modify a file in-place. The
following changes all 'Fred' to 'Barney' in inFile.txt, overwriting
the file with the new contents. With the -p
switch, Perl wraps a
while
loop around the code you specify with -e
, and -i
turns
on in-place editing. The current line is in $_
. With -p
, Perl
automatically prints the value of $_
at the end of the loop. See
perlrun for more details.
- perl -pi -e 's/Fred/Barney/' inFile.txt
To make a backup of inFile.txt
, give -i
a file extension to add:
- perl -pi.bak -e 's/Fred/Barney/' inFile.txt
To change only the fifth line, you can add a test checking $.
, the
input line number, then only perform the operation when the test
passes:
- perl -pi -e 's/Fred/Barney/ if $. == 5' inFile.txt
To add lines before a certain line, you can add a line (or lines!)
before Perl prints $_
:
- perl -pi -e 'print "Put before third line\n" if $. == 3' inFile.txt
You can even add a line to the beginning of a file, since the current line prints at the end of the loop:
- perl -pi -e 'print "Put before first line\n" if $. == 1' inFile.txt
To insert a line after one already in the file, use the -n
switch.
It's just like -p
except that it doesn't print $_
at the end of
the loop, so you have to do that yourself. In this case, print $_
first, then print the line that you want to add.
- perl -ni -e 'print; print "Put after fifth line\n" if $. == 5' inFile.txt
To delete lines, only print the ones that you want.
- perl -ni -e 'print if /d/' inFile.txt
(contributed by brian d foy)
Conceptually, the easiest way to count the lines in a file is to simply read them and count them:
You don't really have to count them yourself, though, since Perl
already does that with the $.
variable, which is the current line
number from the last filehandle read:
If you want to use $.
, you can reduce it to a simple one-liner,
like one of these:
- % perl -lne '} print $.; {' file
- % perl -lne 'END { print $. }' file
Those can be rather inefficient though. If they aren't fast enough for you, you might just read chunks of data and count the number of newlines:
However, that doesn't work if the line ending isn't a newline. You
might change that tr/// to a s/// so you can count the number of
times the input record separator, $/
, shows up:
If you don't mind shelling out, the wc
command is usually the
fastest, even with the extra interprocess overhead. Ensure that you
have an untainted filename though:
(contributed by brian d foy)
The easiest conceptual solution is to count the lines in the file then start at the beginning and print the number of lines (minus the last N) to a new file.
Most often, the real question is how you can delete the last N lines without making more than one pass over the file, or how to do it without a lot of copying. The easy concept is the hard reality when you might have millions of lines in your file.
One trick is to use File::ReadBackwards, which starts at the end of the file. That module provides an object that wraps the real filehandle to make it easy for you to move around the file. Once you get to the spot you need, you can get the actual filehandle and work with it as normal. In this case, you get the file position at the end of the last line you want to keep and truncate the file to that point:
- use File::ReadBackwards;
- my $filename = 'test.txt';
- my $Lines_to_truncate = 2;
- my $bw = File::ReadBackwards->new( $filename )
- or die "Could not read backwards in [$filename]: $!";
- my $lines_from_end = 0;
- until( $bw->eof or $lines_from_end == $Lines_to_truncate ) {
- print "Got: ", $bw->readline;
- $lines_from_end++;
- }
- truncate( $filename, $bw->tell );
The File::ReadBackwards module also has the advantage of setting the input record separator to a regular expression.
You can also use the Tie::File module which lets you access
the lines through a tied array. You can use normal array operations
to modify your file, including setting the last index and using
splice.
-i
option from within a program?
-i
sets the value of Perl's $^I
variable, which in turn affects
the behavior of <>
; see perlrun for more details. By
modifying the appropriate variables directly, you can get the same
behavior within a larger program. For example:
This block modifies all the .c files in the current directory,
leaving a backup of the original data from each file in a new
.c.orig file.
(contributed by brian d foy)
Use the File::Copy module. It comes with Perl and can do a true copy across file systems, and it does its magic in a portable fashion.
If you can't use File::Copy, you'll have to do the work yourself: open the original file, open the destination file, then print to the destination file as you read the original. You also have to remember to copy the permissions, owner, and group to the new file.
If you don't need to know the name of the file, you can use open()
with undef in place of the file name. In Perl 5.8 or later, the
open() function creates an anonymous temporary file:
Otherwise, you can use the File::Temp module.
The File::Temp has been a standard module since Perl 5.6.1. If you
don't have a modern enough Perl installed, use the new_tmpfile
class method from the IO::File module to get a filehandle opened for
reading and writing. Use it if you don't need to know the file's name:
If you're committed to creating a temporary file by hand, use the process ID and/or the current time-value. If you need to have many temporary files in one process, use a counter:
- BEGIN {
- use Fcntl;
- my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMPDIR} || $ENV{TEMP};
- my $base_name = sprintf "%s/%d-%d-0000", $temp_dir, $$, time;
- sub temp_file {
- my $fh;
- my $count = 0;
- until( defined(fileno($fh)) || $count++ > 100 ) {
- $base_name =~ s/-(\d+)$/"-" . (1 + $1)/e;
- # O_EXCL is required for security reasons.
- sysopen $fh, $base_name, O_WRONLY|O_EXCL|O_CREAT;
- }
- if( defined fileno($fh) ) {
- return ($fh, $base_name);
- }
- else {
- return ();
- }
- }
- }
The most efficient way is using pack and unpack. This is faster than using substr when taking many, many strings. It is slower for just a few.
Here is a sample chunk of code to break up and put back together again some fixed-format input lines, in this case from the output of a normal, Berkeley-style ps:
- # sample input line:
- # 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what
- my $PS_T = 'A6 A4 A7 A5 A*';
- open my $ps, '-|', 'ps';
- print scalar <$ps>;
- my @fields = qw( pid tt stat time command );
- while (<$ps>) {
- my %process;
- @process{@fields} = unpack($PS_T, $_);
- for my $field ( @fields ) {
- print "$field: <$process{$field}>\n";
- }
- print 'line=', pack($PS_T, @process{@fields} ), "\n";
- }
We've used a hash slice in order to easily handle the fields of each row.
Storing the keys in an array makes it easy to operate on them as a
group or loop over them with for
. It also avoids polluting the program
with global variables and using symbolic references.
As of perl5.6, open() autovivifies file and directory handles as references if you pass it an uninitialized scalar variable. You can then pass these references just like any other scalar, and use them in the place of named handles.
If you like, you can store these filehandles in an array or a hash.
If you access them directly, they aren't simple scalars and you
need to give print a little help by placing the filehandle
reference in braces. Perl can only figure it out on its own when
the filehandle reference is a simple scalar.
Before perl5.6, you had to deal with various typeglob idioms which you may see in older code.
If you want to create many anonymous handles, you should check out the Symbol or IO::Handle modules.
An indirect filehandle is the use of something other than a symbol in a place that a filehandle is expected. Here are ways to get indirect filehandles:
- $fh = SOME_FH; # bareword is strict-subs hostile
- $fh = "SOME_FH"; # strict-refs hostile; same package only
- $fh = *SOME_FH; # typeglob
- $fh = \*SOME_FH; # ref to typeglob (bless-able)
- $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob
Or, you can use the new
method from one of the IO::* modules to
create an anonymous filehandle and store that in a scalar variable.
Then use any of those as you would a normal filehandle. Anywhere that
Perl is expecting a filehandle, an indirect filehandle may be used
instead. An indirect filehandle is just a scalar variable that contains
a filehandle. Functions like print, open, seek, or
the <FH>
diamond operator will accept either a named filehandle
or a scalar variable containing one:
If you're passing a filehandle to a function, you can write the function in two ways:
Or it can localize a typeglob and use the filehandle directly:
Both styles work with either objects or typeglobs of real filehandles. (They might also work with strings under some circumstances, but this is risky.)
- accept_fh(*STDOUT);
- accept_fh($handle);
In the examples above, we assigned the filehandle to a scalar variable
before using it. That is because only simple scalar variables, not
expressions or subscripts of hashes or arrays, can be used with
built-ins like print, printf, or the diamond operator. Using
something other than a simple scalar variable as a filehandle is
illegal and won't even compile:
With print and printf, you get around this by using a block and
an expression where you would place the filehandle:
That block is a proper block like any other, so you can put more complicated code there. This sends the message out to one of two places:
This approach of treating print and printf like object methods
calls doesn't work for the diamond operator. That's because it's a
real operator, not just a function with a comma-less argument. Assuming
you've been storing typeglobs in your structure as we did above, you
can use the built-in function named readline to read a record just
as <>
does. Given the initialization shown above for @fd, this
would work, but only because readline() requires a typeglob. It doesn't
work with objects or strings, which might be a bug we haven't fixed yet.
- $got = readline($fd[0]);
Let it be noted that the flakiness of indirect filehandles is not related to whether they're strings, typeglobs, objects, or anything else. It's the syntax of the fundamental operators. Playing the object game doesn't help you at all here.
There's no builtin way to do this, but perlform has a couple of techniques to make it possible for the intrepid hacker.
(contributed by brian d foy)
If you want to write into a string, you just have to <open> a
filehandle to a string, which Perl has been able to do since Perl 5.6:
Since you want to be a good programmer, you probably want to use a lexical
filehandle, even though formats are designed to work with bareword filehandles
since the default format names take the filehandle name. However, you can
control this with some Perl special per-filehandle variables: $^
, which
names the top-of-page format, and $~
which shows the line format. You have
to change the default filehandle to set these variables:
Although write can work with lexical or package variables, whatever variables you use have to scope in the format. That most likely means you'll want to localize some package variables:
There are also some tricks that you can play with formline and the
accumulator variable $^A
, but you lose a lot of the value of formats
since formline won't handle paging and so on. You end up reimplementing
formats when you use them.
(contributed by Peter J. Holzer, hjp-usenet2@hjp.at)
Since Perl 5.8.0 a file handle referring to a string can be created by calling open with a reference to that string instead of the filename. This file handle can then be used to read from or write to the string:
With older versions of Perl, the IO::String module provides similar functionality.
(contributed by brian d foy and Benjamin Goldberg)
You can use Number::Format to separate places in a number. It handles locale information for those of you who want to insert full stops instead (or anything else that they want to use, really).
This subroutine will add commas to your number:
This regex from Benjamin Goldberg will add commas to numbers:
- s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;
It is easier to see with comments:
- s/(
- ^[-+]? # beginning of number.
- \d+? # first digits before first comma
- (?= # followed by, (but not included in the match) :
- (?>(?:\d{3})+) # some positive multiple of three digits.
- (?!\d) # an *exact* multiple, not x * 3 + 1 or whatever.
- )
- | # or:
- \G\d{3} # after the last group, get three digits
- (?=\d) # but they have to have more digits after them.
- )/$1,/xg;
Use the <> (glob()) operator, documented in perlfunc.
Versions of Perl older than 5.6 require that you have a shell
installed that groks tildes. Later versions of Perl have this feature
built in. The File::KGlob module (available from CPAN) gives more
portable glob functionality.
Within Perl, you may use this directly:
- $filename =~ s{
- ^ ~ # find a leading tilde
- ( # save this in $1
- [^/] # a non-slash character
- * # repeated 0 or more times (0 means me)
- )
- }{
- $1
- ? (getpwnam($1))[7]
- : ( $ENV{HOME} || $ENV{LOGDIR} )
- }ex;
Because you're using something like this, which truncates the file then gives you read-write access:
Whoops. You should instead use this, which will fail if the file doesn't exist:
Using ">" always clobbers or creates. Using "<" never does either. The "+" doesn't change this.
Here are examples of many kinds of file opens. Those using sysopen
all assume that you've pulled in the constants from Fcntl:
- use Fcntl;
To open file for reading:
To open file for writing, create new file if needed or else truncate old file:
To open file for writing, create new file, file must not exist:
To open file for appending, create if necessary:
To open file for appending, file must exist:
To open file for update, file must exist:
To open file for update, create file if necessary:
To open file for update, file must not exist:
To open a file without blocking, creating if necessary:
Be warned that neither creation nor deletion of files is guaranteed to be an atomic operation over NFS. That is, two processes might both successfully create or unlink the same file! Therefore O_EXCL isn't as exclusive as you might wish.
See also perlopentut.
The <>
operator performs a globbing operation (see above).
In Perl versions earlier than v5.6.0, the internal glob() operator forks
csh(1) to do the actual glob expansion, but
csh can't handle more than 127 items and so gives the error message
Argument list too long
. People who installed tcsh as csh won't
have this problem, but their users may be surprised by it.
To get around this, either upgrade to Perl v5.6.0 or later, do the glob yourself with readdir() and patterns, or use a module like File::Glob, one that doesn't use the shell to do globbing.
(contributed by Brian McCauley)
The special two-argument form of Perl's open() function ignores trailing blanks in filenames and infers the mode from certain leading characters (or a trailing "|"). In older versions of Perl this was the only version of open() and so it is prevalent in old code and books.
Unless you have a particular reason to use the two-argument form you should use the three-argument form of open() which does not treat any characters in the filename as special.
If your operating system supports a proper mv(1) utility or its functional equivalent, this works:
It may be more portable to use the File::Copy module instead.
You just copy to the new file to the new name (checking return
values), then delete the old one. This isn't really the same
semantically as a rename(), which preserves meta-information like
permissions, timestamps, inode info, etc.
Perl's builtin flock() function (see perlfunc for details) will call flock(2) if that exists, fcntl(2) if it doesn't (on perl version 5.004 and later), and lockf(3) if neither of the two previous system calls exists. On some systems, it may even use a different form of native locking. Here are some gotchas with Perl's flock():
Produces a fatal error if none of the three system calls (or their close equivalent) exists.
lockf(3) does not provide shared locking, and requires that the filehandle be open for writing (or appending, or read/writing).
Some versions of flock() can't lock files over a network (e.g. on NFS file systems), so you'd need to force the use of fcntl(2) when you build Perl. But even this is dubious at best. See the flock entry of perlfunc and the INSTALL file in the source distribution for information on building Perl to do this.
Two potentially non-obvious but traditional flock semantics are that it waits indefinitely until the lock is granted, and that its locks are merely advisory. Such discretionary locks are more flexible, but offer fewer guarantees. This means that files locked with flock() may be modified by programs that do not also use flock(). Cars that stop for red lights get on well with each other, but not with cars that don't stop for red lights. See the perlport manpage, your port's specific documentation, or your system-specific local manpages for details. It's best to assume traditional behavior if you're writing portable programs. (If you're not, you should as always feel perfectly free to write for your own system's idiosyncrasies (sometimes called "features"). Slavish adherence to portability concerns shouldn't get in the way of your getting your job done.)
For more information on file locking, see also File Locking in perlopentut if you have it (new for 5.6).
A common bit of code NOT TO USE is this:
This is a classic race condition: you take two steps to do something which must be done in one. That's why computer hardware provides an atomic test-and-set instruction. In theory, this "ought" to work:
except that lamentably, file creation (and deletion) is not atomic over NFS, so this won't work (at least, not every time) over the net. Various schemes involving link() have been suggested, but these tend to involve busy-wait, which is also less than desirable.
Didn't anyone ever tell you web-page hit counters were useless? They don't count number of hits, they're a waste of time, and they serve only to stroke the writer's vanity. It's better to pick a random number; they're more realistic.
Anyway, this is what you can do if you can't help yourself.
- use Fcntl qw(:DEFAULT :flock);
- sysopen my $fh, "numfile", O_RDWR|O_CREAT or die "can't open numfile: $!";
- flock $fh, LOCK_EX or die "can't flock numfile: $!";
- my $num = <$fh> || 0;
- seek $fh, 0, 0 or die "can't rewind numfile: $!";
- truncate $fh, 0 or die "can't truncate numfile: $!";
- (print $fh $num+1, "\n") or die "can't write numfile: $!";
- close $fh or die "can't close numfile: $!";
Here's a much better web-page hit counter:
If the count doesn't impress your friends, then the code might. :-)
If you are on a system that correctly implements flock and you use
the example appending code from "perldoc -f flock" everything will be
OK even if the OS you are on doesn't implement append mode correctly
(if such a system exists). So if you are happy to restrict yourself to
OSs that implement flock (and that's not really much of a
restriction) then that is what you should do.
If you know you are only going to use a system that does correctly
implement appending (i.e. not Win32) then you can omit the seek
from the code in the previous answer.
If you know you are only writing code to run on an OS and filesystem
that does implement append mode correctly (a local filesystem on a
modern Unix for example), and you keep the file in block-buffered mode
and you write less than one buffer-full of output between each manual
flushing of the buffer then each bufferload is almost guaranteed to be
written to the end of the file in one chunk without getting
intermingled with anyone else's output. You can also use the
syswrite function which is simply a wrapper around your system's
write(2) system call.
There is still a small theoretical chance that a signal will interrupt
the system-level write() operation before completion. There is also
a possibility that some STDIO implementations may call multiple system
level write()s even if the buffer was empty to start. There may be
some systems where this probability is reduced to zero, and this is
not a concern when using :perlio
instead of your system's STDIO.
If you're just trying to patch a binary, in many cases something as simple as this works:
- perl -i -pe 's{window manager}{window mangler}g' /usr/bin/emacs
However, if you have fixed sized records, then you might do something more like this:
- my $RECSIZE = 220; # size of record, in bytes
- my $recno = 37; # which record to update
- open my $fh, '+<', 'somewhere' or die "can't update somewhere: $!";
- seek $fh, $recno * $RECSIZE, 0;
- read $fh, $record, $RECSIZE == $RECSIZE or die "can't read record $recno: $!";
- # munge the record
- seek $fh, -$RECSIZE, 1;
- print $fh $record;
- close $fh;
Locking and error checking are left as an exercise for the reader. Don't forget them or you'll be quite sorry.
If you want to retrieve the time at which the file was last read,
written, or had its meta-data (owner, etc) changed, you use the -A,
-M, or -C file test operations as documented in perlfunc.
These retrieve the age of the file (measured against the start-time of
your program) in days as a floating point number. Some platforms may
not have all of these times. See perlport for details. To retrieve
the "raw" time in seconds since the epoch, you would call the stat
function, then use localtime(), gmtime(), or
POSIX::strftime()
to convert this into human-readable form.
Here's an example:
If you prefer something more legible, use the File::stat module (part of the standard distribution in version 5.004 and later):
The POSIX::strftime() approach has the benefit of being, in theory, independent of the current locale. See perllocale for details.
You use the utime() function documented in utime. By way of example, here's a little program that copies the read and write times from its first argument to all the rest of them.
Error checking is, as usual, left as an exercise for the reader.
The perldoc for utime also has an example that has the same effect as touch(1) on files that already exist.
Certain file systems have a limited ability to store the times on a file at the expected level of precision. For example, the FAT and HPFS filesystem are unable to create dates on files with a finer granularity than two seconds. This is a limitation of the filesystems, not of utime().
To connect one filehandle to several output filehandles, you can use the IO::Tee or Tie::FileHandle::Multiplex modules.
If you only have to do this once, you can print individually to each filehandle.
The customary Perl approach for processing all the lines in a file is to do so one line at a time:
This is tremendously more efficient than reading the entire file into memory as an array of lines and then processing it one element at a time, which is often--if not almost always--the wrong approach. Whenever you see someone do this:
- my @lines = <INPUT>;
You should think long and hard about why you need everything loaded at once. It's just not a scalable solution.
If you "mmap" the file with the File::Map module from CPAN, you can virtually load the entire file into a string without actually storing it in memory:
Once mapped, you can treat $string
as you would any other string.
Since you don't necessarily have to load the data, mmap-ing can be
very fast and may not increase your memory footprint.
You might also find it more
fun to use the standard Tie::File module, or the DB_File module's
$DB_RECNO
bindings, which allow you to tie an array to a file so that
accessing an element of the array actually accesses the corresponding
line in the file.
If you want to load the entire file, you can use the File::Slurp module to do it in one one simple and efficient step:
Or you can read the entire file contents into a scalar like this:
That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this:
You can also use a localized @ARGV
to eliminate the open:
For ordinary files you can also use the read function.
- read( $fh, $var, -s $fh );
That third argument tests the byte size of the data on the $fh
filehandle
and reads that many bytes into the buffer $var
.
Use the $/
variable (see perlvar for details). You can either
set it to ""
to eliminate empty paragraphs ("abc\n\n\n\ndef"
,
for instance, gets treated as two paragraphs and not three), or
"\n\n"
to accept empty paragraphs.
Note that a blank line must have no blanks in it. Thus
"fred\n \nstuff\n\n"
is one paragraph, but "fred\n\nstuff\n\n"
is two.
You can use the builtin getc() function for most filehandles, but
it won't (easily) work on a terminal device. For STDIN, either use
the Term::ReadKey module from CPAN or use the sample code in
getc.
If your system supports the portable operating system programming interface (POSIX), you can use the following code, which you'll note turns off echo processing as well.
- #!/usr/bin/perl -w
- use strict;
- $| = 1;
- for (1..4) {
- print "gimme: ";
- my $got = getone();
- print "--> $got\n";
- }
- exit;
- BEGIN {
- use POSIX qw(:termios_h);
- my ($term, $oterm, $echo, $noecho, $fd_stdin);
- my $fd_stdin = fileno(STDIN);
- $term = POSIX::Termios->new();
- $term->getattr($fd_stdin);
- $oterm = $term->getlflag();
- $echo = ECHO | ECHOK | ICANON;
- $noecho = $oterm & ~$echo;
- sub cbreak {
- $term->setlflag($noecho);
- $term->setcc(VTIME, 1);
- $term->setattr($fd_stdin, TCSANOW);
- }
- sub cooked {
- $term->setlflag($oterm);
- $term->setcc(VTIME, 0);
- $term->setattr($fd_stdin, TCSANOW);
- }
- sub getone {
- my $key = '';
- cbreak();
- sysread(STDIN, $key, 1);
- cooked();
- return $key;
- }
- }
- END { cooked() }
The Term::ReadKey module from CPAN may be easier to use. Recent versions include also support for non-portable systems as well.
The very first thing you should do is look into getting the Term::ReadKey extension from CPAN. As we mentioned earlier, it now even has limited support for non-portable (read: not open systems, closed, proprietary, not POSIX, not Unix, etc.) systems.
You should also check out the Frequently Asked Questions list in comp.unix.* for things like this: the answer is essentially the same. It's very system-dependent. Here's one solution that works on BSD systems:
If you want to find out how many characters are waiting, there's
also the FIONREAD ioctl call to be looked at. The h2ph tool that
comes with Perl tries to convert C include files to Perl code, which
can be required. FIONREAD ends up defined as a function in the
sys/ioctl.ph file:
If h2ph wasn't installed or doesn't work for you, you can grep the include files by hand:
- % grep FIONREAD /usr/include/*/*
- /usr/include/asm/ioctls.h:#define FIONREAD 0x541B
Or write a small C program using the editor of champions:
- % cat > fionread.c
- #include <sys/ioctl.h>
- main() {
- printf("%#08x\n", FIONREAD);
- }
- ^D
- % cc -o fionread fionread.c
- % ./fionread
- 0x4004667f
And then hard-code it, leaving porting as an exercise to your successor.
FIONREAD requires a filehandle connected to a stream, meaning that sockets, pipes, and tty devices work, but not files.
tail -f
in perl?
First try
- seek($gw_fh, 0, 1);
The statement seek($gw_fh, 0, 1)
doesn't change the current position,
but it does clear the end-of-file condition on the handle, so that the
next <$gw_fh>
makes Perl try again to read something.
If that doesn't work (it relies on features of your stdio implementation), then you need something more like this:
If this still doesn't work, look into the clearerr
method
from IO::Handle, which resets the error and end-of-file states
on the handle.
There's also a File::Tail module from CPAN.
If you check open, you'll see that several of the ways to call open() should do the trick. For example:
Or even with a literal numeric descriptor:
Note that "<&STDIN" makes a copy, but "<&=STDIN" makes an alias. That means if you close an aliased handle, all aliases become inaccessible. This is not true with a copied one.
Error checking, as always, has been left as an exercise for the reader.
If, for some reason, you have a file descriptor instead of a
filehandle (perhaps you used POSIX::open
), you can use the
close() function from the POSIX module:
- use POSIX ();
- POSIX::close( $fd );
This should rarely be necessary, as the Perl close() function is to be
used for things that Perl opened itself, even if it was a dup of a
numeric descriptor as with MHCONTEXT
above. But if you really have
to, you may be able to do this:
Or, just use the fdopen(3S) feature of open():
Whoops! You just put a tab and a formfeed into that filename! Remember that within double quoted strings ("like\this"), the backslash is an escape character. The full list of these is in Quote and Quote-like Operators in perlop. Unsurprisingly, you don't have a file called "c:(tab)emp(formfeed)oo" or "c:(tab)emp(formfeed)oo.exe" on your legacy DOS filesystem.
Either single-quote your strings, or (preferably) use forward slashes.
Since all DOS and Windows versions since something like MS-DOS 2.0 or so
have treated / and \
the same in a path, you might as well use the
one that doesn't clash with Perl--or the POSIX shell, ANSI C and C++,
awk, Tcl, Java, or Python, just to mention a few. POSIX paths
are more portable, too.
Because even on non-Unix ports, Perl's glob function follows standard
Unix globbing semantics. You'll need glob("*") to get all (non-hidden)
files. This makes glob() portable even to legacy systems. Your
port may include proprietary globbing functions as well. Check its
documentation for details.
-i
clobber protected files? Isn't this a bug in Perl?This is elaborately and painstakingly described in the file-dir-perms article in the "Far More Than You Ever Wanted To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz .
The executive summary: learn how your filesystem works. The permissions on a file say what can happen to the data in that file. The permissions on a directory say what can happen to the list of files in that directory. If you delete a file, you're removing its name from the directory (so the operation depends on the permissions of the directory, not of the file). If you try to write to the file, the permissions of the file govern whether you're allowed to.
Short of loading the file into a database or pre-indexing the lines in the file, there are a couple of things that you can do.
Here's a reservoir-sampling algorithm from the Camel Book:
This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.
You can use the File::Random module which provides a function for that algorithm:
Another way is to use the Tie::File module, which treats the entire file as an array. Simply access a random array element.
(contributed by brian d foy)
If you are seeing spaces between the elements of your array when you print the array, you are probably interpolating the array in double quotes:
It's the double quotes, not the print, doing this. Whenever you
interpolate an array in a double quote context, Perl joins the
elements with spaces (or whatever is in $"
, which is a space by
default):
- animals are: camel llama alpaca vicuna
This is different than printing the array without the interpolation:
Now the output doesn't have the spaces between the elements because
the elements of @animals
simply become part of the list to
print:
- animals are: camelllamaalpacavicuna
You might notice this when each of the elements of @array
end with
a newline. You expect to print one element per line, but notice that
every line after the first is indented:
- this is a line
- this is another line
- this is the third line
That extra space comes from the interpolation of the array. If you don't want to put anything between your array elements, don't use the array in double quotes. You can send it to print without them:
- print @lines;
(contributed by brian d foy)
The File::Find module, which comes with Perl, does all of the hard
work to traverse a directory structure. It comes with Perl. You simply
call the find
subroutine with a callback subroutine and the
directories you want to traverse:
The File::Find::Closures, which you can download from CPAN, provides many ready-to-use subroutines that you can use with File::Find.
The File::Finder, which you can download from CPAN, can help you
create the callback subroutine using something closer to the syntax of
the find
command-line utility:
The File::Find::Rule module, which you can download from CPAN, has a similar interface, but does the traversal for you too:
(contributed by brian d foy)
If you have an empty directory, you can use Perl's built-in rmdir.
If the directory is not empty (so, no files or subdirectories), you
either have to empty it yourself (a lot of work) or use a module to
help you.
The File::Path module, which comes with Perl, has a remove_tree
which can take care of all of the hard work for you:
- use File::Path qw(remove_tree);
- remove_tree( @directories );
The File::Path module also has a legacy interface to the older
rmtree
subroutine.
(contributed by Shlomi Fish)
To do the equivalent of cp -R
(i.e. copy an entire directory tree
recursively) in portable Perl, you'll either need to write something yourself
or find a good CPAN module such as File::Copy::Recursive.
Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.
Irrespective of its distribution, all code examples here are in the public domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required.
perlfaq6 - Regular Expressions
This section is surprisingly small because the rest of the FAQ is littered with answers involving regular expressions. For example, decoding a URL and checking whether something is a number can be handled with regular expressions, but those answers are found elsewhere in this document (in perlfaq9: "How do I decode or create those %-encodings on the web" and perlfaq4: "How do I determine whether a scalar is a number/whole/integer/float", to be precise).
Three techniques can make regular expressions maintainable and understandable.
Describe what you're doing and how you're doing it, using normal Perl comments.
- # turn the line into the first word, a colon, and the
- # number of characters on the rest of the line
- s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg;
The /x modifier causes whitespace to be ignored in a regex pattern
(except in a character class and a few other places), and also allows you to
use normal comments there, too. As you can imagine, whitespace and comments
help a lot.
/x lets you turn this:
- s{<(?:[^>'"]*|".*?"|'.*?')+>}{}gs;
into this:
- s{ < # opening angle bracket
- (?: # Non-backreffing grouping paren
- [^>'"] * # 0 or more things that are neither > nor ' nor "
- | # or else
- ".*?" # a section between double quotes (stingy match)
- | # or else
- '.*?' # a section between single quotes (stingy match)
- ) + # all occurring one or more times
- > # closing angle bracket
- }{}gsx; # replace with nothing, i.e. delete
It's still not quite so clear as prose, but it is very useful for describing the meaning of each part of the pattern.
While we normally think of patterns as being delimited with /
characters, they can be delimited by almost any character. perlre
describes this. For example, the s/// above uses braces as
delimiters. Selecting another delimiter can avoid quoting the
delimiter within the pattern:
- s/\/usr\/local/\/usr\/share/g; # bad delimiter choice
- s#/usr/local#/usr/share#g; # better
Using logically paired delimiters can be even more readable:
- s{/usr/local/}{/usr/share}g; # better still
Either you don't have more than one line in the string you're looking at (probably), or else you aren't using the correct modifier(s) on your pattern (possibly).
There are many ways to get multiline data into a string. If you want
it to happen automatically while reading input, you'll want to set $/
(probably to '' for paragraphs or undef for the whole file) to
allow you to read more than one line at a time.
Read perlre to help you decide which of /s and /m (or both)
you might want to use: /s allows dot to include newline, and /m
allows caret and dollar to match next to a newline, not just at the
end of the string. You do need to make sure that you've actually
got a multiline string in there.
For example, this program detects duplicate words, even when they span
line breaks (but not paragraph ones). For this example, we don't need
/s because we aren't using dot in a regular expression that we want
to cross line boundaries. Neither do we need /m because we don't
want caret or dollar to match at any point inside the record next
to newlines. But it's imperative that $/ be set to something other
than the default, or else we won't actually ever have a multiline
record read in.
Here's some code that finds sentences that begin with "From " (which would be mangled by many mailers):
Here's code that finds everything between START and END in a paragraph:
You can use Perl's somewhat exotic ..
operator (documented in
perlop):
- perl -ne 'print if /START/ .. /END/' file1 file2 ...
If you wanted text and not lines, you would use
- perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...
But if you want nested occurrences of START
through END
, you'll
run up against the problem described in the question in this section
on matching balanced text.
Here's another example of using ..
:
Do not use regexes. Use a module and forget about the regular expressions. The XML::LibXML, HTML::TokeParser and HTML::TreeBuilder modules are good starts, although each namespace has other parsing modules specialized for certain tasks and different ways of doing it. Start at CPAN Search ( http://metacpan.org/ ) and wonder at all the work people have done for you already! :)
$/ has to be a string. You can use these examples if you really need to do this.
If you have File::Stream, this is easy.
If you don't have File::Stream, you have to do a little more work.
You can use the four-argument form of sysread to continually add to a buffer. After you add to the buffer, you check if you have a complete line (using your regular expression).
You can do the same thing with foreach and a match using the c flag and the \G anchor, if you do not mind your entire file being in memory at the end.
Here's a lovely Perlish solution by Larry Rosler. It exploits properties of bitwise xor on ASCII strings.
- $_= "this is a TEsT case";
- $old = 'test';
- $new = 'success';
- s{(\Q$old\E)}
- { uc $new | (uc $1 ^ $1) .
- (uc(substr $1, -1) ^ substr $1, -1) x
- (length($new) - length $1)
- }egi;
- print;
And here it is as a subroutine, modeled after the above:
This prints:
- this is a SUcCESS case
As an alternative, to keep the case of the replacement word if it is longer than the original, you can use this code, by Jeff Pinyan:
This changes the sentence to "this is a SUcCess case."
Just to show that C programmers can write C in any programming language, if you prefer a more C-like solution, the following script makes the substitution have the same case, letter by letter, as the original. (It also happens to run about 240% slower than the Perlish solution runs.) If the substitution has more characters than the string being substituted, the case of the last character is used for the rest of the substitution.
- # Original by Nathan Torkington, massaged by Jeffrey Friedl
- #
- sub preserve_case
- {
- my ($old, $new) = @_;
- my $state = 0; # 0 = no change; 1 = lc; 2 = uc
- my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new));
- my $len = $oldlen < $newlen ? $oldlen : $newlen;
- for ($i = 0; $i < $len; $i++) {
- if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) {
- $state = 0;
- } elsif (lc $c eq $c) {
- substr($new, $i, 1) = lc(substr($new, $i, 1));
- $state = 1;
- } else {
- substr($new, $i, 1) = uc(substr($new, $i, 1));
- $state = 2;
- }
- }
- # finish up with any remaining new (for when new is longer than old)
- if ($newlen > $oldlen) {
- if ($state == 1) {
- substr($new, $oldlen) = lc(substr($new, $oldlen));
- } elsif ($state == 2) {
- substr($new, $oldlen) = uc(substr($new, $oldlen));
- }
- }
- return $new;
- }
\w
match national character sets?
Put use locale;
in your script. The \w character class is taken
from the current locale.
See perllocale for details.
/[a-zA-Z]/
?
You can use the POSIX character class syntax /[[:alpha:]]/
documented in perlre.
No matter which locale you are in, the alphabetic characters are
the characters in \w without the digits and the underscore.
As a regex, that looks like /[^\W\d_]/
. Its complement,
the non-alphabetics, is then everything in \W along with
the digits and the underscore, or /[\W\d_]/
.
The Perl parser will expand $variable and @variable references in
regular expressions unless the delimiter is a single quote. Remember,
too, that the right-hand side of a s/// substitution is considered
a double-quoted string (see perlop for more details). Remember
also that any regex special characters will be acted on unless you
precede the substitution with \Q. Here's an example:
- $string = "Placido P. Octopus";
- $regex = "P.";
- $string =~ s/$regex/Polyp/;
- # $string is now "Polypacido P. Octopus"
Because . is special in regular expressions, and can match any
single character, the regex P.
here has matched the <Pl> in the
original string.
To escape the special meaning of ., we use \Q
:
- $string = "Placido P. Octopus";
- $regex = "P.";
- $string =~ s/\Q$regex/Polyp/;
- # $string is now "Placido Polyp Octopus"
The use of \Q
causes the <.> in the regex to be treated as a
regular character, so that P.
matches a P
followed by a dot.
/o really for?
(contributed by brian d foy)
The /o option for regular expressions (documented in perlop and
perlreref) tells Perl to compile the regular expression only once.
This is only useful when the pattern contains a variable. Perls 5.6
and later handle this automatically if the pattern does not change.
Since the match operator m//, the substitution operator s///,
and the regular expression quoting operator qr// are double-quotish
constructs, you can interpolate variables into the pattern. See the
answer to "How can I quote a variable to use in a regex?" for more
details.
This example takes a regular expression from the argument list and prints the lines of input that match it:
Versions of Perl prior to 5.6 would recompile the regular expression
for each iteration, even if $pattern
had not changed. The /o
would prevent this by telling Perl to compile the pattern the first
time, then reuse that for subsequent iterations:
In versions 5.6 and later, Perl won't recompile the regular expression
if the variable hasn't changed, so you probably don't need the /o
option. It doesn't hurt, but it doesn't help either. If you want any
version of Perl to compile the regular expression only once even if
the variable changes (thus, only using its initial value), you still
need the /o.
You can watch Perl's regular expression engine at work to verify for
yourself if Perl is recompiling a regular expression. The use re
'debug'
pragma (comes with Perl 5.005 and later) shows the details.
With Perls before 5.6, you should see re
reporting that its
compiling the regular expression on each iteration. With Perl 5.6 or
later, you should only see re
report that for the first iteration.
While this actually can be done, it's much harder than you'd think. For example, this one-liner
- perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
will work in many but not all cases. You see, it's too simple-minded for certain kinds of C programs, in particular, those with what appear to be comments in quoted strings. For that, you'd need something like this, created by Jeffrey Friedl and later modified by Fred Curtis.
This could, of course, be more legibly written with the /x modifier, adding
whitespace and comments. Here it is expanded, courtesy of Fred Curtis.
- s{
- /\* ## Start of /* ... */ comment
- [^*]*\*+ ## Non-* followed by 1-or-more *'s
- (
- [^/*][^*]*\*+
- )* ## 0-or-more things which don't start with /
- ## but do end with '*'
- / ## End of /* ... */ comment
- | ## OR various things which aren't comments:
- (
- " ## Start of " ... " string
- (
- \\. ## Escaped char
- | ## OR
- [^"\\] ## Non "\
- )*
- " ## End of " ... " string
- | ## OR
- ' ## Start of ' ... ' string
- (
- \\. ## Escaped char
- | ## OR
- [^'\\] ## Non '\
- )*
- ' ## End of ' ... ' string
- | ## OR
- . ## Anything other char
- [^/"'\\]* ## Chars which doesn't start a comment, string or escape
- )
- }{defined $2 ? $2 : ""}gxse;
A slight modification also removes C++ comments, possibly spanning multiple lines using a continuation character:
- s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;
(contributed by brian d foy)
Your first try should probably be the Text::Balanced module, which is in the Perl standard library since Perl 5.8. It has a variety of functions to deal with tricky text. The Regexp::Common module can also help by providing canned patterns you can use.
As of Perl 5.10, you can match balanced text with regular expressions
using recursive patterns. Before Perl 5.10, you had to resort to
various tricks such as using Perl code in (??{})
sequences.
Here's an example using a recursive regular expression. The goal is to capture all of the text within angle brackets, including the text in nested angle brackets. This sample text has two "major" groups: a group with one level of nesting and a group with two levels of nesting. There are five total groups in angle brackets:
The regular expression to match the balanced text uses two new (to Perl 5.10) regular expression features. These are covered in perlre and this example is a modified version of one in that documentation.
First, adding the new possessive +
to any quantifier finds the
longest match and does not backtrack. That's important since you want
to handle any angle brackets through the recursion, not backtracking.
The group [^<>]++ finds one or more non-angle brackets without
backtracking.
Second, the new (?PARNO) refers to the sub-pattern in the
particular capture group given by PARNO
. In the following regex,
the first capture group finds (and remembers) the balanced text, and
you need that same pattern within the first buffer to get past the
nested text. That's the recursive part. The (?1) uses the pattern
in the outer capture group as an independent part of the regex.
Putting it all together, you have:
- #!/usr/local/bin/perl5.10.0
- my $string =<<"HERE";
- I have some <brackets in <nested brackets> > and
- <another group <nested once <nested twice> > >
- and that's it.
- HERE
- my @groups = $string =~ m/
- ( # start of capture group 1
- < # match an opening angle bracket
- (?:
- [^<>]++ # one or more non angle brackets, non backtracking
- |
- (?1) # found < or >, so recurse to capture group 1
- )*
- > # match a closing angle bracket
- ) # end of capture group 1
- /xg;
- $" = "\n\t";
- print "Found:\n\t@groups\n";
The output shows that Perl found the two major groups:
- Found:
- <brackets in <nested brackets> >
- <another group <nested once <nested twice> > >
With a little extra work, you can get the all of the groups in angle brackets even if they are in other angle brackets too. Each time you get a balanced match, remove its outer delimiter (that's the one you just matched so don't match it again) and add it to a queue of strings to process. Keep doing that until you get no matches:
- #!/usr/local/bin/perl5.10.0
- my @queue =<<"HERE";
- I have some <brackets in <nested brackets> > and
- <another group <nested once <nested twice> > >
- and that's it.
- HERE
- my $regex = qr/
- ( # start of bracket 1
- < # match an opening angle bracket
- (?:
- [^<>]++ # one or more non angle brackets, non backtracking
- |
- (?1) # recurse to bracket 1
- )*
- > # match a closing angle bracket
- ) # end of bracket 1
- /x;
- $" = "\n\t";
- while( @queue ) {
- my $string = shift @queue;
- my @groups = $string =~ m/$regex/g;
- print "Found:\n\t@groups\n\n" if @groups;
- unshift @queue, map { s/^<//; s/>$//; $_ } @groups;
- }
The output shows all of the groups. The outermost matches show up first and the nested matches so up later:
- Found:
- <brackets in <nested brackets> >
- <another group <nested once <nested twice> > >
- Found:
- <nested brackets>
- Found:
- <nested once <nested twice> >
- Found:
- <nested twice>
Most people mean that greedy regexes match as much as they can.
Technically speaking, it's actually the quantifiers (?, *
, +
,
{}
) that are greedy rather than the whole pattern; Perl prefers local
greed and immediate gratification to overall greed. To get non-greedy
versions of the same quantifiers, use (??
, *?
, +?, {}?).
An example:
Notice how the second substitution stopped matching as soon as it
encountered "y ". The *?
quantifier effectively tells the regular
expression engine to find a match as quickly as possible and pass
control on to whatever is next in line, as you would if you were
playing hot potato.
Use the split function:
Note that this isn't really a word in the English sense; it's just chunks of consecutive non-whitespace characters.
To work with only alphanumeric sequences (including underscores), you might consider
To do this, you have to parse out each word in the input stream. We'll pretend that by word you mean chunk of alphabetics, hyphens, or apostrophes, rather than the non-whitespace chunk idea of a word given in the previous question:
If you wanted to do the same thing for lines, you wouldn't need a regular expression:
If you want these output in a sorted order, see perlfaq4: "How do I sort a hash (optionally by value instead of key)?".
See the module String::Approx available from CPAN.
(contributed by brian d foy)
If you have Perl 5.10 or later, this is almost trivial. You just smart match against an array of regular expression objects:
The smart match stops when it finds a match, so it doesn't have to try every expression.
Earlier than Perl 5.10, you have a bit of work to do. You want to
avoid compiling a regular expression every time you want to match it.
In this example, perl must recompile the regular expression for every
iteration of the foreach
loop since it has no way to know what
$pattern
will be:
The qr// operator showed up in perl 5.005. It compiles a regular
expression, but doesn't apply it. When you use the pre-compiled
version of the regex, perl does less work. In this example, I inserted
a map to turn each pattern into its pre-compiled form. The rest of
the script is the same, but faster:
In some cases, you may be able to make several patterns into a single regular expression. Beware of situations that require backtracking though.
For more details on regular expression efficiency, see Mastering Regular Expressions by Jeffrey Friedl. He explains how the regular expressions engine works and why some patterns are surprisingly inefficient. Once you understand how perl applies regular expressions, you can tune them for individual situations.
\b
work for me?
(contributed by brian d foy)
Ensure that you know what \b really does: it's the boundary between a word character, \w, and something that isn't a word character. That thing that isn't a word character might be \W, but it can also be the start or end of the string.
It's not (not!) the boundary between whitespace and non-whitespace, and it's not the stuff between words we use to create sentences.
In regex speak, a word boundary (\b) is a "zero width assertion", meaning that it doesn't represent a character in the string, but a condition at a certain position.
For the regular expression, /\bPerl\b/, there has to be a word boundary before the "P" and after the "l". As long as something other than a word character precedes the "P" and succeeds the "l", the pattern will match. These strings match /\bPerl\b/.
- "Perl" # no word char before P or after l
- "Perl " # same as previous (space is not a word char)
- "'Perl'" # the ' char is not a word char
- "Perl's" # no word char before P, non-word char after "l"
These strings do not match /\bPerl\b/.
- "Perl_" # _ is a word char!
- "Perler" # no word char before P, but one after l
You don't have to use \b to match words though. You can look for non-word characters surrounded by word characters. These strings match the pattern /\b'\b/.
- "don't" # the ' char is surrounded by "n" and "t"
- "qep'a'" # the ' char is surrounded by "p" and "a"
These strings do not match /\b'\b/.
- "foo'" # there is no word char after non-word '
You can also use the complement of \b, \B, to specify that there should not be a word boundary.
In the pattern /\Bam\B/, there must be a word character before the "a" and after the "m". These patterns match /\Bam\B/:
- "llama" # "am" surrounded by word chars
- "Samuel" # same
These strings do not match /\Bam\B/
- "Sam" # no word boundary before "a", but one after "m"
- "I am Sam" # "am" surrounded by non-word chars
(contributed by Anno Siegel)
Once Perl sees that you need one of these variables anywhere in the program, it provides them on each and every pattern match. That means that on every pattern match the entire string will be copied, part of it to $`, part to $&, and part to $'. Thus the penalty is most severe with long strings and patterns that match often. Avoid $&, $', and $` if you can, but if you can't, once you've used them at all, use them at will because you've already paid the price. Remember that some algorithms really appreciate them. As of the 5.005 release, the $& variable is no longer "expensive" the way the other two are.
Since Perl 5.6.1 the special variables @- and @+ can functionally replace $`, $& and $'. These arrays contain pointers to the beginning and end of each match (see perlvar for the full story), so they give you essentially the same information, but without the risk of excessive string copying.
Perl 5.10 added three specials, ${^MATCH}
, ${^PREMATCH}
, and
${^POSTMATCH}
to do the same job but without the global performance
penalty. Perl 5.10 only sets these variables if you compile or execute the
regular expression with the /p modifier.
\G
in a regular expression?
You use the \G
anchor to start the next match on the same
string where the last match left off. The regular
expression engine cannot skip over any characters to find
the next match with this anchor, so \G
is similar to the
beginning of string anchor, ^. The \G
anchor is typically
used with the g
flag. It uses the value of pos()
as the position to start the next match. As the match
operator makes successive matches, it updates pos() with the
position of the next character past the last match (or the
first character of the next match, depending on how you like
to look at it). Each string has its own pos() value.
Suppose you want to match all of consecutive pairs of digits
in a string like "1122a44" and stop matching when you
encounter non-digits. You want to match 11
and 22
but
the letter <a> shows up between 22
and 44
and you want
to stop at a
. Simply matching pairs of digits skips over
the a
and still matches 44
.
- $_ = "1122a44";
- my @pairs = m/(\d\d)/g; # qw( 11 22 44 )
If you use the \G
anchor, you force the match after 22
to
start with the a
. The regular expression cannot match
there since it does not find a digit, so the next match
fails and the match operator returns the pairs it already
found.
- $_ = "1122a44";
- my @pairs = m/\G(\d\d)/g; # qw( 11 22 )
You can also use the \G
anchor in scalar context. You
still need the g
flag.
After the match fails at the letter a
, perl resets pos()
and the next match on the same string starts at the beginning.
You can disable pos() resets on fail with the c
flag, documented
in perlop and perlreref. Subsequent matches start where the last
successful match ended (the value of pos()) even if a match on the
same string has failed in the meantime. In this case, the match after
the while()
loop starts at the a
(where the last match stopped),
and since it does not use any anchor it can skip over the a
to find
44
.
Typically you use the \G
anchor with the c
flag
when you want to try a different match if one fails,
such as in a tokenizer. Jeffrey Friedl offers this example
which works in 5.004 or later.
For each line, the PARSER
loop first tries to match a series
of digits followed by a word boundary. This match has to
start at the place the last match left off (or the beginning
of the string on the first match). Since m/ \G( \d+\b
)/gcx
uses the c
flag, if the string does not match that
regular expression, perl does not reset pos() and the next
match starts at the same position to try a different
pattern.
While it's true that Perl's regular expressions resemble the DFAs (deterministic finite automata) of the egrep(1) program, they are in fact implemented as NFAs (non-deterministic finite automata) to allow backtracking and backreferencing. And they aren't POSIX-style either, because those guarantee worst-case behavior for all cases. (It seems that some people prefer guarantees of consistency, even when what's guaranteed is slowness.) See the book "Mastering Regular Expressions" (from O'Reilly) by Jeffrey Friedl for all the details you could ever hope to know on these matters (a full citation appears in perlfaq2).
The problem is that grep builds a return list, regardless of the context. This means you're making Perl go to the trouble of building a list that you then just throw away. If the list is large, you waste both time and space. If your intent is to iterate over the list, then use a for loop for this purpose.
In perls older than 5.8.1, map suffers from this problem as well. But since 5.8.1, this has been fixed, and map is context aware - in void context, no lists are constructed.
Starting from Perl 5.6 Perl has had some level of multibyte character support. Perl 5.8 or later is recommended. Supported multibyte character repertoires include Unicode, and legacy encodings through the Encode module. See perluniintro, perlunicode, and Encode.
If you are stuck with older Perls, you can do Unicode with the Unicode::String module, and character conversions using the Unicode::Map8 and Unicode::Map modules. If you are using Japanese encodings, you might try using the jperl 5.005_03.
Finally, the following set of approaches was offered by Jeffrey Friedl, whose article in issue #5 of The Perl Journal talks about this very matter.
Let's suppose you have some weird Martian encoding where pairs of ASCII uppercase letters encode single Martian letters (i.e. the two bytes "CV" make a single Martian letter, as do the two bytes "SG", "VS", "XX", etc.). Other bytes represent single characters, just like ASCII.
So, the string of Martian "I am CVSGXX!" uses 12 bytes to encode the nine characters 'I', ' ', 'a', 'm', ' ', 'CV', 'SG', 'XX', '!'.
Now, say you want to search for the single character /GX/
. Perl
doesn't know about Martian, so it'll find the two bytes "GX" in the "I
am CVSGXX!" string, even though that character isn't there: it just
looks like it is because "SG" is next to "XX", but there's no real
"GX". This is a big problem.
Here are a few ways, all painful, to deal with it:
Or like this:
Or like this:
Here's another, slightly less painful, way to do it from Benjamin Goldberg, who uses a zero-width negative look-behind assertion.
This succeeds if the "martian" character GX is in the string, and fails otherwise. If you don't like using (?<!), a zero-width negative look-behind assertion, you can replace (?<![A-Z]) with (?:^|[^A-Z]).
It does have the drawback of putting the wrong thing in $-[0] and $+[0], but this usually can be worked around.
(contributed by brian d foy)
We don't have to hard-code patterns into the match operator (or anything else that works with regular expressions). We can put the pattern in a variable for later use.
The match operator is a double quote context, so you can interpolate
your variable just like a double quoted string. In this case, you
read the regular expression as user input and store it in $regex
.
Once you have the pattern in $regex
, you use that variable in the
match operator.
Any regular expression special characters in $regex
are still
special, and the pattern still has to be valid or Perl will complain.
For instance, in this pattern there is an unpaired parenthesis.
- my $regex = "Unmatched ( paren";
- "Two parens to bind them all" =~ m/$regex/;
When Perl compiles the regular expression, it treats the parenthesis as the start of a memory match. When it doesn't find the closing parenthesis, it complains:
- Unmatched ( in regex; marked by <-- HERE in m/Unmatched ( <-- HERE paren/ at script line 3.
You can get around this in several ways depending on our situation.
First, if you don't want any of the characters in the string to be
special, you can escape them with quotemeta before you use the string.
You can also do this directly in the match operator using the \Q
and \E
sequences. The \Q
tells Perl where to start escaping
special characters, and the \E
tells it where to stop (see perlop
for more details).
Alternately, you can use qr//, the regular expression quote operator (see
perlop for more details). It quotes and perhaps compiles the pattern,
and you can apply regular expression flags to the pattern.
You might also want to trap any errors by wrapping an eval block
around the whole thing.
Or...
Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.
Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required.
perlfaq7 - General Perl Language Issues
This section deals with general Perl language issues that don't clearly fit into any of the other sections.
There is no BNF, but you can paw your way through the yacc grammar in perly.y in the source distribution if you're particularly brave. The grammar relies on very smart tokenizing code, so be prepared to venture into toke.c as well.
In the words of Chaim Frenkel: "Perl's grammar can not be reduced to BNF. The work of parsing perl is distributed between yacc, the lexer, smoke and mirrors."
They are type specifiers, as detailed in perldata:
There are a couple of other symbols that you're likely to encounter that aren't really type specifiers:
- <> are used for inputting a record from a filehandle.
- \ takes a reference to something.
Note that <FILE> is neither the type specifier for files
nor the name of the handle. It is the <>
operator applied
to the handle FILE. It reads one line (well, record--see
$/ in perlvar) from the handle FILE in scalar context, or all lines
in list context. When performing open, close, or any other operation
besides <>
on files, or even when talking about the handle, do
not use the brackets. These are correct: eof(FH), seek(FH, 0,
2)
and "copying from STDIN to FILE".
Normally, a bareword doesn't need to be quoted, but in most cases
probably should be (and must be under use strict
). But a hash key
consisting of a simple word and the left-hand
operand to the =>
operator both
count as though they were quoted:
- This is like this
- ------------ ---------------
- $foo{line} $foo{'line'}
- bar => stuff 'bar' => stuff
The final semicolon in a block is optional, as is the final comma in a list. Good style (see perlstyle) says to put them in except for one-liners:
One way is to treat the return values as a list and index into it:
- $dir = (getpwnam($user))[7];
Another way is to use undef as an element on the left-hand-side:
You can also use a list slice to select only the elements that you need:
- ($dev, $ino, $uid, $gid) = ( stat($file) )[0,1,4,5];
If you are running Perl 5.6.0 or better, the use warnings
pragma
allows fine control of what warnings are produced.
See perllexwarn for more details.
- {
- no warnings; # temporarily turn off warnings
- $x = $y + $z; # I know these might be undef
- }
Additionally, you can enable and disable categories of warnings. You turn off the categories you want to ignore and you can still get other categories of warnings. See perllexwarn for the complete details, including the category names and hierarchy.
- {
- no warnings 'uninitialized';
- $x = $y + $z;
- }
If you have an older version of Perl, the $^W
variable (documented
in perlvar) controls runtime warnings for a block:
- {
- local $^W = 0; # temporarily turn off warnings
- $x = $y + $z; # I know these might be undef
- }
Note that like all the punctuation variables, you cannot currently
use my() on $^W
, only local().
An extension is a way of calling compiled C code from Perl. Reading perlxstut is a good place to learn more about extensions.
Actually, they don't. All C operators that Perl copies have the same precedence in Perl as they do in C. The problem is with operators that C doesn't have, especially functions that give a list context to everything on their right, eg. print, chmod, exec, and so on. Such functions are called "list operators" and appear as such in the precedence table in perlop.
A common mistake is to write:
This gets interpreted as:
To avoid this problem, either put in extra parentheses or use the
super low precedence or
operator:
The "English" operators (and
, or
, xor
, and not
)
deliberately have precedence lower than that of list operators for
just such situations as the one above.
Another operator with surprising precedence is exponentiation. It
binds more tightly even than unary minus, making -2**2
produce a
negative four and not a positive one. It is also right-associating, meaning
that 2**3**2
is two raised to the ninth power, not eight squared.
Although it has the same precedence as in C, Perl's ?: operator
produces an lvalue. This assigns $x to either $if_true or $if_false, depending
on the trueness of $maybe:
- ($maybe ? $if_true : $if_false) = $x;
In general, you don't "declare" a structure. Just use a (probably anonymous) hash reference. See perlref and perldsc for details. Here's an example:
- $person = {}; # new anonymous hash
- $person->{AGE} = 24; # set field AGE to 24
- $person->{NAME} = "Nat"; # set field NAME to "Nat"
If you're looking for something a bit more rigorous, try perltoot.
perlnewmod is a good place to start, ignore the bits about uploading to CPAN if you don't want to make your module publicly available.
ExtUtils::ModuleMaker and Module::Starter are also good places to start. Many CPAN authors now use Dist::Zilla to automate as much as possible.
Detailed documentation about modules can be found at: perlmod, perlmodlib, perlmodstyle.
If you need to include C code or C library interfaces use h2xs. h2xs will create the module distribution structure and the initial interface files. perlxs and perlxstut explain the details.
Ask the current maintainer to make you a co-maintainer or transfer the module to you.
If you can not reach the author for some reason contact the PAUSE admins at modules@perl.org who may be able to help, but each case it treated seperatly.
Get a login for the Perl Authors Upload Server (PAUSE) if you don't already have one: http://pause.perl.org
Write to modules@perl.org explaining what you did to contact the current maintainer. The PAUSE admins will also try to reach the maintainer.
Post a public message in a heavily trafficked site announcing your intention to take over the module.
Wait a bit. The PAUSE admins don't want to act too quickly in case the current maintainer is on holiday. If there's no response to private communication or the public post, a PAUSE admin can transfer it to you.
(contributed by brian d foy)
In Perl, a class is just a package, and methods are just subroutines. Perl doesn't get more formal than that and lets you set up the package just the way that you like it (that is, it doesn't set up anything for you).
The Perl documentation has several tutorials that cover class creation, including perlboot (Barnyard Object Oriented Tutorial), perltoot (Tom's Object Oriented Tutorial), perlbot (Bag o' Object Tricks), and perlobj.
You can use the tainted() function of the Scalar::Util module, available from CPAN (or included with Perl since release 5.8.0). See also Laundering and Detecting Tainted Data in perlsec.
Closures are documented in perlref.
Closure is a computer science term with a precise but hard-to-explain meaning. Usually, closures are implemented in Perl as anonymous subroutines with lasting references to lexical variables outside their own scopes. These lexicals magically refer to the variables that were around when the subroutine was defined (deep binding).
Closures are most often used in programming languages where you can have the return value of a function be itself a function, as you can in Perl. Note that some languages provide anonymous functions but are not capable of providing proper closures: the Python language, for example. For more information on closures, check out any textbook on functional programming. Scheme is a language that not only supports but encourages closures.
Here's a classic non-closure function-generating function:
The anonymous subroutine returned by add_function_generator() isn't technically a closure because it refers to no lexicals outside its own scope. Using a closure gives you a function template with some customization slots left out to be filled later.
Contrast this with the following make_adder() function, in which the returned anonymous function contains a reference to a lexical variable outside the scope of that function itself. Such a reference requires that Perl return a proper closure, thus locking in for all time the value that the lexical had when the function was created.
Now $f1->($n)
is always 20 plus whatever $n you pass in, whereas
$f2->($n)
is always 555 plus whatever $n you pass in. The $addpiece
in the closure sticks around.
Closures are often used for less esoteric purposes. For example, when you want to pass in a bit of code into a function:
If the code to execute had been passed in as a string,
'$line = <STDIN>'
, there would have been no way for the
hypothetical timeout() function to access the lexical variable
$line back in its caller's scope.
Another use for a closure is to make a variable private to a named subroutine, e.g. a counter that gets initialized at creation time of the sub and can only be modified from within the sub. This is sometimes used with a BEGIN block in package files to make sure a variable doesn't get meddled with during the lifetime of the package:
This is discussed in more detail in perlsub; see the entry on Persistent Private Variables.
This problem was fixed in perl 5.004_05, so preventing it means upgrading your version of perl. ;)
Variable suicide is when you (temporarily or permanently) lose the value of a variable. It is caused by scoping through my() and local() interacting with either closures or aliased foreach() iterator variables and subroutine arguments. It used to be easy to inadvertently lose a variable's value this way, but now it's much harder. Take this code:
If you are experiencing variable suicide, that my $f
in the subroutine
doesn't pick up a fresh copy of the $f
whose value is 'foo'
. The
output shows that inside the subroutine the value of $f
leaks through
when it shouldn't, as in this output:
- foobar
- foobarbar
- foobarbarbar
- Finally foo
The $f that has "bar" added to it three times should be a new $f
my $f
should create a new lexical variable each time through the loop.
The expected output is:
- foobar
- foobar
- foobar
- Finally foo
You need to pass references to these objects. See Pass by Reference in perlsub for this particular question, and perlref for information on references.
Regular variables and functions are quite easy to pass: just pass in a reference to an existing or anonymous variable or function:
- func( \$some_scalar );
- func( \@some_array );
- func( [ 1 .. 10 ] );
- func( \%some_hash );
- func( { this => 10, that => 20 } );
- func( \&some_func );
- func( sub { $_[0] ** $_[1] } );
As of Perl 5.6, you can represent filehandles with scalar variables which you treat as any other scalar.
Before Perl 5.6, you had to use the *FH
or \*FH
notations.
These are "typeglobs"--see Typeglobs and Filehandles in perldata
and especially Pass by Reference in perlsub for more information.
Here's an example of how to pass in a string and a regular expression
for it to match against. You construct the pattern with the qr//
operator:
To pass an object method into a subroutine, you can do this:
- call_a_lot(10, $some_obj, "methname")
- sub call_a_lot {
- my ($count, $widget, $trick) = @_;
- for (my $i = 0; $i < $count; $i++) {
- $widget->$trick();
- }
- }
Or, you can use a closure to bundle up the object, its method call, and arguments:
You could also investigate the can() method in the UNIVERSAL class (part of the standard perl distribution).
(contributed by brian d foy)
In Perl 5.10, declare the variable with state. The state
declaration creates the lexical variable that persists between calls
to the subroutine:
You can fake a static variable by using a lexical variable which goes
out of scope. In this example, you define the subroutine counter
, and
it uses the lexical variable $count
. Since you wrap this in a BEGIN
block, $count
is defined at compile-time, but also goes out of
scope at the end of the BEGIN block. The BEGIN block also ensures that
the subroutine and the value it uses is defined at compile-time so the
subroutine is ready to use just like any other subroutine, and you can
put this code in the same place as other subroutines in the program
text (i.e. at the end of the code, typically). The subroutine
counter
still has a reference to the data, and is the only way you
can access the value (and each time you do, you increment the value).
The data in chunk of memory defined by $count
is private to
counter
.
In the previous example, you created a function-private variable
because only one function remembered its reference. You could define
multiple functions while the variable is in scope, and each function
can share the "private" variable. It's not really "static" because you
can access it outside the function while the lexical variable is in
scope, and even create references to it. In this example,
increment_count
and return_count
share the variable. One
function adds to the value and the other simply returns the value.
They can both access $count
, and since it has gone out of scope,
there is no other way to access it.
- BEGIN {
- my $count = 1;
- sub increment_count { $count++ }
- sub return_count { $count }
- }
To declare a file-private variable, you still use a lexical variable. A file is also a scope, so a lexical variable defined in the file cannot be seen from any other file.
See Persistent Private Variables in perlsub for more information. The discussion of closures in perlref may help you even though we did not use anonymous subroutines in this answer. See Persistent Private Variables in perlsub for details.
local($x) saves away the old value of the global variable $x
and assigns a new value for the duration of the subroutine which is
visible in other functions called from that subroutine. This is done
at run-time, so is called dynamic scoping. local() always affects global
variables, also called package variables or dynamic variables.
my($x) creates a new variable that is only visible in the current
subroutine. This is done at compile-time, so it is called lexical or
static scoping. my() always affects private variables, also called
lexical variables or (improperly) static(ly scoped) variables.
For instance:
- sub visible {
- print "var has value $var\n";
- }
- sub dynamic {
- local $var = 'local'; # new temporary value for the still-global
- visible(); # variable called $var
- }
- sub lexical {
- my $var = 'private'; # new private variable, $var
- visible(); # (invisible outside of sub scope)
- }
- $var = 'global';
- visible(); # prints global
- dynamic(); # prints local
- lexical(); # prints global
Notice how at no point does the value "private" get printed. That's because $var only has that value within the block of the lexical() function, and it is hidden from the called subroutine.
In summary, local() doesn't make what you think of as private, local variables. It gives a global variable a temporary value. my() is what you're looking for if you want private variables.
See Private Variables via my() in perlsub and Temporary Values via local() in perlsub for excruciating details.
If you know your package, you can just mention it explicitly, as in $Some_Pack::var. Note that the notation $::var is not the dynamic $var in the current package, but rather the one in the "main" package, as though you had written $main::var.
Alternatively you can use the compiler directive our() to bring a dynamic variable into the current lexical scope.
In deep binding, lexical variables mentioned in anonymous subroutines are the same ones that were in scope when the subroutine was created. In shallow binding, they are whichever variables with the same names happen to be in scope when the subroutine is called. Perl always uses deep binding of lexical variables (i.e., those created with my()). However, dynamic variables (aka global, local, or package variables) are effectively shallowly bound. Consider this just one more reason not to use them. See the answer to What's a closure?.
my() and local() give list context to the right hand side
of =
. The <$fh> read operation, like so many of Perl's
functions and operators, can tell which context it was called in and
behaves appropriately. In general, the scalar() function can help.
This function does nothing to the data itself (contrary to popular myth)
but rather tells its argument to behave in whatever its scalar fashion is.
If that function doesn't have a defined scalar behavior, this of course
doesn't help you (such as with sort()).
To enforce scalar context in this particular case, however, you need merely omit the parentheses:
You should probably be using lexical variables anyway, although the issue is the same here:
Why do you want to do that? :-)
If you want to override a predefined function, such as open(), then you'll have to import the new definition from a different module. See Overriding Built-in Functions in perlsub.
If you want to overload a Perl operator, such as +
or **
,
then you'll want to use the use overload
pragma, documented
in overload.
If you're talking about obscuring method calls in parent classes, see Overridden Methods in perltoot.
(contributed by brian d foy)
Calling a subroutine as &foo
with no trailing parentheses ignores
the prototype of foo
and passes it the current value of the argument
list, @_
. Here's an example; the bar
subroutine calls &foo
,
which prints its arguments list:
- sub bar { &foo }
- sub foo { print "Args in foo are: @_\n" }
- bar( qw( a b c ) );
When you call bar
with arguments, you see that foo
got the same @_
:
- Args in foo are: a b c
Calling the subroutine with trailing parentheses, with or without arguments,
does not use the current @_
and respects the subroutine prototype. Changing
the example to put parentheses after the call to foo
changes the program:
- sub bar { &foo() }
- sub foo { print "Args in foo are: @_\n" }
- bar( qw( a b c ) );
Now the output shows that foo
doesn't get the @_
from its caller.
- Args in foo are:
The main use of the @_
pass-through feature is to write subroutines
whose main job it is to call other subroutines for you. For further
details, see perlsub.
In Perl 5.10, use the given-when
construct described in perlsyn:
If one wants to use pure Perl and to be compatible with Perl versions
prior to 5.10, the general answer is to use if-elsif-else:
Here's a simple example of a switch based on pattern matching, lined up in a way to make it look more like a switch statement. We'll do a multiway conditional based on the type of reference stored in $whatchamacallit:
- SWITCH: for (ref $whatchamacallit) {
- /^$/ && die "not a reference";
- /SCALAR/ && do {
- print_scalar($$ref);
- last SWITCH;
- };
- /ARRAY/ && do {
- print_array(@$ref);
- last SWITCH;
- };
- /HASH/ && do {
- print_hash(%$ref);
- last SWITCH;
- };
- /CODE/ && do {
- warn "can't print function ref";
- last SWITCH;
- };
- # DEFAULT
- warn "User defined type skipped";
- }
See perlsyn for other examples in this style.
Sometimes you should change the positions of the constant and the variable.
For example, let's say you wanted to test which of many answers you were
given, but in a case-insensitive way that also allows abbreviations.
You can use the following technique if the strings all start with
different characters or if you want to arrange the matches so that
one takes precedence over another, as "SEND"
has precedence over
"STOP"
here:
- chomp($answer = <>);
- if ("SEND" =~ /^\Q$answer/i) { print "Action is send\n" }
- elsif ("STOP" =~ /^\Q$answer/i) { print "Action is stop\n" }
- elsif ("ABORT" =~ /^\Q$answer/i) { print "Action is abort\n" }
- elsif ("LIST" =~ /^\Q$answer/i) { print "Action is list\n" }
- elsif ("EDIT" =~ /^\Q$answer/i) { print "Action is edit\n" }
A totally different approach is to create a hash of function references.
Starting from Perl 5.8, a source filter module, Switch
, can also be
used to get switch and case. Its use is now discouraged, because it's
not fully compatible with the native switch of Perl 5.10, and because,
as it's implemented as a source filter, it doesn't always work as intended
when complex syntax is involved.
The AUTOLOAD method, discussed in Autoloading in perlsub and AUTOLOAD: Proxy Methods in perltoot, lets you capture calls to undefined functions and methods.
When it comes to undefined variables that would trigger a warning
under use warnings
, you can promote the warning to an error.
- use warnings FATAL => qw(uninitialized);
Some possible reasons: your inheritance is getting confused, you've
misspelled the method name, or the object is of the wrong type. Check
out perltoot for details about any of the above cases. You may
also use print ref($object)
to find out the class $object
was
blessed into.
Another possible reason for problems is that you've used the
indirect object syntax (eg, find Guru "Samy"
) on a class name
before Perl has seen that such a package exists. It's wisest to make
sure your packages are all defined before you start using them, which
will be taken care of if you use the use statement instead of
require. If not, make sure to use arrow notation (eg.,
Guru->find("Samy")
) instead. Object notation is explained in
perlobj.
Make sure to read about creating modules in perlmod and the perils of indirect objects in Method Invocation in perlobj.
(contributed by brian d foy)
To find the package you are currently in, use the special literal
__PACKAGE__
, as documented in perldata. You can only use the
special literals as separate tokens, so you can't interpolate them
into strings like you can with variables:
If you want to find the package calling your code, perhaps to give better
diagnostics as Carp does, use the caller built-in:
By default, your program starts in package main
, so you will
always be in some package.
This is different from finding out the package an object is blessed
into, which might not be the current package. For that, use blessed
from Scalar::Util, part of the Standard Library since Perl 5.8:
Most of the time, you shouldn't care what package an object is blessed into, however, as long as it claims to inherit from that class:
And, with Perl 5.10 and later, you don't have to check for an
inheritance to see if the object can handle a role. For that, you can
use DOES
, which comes from UNIVERSAL
:
You can safely replace isa
with DOES
(although the converse is not true).
(contributed by brian d foy)
The quick-and-dirty way to comment out more than one line of Perl is
to surround those lines with Pod directives. You have to put these
directives at the beginning of the line and somewhere where Perl
expects a new statement (so not in the middle of statements like the #
comments). You end the comment with =cut
, ending the Pod section:
- =pod
- my $object = NotGonnaHappen->new();
- ignored_sub();
- $wont_be_assigned = 37;
- =cut
The quick-and-dirty method only works well when you don't plan to leave the commented code in the source. If a Pod parser comes along, your multiline comment is going to show up in the Pod translation. A better way hides it from Pod parsers as well.
The =begin
directive can mark a section for a particular purpose.
If the Pod parser doesn't want to handle it, it just ignores it. Label
the comments with comment
. End the comment using =end
with the
same label. You still need the =cut
to go back to Perl code from
the Pod comment:
- =begin comment
- my $object = NotGonnaHappen->new();
- ignored_sub();
- $wont_be_assigned = 37;
- =end comment
- =cut
For more information on Pod, check out perlpod and perlpodspec.
Use this code, provided by Mark-Jason Dominus:
- sub scrub_package {
- no strict 'refs';
- my $pack = shift;
- die "Shouldn't delete main package"
- if $pack eq "" || $pack eq "main";
- my $stash = *{$pack . '::'}{HASH};
- my $name;
- foreach $name (keys %$stash) {
- my $fullname = $pack . '::' . $name;
- # Get rid of everything with that name.
- undef $$fullname;
- undef @$fullname;
- undef %$fullname;
- undef &$fullname;
- undef *$fullname;
- }
- }
Or, if you're using a recent release of Perl, you can just use the Symbol::delete_package() function instead.
Beginners often think they want to have a variable contain the name of a variable.
- $fred = 23;
- $varname = "fred";
- ++$$varname; # $fred now 24
This works sometimes, but it is a very bad idea for two reasons.
The first reason is that this technique only works on global variables. That means that if $fred is a lexical variable created with my() in the above example, the code wouldn't work at all: you'd accidentally access the global and skip right over the private lexical altogether. Global variables are bad because they can easily collide accidentally and in general make for non-scalable and confusing code.
Symbolic references are forbidden under the use strict
pragma.
They are not true references and consequently are not reference-counted
or garbage-collected.
The other reason why using a variable to hold the name of another
variable is a bad idea is that the question often stems from a lack of
understanding of Perl data structures, particularly hashes. By using
symbolic references, you are just using the package's symbol-table hash
(like %main::
) instead of a user-defined hash. The solution is to
use your own hash or a real reference instead.
- $USER_VARS{"fred"} = 23;
- my $varname = "fred";
- $USER_VARS{$varname}++; # not $$varname++
There we're using the %USER_VARS hash instead of symbolic references. Sometimes this comes up in reading strings from the user with variable references and wanting to expand them to the values of your perl program's variables. This is also a bad idea because it conflates the program-addressable namespace and the user-addressable one. Instead of reading a string and expanding it to the actual contents of your program's own variables:
- $str = 'this has a $fred and $barney in it';
- $str =~ s/(\$\w+)/$1/eeg; # need double eval
it would be better to keep a hash around like %USER_VARS and have variable references actually refer to entries in that hash:
- $str =~ s/\$(\w+)/$USER_VARS{$1}/g; # no /e here at all
That's faster, cleaner, and safer than the previous approach. Of course, you don't need to use a dollar sign. You could use your own scheme to make it less confusing, like bracketed percent symbols, etc.
- $str = 'this has a %fred% and %barney% in it';
- $str =~ s/%(\w+)%/$USER_VARS{$1}/g; # no /e here at all
Another reason that folks sometimes think they want a variable to contain the name of a variable is that they don't know how to build proper data structures using hashes. For example, let's say they wanted two hashes in their program: %fred and %barney, and that they wanted to use another scalar variable to refer to those by name.
- $name = "fred";
- $$name{WIFE} = "wilma"; # set %fred
- $name = "barney";
- $$name{WIFE} = "betty"; # set %barney
This is still a symbolic reference, and is still saddled with the problems enumerated above. It would be far better to write:
- $folks{"fred"}{WIFE} = "wilma";
- $folks{"barney"}{WIFE} = "betty";
And just use a multilevel hash to start with.
The only times that you absolutely must use symbolic references are when you really must refer to the symbol table. This may be because it's something that one can't take a real reference to, such as a format name. Doing so may also be important for method calls, since these always go through the symbol table for resolution.
In those cases, you would turn off strict 'refs'
temporarily so you
can play around with the symbol table. For example:
All those functions (red(), blue(), green(), etc.) appear to be separate, but the real code in the closure actually was compiled only once.
So, sometimes you might want to use symbolic references to manipulate the symbol table directly. This doesn't matter for formats, handles, and subroutines, because they are always global--you can't use my() on them. For scalars, arrays, and hashes, though--and usually for subroutines-- you probably only want to use hard references.
(contributed by brian d foy)
The "bad interpreter" message comes from the shell, not perl. The actual message may vary depending on your platform, shell, and locale settings.
If you see "bad interpreter - no such file or directory", the first line in your perl script (the "shebang" line) does not contain the right path to perl (or any other program capable of running scripts). Sometimes this happens when you move the script from one machine to another and each machine has a different path to perl--/usr/bin/perl versus /usr/local/bin/perl for instance. It may also indicate that the source machine has CRLF line terminators and the destination machine has LF only: the shell tries to find /usr/bin/perl<CR>, but can't.
If you see "bad interpreter: Permission denied", you need to make your script executable.
In either case, you should still be able to run the scripts with perl explicitly:
- % perl script.pl
If you get a message like "perl: command not found", perl is not in your PATH, which might also mean that the location of perl is not where you expect it so you need to adjust your shebang line.
Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.
Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required.
perlfaq8 - System Interaction
This section of the Perl FAQ covers questions involving operating system interaction. Topics include interprocess communication (IPC), control over the user-interface (keyboard, screen and pointing devices), and most anything else not related to data manipulation.
Read the FAQs and documentation specific to the port of perl to your operating system (eg, perlvms, perlplan9, ...). These should contain more detailed information on the vagaries of your perl.
The $^O
variable ($OSNAME
if you use English
) contains an
indication of the name of the operating system (not its release
number) that your perl binary was built for.
(contributed by brian d foy)
The exec function's job is to turn your process into another
command and never to return. If that's not what you want to do, don't
use exec. :)
If you want to run an external command and still keep your Perl process
going, look at a piped open, fork, or system.
How you access/control keyboards, screens, and pointing devices ("mice") is system-dependent. Try the following modules:
- Term::Cap Standard perl distribution
- Term::ReadKey CPAN
- Term::ReadLine::Gnu CPAN
- Term::ReadLine::Perl CPAN
- Term::Screen CPAN
- Term::Cap Standard perl distribution
- Curses CPAN
- Term::ANSIColor CPAN
- Tk CPAN
- Wx CPAN
- Gtk2 CPAN
- Qt4 kdebindings4 package
Some of these specific cases are shown as examples in other answers in this section of the perlfaq.
In general, you don't, because you don't know whether the recipient has a color-aware display device. If you know that they have an ANSI terminal that understands color, you can use the Term::ANSIColor module from CPAN:
Or like this:
Controlling input buffering is a remarkably system-dependent matter. On many systems, you can just use the stty command as shown in getc, but as you see, that's already getting you into portability snags.
The Term::ReadKey module from CPAN offers an easy-to-use interface that should be more efficient than shelling out to stty for each key. It even includes limited support for Windows.
- use Term::ReadKey;
- ReadMode('cbreak');
- $key = ReadKey(0);
- ReadMode('normal');
However, using the code requires that you have a working C compiler and can use it to build and install a CPAN module. Here's a solution using the standard POSIX module, which is already on your system (assuming your system supports POSIX).
- use HotKey;
- $key = readkey();
And here's the HotKey
module, which hides the somewhat mystifying calls
to manipulate the POSIX termios structures.
- # HotKey.pm
- package HotKey;
- use strict;
- use warnings;
- use parent 'Exporter';
- our @EXPORT = qw(cbreak cooked readkey);
- use POSIX qw(:termios_h);
- my ($term, $oterm, $echo, $noecho, $fd_stdin);
- $fd_stdin = fileno(STDIN);
- $term = POSIX::Termios->new();
- $term->getattr($fd_stdin);
- $oterm = $term->getlflag();
- $echo = ECHO | ECHOK | ICANON;
- $noecho = $oterm & ~$echo;
- sub cbreak {
- $term->setlflag($noecho); # ok, so i don't want echo either
- $term->setcc(VTIME, 1);
- $term->setattr($fd_stdin, TCSANOW);
- }
- sub cooked {
- $term->setlflag($oterm);
- $term->setcc(VTIME, 0);
- $term->setattr($fd_stdin, TCSANOW);
- }
- sub readkey {
- my $key = '';
- cbreak();
- sysread(STDIN, $key, 1);
- cooked();
- return $key;
- }
- END { cooked() }
- 1;
The easiest way to do this is to read a key in nonblocking mode with the Term::ReadKey module from CPAN, passing it an argument of -1 to indicate not to block:
(contributed by brian d foy)
To clear the screen, you just have to print the special sequence that tells the terminal to clear the screen. Once you have that sequence, output it when you want to clear the screen.
You can use the Term::ANSIScreen module to get the special
sequence. Import the cls
function (or the :screen
tag):
The Term::Cap module can also get the special sequence if you want
to deal with the low-level details of terminal control. The Tputs
method returns the string for the given capability:
On Windows, you can use the Win32::Console module. After creating
an object for the output filehandle you want to affect, call the
Cls
method:
If you have a command-line program that does the job, you can call it in backticks to capture whatever it outputs so you can use it later:
If you have Term::ReadKey module installed from CPAN, you can use it to fetch the width and height in characters and in pixels:
This is more portable than the raw ioctl, but not as
illustrative:
- require 'sys/ioctl.ph';
- die "no TIOCGWINSZ " unless defined &TIOCGWINSZ;
- open(my $tty_fh, "+</dev/tty") or die "No tty: $!";
- unless (ioctl($tty_fh, &TIOCGWINSZ, $winsize='')) {
- die sprintf "$0: ioctl TIOCGWINSZ (%08x: $!)\n", &TIOCGWINSZ;
- }
- my ($row, $col, $xpixel, $ypixel) = unpack('S4', $winsize);
- print "(row,col) = ($row,$col)";
- print " (xpixel,ypixel) = ($xpixel,$ypixel)" if $xpixel || $ypixel;
- print "\n";
(This question has nothing to do with the web. See a different FAQ for that.)
There's an example of this in crypt). First, you put the
terminal into "no echo" mode, then just read the password normally.
You may do this with an old-style ioctl() function, POSIX terminal
control (see POSIX or its documentation the Camel Book), or a call
to the stty program, with varying degrees of portability.
You can also do this for most systems using the Term::ReadKey module from CPAN, which is easier to use and in theory more portable.
This depends on which operating system your program is running on. In
the case of Unix, the serial ports will be accessible through files in
/dev
; on other systems, device names will doubtless differ.
Several problem areas common to all device interaction are the
following:
Your system may use lockfiles to control multiple access. Make sure you follow the correct protocol. Unpredictable behavior can result from multiple processes reading from one device.
If you expect to use both read and write operations on the device,
you'll have to open it for update (see open for
details). You may wish to open it without running the risk of
blocking by using sysopen() and O_RDWR|O_NDELAY|O_NOCTTY
from the
Fcntl module (part of the standard perl distribution). See
sysopen for more on this approach.
Some devices will be expecting a "\r" at the end of each line rather than a "\n". In some ports of perl, "\r" and "\n" are different from their usual (Unix) ASCII values of "\015" and "\012". You may have to give the numeric values you want directly, using octal ("\015"), hex ("0x0D"), or as a control-character specification ("\cM").
Even though with normal text files a "\n" will do the trick, there is still no unified scheme for terminating a line that is portable between Unix, DOS/Win, and Macintosh, except to terminate ALL line ends with "\015\012", and strip what you don't need from the output. This applies especially to socket I/O and autoflushing, discussed next.
If you expect characters to get to your device when you print() them,
you'll want to autoflush that filehandle. You can use select()
and the $|
variable to control autoflushing (see perlvar/$
and select, or perlfaq5, "How do I flush/unbuffer an
output filehandle? Why must I do this?"):
You'll also see code that does this without a temporary variable, as in
Or if you don't mind pulling in a few thousand lines
of code just because you're afraid of a little $|
variable:
- use IO::Handle;
- $dev_fh->autoflush(1);
As mentioned in the previous item, this still doesn't work when using socket I/O between Unix and Macintosh. You'll need to hard code your line terminators, in that case.
If you are doing a blocking read() or sysread(), you'll have to
arrange for an alarm handler to provide a timeout (see
alarm). If you have a non-blocking open, you'll likely
have a non-blocking read, which means you may have to use a 4-arg
select() to determine whether I/O is ready on that device (see
select.
While trying to read from his caller-id box, the notorious Jamie
Zawinski <jwz@netscape.com>
, after much gnashing of teeth and
fighting with sysread, sysopen, POSIX's tcgetattr
business,
and various other functions that go bump in the night, finally came up
with this:
- sub open_modem {
- use IPC::Open2;
- my $stty = `/bin/stty -g`;
- open2( \*MODEM_IN, \*MODEM_OUT, "cu -l$modem_device -s2400 2>&1");
- # starting cu hoses /dev/tty's stty settings, even when it has
- # been opened on a pipe...
- system("/bin/stty $stty");
- $_ = <MODEM_IN>;
- chomp;
- if ( !m/^Connected/ ) {
- print STDERR "$0: cu printed `$_' instead of `Connected'\n";
- }
- }
You spend lots and lots of money on dedicated hardware, but this is bound to get you talked about.
Seriously, you can't if they are Unix password files--the Unix password system employs one-way encryption. It's more like hashing than encryption. The best you can do is check whether something else hashes to the same string. You can't turn a hash back into the original string. Programs like Crack can forcibly (and intelligently) try to guess passwords, but don't (can't) guarantee quick success.
If you're worried about users selecting bad passwords, you should proactively check when they try to change their password (by modifying passwd(1), for example).
(contributed by brian d foy)
There's not a single way to run code in the background so you don't have to wait for it to finish before your program moves on to other tasks. Process management depends on your particular operating system, and many of the techniques are covered in perlipc.
Several CPAN modules may be able to help, including IPC::Open2 or IPC::Open3, IPC::Run, Parallel::Jobs, Parallel::ForkManager, POE, Proc::Background, and Win32::Process. There are many other modules you might use, so check those namespaces for other options too.
If you are on a Unix-like system, you might be able to get away with a
system call where you put an &
on the end of the command:
- system("cmd &")
You can also try using fork, as described in perlfunc (although
this is the same thing that many of the modules will do for you).
Both the main process and the backgrounded one (the "child" process)
share the same STDIN, STDOUT and STDERR filehandles. If both try to
access them at once, strange things can happen. You may want to close
or reopen these for the child. You can get around this with
opening a pipe (see open) but on some systems this
means that the child process cannot outlive the parent.
You'll have to catch the SIGCHLD signal, and possibly SIGPIPE too.
SIGCHLD is sent when the backgrounded process finishes. SIGPIPE is
sent when you write to a filehandle whose child process has closed (an
untrapped SIGPIPE can cause your program to silently die). This is
not an issue with system("cmd&").
You have to be prepared to "reap" the child process when it finishes.
You can also use a double fork. You immediately wait() for your
first child, and the init daemon will wait() for your grandchild once
it exits.
See Signals in perlipc for other examples of code to do this.
Zombies are not an issue with system("prog &")
.
You don't actually "trap" a control character. Instead, that character generates a signal which is sent to your terminal's currently foregrounded process group, which you then trap in your process. Signals are documented in Signals in perlipc and the section on "Signals" in the Camel.
You can set the values of the %SIG
hash to be the functions you want
to handle the signal. After perl catches the signal, it looks in %SIG
for a key with the same name as the signal, then calls the subroutine
value for that key.
Perl versions before 5.8 had in its C source code signal handlers which
would catch the signal and possibly run a Perl function that you had set
in %SIG
. This violated the rules of signal handling at that level
causing perl to dump core. Since version 5.8.0, perl looks at %SIG
after the signal has been caught, rather than while it is being caught.
Previous versions of this answer were incorrect.
If perl was installed correctly and your shadow library was written
properly, the getpw*()
functions described in perlfunc should in
theory provide (read-only) access to entries in the shadow password
file. To change the file, make a new shadow password file (the format
varies from system to system--see passwd(1) for specifics) and use
pwd_mkdb(8)
to install it (see pwd_mkdb(8) for more details).
Assuming you're running under sufficient permissions, you should be
able to set the system-wide date and time by running the date(1)
program. (There is no way to set the time and date on a per-process
basis.) This mechanism will work for Unix, MS-DOS, Windows, and NT;
the VMS equivalent is set time
.
However, if all you want to do is change your time zone, you can probably get away with setting an environment variable:
- $ENV{TZ} = "MST7MDT"; # Unixish
- $ENV{'SYS$TIMEZONE_DIFFERENTIAL'}="-5" # vms
- system('trn', 'comp.lang.perl.misc');
If you want finer granularity than the 1 second that the sleep()
function provides, the easiest way is to use the select() function as
documented in select. Try the Time::HiRes and
the BSD::Itimer modules (available from CPAN, and starting from
Perl 5.8 Time::HiRes is part of the standard distribution).
(contributed by brian d foy)
The Time::HiRes module (part of the standard distribution as of
Perl 5.8) measures time with the gettimeofday()
system call, which
returns the time in microseconds since the epoch. If you can't install
Time::HiRes for older Perls and you are on a Unixish system, you
may be able to call gettimeofday(2)
directly. See
syscall.
You can use the END
block to simulate atexit()
. Each package's
END
block is called when the program or thread ends. See the perlmod
manpage for more details about END
blocks.
For example, you can use this to make sure your filter program managed to finish its output without filling up the disk:
The END
block isn't called when untrapped signals kill the program,
though, so if you use END
blocks you should also use
- use sigtrap qw(die normal-signals);
Perl's exception-handling mechanism is its eval() operator. You
can use eval() as setjmp
and die() as longjmp
. For
details of this, see the section on signals, especially the time-out
handler for a blocking flock() in Signals in perlipc or the
section on "Signals" in Programming Perl.
If exception handling is all you're interested in, use one of the many CPAN modules that handle exceptions, such as Try::Tiny.
If you want the atexit()
syntax (and an rmexit()
as well), try the
AtExit
module available from CPAN.
Some Sys-V based systems, notably Solaris 2.X, redefined some of the standard socket constants. Since these were constant across all architectures, they were often hardwired into perl code. The proper way to deal with this is to "use Socket" to get the correct values.
Note that even though SunOS and Solaris are binary compatible, these values are different. Go figure.
In most cases, you write an external module to do it--see the answer
to "Where can I learn about linking C with Perl? [h2xs, xsubpp]".
However, if the function is a system call, and your system supports
syscall(), you can use the syscall function (documented in
perlfunc).
Remember to check the modules that came with your distribution, and CPAN as well--someone may already have written a module to do it. On Windows, try Win32::API. On Macs, try Mac::Carbon. If no module has an interface to the C function, you can inline a bit of C in your Perl source with Inline::C.
Historically, these would be generated by the h2ph tool, part of the
standard perl distribution. This program converts cpp(1)
directives
in C header files to files containing subroutine definitions, like
SYS_getitimer()
, which you can use as arguments to your functions.
It doesn't work perfectly, but it usually gets most of the job done.
Simple files like errno.h, syscall.h, and socket.h were fine,
but the hard ones like ioctl.h nearly always need to be hand-edited.
Here's how to install the *.ph files:
- 1. Become the super-user
- 2. cd /usr/include
- 3. h2ph *.h */*.h
If your system supports dynamic loading, for reasons of portability and sanity you probably ought to use h2xs (also part of the standard perl distribution). This tool converts C header files to Perl extensions. See perlxstut for how to get started with h2xs.
If your system doesn't support dynamic loading, you still probably ought to use h2xs. See perlxstut and ExtUtils::MakeMaker for more information (in brief, just use make perl instead of a plain make to rebuild perl with a new static extension).
Some operating systems have bugs in the kernel that make setuid scripts inherently insecure. Perl gives you a number of options (described in perlsec) to work around such systems.
The IPC::Open2 module (part of the standard perl distribution) is
an easy-to-use approach that internally uses pipe(), fork(), and
exec() to do the job. Make sure you read the deadlock warnings in
its documentation, though (see IPC::Open2). See
Bidirectional Communication with Another Process in perlipc and
Bidirectional Communication with Yourself in perlipc
You may also use the IPC::Open3 module (part of the standard perl distribution), but be warned that it has a different order of arguments from IPC::Open2 (see IPC::Open3).
You're confusing the purpose of system() and backticks (``). system()
runs a command and returns exit status information (as a 16 bit value:
the low 7 bits are the signal the process died from, if any, and
the high 8 bits are the actual exit value). Backticks (``) run a
command and return what it sent to STDOUT.
There are three basic ways of running external commands:
With system(), both STDOUT and STDERR will go the same place as the
script's STDOUT and STDERR, unless the system() command redirects them.
Backticks and open() read only the STDOUT of your command.
You can also use the open3()
function from IPC::Open3. Benjamin
Goldberg provides some sample code:
To capture a program's STDOUT, but discard its STDERR:
To capture a program's STDERR, but discard its STDOUT:
To capture a program's STDERR, and let its STDOUT go to our own STDERR:
To read both a command's STDOUT and its STDERR separately, you can redirect them to temp files, let the command run, then read the temp files:
- use IPC::Open3;
- use Symbol qw(gensym);
- use IO::File;
- local *CATCHOUT = IO::File->new_tmpfile;
- local *CATCHERR = IO::File->new_tmpfile;
- my $pid = open3(gensym, ">&CATCHOUT", ">&CATCHERR", "cmd");
- waitpid($pid, 0);
- seek $_, 0, 0 for \*CATCHOUT, \*CATCHERR;
- while( <CATCHOUT> ) {}
- while( <CATCHERR> ) {}
But there's no real need for both to be tempfiles... the following should work just as well, without deadlocking:
And it'll be faster, too, since we can begin processing the program's stdout immediately, rather than waiting for the program to finish.
With any of these, you can change file descriptors before the call:
or you can use Bourne shell file-descriptor redirection:
- $output = `$cmd 2>some_file`;
- open (PIPE, "cmd 2>some_file |");
You can also use file-descriptor redirection to make STDERR a duplicate of STDOUT:
- $output = `$cmd 2>&1`;
- open (PIPE, "cmd 2>&1 |");
Note that you cannot simply open STDERR to be a dup of STDOUT in your Perl program and avoid calling the shell to do the redirection. This doesn't work:
- open(STDERR, ">&STDOUT");
- $alloutput = `cmd args`; # stderr still escapes
This fails because the open() makes STDERR go to where STDOUT was
going at the time of the open(). The backticks then make STDOUT go to
a string, but don't change STDERR (which still goes to the old
STDOUT).
Note that you must use Bourne shell (sh(1)
) redirection syntax in
backticks, not csh(1)
! Details on why Perl's system() and backtick
and pipe opens all use the Bourne shell are in the
versus/csh.whynot article in the "Far More Than You Ever Wanted To
Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz . To
capture a command's STDERR and STDOUT together:
- $output = `cmd 2>&1`; # either with backticks
- $pid = open(PH, "cmd 2>&1 |"); # or with an open pipe
- while (<PH>) { } # plus a read
To capture a command's STDOUT but discard its STDERR:
- $output = `cmd 2>/dev/null`; # either with backticks
- $pid = open(PH, "cmd 2>/dev/null |"); # or with an open pipe
- while (<PH>) { } # plus a read
To capture a command's STDERR but discard its STDOUT:
- $output = `cmd 2>&1 1>/dev/null`; # either with backticks
- $pid = open(PH, "cmd 2>&1 1>/dev/null |"); # or with an open pipe
- while (<PH>) { } # plus a read
To exchange a command's STDOUT and STDERR in order to capture the STDERR but leave its STDOUT to come out our old STDERR:
- $output = `cmd 3>&1 1>&2 2>&3 3>&-`; # either with backticks
- $pid = open(PH, "cmd 3>&1 1>&2 2>&3 3>&-|");# or with an open pipe
- while (<PH>) { } # plus a read
To read both a command's STDOUT and its STDERR separately, it's easiest to redirect them separately to files, and then read from those files when the program is done:
- system("program args 1>program.stdout 2>program.stderr");
Ordering is important in all these examples. That's because the shell processes file descriptor redirections in strictly left to right order.
The first command sends both standard out and standard error to the temporary file. The second command sends only the old standard output there, and the old standard error shows up on the old standard out.
If the second argument to a piped open() contains shell
metacharacters, perl fork()s, then exec()s a shell to decode the
metacharacters and eventually run the desired program. If the program
couldn't be run, it's the shell that gets the message, not Perl. All
your Perl program can find out is whether the shell itself could be
successfully started. You can still capture the shell's STDERR and
check it for error messages. See How can I capture STDERR from an external command? elsewhere in this document, or use the
IPC::Open3 module.
If there are no shell metacharacters in the argument of open(), Perl
runs the command directly, without using the shell, and can correctly
report whether the command started.
Strictly speaking, nothing. Stylistically speaking, it's not a good
way to write maintainable code. Perl has several operators for
running external commands. Backticks are one; they collect the output
from the command for use in your program. The system function is
another; it doesn't do this.
Writing backticks in your program sends a clear message to the readers of your code that you wanted to collect the output of the command. Why send a clear message that isn't true?
Consider this line:
- `cat /etc/termcap`;
You forgot to check $?
to see whether the program even ran
correctly. Even if you wrote
- print `cat /etc/termcap`;
this code could and probably should be written as
which will echo the cat command's output as it is generated, instead of waiting until the program has completed to print it out. It also checks the return value.
system also provides direct control over whether shell wildcard
processing may take place, whereas backticks do not.
This is a bit tricky. You can't simply write the command like this:
- @ok = `grep @opts '$search_string' @filenames`;
As of Perl 5.8.0, you can use open() with multiple arguments.
Just like the list forms of system() and exec(), no shell
escapes happen.
You can also:
Just as with system(), no shell escapes happen when you exec() a
list. Further examples of this can be found in Safe Pipe Opens in perlipc.
Note that if you're using Windows, no solution to this vexing issue is
even possible. Even though Perl emulates fork(), you'll still be
stuck, because Windows does not have an argc/argv-style API.
This happens only if your perl is compiled to use stdio instead of
perlio, which is the default. Some (maybe all?) stdios set error and
eof flags that you may need to clear. The POSIX module defines
clearerr()
that you can use. That is the technically correct way to
do it. Here are some less reliable workarounds:
Try keeping around the seekpointer and go there, like this:
If that doesn't work, try seeking to a different part of the file and then back.
If that doesn't work, try seeking to a different part of the file, reading something, and then seeking back.
If that doesn't work, give up on your stdio package and use sysread.
Learn Perl and rewrite it. Seriously, there's no simple converter. Things that are awkward to do in the shell are easy to do in Perl, and this very awkwardness is what would make a shell->perl converter nigh-on impossible to write. By rewriting it, you'll think about what you're really trying to do, and hopefully will escape the shell's pipeline datastream paradigm, which while convenient for some matters, causes many inefficiencies.
Try the Net::FTP, TCP::Client, and Net::Telnet modules (available from CPAN). http://www.cpan.org/scripts/netstuff/telnet.emul.shar will also help for emulating the telnet protocol, but Net::Telnet is quite probably easier to use.
If all you want to do is pretend to be telnet but don't need the initial telnet handshaking, then the standard dual-process approach will suffice:
- use IO::Socket; # new in 5.004
- my $handle = IO::Socket::INET->new('www.perl.com:80')
- or die "can't connect to port 80 on www.perl.com $!";
- $handle->autoflush(1);
- if (fork()) { # XXX: undef means failure
- select($handle);
- print while <STDIN>; # everything from stdin to socket
- } else {
- print while <$handle>; # everything from socket to stdout
- }
- close $handle;
- exit;
Once upon a time, there was a library called chat2.pl (part of the standard perl distribution), which never really got finished. If you find it somewhere, don't use it. These days, your best bet is to look at the Expect module available from CPAN, which also requires two other modules from CPAN, IO::Pty and IO::Stty.
First of all note that if you're doing this for security reasons (to avoid people seeing passwords, for example) then you should rewrite your program so that critical information is never given as an argument. Hiding the arguments won't make your program completely secure.
To actually alter the visible command line, you can assign to the variable $0 as documented in perlvar. This won't work on all operating systems, though. Daemon programs like sendmail place their state there, as in:
- $0 = "orcus [accepting connections]";
In the strictest sense, it can't be done--the script executes as a
different process from the shell it was started from. Changes to a
process are not reflected in its parent--only in any children
created after the change. There is shell magic that may allow you to
fake it by eval()ing the script's output in your shell; check out the
comp.unix.questions FAQ for details.
Assuming your system supports such things, just send an appropriate signal to the process (see kill). It's common to first send a TERM signal, wait a little bit, and then send a KILL signal to finish it off.
If by daemon process you mean one that's detached (disassociated from its tty), then the following process is reported to work on most Unixish systems. Non-Unix users should check their Your_OS::Process module for other solutions.
Open /dev/tty and use the TIOCNOTTY ioctl on it. See tty(1)
for details. Or better yet, you can just use the POSIX::setsid()
function, so you don't have to worry about process groups.
Change directory to /
Reopen STDIN, STDOUT, and STDERR so they're not connected to the old tty.
Background yourself like this:
The Proc::Daemon module, available from CPAN, provides a function to perform these actions for you.
(contributed by brian d foy)
This is a difficult question to answer, and the best answer is only a guess.
What do you really want to know? If you merely want to know if one of
your filehandles is connected to a terminal, you can try the -t
file test:
However, you might be out of luck if you expect that means there is a real person on the other side. With the Expect module, another program can pretend to be a person. The program might even come close to passing the Turing test.
The IO::Interactive module does the best it can to give you an
answer. Its is_interactive
function returns an output filehandle;
that filehandle points to standard output if the module thinks the
session is interactive. Otherwise, the filehandle is a null handle
that simply discards the output:
This still doesn't guarantee that a real person is answering your prompts or reading your output.
If you want to know how to handle automated testing for your
distribution, you can check the environment. The CPAN
Testers, for instance, set the value of AUTOMATED_TESTING
:
Use the alarm() function, probably in conjunction with a signal
handler, as documented in Signals in perlipc and the section on
"Signals" in the Camel. You may instead use the more flexible
Sys::AlarmCall module available from CPAN.
The alarm() function is not implemented on all versions of Windows.
Check the documentation for your specific version of Perl.
(contributed by Xho)
Use the BSD::Resource module from CPAN. As an example:
This sets the soft and hard limits to 10 and 20 seconds, respectively. After 10 seconds of time spent running on the CPU (not "wall" time), the process will be sent a signal (XCPU on some systems) which, if not trapped, will cause the process to terminate. If that signal is trapped, then after 10 more seconds (20 seconds in total) the process will be killed with a non-trappable signal.
See the BSD::Resource and your systems documentation for the gory details.
Use the reaper code from Signals in perlipc to call wait() when a
SIGCHLD is received, or else use the double-fork technique described
in How do I start a process in the background? in perlfaq8.
The DBI module provides an abstract interface to most database servers and types, including Oracle, DB2, Sybase, mysql, Postgresql, ODBC, and flat files. The DBI module accesses each database type through a database driver, or DBD. You can see a complete list of available drivers on CPAN: http://www.cpan.org/modules/by-module/DBD/ . You can read more about DBI on http://dbi.perl.org/ .
Other modules provide more specific access: Win32::ODBC, Alzabo,
iodbc
, and others found on CPAN Search: http://search.cpan.org/ .
You can't. You need to imitate the system() call (see perlipc for
sample code) and then have a signal handler for the INT signal that
passes the signal on to the subprocess. Or you can check for it:
If you're lucky enough to be using a system that supports
non-blocking reads (most Unixish systems do), you need only to use the
O_NDELAY
or O_NONBLOCK
flag from the Fcntl
module in conjunction with
sysopen():
(answer contributed by brian d foy)
When you run a Perl script, something else is running the script for you, and that something else may output error messages. The script might emit its own warnings and error messages. Most of the time you cannot tell who said what.
You probably cannot fix the thing that runs perl, but you can change how perl outputs its warnings by defining a custom warning and die functions.
Consider this script, which has an error you may not notice immediately.
- #!/usr/locl/bin/perl
- print "Hello World\n";
I get an error when I run this from my shell (which happens to be
bash). That may look like perl forgot it has a print() function,
but my shebang line is not the path to perl, so the shell runs the
script, and I get the error.
- $ ./test
- ./test: line 3: print: command not found
A quick and dirty fix involves a little bit of code, but this may be all you need to figure out the problem.
The perl message comes out with "Perl" in front. The BEGIN
block
works at compile time so all of the compilation errors and warnings
get the "Perl:" prefix too.
- Perl: Useless use of division (/) in void context at ./test line 9.
- Perl: Name "main::a" used only once: possible typo at ./test line 8.
- Perl: Name "main::x" used only once: possible typo at ./test line 9.
- Perl: Use of uninitialized value in addition (+) at ./test line 8.
- Perl: Use of uninitialized value in division (/) at ./test line 9.
- Perl: Illegal division by zero at ./test line 9.
- Perl: Illegal division by zero at -e line 3.
If I don't see that "Perl:", it's not from perl.
You could also just know all the perl errors, and although there are some people who may know all of them, you probably don't. However, they all should be in the perldiag manpage. If you don't find the error in there, it probably isn't a perl error.
Looking up every message is not the easiest way, so let perl to do it for you. Use the diagnostics pragma with turns perl's normal messages into longer discussions on the topic.
- use diagnostics;
If you don't get a paragraph or two of expanded discussion, it might not be perl's message.
(contributed by brian d foy)
The easiest way is to have a module also named CPAN do it for you by using
the cpan
command that comes with Perl. You can give it a list of modules
to install:
- $ cpan IO::Interactive Getopt::Whatever
If you prefer CPANPLUS
, it's just as easy:
- $ cpanp i IO::Interactive Getopt::Whatever
If you want to install a distribution from the current directory, you can
tell CPAN.pm
to install . (the full stop):
- $ cpan .
See the documentation for either of those commands to see what else you can do.
If you want to try to install a distribution by yourself, resolving all dependencies on your own, you follow one of two possible build paths.
For distributions that use Makefile.PL:
- $ perl Makefile.PL
- $ make test install
For distributions that use Build.PL:
- $ perl Build.PL
- $ ./Build test
- $ ./Build install
Some distributions may need to link to libraries or other third-party code and their build and installation sequences may be more complicated. Check any README or INSTALL files that you may find.
(contributed by brian d foy)
Perl runs require statement at run-time. Once Perl loads, compiles,
and runs the file, it doesn't do anything else. The use statement
is the same as a require run at compile-time, but Perl also calls the
import method for the loaded package. These two are the same:
However, you can suppress the import by using an explicit, empty
import list. Both of these still happen at compile-time:
Since use will also call the import method, the actual value
for MODULE
must be a bareword. That is, use cannot load files
by name, although require can:
- require "$ENV{HOME}/lib/Foo.pm"; # no @INC searching!
See the entry for use in perlfunc for more details.
When you build modules, tell Perl where to install the modules.
If you want to install modules for your own use, the easiest way might be local::lib, which you can download from CPAN. It sets various installation settings for you, and uses those same settings within your programs.
If you want more flexibility, you need to configure your CPAN client for your particular situation.
For Makefile.PL
-based distributions, use the INSTALL_BASE option
when generating Makefiles:
- perl Makefile.PL INSTALL_BASE=/mydir/perl
You can set this in your CPAN.pm
configuration so modules
automatically install in your private library directory when you use
the CPAN.pm shell:
- % cpan
- cpan> o conf makepl_arg INSTALL_BASE=/mydir/perl
- cpan> o conf commit
For Build.PL
-based distributions, use the --install_base option:
- perl Build.PL --install_base /mydir/perl
You can configure CPAN.pm
to automatically use this option too:
- % cpan
- cpan> o conf mbuild_arg "--install_base /mydir/perl"
- cpan> o conf commit
INSTALL_BASE tells these tools to put your modules into /mydir/perl/lib/perl5. See How do I add a directory to my include path (@INC) at runtime? for details on how to run your newly installed modules.
There is one caveat with INSTALL_BASE, though, since it acts differently from the PREFIX and LIB settings that older versions of ExtUtils::MakeMaker advocated. INSTALL_BASE does not support installing modules for multiple versions of Perl or different architectures under the same directory. You should consider whether you really want that and, if you do, use the older PREFIX and LIB settings. See the ExtUtils::Makemaker documentation for more details.
(contributed by brian d foy)
If you know the directory already, you can add it to @INC
as you would
for any other directory. You might <use lib> if you know the directory
at compile time:
- use lib $directory;
The trick in this task is to find the directory. Before your script does
anything else (such as a chdir), you can get the current working
directory with the Cwd
module, which comes with Perl:
You can do a similar thing with the value of $0
, which holds the
script name. That might hold a relative path, but rel2abs
can turn
it into an absolute path. Once you have the
The FindBin module, which comes with Perl, might work. It finds the
directory of the currently running script and puts it in $Bin
, which
you can then use to construct the right library path:
- use FindBin qw($Bin);
You can also use local::lib to do much of the same thing. Install modules using local::lib's settings then use the module in your program:
- use local::lib; # sets up a local lib at ~/perl5
See the local::lib documentation for more details.
Here are the suggested ways of modifying your include path, including environment variables, run-time switches, and in-code statements:
PERLLIB
environment variable
- $ export PERLLIB=/path/to/my/dir
- $ perl program.pl
PERL5LIB
environment variable
- $ export PERL5LIB=/path/to/my/dir
- $ perl program.pl
perl -Idir
command line flag
- $ perl -I/path/to/my/dir program.pl
lib
pragma:
- use lib "$ENV{HOME}/myown_perllib";
The last is particularly useful because it knows about machine-dependent
architectures. The lib.pm
pragmatic module was first
included with the 5.002 release of Perl.
It's a Perl 4 style file defining values for system networking
constants. Sometimes it is built using h2ph when Perl is installed,
but other times it is not. Modern programs should use use Socket;
instead.
Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.
Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required.
perlfaq9 - Web, Email and Networking
This section deals with questions related to running web sites, sending and receiving email as well as general networking.
Yes. If you are building a web site with any level of interactivity (forms / users / databases), you will want to use a framework to make handling requests and responses easier.
If there is no interactivity then you may still want to look at using something like Template Toolkit or Plack::Middleware::TemplateToolkit so maintenance of your HTML files (and other assets) is easier.
There is no simple answer to this question. Perl frameworks can run everything from basic file servers and small scale intranets to massive multinational multilingual websites that are the core to international businesses.
Below is a list of a few frameworks with comments which might help you in making a decision, depending on your specific requirements. Start by reading the docs, then ask questions on the relevant mailing list or IRC channel.
Strongly object-oriented and fully-featured with a long development history and a large community and addon ecosystem. It is excellent for large and complex applications, where you have full control over the server.
Young and free of legacy weight, providing a lightweight and easy to learn API. Has a growing addon ecosystem. It is best used for smaller projects and very easy to learn for beginners.
Fairly young with a focus on HTML5 and real-time web technologies such as WebSockets.
Currently experimental, strongly object-oriented, built for speed and intended as a toolkit for building micro web apps, custom frameworks or for tieing together existing Plack-compatible web applications with one central dispatcher.
All of these interact with or use Plack which is worth understanding the basics of when building a website in Perl (there is a lot of useful Plack::Middleware).
PSGI is the Perl Web Server Gateway Interface Specification, it is a standard that many Perl web frameworks use, you should not need to understand it to build a web site, the part you might want to use is Plack.
Plack is a set of tools for using the PSGI stack. It contains middleware components, a reference server and utilities for Web application frameworks. Plack is like Ruby's Rack or Python's Paste for WSGI.
You could build a web site using Plack and your own code, but for anything other than a very basic web site, using a web framework (that uses Plack) is a better option.
Use HTML::Strip, or HTML::FormatText which not only removes HTML but also attempts to do a little simple formatting of the resulting plain text.
HTML::SimpleLinkExtor will extract URLs from HTML, it handles anchors, images, objects, frames, and many other tags that can contain a URL. If you need anything more complex, you can create your own subclass of HTML::LinkExtor or HTML::Parser. You might even use HTML::SimpleLinkExtor as an example for something specifically suited to your needs.
You can use URI::Find to extract URLs from an arbitrary text document.
(contributed by brian d foy)
Use the libwww-perl distribution. The LWP::Simple module can fetch web resources and give their content back to you as a string:
- use LWP::Simple qw(get);
- my $html = get( "http://www.example.com/index.html" );
It can also store the resource directly in a file:
- use LWP::Simple qw(getstore);
- getstore( "http://www.example.com/index.html", "foo.html" );
If you need to do something more complicated, you can use LWP::UserAgent module to create your own user-agent (e.g. browser) to get the job done. If you want to simulate an interactive web browser, you can use the WWW::Mechanize module.
If you are doing something complex, such as moving through many pages and forms or a web site, you can use WWW::Mechanize. See its documentation for all the details.
If you're submitting values using the GET method, create a URL and encode
the form using the query_form
method:
- use LWP::Simple;
- use URI::URL;
- my $url = url('L<http://www.perl.com/cgi-bin/cpan_mod')>;
- $url->query_form(module => 'DB_File', readme => 1);
- $content = get($url);
If you're using the POST method, create your own user agent and encode the content appropriately.
Most of the time you should not need to do this as your web framework, or if you are making a request, the LWP or other module would handle it for you.
To encode a string yourself, use the URI::Escape module. The uri_escape
function returns the escaped string:
To decode the string, use the uri_unescape
function:
Remember not to encode a full URI, you need to escape each component separately and then join them together.
Most Perl Web Frameworks will have a mechanism for doing this, using the Catalyst framework it would be:
- $c->res->redirect($url);
- $c->detach();
If you are using Plack (which most frameworks do), then Plack::Middleware::Rewrite is worth looking at if you are migrating from Apache or have URL's you want to always redirect.
See if the web framework you are using has an authentication system and if that fits your needs.
Alternativly look at Plack::Middleware::Auth::Basic, or one of the other Plack authentication options.
(contributed by brian d foy)
You can't prevent people from sending your script bad data. Even if you add some client-side checks, people may disable them or bypass them completely. For instance, someone might use a module such as LWP to submit to your web site. If you want to prevent data that try to use SQL injection or other sorts of attacks (and you should want to), you have to not trust any data that enter your program.
The perlsec documentation has general advice about data security.
If you are using the DBI module, use placeholder to fill in data.
If you are running external programs with system or exec, use
the list forms. There are many other precautions that you should take,
too many to list here, and most of them fall under the category of not
using any data that you don't intend to use. Trust no one.
Use the Email::MIME module. It's well-tested and supports all the craziness that you'll see in the real world (comment-folding whitespace, encodings, comments, etc.).
If you've already got some other kind of email object, consider passing it to Email::Abstract and then using its cast method to get an Email::MIME object:
(partly contributed by Aaron Sherman)
This isn't as simple a question as it sounds. There are two parts:
a) How do I verify that an email address is correctly formatted?
b) How do I verify that an email address targets a valid recipient?
Without sending mail to the address and seeing whether there's a human on the other end to answer you, you cannot fully answer part b, but the Email::Valid module will do both part a and part b as far as you can in real-time.
Our best advice for verifying a person's mail address is to have them enter their address twice, just as you normally do to change a password. This usually weeds out typos. If both versions match, send mail to that address with a personal message. If you get the message back and they've followed your directions, you can be reasonably assured that it's real.
A related strategy that's less open to forgery is to give them a PIN (personal ID number). Record the address and PIN (best that it be a random one) for later processing. In the mail you send, include a link to your site with the PIN included. If the mail bounces, you know it's not valid. If they don't click on the link, either they forged the address or (assuming they got the message) following through wasn't important so you don't need to worry about it.
The MIME::Base64 package handles this as well as the MIME/QP encoding. Decoding base 64 becomes as simple as:
The Email::MIME module can decode base 64-encoded email message parts transparently so the developer doesn't need to worry about it.
Ask them for it. There are so many email providers available that it's unlikely the local system has any idea how to determine a user's email address.
The exception is for organization-specific email (e.g. foo@yourcompany.com) where policy can be codified in your program. In that case, you could look at $ENV{USER}, $ENV{LOGNAME}, and getpwuid($<) in scalar context, like so:
But you still cannot make assumptions about whether this is correct, unless your policy says it is. You really are best off asking the user.
Use the Email::MIME and Email::Sender::Simple modules, like so:
- # first, create your message
- my $message = Email::MIME->create(
- header_str => [
- From => 'you@example.com',
- To => 'friend@example.com',
- Subject => 'Happy birthday!',
- ],
- attributes => {
- encoding => 'quoted-printable',
- charset => 'utf-8',
- },
- body_str => "Happy birthday to you!\n",
- );
- use Email::Sender::Simple qw(sendmail);
- sendmail($message);
By default, Email::Sender::Simple will try `sendmail` first, if it exists in your $PATH. This generally isn't the case. If there's a remote mail server you use to send mail, consider investigating one of the Transport classes. At time of writing, the available transports include:
This is the default. If you can use the mail(1) or mailx(1) program to send mail from the machine where your code runs, you should be able to use this.
This transport contacts a remote SMTP server over TCP. It optionally uses SSL and can authenticate to the server via SASL.
This is like the SMTP transport, but uses TLS security. You can authenticate with this module as well, using any mechanisms your server supports after STARTTLS.
Telling Email::Sender::Simple to use your transport is straightforward.
- sendmail(
- $message,
- {
- transport => $email_sender_transport_object,
- }
- );
Email::MIME directly supports multipart messages. Email::MIME objects themselves are parts and can be attached to other Email::MIME objects. Consult the Email::MIME documentation for more information, including all of the supported methods and examples of their use.
Use the Email::Folder module, like so:
There are different classes in the Email::Folder namespace for supporting various mailbox types. Note that these modules are generally rather limited and only support reading rather than writing.
(contributed by brian d foy)
The Net::Domain module, which is part of the Standard Library starting in Perl 5.7.3, can get you the fully qualified domain name (FQDN), the host name, or the domain name.
The Sys::Hostname module, part of the Standard Library, can also get the hostname:
- use Sys::Hostname;
- $host = hostname();
The Sys::Hostname::Long module takes a different approach and tries harder to return the fully qualified hostname:
To get the IP address, you can use the gethostbyname built-in function
to turn the name into a number. To turn that number into the dotted octet
form (a.b.c.d) that most people expect, use the inet_ntoa
function
from the Socket module, which also comes with perl.
- use Socket;
- my $address = inet_ntoa(
- scalar gethostbyname( $host || 'localhost' )
- );
Net::FTP, and Net::SFTP allow you to interact with FTP and SFTP (Secure FTP) servers.
Use one of the RPC modules( https://metacpan.org/search?q=RPC ).
Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved.
This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.
Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required.
perlfilter - Source Filters
This article is about a little-known feature of Perl called source filters. Source filters alter the program text of a module before Perl sees it, much as a C preprocessor alters the source text of a C program before the compiler sees it. This article tells you more about what source filters are, how they work, and how to write your own.
The original purpose of source filters was to let you encrypt your program source to prevent casual piracy. This isn't all they can do, as you'll soon learn. But first, the basics.
Before the Perl interpreter can execute a Perl script, it must first
read it from a file into memory for parsing and compilation. If that
script itself includes other scripts with a use or require
statement, then each of those scripts will have to be read from their
respective files as well.
Now think of each logical connection between the Perl parser and an
individual file as a source stream. A source stream is created when
the Perl parser opens a file, it continues to exist as the source code
is read into memory, and it is destroyed when Perl is finished parsing
the file. If the parser encounters a require or use statement in
a source stream, a new and distinct stream is created just for that
file.
The diagram below represents a single source stream, with the flow of source from a Perl script file on the left into the Perl parser on the right. This is how Perl normally operates.
- file -------> parser
There are two important points to remember:
Although there can be any number of source streams in existence at any given time, only one will be active.
Every source stream is associated with only one file.
A source filter is a special kind of Perl module that intercepts and modifies a source stream before it reaches the parser. A source filter changes our diagram like this:
- file ----> filter ----> parser
If that doesn't make much sense, consider the analogy of a command pipeline. Say you have a shell script stored in the compressed file trial.gz. The simple pipeline command below runs the script without needing to create a temporary file to hold the uncompressed file.
- gunzip -c trial.gz | sh
In this case, the data flow from the pipeline can be represented as follows:
- trial.gz ----> gunzip ----> sh
With source filters, you can store the text of your script compressed and use a source filter to uncompress it for Perl's parser:
- compressed gunzip
- Perl program ---> source filter ---> parser
So how do you use a source filter in a Perl script? Above, I said that a source filter is just a special kind of module. Like all Perl modules, a source filter is invoked with a use statement.
Say you want to pass your Perl source through the C preprocessor before execution. As it happens, the source filters distribution comes with a C preprocessor filter module called Filter::cpp.
Below is an example program, cpp_test
, which makes use of this filter.
Line numbers have been added to allow specific lines to be referenced
easily.
- 1: use Filter::cpp;
- 2: #define TRUE 1
- 3: $a = TRUE;
- 4: print "a = $a\n";
When you execute this script, Perl creates a source stream for the file. Before the parser processes any of the lines from the file, the source stream looks like this:
- cpp_test ---------> parser
Line 1, use Filter::cpp
, includes and installs the cpp
filter
module. All source filters work this way. The use statement is compiled
and executed at compile time, before any more of the file is read, and
it attaches the cpp filter to the source stream behind the scenes. Now
the data flow looks like this:
- cpp_test ----> cpp filter ----> parser
As the parser reads the second and subsequent lines from the source
stream, it feeds those lines through the cpp
source filter before
processing them. The cpp
filter simply passes each line through the
real C preprocessor. The output from the C preprocessor is then
inserted back into the source stream by the filter.
- .-> cpp --.
- | |
- | |
- | <-'
- cpp_test ----> cpp filter ----> parser
The parser then sees the following code:
Let's consider what happens when the filtered code includes another module with use:
- 1: use Filter::cpp;
- 2: #define TRUE 1
- 3: use Fred;
- 4: $a = TRUE;
- 5: print "a = $a\n";
The cpp
filter does not apply to the text of the Fred module, only
to the text of the file that used it (cpp_test
). Although the use
statement on line 3 will pass through the cpp filter, the module that
gets included (Fred
) will not. The source streams look like this
after line 3 has been parsed and before line 4 is parsed:
- cpp_test ---> cpp filter ---> parser (INACTIVE)
- Fred.pm ----> parser
As you can see, a new stream has been created for reading the source
from Fred.pm
. This stream will remain active until all of Fred.pm
has been parsed. The source stream for cpp_test
will still exist,
but is inactive. Once the parser has finished reading Fred.pm, the
source stream associated with it will be destroyed. The source stream
for cpp_test
then becomes active again and the parser reads line 4
and subsequent lines from cpp_test
.
You can use more than one source filter on a single file. Similarly, you can reuse the same filter in as many files as you like.
For example, if you have a uuencoded and compressed source file, it is possible to stack a uudecode filter and an uncompression filter like this:
- use Filter::uudecode; use Filter::uncompress;
- M'XL(".H<US4''V9I;F%L')Q;>7/;1I;_>_I3=&E=%:F*I"T?22Q/
- M6]9*<IQCO*XFT"0[PL%%'Y+IG?WN^ZYN-$'J.[.JE$,20/?K=_[>
- ...
Once the first line has been processed, the flow will look like this:
- file ---> uudecode ---> uncompress ---> parser
- filter filter
Data flows through filters in the same order they appear in the source file. The uudecode filter appeared before the uncompress filter, so the source file will be uudecoded before it's uncompressed.
There are three ways to write your own source filter. You can write it in C, use an external program as a filter, or write the filter in Perl. I won't cover the first two in any great detail, so I'll get them out of the way first. Writing the filter in Perl is most convenient, so I'll devote the most space to it.
The first of the three available techniques is to write the filter completely in C. The external module you create interfaces directly with the source filter hooks provided by Perl.
The advantage of this technique is that you have complete control over
the implementation of your filter. The big disadvantage is the
increased complexity required to write the filter - not only do you
need to understand the source filter hooks, but you also need a
reasonable knowledge of Perl guts. One of the few times it is worth
going to this trouble is when writing a source scrambler. The
decrypt
filter (which unscrambles the source before Perl parses it)
included with the source filter distribution is an example of a C
source filter (see Decryption Filters, below).
All decryption filters work on the principle of "security through obscurity." Regardless of how well you write a decryption filter and how strong your encryption algorithm is, anyone determined enough can retrieve the original source code. The reason is quite simple - once the decryption filter has decrypted the source back to its original form, fragments of it will be stored in the computer's memory as Perl parses it. The source might only be in memory for a short period of time, but anyone possessing a debugger, skill, and lots of patience can eventually reconstruct your program.
That said, there are a number of steps that can be taken to make life difficult for the potential cracker. The most important: Write your decryption filter in C and statically link the decryption module into the Perl binary. For further tips to make life difficult for the potential cracker, see the file decrypt.pm in the source filters distribution.
An alternative to writing the filter in C is to create a separate
executable in the language of your choice. The separate executable
reads from standard input, does whatever processing is necessary, and
writes the filtered data to standard output. Filter::cpp
is an
example of a source filter implemented as a separate executable - the
executable is the C preprocessor bundled with your C compiler.
The source filter distribution includes two modules that simplify this
task: Filter::exec
and Filter::sh
. Both allow you to run any
external executable. Both use a coprocess to control the flow of data
into and out of the external executable. (For details on coprocesses,
see Stephens, W.R., "Advanced Programming in the UNIX Environment."
Addison-Wesley, ISBN 0-210-56317-7, pages 441-445.) The difference
between them is that Filter::exec
spawns the external command
directly, while Filter::sh
spawns a shell to execute the external
command. (Unix uses the Bourne shell; NT uses the cmd shell.) Spawning
a shell allows you to make use of the shell metacharacters and
redirection facilities.
Here is an example script that uses Filter::sh
:
The output you'll get when the script is executed:
- PQR a = 1
Writing a source filter as a separate executable works fine, but a
small performance penalty is incurred. For example, if you execute the
small example above, a separate subprocess will be created to run the
Unix tr command. Each use of the filter requires its own subprocess.
If creating subprocesses is expensive on your system, you might want to
consider one of the other options for creating source filters.
The easiest and most portable option available for creating your own source filter is to write it completely in Perl. To distinguish this from the previous two techniques, I'll call it a Perl source filter.
To help understand how to write a Perl source filter we need an example to study. Here is a complete source filter that performs rot13 decoding. (Rot13 is a very simple encryption scheme used in Usenet postings to hide the contents of offensive posts. It moves every letter forward thirteen places, so that A becomes N, B becomes O, and Z becomes M.)
All Perl source filters are implemented as Perl classes and have the same basic structure as the example above.
First, we include the Filter::Util::Call
module, which exports a
number of functions into your filter's namespace. The filter shown
above uses two of these functions, filter_add()
and
filter_read()
.
Next, we create the filter object and associate it with the source
stream by defining the import function. If you know Perl well
enough, you know that import is called automatically every time a
module is included with a use statement. This makes import the ideal
place to both create and install a filter object.
In the example filter, the object ($ref
) is blessed just like any
other Perl object. Our example uses an anonymous array, but this isn't
a requirement. Because this example doesn't need to store any context
information, we could have used a scalar or hash reference just as
well. The next section demonstrates context data.
The association between the filter object and the source stream is made
with the filter_add()
function. This takes a filter object as a
parameter ($ref
in this case) and installs it in the source stream.
Finally, there is the code that actually does the filtering. For this
type of Perl source filter, all the filtering is done in a method
called filter()
. (It is also possible to write a Perl source filter
using a closure. See the Filter::Util::Call
manual page for more
details.) It's called every time the Perl parser needs another line of
source to process. The filter()
method, in turn, reads lines from
the source stream using the filter_read()
function.
If a line was available from the source stream, filter_read()
returns a status value greater than zero and appends the line to $_
.
A status value of zero indicates end-of-file, less than zero means an
error. The filter function itself is expected to return its status in
the same way, and put the filtered line it wants written to the source
stream in $_
. The use of $_
accounts for the brevity of most Perl
source filters.
In order to make use of the rot13 filter we need some way of encoding
the source file in rot13 format. The script below, mkrot13
, does
just that.
- die "usage mkrot13 filename\n" unless @ARGV;
- my $in = $ARGV[0];
- my $out = "$in.tmp";
- open(IN, "<$in") or die "Cannot open file $in: $!\n";
- open(OUT, ">$out") or die "Cannot open file $out: $!\n";
- print OUT "use Rot13;\n";
- while (<IN>) {
- tr/a-zA-Z/n-za-mN-ZA-M/;
- print OUT;
- }
- close IN;
- close OUT;
- unlink $in;
- rename $out, $in;
If we encrypt this with mkrot13
:
- print " hello fred \n";
the result will be this:
- use Rot13;
- cevag "uryyb serq\a";
Running it produces this output:
- hello fred
The rot13 example was a trivial example. Here's another demonstration that shows off a few more features.
Say you wanted to include a lot of debugging code in your Perl script
during development, but you didn't want it available in the released
product. Source filters offer a solution. In order to keep the example
simple, let's say you wanted the debugging output to be controlled by
an environment variable, DEBUG
. Debugging code is enabled if the
variable exists, otherwise it is disabled.
Two special marker lines will bracket debugging code, like this:
- ## DEBUG_BEGIN
- if ($year > 1999) {
- warn "Debug: millennium bug in year $year\n";
- }
- ## DEBUG_END
The filter ensures that Perl parses the code between the <DEBUG_BEGIN>
and DEBUG_END
markers only when the DEBUG
environment variable
exists. That means that when DEBUG
does exist, the code above
should be passed through the filter unchanged. The marker lines can
also be passed through as-is, because the Perl parser will see them as
comment lines. When DEBUG
isn't set, we need a way to disable the
debug code. A simple way to achieve that is to convert the lines
between the two markers into comments:
- ## DEBUG_BEGIN
- #if ($year > 1999) {
- # warn "Debug: millennium bug in year $year\n";
- #}
- ## DEBUG_END
Here is the complete Debug filter:
- package Debug;
- use strict;
- use warnings;
- use Filter::Util::Call;
- use constant TRUE => 1;
- use constant FALSE => 0;
- sub import {
- my ($type) = @_;
- my (%context) = (
- Enabled => defined $ENV{DEBUG},
- InTraceBlock => FALSE,
- Filename => (caller)[1],
- LineNo => 0,
- LastBegin => 0,
- );
- filter_add(bless \%context);
- }
- sub Die {
- my ($self) = shift;
- my ($message) = shift;
- my ($line_no) = shift || $self->{LastBegin};
- die "$message at $self->{Filename} line $line_no.\n"
- }
- sub filter {
- my ($self) = @_;
- my ($status);
- $status = filter_read();
- ++ $self->{LineNo};
- # deal with EOF/error first
- if ($status <= 0) {
- $self->Die("DEBUG_BEGIN has no DEBUG_END")
- if $self->{InTraceBlock};
- return $status;
- }
- if ($self->{InTraceBlock}) {
- if (/^\s*##\s*DEBUG_BEGIN/ ) {
- $self->Die("Nested DEBUG_BEGIN", $self->{LineNo})
- } elsif (/^\s*##\s*DEBUG_END/) {
- $self->{InTraceBlock} = FALSE;
- }
- # comment out the debug lines when the filter is disabled
- s/^/#/ if ! $self->{Enabled};
- } elsif ( /^\s*##\s*DEBUG_BEGIN/ ) {
- $self->{InTraceBlock} = TRUE;
- $self->{LastBegin} = $self->{LineNo};
- } elsif ( /^\s*##\s*DEBUG_END/ ) {
- $self->Die("DEBUG_END has no DEBUG_BEGIN", $self->{LineNo});
- }
- return $status;
- }
- 1;
The big difference between this filter and the previous example is the
use of context data in the filter object. The filter object is based on
a hash reference, and is used to keep various pieces of context
information between calls to the filter function. All but two of the
hash fields are used for error reporting. The first of those two,
Enabled, is used by the filter to determine whether the debugging code
should be given to the Perl parser. The second, InTraceBlock, is true
when the filter has encountered a DEBUG_BEGIN
line, but has not yet
encountered the following DEBUG_END
line.
If you ignore all the error checking that most of the code does, the essence of the filter is as follows:
- sub filter {
- my ($self) = @_;
- my ($status);
- $status = filter_read();
- # deal with EOF/error first
- return $status if $status <= 0;
- if ($self->{InTraceBlock}) {
- if (/^\s*##\s*DEBUG_END/) {
- $self->{InTraceBlock} = FALSE
- }
- # comment out debug lines when the filter is disabled
- s/^/#/ if ! $self->{Enabled};
- } elsif ( /^\s*##\s*DEBUG_BEGIN/ ) {
- $self->{InTraceBlock} = TRUE;
- }
- return $status;
- }
Be warned: just as the C-preprocessor doesn't know C, the Debug filter doesn't know Perl. It can be fooled quite easily:
- print <<EOM;
- ##DEBUG_BEGIN
- EOM
Such things aside, you can see that a lot can be achieved with a modest amount of code.
You now have better understanding of what a source filter is, and you might even have a possible use for them. If you feel like playing with source filters but need a bit of inspiration, here are some extra features you could add to the Debug filter.
First, an easy one. Rather than having debugging code that is
all-or-nothing, it would be much more useful to be able to control
which specific blocks of debugging code get included. Try extending the
syntax for debug blocks to allow each to be identified. The contents of
the DEBUG
environment variable can then be used to control which
blocks get included.
Once you can identify individual blocks, try allowing them to be nested. That isn't difficult either.
Here is an interesting idea that doesn't involve the Debug filter.
Currently Perl subroutines have fairly limited support for formal
parameter lists. You can specify the number of parameters and their
type, but you still have to manually take them out of the @_
array
yourself. Write a source filter that allows you to have a named
parameter list. Such a filter would turn this:
into this:
Finally, if you feel like a real challenge, have a go at writing a full-blown Perl macro preprocessor as a source filter. Borrow the useful features from the C preprocessor and any other macro processors you know. The tricky bit will be choosing how much knowledge of Perl's syntax you want your filter to have.
DATA
Handle
Some source filters use the DATA
handle to read the calling program.
When using these source filters you cannot rely on this handle, nor expect
any particular kind of behavior when operating on it. Filters based on
Filter::Util::Call (and therefore Filter::Simple) do not alter the DATA
filehandle.
The Source Filters distribution is available on CPAN, in
- CPAN/modules/by-module/Filter
Starting from Perl 5.8 Filter::Util::Call (the core part of the Source Filters distribution) is part of the standard Perl distribution. Also included is a friendlier interface called Filter::Simple, by Damian Conway.
Paul Marquess <Paul.Marquess@btinternet.com>
This article originally appeared in The Perl Journal #11, and is copyright 1998 The Perl Journal. It appears courtesy of Jon Orwant and The Perl Journal. This document may be distributed under the same terms as Perl itself.
perlfork - Perl's fork() emulation
- NOTE: As of the 5.8.0 release, fork() emulation has considerably
- matured. However, there are still a few known bugs and differences
- from real fork() that might affect you. See the "BUGS" and
- "CAVEATS AND LIMITATIONS" sections below.
Perl provides a fork() keyword that corresponds to the Unix system call of the same name. On most Unix-like platforms where the fork() system call is available, Perl's fork() simply calls it.
On some platforms such as Windows where the fork() system call is not available, Perl can be built to emulate fork() at the interpreter level. While the emulation is designed to be as compatible as possible with the real fork() at the level of the Perl program, there are certain important differences that stem from the fact that all the pseudo child "processes" created this way live in the same real process as far as the operating system is concerned.
This document provides a general overview of the capabilities and limitations of the fork() emulation. Note that the issues discussed here are not applicable to platforms where a real fork() is available and Perl has been configured to use it.
The fork() emulation is implemented at the level of the Perl interpreter. What this means in general is that running fork() will actually clone the running interpreter and all its state, and run the cloned interpreter in a separate thread, beginning execution in the new thread just after the point where the fork() was called in the parent. We will refer to the thread that implements this child "process" as the pseudo-process.
To the Perl program that called fork(), all this is designed to be
transparent. The parent returns from the fork() with a pseudo-process
ID that can be subsequently used in any process-manipulation functions;
the child returns from the fork() with a value of 0
to signify that
it is the child pseudo-process.
Most Perl features behave in a natural way within pseudo-processes.
This special variable is correctly set to the pseudo-process ID. It can be used to identify pseudo-processes within a particular session. Note that this value is subject to recycling if any pseudo-processes are launched after others have been wait()-ed on.
Each pseudo-process maintains its own virtual environment. Modifications to %ENV affect the virtual environment, and are only visible within that pseudo-process, and in any processes (or pseudo-processes) launched from it.
Each pseudo-process maintains its own virtual idea of the current directory. Modifications to the current directory using chdir() are only visible within that pseudo-process, and in any processes (or pseudo-processes) launched from it. All file and directory accesses from the pseudo-process will correctly map the virtual working directory to the real working directory appropriately.
wait() and waitpid() can be passed a pseudo-process ID returned by fork(). These calls will properly wait for the termination of the pseudo-process and return its status.
kill('KILL', ...)
can be used to terminate a pseudo-process by
passing it the ID returned by fork(). The outcome of kill on a pseudo-process
is unpredictable and it should not be used except
under dire circumstances, because the operating system may not
guarantee integrity of the process resources when a running thread is
terminated. The process which implements the pseudo-processes can be blocked
and the Perl interpreter hangs. Note that using kill('KILL', ...)
on a
pseudo-process() may typically cause memory leaks, because the thread
that implements the pseudo-process does not get a chance to clean up
its resources.
kill('TERM', ...)
can also be used on pseudo-processes, but the
signal will not be delivered while the pseudo-process is blocked by a
system call, e.g. waiting for a socket to connect, or trying to read
from a socket with no data available. Starting in Perl 5.14 the
parent process will not wait for children to exit once they have been
signalled with kill('TERM', ...)
to avoid deadlock during process
exit. You will have to explicitly call waitpid() to make sure the
child has time to clean-up itself, but you are then also responsible
that the child is not blocking on I/O either.
Calling exec() within a pseudo-process actually spawns the requested executable in a separate process and waits for it to complete before exiting with the same exit status as that process. This means that the process ID reported within the running executable will be different from what the earlier Perl fork() might have returned. Similarly, any process manipulation functions applied to the ID returned by fork() will affect the waiting pseudo-process that called exec(), not the real process it is waiting for after the exec().
When exec() is called inside a pseudo-process then DESTROY methods and END blocks will still be called after the external process returns.
exit() always exits just the executing pseudo-process, after automatically wait()-ing for any outstanding child pseudo-processes. Note that this means that the process as a whole will not exit unless all running pseudo-processes have exited. See below for some limitations with open filehandles.
All open handles are dup()-ed in pseudo-processes, so that closing any handles in one process does not affect the others. See below for some limitations.
In the eyes of the operating system, pseudo-processes created via the fork() emulation are simply threads in the same process. This means that any process-level limits imposed by the operating system apply to all pseudo-processes taken together. This includes any limits imposed by the operating system on the number of open file, directory and socket handles, limits on disk space usage, limits on memory size, limits on CPU utilization etc.
If the parent process is killed (either using Perl's kill() builtin, or using some external means) all the pseudo-processes are killed as well, and the whole process exits.
During the normal course of events, the parent process and every pseudo-process started by it will wait for their respective pseudo-children to complete before they exit. This means that the parent and every pseudo-child created by it that is also a pseudo-parent will only exit after their pseudo-children have exited.
Starting with Perl 5.14 a parent will not wait() automatically
for any child that has been signalled with sig('TERM', ...)
to avoid a deadlock in case the child is blocking on I/O and
never receives the signal.
The fork() emulation will not work entirely correctly when called from within a BEGIN block. The forked copy will run the contents of the BEGIN block, but will not continue parsing the source stream after the BEGIN block. For example, consider the following code:
This will print:
- inner
rather than the expected:
- inner
- outer
This limitation arises from fundamental technical difficulties in cloning and restarting the stacks used by the Perl parser in the middle of a parse.
Any filehandles open at the time of the fork() will be dup()-ed. Thus, the files can be closed independently in the parent and child, but beware that the dup()-ed handles will still share the same seek pointer. Changing the seek position in the parent will change it in the child and vice-versa. One can avoid this by opening files that need distinct seek pointers separately in the child.
On some operating systems, notably Solaris and Unixware, calling exit()
from a child process will flush and close open filehandles in the parent,
thereby corrupting the filehandles. On these systems, calling _exit()
is suggested instead. _exit()
is available in Perl through the
POSIX
module. Please consult your system's manpages for more information
on this.
Perl will completely read from all open directory handles until they reach the end of the stream. It will then seekdir() back to the original location and all future readdir() requests will be fulfilled from the cache buffer. That means that neither the directory handle held by the parent process nor the one held by the child process will see any changes made to the directory after the fork() call.
Note that rewinddir() has a similar limitation on Windows and will not force readdir() to read the directory again either. Only a newly opened directory handle will reflect changes to the directory.
The open(FOO, "|-")
and open(BAR, "-|")
constructs are not yet
implemented. This limitation can be easily worked around in new code
by creating a pipe explicitly. The following example shows how to
write to a forked child:
- # simulate open(FOO, "|-")
- sub pipe_to_fork ($) {
- my $parent = shift;
- pipe my $child, $parent or die;
- my $pid = fork();
- die "fork() failed: $!" unless defined $pid;
- if ($pid) {
- close $child;
- }
- else {
- close $parent;
- open(STDIN, "<&=" . fileno($child)) or die;
- }
- $pid;
- }
- if (pipe_to_fork('FOO')) {
- # parent
- print FOO "pipe_to_fork\n";
- close FOO;
- }
- else {
- # child
- while (<STDIN>) { print; }
- exit(0);
- }
And this one reads from the child:
- # simulate open(FOO, "-|")
- sub pipe_from_fork ($) {
- my $parent = shift;
- pipe $parent, my $child or die;
- my $pid = fork();
- die "fork() failed: $!" unless defined $pid;
- if ($pid) {
- close $child;
- }
- else {
- close $parent;
- open(STDOUT, ">&=" . fileno($child)) or die;
- }
- $pid;
- }
- if (pipe_from_fork('BAR')) {
- # parent
- while (<BAR>) { print; }
- close BAR;
- }
- else {
- # child
- print "pipe_from_fork\n";
- exit(0);
- }
Forking pipe open() constructs will be supported in future.
External subroutines (XSUBs) that maintain their own global state may not work correctly. Such XSUBs will either need to maintain locks to protect simultaneous access to global data from different pseudo-processes, or maintain all their state on the Perl symbol table, which is copied naturally when fork() is called. A callback mechanism that provides extensions an opportunity to clone their state will be provided in the near future.
The fork() emulation may not behave as expected when it is executed in an application which embeds a Perl interpreter and calls Perl APIs that can evaluate bits of Perl code. This stems from the fact that the emulation only has knowledge about the Perl interpreter's own data structures and knows nothing about the containing application's state. For example, any state carried on the application's own call stack is out of reach.
Since the fork() emulation runs code in multiple threads, extensions calling into non-thread-safe libraries may not work reliably when calling fork(). As Perl's threading support gradually becomes more widely adopted even on platforms with a native fork(), such extensions are expected to be fixed for thread-safety.
In portable Perl code, kill(9, $child)
must not be used on forked processes.
Killing a forked process is unsafe and has unpredictable results.
See kill(), above.
Having pseudo-process IDs be negative integers breaks down for the integer
-1
because the wait() and waitpid() functions treat this number as
being special. The tacit assumption in the current implementation is that
the system never allocates a thread ID of 1
for user threads. A better
representation for pseudo-process IDs will be implemented in future.
In certain cases, the OS-level handles created by the pipe(), socket(), and accept() operators are apparently not duplicated accurately in pseudo-processes. This only happens in some situations, but where it does happen, it may result in deadlocks between the read and write ends of pipe handles, or inability to send or receive data across socket handles.
This document may be incomplete in some respects.
Support for concurrent interpreters and the fork() emulation was implemented by ActiveState, with funding from Microsoft Corporation.
This document is authored and maintained by Gurusamy Sarathy <gsar@activestate.com>.
perlform - Perl formats
Perl has a mechanism to help you generate simple reports and charts. To facilitate this, Perl helps you code up your output page close to how it will look when it's printed. It can keep track of things like how many lines are on a page, what page you're on, when to print page headers, etc. Keywords are borrowed from FORTRAN: format() to declare and write() to execute; see their entries in perlfunc. Fortunately, the layout is much more legible, more like BASIC's PRINT USING statement. Think of it as a poor man's nroff(1).
Formats, like packages and subroutines, are declared rather than executed, so they may occur at any point in your program. (Usually it's best to keep them all together though.) They have their own namespace apart from all the other "types" in Perl. This means that if you have a function named "Foo", it is not the same thing as having a format named "Foo". However, the default name for the format associated with a given filehandle is the same as the name of the filehandle. Thus, the default format for STDOUT is named "STDOUT", and the default format for filehandle TEMP is named "TEMP". They just look the same. They aren't.
Output record formats are declared as follows:
- format NAME =
- FORMLIST
- .
If the name is omitted, format "STDOUT" is defined. A single "." in column 1 is used to terminate a format. FORMLIST consists of a sequence of lines, each of which may be one of three types:
A comment, indicated by putting a '#' in the first column.
A "picture" line giving the format for one output line.
An argument line supplying values to plug into the previous picture line.
Picture lines contain output field definitions, intermingled with literal text. These lines do not undergo any kind of variable interpolation. Field definitions are made up from a set of characters, for starting and extending a field to its desired width. This is the complete set of characters for field definitions:
- @ start of regular field
- ^ start of special field
- < pad character for left justification
- | pad character for centering
- > pad character for right justification
- # pad character for a right-justified numeric field
- 0 instead of first #: pad number with leading zeroes
- . decimal point within a numeric field
- ... terminate a text field, show "..." as truncation evidence
- @* variable width field for a multi-line value
- ^* variable width field for next line of a multi-line value
- ~ suppress line with all fields empty
- ~~ repeat line until all fields are exhausted
Each field in a picture line starts with either "@" (at) or "^" (caret), indicating what we'll call, respectively, a "regular" or "special" field. The choice of pad characters determines whether a field is textual or numeric. The tilde operators are not part of a field. Let's look at the various possibilities in detail.
The length of the field is supplied by padding out the field with multiple "<", ">", or "|" characters to specify a non-numeric field with, respectively, left justification, right justification, or centering. For a regular field, the value (up to the first newline) is taken and printed according to the selected justification, truncating excess characters. If you terminate a text field with "...", three dots will be shown if the value is truncated. A special text field may be used to do rudimentary multi-line text block filling; see Using Fill Mode for details.
- Example:
- format STDOUT =
- @<<<<<< @|||||| @>>>>>>
- "left", "middle", "right"
- .
- Output:
- left middle right
Using "#" as a padding character specifies a numeric field, with right justification. An optional "." defines the position of the decimal point. With a "0" (zero) instead of the first "#", the formatted number will be padded with leading zeroes if necessary. A special numeric field is blanked out if the value is undefined. If the resulting value would exceed the width specified the field is filled with "#" as overflow evidence.
- Example:
- format STDOUT =
- @### @.### @##.### @### @### ^####
- 42, 3.1415, undef, 0, 10000, undef
- .
- Output:
- 42 3.142 0.000 0 ####
The field "@*" can be used for printing multi-line, nontruncated values; it should (but need not) appear by itself on a line. A final line feed is chomped off, but all other characters are emitted verbatim.
Like "@*", this is a variable-width field. The value supplied must be a scalar variable. Perl puts the first line (up to the first "\n") of the text into the field, and then chops off the front of the string so that the next time the variable is referenced, more of the text can be printed. The variable will not be restored.
- Example:
- $text = "line 1\nline 2\nline 3";
- format STDOUT =
- Text: ^*
- $text
- ~~ ^*
- $text
- .
- Output:
- Text: line 1
- line 2
- line 3
The values are specified on the following format line in the same order as
the picture fields. The expressions providing the values must be
separated by commas. They are all evaluated in a list context
before the line is processed, so a single list expression could produce
multiple list elements. The expressions may be spread out to more than
one line if enclosed in braces. If so, the opening brace must be the first
token on the first line. If an expression evaluates to a number with a
decimal part, and if the corresponding picture specifies that the decimal
part should appear in the output (that is, any picture except multiple "#"
characters without an embedded "."), the character used for the decimal
point is determined by the current LC_NUMERIC locale if use locale
is in
effect. This means that, if, for example, the run-time environment happens
to specify a German locale, "," will be used instead of the default ".". See
perllocale and WARNINGS for more information.
On text fields the caret enables a kind of fill mode. Instead of an
arbitrary expression, the value supplied must be a scalar variable
that contains a text string. Perl puts the next portion of the text into
the field, and then chops off the front of the string so that the next time
the variable is referenced, more of the text can be printed. (Yes, this
means that the variable itself is altered during execution of the write()
call, and is not restored.) The next portion of text is determined by
a crude line-breaking algorithm. You may use the carriage return character
(\r
) to force a line break. You can change which characters are legal
to break on by changing the variable $:
(that's
$FORMAT_LINE_BREAK_CHARACTERS if you're using the English module) to a
list of the desired characters.
Normally you would use a sequence of fields in a vertical stack associated with the same scalar variable to print out a block of text. You might wish to end the final field with the text "...", which will appear in the output if the text was too long to appear in its entirety.
Using caret fields can produce lines where all fields are blank. You can suppress such lines by putting a "~" (tilde) character anywhere in the line. The tilde will be translated to a space upon output.
If you put two contiguous tilde characters "~~" anywhere into a line,
the line will be repeated until all the fields on the line are exhausted,
i.e. undefined. For special (caret) text fields this will occur sooner or
later, but if you use a text field of the at variety, the expression you
supply had better not give the same value every time forever! (shift(@f)
is a simple example that would work.) Don't use a regular (at) numeric
field in such lines, because it will never go blank.
Top-of-form processing is by default handled by a format with the same name as the current filehandle with "_TOP" concatenated to it. It's triggered at the top of each page. See write.
Examples:
- # a report on the /etc/passwd file
- format STDOUT_TOP =
- Passwd File
- Name Login Office Uid Gid Home
- ------------------------------------------------------------------
- .
- format STDOUT =
- @<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
- $name, $login, $office,$uid,$gid, $home
- .
- # a report from a bug report form
- format STDOUT_TOP =
- Bug Reports
- @<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
- $system, $%, $date
- ------------------------------------------------------------------
- .
- format STDOUT =
- Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $subject
- Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $index, $description
- Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $priority, $date, $description
- From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $from, $description
- Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $programmer, $description
- ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $description
- ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $description
- ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $description
- ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $description
- ~ ^<<<<<<<<<<<<<<<<<<<<<<<...
- $description
- .
It is possible to intermix print()s with write()s on the same output
channel, but you'll have to handle $-
($FORMAT_LINES_LEFT
)
yourself.
The current format name is stored in the variable $~
($FORMAT_NAME
),
and the current top of form format name is in $^
($FORMAT_TOP_NAME
).
The current output page number is stored in $%
($FORMAT_PAGE_NUMBER
),
and the number of lines on the page is in $=
($FORMAT_LINES_PER_PAGE
).
Whether to autoflush output on this handle is stored in $|
($OUTPUT_AUTOFLUSH
). The string output before each top of page (except
the first) is stored in $^L
($FORMAT_FORMFEED
). These variables are
set on a per-filehandle basis, so you'll need to select() into a different
one to affect them:
Pretty ugly, eh? It's a common idiom though, so don't be too surprised when you see it. You can at least use a temporary variable to hold the previous filehandle: (this is a much better approach in general, because not only does legibility improve, you now have an intermediary stage in the expression to single-step the debugger through):
If you use the English module, you can even read the variable names:
But you still have those funny select()s. So just use the FileHandle module. Now, you can access these special variables using lowercase method names instead:
- use FileHandle;
- format_name OUTF "My_Other_Format";
- format_top_name OUTF "My_Top_Format";
Much better!
Because the values line may contain arbitrary expressions (for at fields, not caret fields), you can farm out more sophisticated processing to other functions, like sprintf() or one of your own. For example:
- format Ident =
- @<<<<<<<<<<<<<<<
- &commify($n)
- .
To get a real at or caret into the field, do this:
- format Ident =
- I have an @ here.
- "@"
- .
To center a whole line of text, do something like this:
- format Ident =
- @|||||||||||||||||||||||||||||||||||||||||||||||
- "Some text line"
- .
There is no builtin way to say "float this to the right hand side of the page, however wide it is." You have to specify where it goes. The truly desperate can generate their own format on the fly, based on the current number of columns, and then eval() it:
Which would generate a format looking something like this:
- format STDOUT =
- ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $entry
- ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~
- $entry
- .
Here's a little program that's somewhat like fmt(1):
- format =
- ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~~
- $_
- .
- $/ = '';
- while (<>) {
- s/\s*\n\s*/ /g;
- write;
- }
While $FORMAT_TOP_NAME contains the name of the current header format, there is no corresponding mechanism to automatically do the same thing for a footer. Not knowing how big a format is going to be until you evaluate it is one of the major problems. It's on the TODO list.
Here's one strategy: If you have a fixed-size footer, you can get footers by checking $FORMAT_LINES_LEFT before each write() and print the footer yourself if necessary.
Here's another strategy: Open a pipe to yourself, using open(MYSELF, "|-")
(see open) and always write() to MYSELF instead of STDOUT.
Have your child process massage its STDIN to rearrange headers and footers
however you like. Not very convenient, but doable.
For low-level access to the formatting mechanism, you may use formline()
and access $^A
(the $ACCUMULATOR variable) directly.
For example:
- $str = formline <<'END', 1,2,3;
- @<<< @||| @>>>
- END
- print "Wow, I just stored '$^A' in the accumulator!\n";
Or to make an swrite() subroutine, which is to write() what sprintf() is to printf(), do this:
- use Carp;
- sub swrite {
- croak "usage: swrite PICTURE ARGS" unless @_;
- my $format = shift;
- $^A = "";
- formline($format,@_);
- return $^A;
- }
- $string = swrite(<<'END', 1, 2, 3);
- Check me out
- @<<< @||| @>>>
- END
- print $string;
The lone dot that ends a format can also prematurely end a mail message passing through a misconfigured Internet mailer (and based on experience, such misconfiguration is the rule, not the exception). So when sending format code through mail, you should indent it so that the format-ending dot is not on the left margin; this will prevent SMTP cutoff.
Lexical variables (declared with "my") are not visible within a format unless the format is declared within the scope of the lexical variable.
If a program's environment specifies an LC_NUMERIC locale and use
locale
is in effect when the format is declared, the locale is used
to specify the decimal point character in formatted output. Formatted
output cannot be controlled by use locale
at the time when write()
is called. See perllocale for further discussion of locale handling.
Within strings that are to be displayed in a fixed-length text field,
each control character is substituted by a space. (But remember the
special meaning of \r
when using fill mode.) This is done to avoid
misalignment when control characters "disappear" on some output media.
perlfreebsd - Perl version 5 on FreeBSD systems
This document describes various features of FreeBSD that will affect how Perl version 5 (hereafter just Perl) is compiled and/or runs.
When perl is configured to use ithreads, it will use re-entrant library calls
in preference to non-re-entrant versions. There is a bug in FreeBSD's
readdir_r
function in versions 4.5 and earlier that can cause a SEGV when
reading large directories. A patch for FreeBSD libc is available
(see http://www.freebsd.org/cgi/query-pr.cgi?pr=misc/30631 )
which has been integrated into FreeBSD 4.6.
perl sets $^X
where possible to a full path by asking the operating
system. On FreeBSD the full path of the perl interpreter is found by using
sysctl
with KERN_PROC_PATHNAME
if that is supported, else by reading
the symlink /proc/curproc/file. FreeBSD 7 and earlier has a bug where
either approach sometimes returns an incorrect value
(see http://www.freebsd.org/cgi/query-pr.cgi?pr=35703 ).
In these cases perl will fall back to the old behaviour of using C's
argv[0] value for $^X
.
Nicholas Clark <nick@ccl4.org>, collating wisdom supplied by Slaven Rezic and Tim Bunce.
Please report any errors, updates, or suggestions to perlbug@perl.org.
perlfunc - Perl builtin functions
The functions in this section can serve as terms in an expression. They fall into two major categories: list operators and named unary operators. These differ in their precedence relationship with a following comma. (See the precedence table in perlop.) List operators take more than one argument, while unary operators can never take more than one argument. Thus, a comma terminates the argument of a unary operator, but merely separates the arguments of a list operator. A unary operator generally provides scalar context to its argument, while a list operator may provide either scalar or list contexts for its arguments. If it does both, scalar arguments come first and list argument follow, and there can only ever be one such list argument. For instance, splice() has three scalar arguments followed by a list, whereas gethostbyname() has four scalar arguments.
In the syntax descriptions that follow, list operators that expect a list (and provide list context for elements of the list) are shown with LIST as an argument. Such a list may consist of any combination of scalar arguments or list values; the list values will be included in the list as if each individual element were interpolated at that point in the list, forming a longer single-dimensional list value. Commas should separate literal elements of the LIST.
Any function in the list below may be used either with or without parentheses around its arguments. (The syntax descriptions omit the parentheses.) If you use parentheses, the simple but occasionally surprising rule is this: It looks like a function, therefore it is a function, and precedence doesn't matter. Otherwise it's a list operator or unary operator, and precedence does matter. Whitespace between the function and left parenthesis doesn't count, so sometimes you need to be careful:
If you run Perl with the -w switch it can warn you about this. For example, the third line above produces:
- print (...) interpreted as function at - line 1.
- Useless use of integer addition in void context at - line 1.
A few functions take no arguments at all, and therefore work as neither
unary nor list operators. These include such functions as time
and endpwent. For example, time+86_400 always means
time() + 86_400
.
For functions that can be used in either a scalar or list context, nonabortive failure is generally indicated in scalar context by returning the undefined value, and in list context by returning the empty list.
Remember the following important rule: There is no rule that relates the behavior of an expression in list context to its behavior in scalar context, or vice versa. It might do two totally different things. Each operator and function decides which sort of value would be most appropriate to return in scalar context. Some operators return the length of the list that would have been returned in list context. Some operators return the first value in the list. Some operators return the last value in the list. Some operators return a count of successful operations. In general, they do what you want, unless you want consistency.
A named array in scalar context is quite different from what would at
first glance appear to be a list in scalar context. You can't get a list
like (1,2,3)
into being in scalar context, because the compiler knows
the context at compile time. It would generate the scalar comma operator
there, not the list construction version of the comma. That means it
was never a list to start with.
In general, functions in Perl that serve as wrappers for system calls ("syscalls")
of the same name (like chown(2), fork(2), closedir(2), etc.) return
true when they succeed and undef otherwise, as is usually mentioned
in the descriptions below. This is different from the C interfaces,
which return -1
on failure. Exceptions to this rule include wait,
waitpid, and syscall. System calls also set the special $!
variable on failure. Other functions do not, except accidentally.
Extension modules can also hook into the Perl parser to define new kinds of keyword-headed expression. These may look like functions, but may also look completely different. The syntax following the keyword is defined entirely by the extension. If you are an implementor, see PL_keyword_plugin in perlapi for the mechanism. If you are using such a module, see the module's documentation for details of the syntax that it defines.
Here are Perl's functions (including things that look like functions, like some keywords and named operators) arranged by category. Some functions appear in more than one place.
chomp, chop, chr, crypt, fc, hex, index, lc,
lcfirst, length, oct, ord, pack, q//, qq//, reverse,
rindex, sprintf, substr, tr///, uc, ucfirst, y///
fc is available only if the "fc"
feature is enabled or if it is
prefixed with CORE::
. The "fc"
feature is enabled automatically
with a use v5.16
(or higher) declaration in the current scope.
abs, atan2, cos, exp, hex, int, log, oct, rand,
sin, sqrt, srand
binmode, close, closedir, dbmclose, dbmopen, die, eof,
fileno, flock, format, getc, print, printf, read,
readdir, readline rewinddir, say, seek, seekdir, select,
syscall, sysread, sysseek, syswrite, tell, telldir,
truncate, warn, write
say is available only if the "say"
feature is enabled or if it is
prefixed with CORE::
. The "say"
feature is enabled automatically
with a use v5.10
(or higher) declaration in the current scope.
pack, read, syscall, sysread, sysseek, syswrite, unpack,
vec
-X, chdir, chmod, chown, chroot, fcntl, glob,
ioctl, link, lstat, mkdir, open, opendir,
readlink, rename, rmdir, stat, symlink, sysopen,
umask, unlink, utime
break
, caller, continue, die, do,
dump, eval, evalbytes exit,
__FILE__
, goto, last, __LINE__
, next, __PACKAGE__
,
redo, return, sub, __SUB__, wantarray
break
is available only if you enable the experimental "switch"
feature or use the CORE::
prefix. The "switch"
feature also enables
the default
, given
and when
statements, which are documented in
Switch Statements in perlsyn. The "switch"
feature is enabled
automatically with a use v5.10
(or higher) declaration in the current
scope. In Perl v5.14 and earlier, continue required the "switch"
feature, like the other keywords.
evalbytes is only available with the "evalbytes"
feature (see
feature) or if prefixed with CORE::
. __SUB__ is only available
with the "current_sub"
feature or if prefixed with CORE::
. Both
the "evalbytes"
and "current_sub"
features are enabled automatically
with a use v5.16
(or higher) declaration in the current scope.
caller, import, local, my, our, package, state, use
state is available only if the "state"
feature is enabled or if it is
prefixed with CORE::
. The "state"
feature is enabled automatically
with a use v5.10
(or higher) declaration in the current scope.
alarm, exec, fork, getpgrp, getppid, getpriority, kill,
pipe, qx//, readpipe, setpgrp,
setpriority, sleep, system,
times, wait, waitpid
bless, dbmclose, dbmopen, package, ref, tie, tied,
untie, use
accept, bind, connect, getpeername, getsockname,
getsockopt, listen, recv, send, setsockopt, shutdown,
socket, socketpair
msgctl, msgget, msgrcv, msgsnd, semctl, semget, semop,
shmctl, shmget, shmread, shmwrite
endgrent, endhostent, endnetent, endpwent, getgrent,
getgrgid, getgrnam, getlogin, getpwent, getpwnam,
getpwuid, setgrent, setpwent
endprotoent, endservent, gethostbyaddr, gethostbyname,
gethostent, getnetbyaddr, getnetbyname, getnetent,
getprotobyname, getprotobynumber, getprotoent,
getservbyname, getservbyport, getservent, sethostent,
setnetent, setprotoent, setservent
and
, AUTOLOAD
, BEGIN
, CHECK
, cmp
, CORE
, __DATA__
,
default
, DESTROY
, else
, elseif, elsif
, END
, __END__
,
eq
, for
, foreach
, ge
, given
, gt
, if
, INIT
, le
,
lt
, ne
, not
, or
, UNITCHECK
, unless
, until
, when
,
while
, x
, xor
Perl was born in Unix and can therefore access all common Unix system calls. In non-Unix environments, the functionality of some Unix system calls may not be available or details of the available functionality may differ slightly. The Perl functions affected by this are:
-X, binmode, chmod, chown, chroot, crypt,
dbmclose, dbmopen, dump, endgrent, endhostent,
endnetent, endprotoent, endpwent, endservent, exec,
fcntl, flock, fork, getgrent, getgrgid, gethostbyname,
gethostent, getlogin, getnetbyaddr, getnetbyname, getnetent,
getppid, getpgrp, getpriority, getprotobynumber,
getprotoent, getpwent, getpwnam, getpwuid,
getservbyport, getservent, getsockopt, glob, ioctl,
kill, link, lstat, msgctl, msgget, msgrcv,
msgsnd, open, pipe, readlink, rename, select, semctl,
semget, semop, setgrent, sethostent, setnetent,
setpgrp, setpriority, setprotoent, setpwent,
setservent, setsockopt, shmctl, shmget, shmread,
shmwrite, socket, socketpair,
stat, symlink, syscall, sysopen, system,
times, truncate, umask, unlink,
utime, wait, waitpid
For more information about the portability of these functions, see perlport and other available platform-specific documentation.
A file test, where X is one of the letters listed below. This unary
operator takes one argument, either a filename, a filehandle, or a dirhandle,
and tests the associated file to see if something is true about it. If the
argument is omitted, tests $_
, except for -t
, which tests STDIN.
Unless otherwise documented, it returns 1
for true and ''
for false, or
the undefined value if the file doesn't exist. Despite the funny
names, precedence is the same as any other named unary operator. The
operator may be any of:
- -r File is readable by effective uid/gid.
- -w File is writable by effective uid/gid.
- -x File is executable by effective uid/gid.
- -o File is owned by effective uid.
- -R File is readable by real uid/gid.
- -W File is writable by real uid/gid.
- -X File is executable by real uid/gid.
- -O File is owned by real uid.
- -e File exists.
- -z File has zero size (is empty).
- -s File has nonzero size (returns size in bytes).
- -f File is a plain file.
- -d File is a directory.
- -l File is a symbolic link.
- -p File is a named pipe (FIFO), or Filehandle is a pipe.
- -S File is a socket.
- -b File is a block special file.
- -c File is a character special file.
- -t Filehandle is opened to a tty.
- -u File has setuid bit set.
- -g File has setgid bit set.
- -k File has sticky bit set.
- -T File is an ASCII text file (heuristic guess).
- -B File is a "binary" file (opposite of -T).
- -M Script start time minus file modification time, in days.
- -A Same for access time.
- -C Same for inode change time (Unix, may differ for other
- platforms)
Example:
Note that -s/a/b/ does not do a negated substitution. Saying
-exp($foo)
still works as expected, however: only single letters
following a minus are interpreted as file tests.
These operators are exempt from the "looks like a function rule" described above. That is, an opening parenthesis after the operator does not affect how much of the following code constitutes the argument. Put the opening parentheses before the operator to separate it from code that follows (this applies only to operators with higher precedence than unary operators, of course):
- -s($file) + 1024 # probably wrong; same as -s($file + 1024)
- (-s $file) + 1024 # correct
The interpretation of the file permission operators -r
, -R
,
-w
, -W
, -x
, and -X is by default based solely on the mode
of the file and the uids and gids of the user. There may be other
reasons you can't actually read, write, or execute the file: for
example network filesystem access controls, ACLs (access control lists),
read-only filesystems, and unrecognized executable formats. Note
that the use of these six specific operators to verify if some operation
is possible is usually a mistake, because it may be open to race
conditions.
Also note that, for the superuser on the local filesystems, the -r
,
-R
, -w
, and -W
tests always return 1, and -x
and -X return 1
if any execute bit is set in the mode. Scripts run by the superuser
may thus need to do a stat() to determine the actual mode of the file,
or temporarily set their effective uid to something else.
If you are using ACLs, there is a pragma called filetest
that may
produce more accurate results than the bare stat() mode bits.
When under use filetest 'access'
the above-mentioned filetests
test whether the permission can(not) be granted using the
access(2) family of system calls. Also note that the -x
and -X may
under this pragma return true even if there are no execute permission
bits set (nor any extra execute permission ACLs). This strangeness is
due to the underlying system calls' definitions. Note also that, due to
the implementation of use filetest 'access'
, the _
special
filehandle won't cache the results of the file tests when this pragma is
in effect. Read the documentation for the filetest
pragma for more
information.
The -T
and -B
switches work as follows. The first block or so of the
file is examined for odd characters such as strange control codes or
characters with the high bit set. If too many strange characters (>30%)
are found, it's a -B
file; otherwise it's a -T
file. Also, any file
containing a zero byte in the first block is considered a binary file. If -T
or -B
is used on a filehandle, the current IO buffer is examined
rather than the first block. Both -T
and -B
return true on an empty
file, or a file at EOF when testing a filehandle. Because you have to
read a file to do the -T
test, on most occasions you want to use a -f
against the file first, as in next unless -f $file && -T $file
.
If any of the file tests (or either the stat or lstat operator) is given
the special filehandle consisting of a solitary underline, then the stat
structure of the previous file test (or stat operator) is used, saving
a system call. (This doesn't work with -t
, and you need to remember
that lstat() and -l
leave values in the stat structure for the
symbolic link, not the real file.) (Also, if the stat buffer was filled by
an lstat call, -T
and -B
will reset it with the results of stat _
).
Example:
As of Perl 5.10.0, as a form of purely syntactic sugar, you can stack file
test operators, in a way that -f -w -x $file
is equivalent to
-x $file && -w _ && -f _
. (This is only fancy fancy: if you use
the return value of -f $file
as an argument to another filetest
operator, no special magic will happen.)
Portability issues: -X in perlport.
To avoid confusing would-be users of your code with mysterious syntax errors, put something like this at the top of your script:
- use 5.010; # so filetest ops can stack
Returns the absolute value of its argument.
If VALUE is omitted, uses $_
.
Accepts an incoming socket connect, just as accept(2) does. Returns the packed address if it succeeded, false otherwise. See the example in Sockets: Client/Server Communication in perlipc.
On systems that support a close-on-exec flag on files, the flag will be set for the newly opened file descriptor, as determined by the value of $^F. See $^F in perlvar.
Arranges to have a SIGALRM delivered to this process after the
specified number of wallclock seconds has elapsed. If SECONDS is not
specified, the value stored in $_
is used. (On some machines,
unfortunately, the elapsed time may be up to one second less or more
than you specified because of how seconds are counted, and process
scheduling may delay the delivery of the signal even further.)
Only one timer may be counting at once. Each call disables the
previous timer, and an argument of 0
may be supplied to cancel the
previous timer without starting a new one. The returned value is the
amount of time remaining on the previous timer.
For delays of finer granularity than one second, the Time::HiRes module
(from CPAN, and starting from Perl 5.8 part of the standard
distribution) provides ualarm(). You may also use Perl's four-argument
version of select() leaving the first three arguments undefined, or you
might be able to use the syscall interface to access setitimer(2) if
your system supports it. See perlfaq8 for details.
It is usually a mistake to intermix alarm and sleep calls, because
sleep may be internally implemented on your system with alarm.
If you want to use alarm to time out a system call you need to use an
eval/die pair. You can't rely on the alarm causing the system call to
fail with $!
set to EINTR
because Perl sets up signal handlers to
restart system calls on some systems. Using eval/die always works,
modulo the caveats given in Signals in perlipc.
For more information see perlipc.
Portability issues: alarm in perlport.
Returns the arctangent of Y/X in the range -PI to PI.
For the tangent operation, you may use the Math::Trig::tan
function, or use the familiar relation:
The return value for atan2(0,0) is implementation-defined; consult
your atan2(3) manpage for more information.
Portability issues: atan2 in perlport.
Binds a network address to a socket, just as bind(2) does. Returns true if it succeeded, false otherwise. NAME should be a packed address of the appropriate type for the socket. See the examples in Sockets: Client/Server Communication in perlipc.
Arranges for FILEHANDLE to be read or written in "binary" or "text"
mode on systems where the run-time libraries distinguish between
binary and text files. If FILEHANDLE is an expression, the value is
taken as the name of the filehandle. Returns true on success,
otherwise it returns undef and sets $!
(errno).
On some systems (in general, DOS- and Windows-based systems) binmode() is necessary when you're not working with a text file. For the sake of portability it is a good idea always to use it when appropriate, and never to use it when it isn't appropriate. Also, people can set their I/O to be by default UTF8-encoded Unicode, not bytes.
In other words: regardless of platform, use binmode() on binary data, like images, for example.
If LAYER is present it is a single string, but may contain multiple directives. The directives alter the behaviour of the filehandle. When LAYER is present, using binmode on a text file makes sense.
If LAYER is omitted or specified as :raw
the filehandle is made
suitable for passing binary data. This includes turning off possible CRLF
translation and marking it as bytes (as opposed to Unicode characters).
Note that, despite what may be implied in "Programming Perl" (the
Camel, 3rd edition) or elsewhere, :raw
is not simply the inverse of :crlf
.
Other layers that would affect the binary nature of the stream are
also disabled. See PerlIO, perlrun, and the discussion about the
PERLIO environment variable.
The :bytes
, :crlf
, :utf8
, and any other directives of the
form :...
, are called I/O layers. The open pragma can be used to
establish default I/O layers. See open.
The LAYER parameter of the binmode() function is described as "DISCIPLINE" in "Programming Perl, 3rd Edition". However, since the publishing of this book, by many known as "Camel III", the consensus of the naming of this functionality has moved from "discipline" to "layer". All documentation of this version of Perl therefore refers to "layers" rather than to "disciplines". Now back to the regularly scheduled documentation...
To mark FILEHANDLE as UTF-8, use :utf8
or :encoding(UTF-8)
.
:utf8
just marks the data as UTF-8 without further checking,
while :encoding(UTF-8)
checks the data for actually being valid
UTF-8. More details can be found in PerlIO::encoding.
In general, binmode() should be called after open() but before any I/O
is done on the filehandle. Calling binmode() normally flushes any
pending buffered output data (and perhaps pending input data) on the
handle. An exception to this is the :encoding
layer that
changes the default character encoding of the handle; see open.
The :encoding
layer sometimes needs to be called in
mid-stream, and it doesn't flush the stream. The :encoding
also implicitly pushes on top of itself the :utf8
layer because
internally Perl operates on UTF8-encoded Unicode characters.
The operating system, device drivers, C libraries, and Perl run-time
system all conspire to let the programmer treat a single
character (\n
) as the line terminator, irrespective of external
representation. On many operating systems, the native text file
representation matches the internal representation, but on some
platforms the external representation of \n
is made up of more than
one character.
All variants of Unix, Mac OS (old and new), and Stream_LF files on VMS use
a single character to end each line in the external representation of text
(even though that single character is CARRIAGE RETURN on old, pre-Darwin
flavors of Mac OS, and is LINE FEED on Unix and most VMS files). In other
systems like OS/2, DOS, and the various flavors of MS-Windows, your program
sees a \n
as a simple \cJ
, but what's stored in text files are the
two characters \cM\cJ
. That means that if you don't use binmode() on
these systems, \cM\cJ
sequences on disk will be converted to \n
on
input, and any \n
in your program will be converted back to \cM\cJ
on
output. This is what you want for text files, but it can be disastrous for
binary files.
Another consequence of using binmode() (on some systems) is that
special end-of-file markers will be seen as part of the data stream.
For systems from the Microsoft family this means that, if your binary
data contain \cZ
, the I/O subsystem will regard it as the end of
the file, unless you use binmode().
binmode() is important not only for readline() and print() operations,
but also when using read(), seek(), sysread(), syswrite() and tell()
(see perlport for more details). See the $/
and $\
variables
in perlvar for how to manually set your input and output
line-termination sequences.
Portability issues: binmode in perlport.
This function tells the thingy referenced by REF that it is now an object
in the CLASSNAME package. If CLASSNAME is omitted, the current package
is used. Because a bless is often the last thing in a constructor,
it returns the reference for convenience. Always use the two-argument
version if a derived class might inherit the function doing the blessing.
See perlobj for more about the blessing (and blessings) of objects.
Consider always blessing objects in CLASSNAMEs that are mixed case. Namespaces with all lowercase names are considered reserved for Perl pragmata. Builtin types have all uppercase names. To prevent confusion, you may wish to avoid such package names as well. Make sure that CLASSNAME is a true value.
Break out of a given()
block.
This keyword is enabled by the "switch"
feature; see feature for
more information on "switch"
. You can also access it by prefixing it
with CORE::
. Alternatively, include a use v5.10
or later to the
current scope.
Returns the context of the current subroutine call. In scalar context,
returns the caller's package name if there is a caller (that is, if
we're in a subroutine or eval or require) and the undefined value
otherwise. In list context, returns
- # 0 1 2
- ($package, $filename, $line) = caller;
With EXPR, it returns some extra information that the debugger uses to print a stack trace. The value of EXPR indicates how many call frames to go back before the current one.
- # 0 1 2 3 4
- ($package, $filename, $line, $subroutine, $hasargs,
- # 5 6 7 8 9 10
- $wantarray, $evaltext, $is_require, $hints, $bitmask, $hinthash)
- = caller($i);
Here $subroutine may be (eval)
if the frame is not a subroutine
call, but an eval. In such a case additional elements $evaltext and
$is_require
are set: $is_require
is true if the frame is created by a
require or use statement, $evaltext contains the text of the
eval EXPR
statement. In particular, for an eval BLOCK
statement,
$subroutine is (eval)
, but $evaltext is undefined. (Note also that
each use statement creates a require frame inside an eval EXPR
frame.) $subroutine may also be (unknown)
if this particular
subroutine happens to have been deleted from the symbol table.
$hasargs
is true if a new instance of @_
was set up for the frame.
$hints
and $bitmask
contain pragmatic hints that the caller was
compiled with. $hints
corresponds to $^H
, and $bitmask
corresponds to ${^WARNING_BITS}
. The
$hints
and $bitmask
values are subject
to change between versions of Perl, and are not meant for external use.
$hinthash
is a reference to a hash containing the value of %^H
when the
caller was compiled, or undef if %^H
was empty. Do not modify the values
of this hash, as they are the actual values stored in the optree.
Furthermore, when called from within the DB package in
list context, and with an argument, caller returns more
detailed information: it sets the list variable @DB::args
to be the
arguments with which the subroutine was invoked.
Be aware that the optimizer might have optimized call frames away before
caller had a chance to get the information. That means that caller(N)
might not return information about the call frame you expect it to, for
N > 1
. In particular, @DB::args
might have information from the
previous time caller was called.
Be aware that setting @DB::args
is best effort, intended for
debugging or generating backtraces, and should not be relied upon. In
particular, as @_
contains aliases to the caller's arguments, Perl does
not take a copy of @_
, so @DB::args
will contain modifications the
subroutine makes to @_
or its contents, not the original values at call
time. @DB::args
, like @_
, does not hold explicit references to its
elements, so under certain cases its elements may have become freed and
reallocated for other variables or temporary values. Finally, a side effect
of the current implementation is that the effects of shift @_
can
normally be undone (but not pop @_
or other splicing, and not if a
reference to @_
has been taken, and subject to the caveat about reallocated
elements), so @DB::args
is actually a hybrid of the current state and
initial state of @_
. Buyer beware.
Changes the working directory to EXPR, if possible. If EXPR is omitted,
changes to the directory specified by $ENV{HOME}
, if set; if not,
changes to the directory specified by $ENV{LOGDIR}
. (Under VMS, the
variable $ENV{SYS$LOGIN}
is also checked, and used if it is set.) If
neither is set, chdir does nothing. It returns true on success,
false otherwise. See the example under die.
On systems that support fchdir(2), you may pass a filehandle or directory handle as the argument. On systems that don't support fchdir(2), passing handles raises an exception.
Changes the permissions of a list of files. The first element of the
list must be the numeric mode, which should probably be an octal
number, and which definitely should not be a string of octal digits:
0644
is okay, but "0644"
is not. Returns the number of files
successfully changed. See also oct if all you have is a string.
On systems that support fchmod(2), you may pass filehandles among the files. On systems that don't support fchmod(2), passing filehandles raises an exception. Filehandles must be passed as globs or glob references to be recognized; barewords are considered filenames.
You can also import the symbolic S_I*
constants from the Fcntl
module:
Portability issues: chmod in perlport.
This safer version of chop removes any trailing string
that corresponds to the current value of $/
(also known as
$INPUT_RECORD_SEPARATOR in the English
module). It returns the total
number of characters removed from all its arguments. It's often used to
remove the newline from the end of an input record when you're worried
that the final record may be missing its newline. When in paragraph
mode ($/ = ""
), it removes all trailing newlines from the string.
When in slurp mode ($/ = undef
) or fixed-length record mode ($/
is
a reference to an integer or the like; see perlvar) chomp() won't
remove anything.
If VARIABLE is omitted, it chomps $_
. Example:
If VARIABLE is a hash, it chomps the hash's values, but not its keys.
You can actually chomp anything that's an lvalue, including an assignment:
If you chomp a list, each element is chomped, and the total number of characters removed is returned.
Note that parentheses are necessary when you're chomping anything
that is not a simple variable. This is because chomp $cwd = `pwd`;
is interpreted as (chomp $cwd) = `pwd`;
, rather than as
chomp( $cwd = `pwd` )
which you might expect. Similarly,
chomp $a, $b
is interpreted as chomp($a), $b
rather than
as chomp($a, $b)
.
Chops off the last character of a string and returns the character
chopped. It is much more efficient than s/.$//s because it neither
scans nor copies the string. If VARIABLE is omitted, chops $_
.
If VARIABLE is a hash, it chops the hash's values, but not its keys.
You can actually chop anything that's an lvalue, including an assignment.
If you chop a list, each element is chopped. Only the value of the
last chop is returned.
Note that chop returns the last character. To return all but the last
character, use substr($string, 0, -1)
.
See also chomp.
Changes the owner (and group) of a list of files. The first two elements of the list must be the numeric uid and gid, in that order. A value of -1 in either position is interpreted by most systems to leave that value unchanged. Returns the number of files successfully changed.
On systems that support fchown(2), you may pass filehandles among the files. On systems that don't support fchown(2), passing filehandles raises an exception. Filehandles must be passed as globs or glob references to be recognized; barewords are considered filenames.
Here's an example that looks up nonnumeric uids in the passwd file:
On most systems, you are not allowed to change the ownership of the file unless you're the superuser, although you should be able to change the group to any of your secondary groups. On insecure systems, these restrictions may be relaxed, but this is not a portable assumption. On POSIX systems, you can detect this condition this way:
- use POSIX qw(sysconf _PC_CHOWN_RESTRICTED);
- $can_chown_giveaway = not sysconf(_PC_CHOWN_RESTRICTED);
Portability issues: chmod in perlport.
Returns the character represented by that NUMBER in the character set.
For example, chr(65) is "A"
in either ASCII or Unicode, and
chr(0x263a) is a Unicode smiley face.
Negative values give the Unicode replacement character (chr(0xfffd)), except under the bytes pragma, where the low eight bits of the value (truncated to an integer) are used.
If NUMBER is omitted, uses $_
.
For the reverse, use ord.
Note that characters from 128 to 255 (inclusive) are by default internally not encoded as UTF-8 for backward compatibility reasons.
See perlunicode for more about Unicode.
This function works like the system call by the same name: it makes the
named directory the new root directory for all further pathnames that
begin with a / by your process and all its children. (It doesn't
change your current working directory, which is unaffected.) For security
reasons, this call is restricted to the superuser. If FILENAME is
omitted, does a chroot to $_
.
Portability issues: chroot in perlport.
Closes the file or pipe associated with the filehandle, flushes the IO buffers, and closes the system file descriptor. Returns true if those operations succeed and if no error was reported by any PerlIO layer. Closes the currently selected filehandle if the argument is omitted.
You don't have to close FILEHANDLE if you are immediately going to do
another open on it, because open closes it for you. (See
open.) However, an explicit close on an input file resets the line
counter ($.
), while the implicit close done by open does not.
If the filehandle came from a piped open, close returns false if one of
the other syscalls involved fails or if its program exits with non-zero
status. If the only problem was that the program exited non-zero, $!
will be set to 0
. Closing a pipe also waits for the process executing
on the pipe to exit--in case you wish to look at the output of the pipe
afterwards--and implicitly puts the exit status value of that command into
$?
and ${^CHILD_ERROR_NATIVE}
.
If there are multiple threads running, close on a filehandle from a
piped open returns true without waiting for the child process to terminate,
if the filehandle is still open in another thread.
Closing the read end of a pipe before the process writing to it at the other end is done writing results in the writer receiving a SIGPIPE. If the other end can't handle that, be sure to read all the data before closing the pipe.
Example:
FILEHANDLE may be an expression whose value can be used as an indirect filehandle, usually the real filehandle name or an autovivified handle.
Closes a directory opened by opendir and returns the success of that
system call.
Attempts to connect to a remote socket, just like connect(2). Returns true if it succeeded, false otherwise. NAME should be a packed address of the appropriate type for the socket. See the examples in Sockets: Client/Server Communication in perlipc.
When followed by a BLOCK, continue is actually a
flow control statement rather than a function. If
there is a continue BLOCK attached to a BLOCK (typically in a while
or
foreach
), it is always executed just before the conditional is about to
be evaluated again, just like the third part of a for
loop in C. Thus
it can be used to increment a loop variable, even when the loop has been
continued via the next statement (which is similar to the C continue
statement).
last, next, or redo may appear within a continue
block; last and redo behave as if they had been executed within
the main block. So will next, but since it will execute a continue
block, it may be more entertaining.
- while (EXPR) {
- ### redo always comes here
- do_something;
- } continue {
- ### next always comes here
- do_something_else;
- # then back the top to re-check EXPR
- }
- ### last always comes here
Omitting the continue section is equivalent to using an
empty one, logically enough, so next goes directly back
to check the condition at the top of the loop.
When there is no BLOCK, continue is a function that
falls through the current when
or default
block instead of iterating
a dynamically enclosing foreach
or exiting a lexically enclosing given
.
In Perl 5.14 and earlier, this form of continue was
only available when the "switch"
feature was enabled.
See feature and Switch Statements in perlsyn for more
information.
Returns the cosine of EXPR (expressed in radians). If EXPR is omitted,
takes the cosine of $_
.
For the inverse cosine operation, you may use the Math::Trig::acos()
function, or use this relation:
Creates a digest string exactly like the crypt(3) function in the C library (assuming that you actually have a version there that has not been extirpated as a potential munition).
crypt() is a one-way hash function. The PLAINTEXT and SALT are turned into a short string, called a digest, which is returned. The same PLAINTEXT and SALT will always return the same string, but there is no (known) way to get the original PLAINTEXT from the hash. Small changes in the PLAINTEXT or SALT will result in large changes in the digest.
There is no decrypt function. This function isn't all that useful for cryptography (for that, look for Crypt modules on your nearby CPAN mirror) and the name "crypt" is a bit of a misnomer. Instead it is primarily used to check if two pieces of text are the same without having to transmit or store the text itself. An example is checking if a correct password is given. The digest of the password is stored, not the password itself. The user types in a password that is crypt()'d with the same salt as the stored digest. If the two digests match, the password is correct.
When verifying an existing digest string you should use the digest as
the salt (like crypt($plain, $digest) eq $digest
). The SALT used
to create the digest is visible as part of the digest. This ensures
crypt() will hash the new string with the same salt as the digest.
This allows your code to work with the standard crypt and
with more exotic implementations. In other words, assume
nothing about the returned string itself nor about how many bytes
of SALT may matter.
Traditionally the result is a string of 13 bytes: two first bytes of
the salt, followed by 11 bytes from the set [./0-9A-Za-z], and only
the first eight bytes of PLAINTEXT mattered. But alternative
hashing schemes (like MD5), higher level security schemes (like C2),
and implementations on non-Unix platforms may produce different
strings.
When choosing a new salt create a random two character string whose
characters come from the set [./0-9A-Za-z] (like join '', ('.',
'/', 0..9, 'A'..'Z', 'a'..'z')[rand 64, rand 64]
). This set of
characters is just a recommendation; the characters allowed in
the salt depend solely on your system's crypt library, and Perl can't
restrict what salts crypt() accepts.
Here's an example that makes sure that whoever runs this program knows their password:
Of course, typing in your own password to whoever asks you for it is unwise.
The crypt function is unsuitable for hashing large quantities of data, not least of all because you can't get the information back. Look at the Digest module for more robust algorithms.
If using crypt() on a Unicode string (which potentially has
characters with codepoints above 255), Perl tries to make sense
of the situation by trying to downgrade (a copy of)
the string back to an eight-bit byte string before calling crypt()
(on that copy). If that works, good. If not, crypt() dies with
Wide character in crypt
.
Portability issues: crypt in perlport.
[This function has been largely superseded by the untie function.]
Breaks the binding between a DBM file and a hash.
Portability issues: dbmclose in perlport.
[This function has been largely superseded by the tie function.]
This binds a dbm(3), ndbm(3), sdbm(3), gdbm(3), or Berkeley DB file to a
hash. HASH is the name of the hash. (Unlike normal open, the first
argument is not a filehandle, even though it looks like one). DBNAME
is the name of the database (without the .dir or .pag extension if
any). If the database does not exist, it is created with protection
specified by MASK (as modified by the umask). To prevent creation of
the database if it doesn't exist, you may specify a MODE
of 0, and the function will return a false value if it
can't find an existing database. If your system supports
only the older DBM functions, you may make only one dbmopen call in your
program. In older versions of Perl, if your system had neither DBM nor
ndbm, calling dbmopen produced a fatal error; it now falls back to
sdbm(3).
If you don't have write access to the DBM file, you can only read hash
variables, not set them. If you want to test whether you can write,
either use file tests or try setting a dummy hash entry inside an eval
to trap the error.
Note that functions such as keys and values may return huge lists
when used on large DBM files. You may prefer to use the each
function to iterate over large DBM files. Example:
See also AnyDBM_File for a more general description of the pros and cons of the various dbm approaches, as well as DB_File for a particularly rich implementation.
You can control which DBM library you use by loading that library before you call dbmopen():
Portability issues: dbmopen in perlport.
Returns a Boolean value telling whether EXPR has a value other than
the undefined value undef. If EXPR is not present, $_
is
checked.
Many operations return undef to indicate failure, end of file,
system error, uninitialized variable, and other exceptional
conditions. This function allows you to distinguish undef from
other values. (A simple Boolean test will not distinguish among
undef, zero, the empty string, and "0"
, which are all equally
false.) Note that since undef is a valid scalar, its presence
doesn't necessarily indicate an exceptional condition: pop
returns undef when its argument is an empty array, or when the
element to return happens to be undef.
You may also use defined(&func) to check whether subroutine &func
has ever been defined. The return value is unaffected by any forward
declarations of &func
. A subroutine that is not defined
may still be callable: its package may have an AUTOLOAD
method that
makes it spring into existence the first time that it is called; see
perlsub.
Use of defined on aggregates (hashes and arrays) is deprecated. It
used to report whether memory for that aggregate had ever been
allocated. This behavior may disappear in future versions of Perl.
You should instead use a simple test for size:
When used on a hash element, it tells you whether the value is defined, not whether the key exists in the hash. Use exists for the latter purpose.
Examples:
Note: Many folks tend to overuse defined and are then surprised to
discover that the number 0
and ""
(the zero-length string) are, in fact,
defined values. For example, if you say
- "ab" =~ /a(.*)b/;
The pattern match succeeds and $1
is defined, although it
matched "nothing". It didn't really fail to match anything. Rather, it
matched something that happened to be zero characters long. This is all
very above-board and honest. When a function returns an undefined value,
it's an admission that it couldn't give you an honest answer. So you
should use defined only when questioning the integrity of what
you're trying to do. At other times, a simple comparison to 0
or ""
is
what you want.
Given an expression that specifies an element or slice of a hash, delete
deletes the specified elements from that hash so that exists() on that element
no longer returns true. Setting a hash element to the undefined value does
not remove its key, but deleting it does; see exists.
In list context, returns the value or values deleted, or the last such element in scalar context. The return list's length always matches that of the argument list: deleting non-existent elements returns the undefined value in their corresponding positions.
delete() may also be used on arrays and array slices, but its behavior is less straightforward. Although exists() will return false for deleted entries, deleting array elements never changes indices of existing values; use shift() or splice() for that. However, if all deleted elements fall at the end of an array, the array's size shrinks to the position of the highest element that still tests true for exists(), or to 0 if none do.
WARNING: Calling delete on array values is deprecated and likely to be removed in a future version of Perl.
Deleting from %ENV
modifies the environment. Deleting from a hash tied to
a DBM file deletes the entry from the DBM file. Deleting from a tied hash
or array may not necessarily return anything; it depends on the implementation
of the tied package's DELETE method, which may do whatever it pleases.
The delete local EXPR
construct localizes the deletion to the current
block at run time. Until the block exits, elements locally deleted
temporarily no longer exist. See Localized deletion of elements of composite types in perlsub.
The following (inefficiently) deletes all the values of %HASH and @ARRAY:
And so do these:
But both are slower than assigning the empty list or undefining %HASH or @ARRAY, which is the customary way to empty out an aggregate:
The EXPR can be arbitrarily complicated provided its final operation is an element or slice of an aggregate:
die raises an exception. Inside an eval the error message is stuffed
into $@
and the eval is terminated with the undefined value.
If the exception is outside of all enclosing evals, then the uncaught
exception prints LIST to STDERR
and exits with a non-zero value. If you
need to exit the process with a specific exit code, see exit.
Equivalent examples:
If the last element of LIST does not end in a newline, the current
script line number and input line number (if any) are also printed,
and a newline is supplied. Note that the "input line number" (also
known as "chunk") is subject to whatever notion of "line" happens to
be currently in effect, and is also available as the special variable
$.
. See $/ in perlvar and $. in perlvar.
Hint: sometimes appending ", stopped"
to your message will cause it
to make better sense when the string "at foo line 123"
is appended.
Suppose you are running script "canasta".
produce, respectively
- /etc/games is no good at canasta line 123.
- /etc/games is no good, stopped at canasta line 123.
If the output is empty and $@
already contains a value (typically from a
previous eval) that value is reused after appending "\t...propagated"
.
This is useful for propagating exceptions:
If the output is empty and $@
contains an object reference that has a
PROPAGATE
method, that method will be called with additional file
and line number parameters. The return value replaces the value in
$@
; i.e., as if $@ = eval { $@->PROPAGATE(__FILE__, __LINE__) };
were called.
If $@
is empty then the string "Died"
is used.
If an uncaught exception results in interpreter exit, the exit code is
determined from the values of $!
and $?
with this pseudocode:
The intent is to squeeze as much possible information about the likely cause
into the limited space of the system exit
code. However, as $!
is the value
of C's errno
, which can be set by any system call, this means that the value
of the exit code used by die can be non-predictable, so should not be relied
upon, other than to be non-zero.
You can also call die with a reference argument, and if this is trapped
within an eval, $@
contains that reference. This permits more
elaborate exception handling using objects that maintain arbitrary state
about the exception. Such a scheme is sometimes preferable to matching
particular string values of $@
with regular expressions. Because $@
is a global variable and eval may be used within object implementations,
be careful that analyzing the error object doesn't replace the reference in
the global variable. It's easiest to make a local copy of the reference
before any manipulations. Here's an example:
Because Perl stringifies uncaught exception messages before display, you'll probably want to overload stringification operations on exception objects. See overload for details about that.
You can arrange for a callback to be run just before the die
does its deed, by setting the $SIG{__DIE__}
hook. The associated
handler is called with the error text and can change the error
message, if it sees fit, by calling die again. See
%SIG in perlvar for details on setting %SIG
entries, and
eval BLOCK for some examples. Although this feature was
to be run only right before your program was to exit, this is not
currently so: the $SIG{__DIE__}
hook is currently called
even inside eval()ed blocks/strings! If one wants the hook to do
nothing in such situations, put
- die @_ if $^S;
as the first line of the handler (see $^S in perlvar). Because this promotes strange action at a distance, this counterintuitive behavior may be fixed in a future release.
See also exit(), warn(), and the Carp module.
Not really a function. Returns the value of the last command in the
sequence of commands indicated by BLOCK. When modified by the while
or
until
loop modifier, executes the BLOCK once before testing the loop
condition. (On other statements the loop modifiers test the conditional
first.)
do BLOCK
does not count as a loop, so the loop control statements
next, last, or redo cannot be used to leave or restart the block.
See perlsyn for alternative strategies.
This form of subroutine call is deprecated. SUBROUTINE can be a bareword or scalar variable.
Uses the value of EXPR as a filename and executes the contents of the file as a Perl script.
- do 'stat.pl';
is largely like
- eval `cat stat.pl`;
except that it's more concise, runs no external processes, keeps track of
the current
filename for error messages, searches the @INC
directories, and updates
%INC
if the file is found. See @INC in perlvar and %INC in perlvar for
these variables. It also differs in that code evaluated with do FILENAME
cannot see lexicals in the enclosing scope; eval STRING
does. It's the
same, however, in that it does reparse the file every time you call it,
so you probably don't want to do this inside a loop.
If do can read the file but cannot compile it, it returns undef and sets
an error message in $@
. If do cannot read the file, it returns undef
and sets $!
to the error. Always check $@
first, as compilation
could fail in a way that also sets $!
. If the file is successfully
compiled, do returns the value of the last expression evaluated.
Inclusion of library modules is better done with the
use and require operators, which also do automatic error checking
and raise an exception if there's a problem.
You might like to use do to read in a program configuration
file. Manual error checking can be done this way:
This function causes an immediate core dump. See also the -u
command-line switch in perlrun, which does the same thing.
Primarily this is so that you can use the undump program (not
supplied) to turn your core dump into an executable binary after
having initialized all your variables at the beginning of the
program. When the new binary is executed it will begin by executing
a goto LABEL
(with all the restrictions that goto suffers).
Think of it as a goto with an intervening core dump and reincarnation.
If LABEL
is omitted, restarts the program from the top. The
dump EXPR
form, available starting in Perl 5.18.0, allows a name to be
computed at run time, being otherwise identical to dump LABEL
.
WARNING: Any files opened at the time of the dump will not be open any more when the program is reincarnated, with possible resulting confusion by Perl.
This function is now largely obsolete, mostly because it's very hard to
convert a core file into an executable. That's why you should now invoke
it as CORE::dump()
, if you don't want to be warned against a possible
typo.
Unlike most named operators, this has the same precedence as assignment.
It is also exempt from the looks-like-a-function rule, so
dump ("foo")."bar"
will cause "bar" to be part of the argument to
dump.
Portability issues: dump in perlport.
When called on a hash in list context, returns a 2-element list consisting of the key and value for the next element of a hash. In Perl 5.12 and later only, it will also return the index and value for the next element of an array so that you can iterate over it; older Perls consider this a syntax error. When called in scalar context, returns only the key (not the value) in a hash, or the index in an array.
Hash entries are returned in an apparently random order. The actual random
order is specific to a given hash; the exact same series of operations
on two hashes may result in a different order for each hash. Any insertion
into the hash may change the order, as will any deletion, with the exception
that the most recent key returned by each or keys may be deleted
without changing the order. So long as a given hash is unmodified you may
rely on keys, values and each to repeatedly return the same order
as each other. See Algorithmic Complexity Attacks in perlsec for
details on why hash order is randomized. Aside from the guarantees
provided here the exact details of Perl's hash algorithm and the hash
traversal order are subject to change in any release of Perl.
After each has returned all entries from the hash or array, the next
call to each returns the empty list in list context and undef in
scalar context; the next call following that one restarts iteration.
Each hash or array has its own internal iterator, accessed by each,
keys, and values. The iterator is implicitly reset when each has
reached the end as just described; it can be explicitly reset by calling
keys or values on the hash or array. If you add or delete a hash's
elements while iterating over it, entries may be skipped or duplicated--so
don't do that. Exception: In the current implementation, it is always safe
to delete the item most recently returned by each(), so the following
code works properly:
This prints out your environment like the printenv(1) program, but in a different order:
Starting with Perl 5.14, each can take a scalar EXPR, which must hold
reference to an unblessed hash or array. The argument will be dereferenced
automatically. This aspect of each is considered highly experimental.
The exact behaviour may change in a future version of Perl.
- while (($key,$value) = each $hashref) { ... }
As of Perl 5.18 you can use a bare each in a while
loop,
which will set $_
on every iteration.
To avoid confusing would-be users of your code who are running earlier versions of Perl with mysterious syntax errors, put this sort of thing at the top of your file to signal that your code will work only on Perls of a recent vintage:
Returns 1 if the next read on FILEHANDLE will return end of file or if
FILEHANDLE is not open. FILEHANDLE may be an expression whose value
gives the real filehandle. (Note that this function actually
reads a character and then ungetc
s it, so isn't useful in an
interactive context.) Do not read from a terminal file (or call
eof(FILEHANDLE) on it) after end-of-file is reached. File types such
as terminals may lose the end-of-file condition if you do.
An eof without an argument uses the last file read. Using eof()
with empty parentheses is different. It refers to the pseudo file
formed from the files listed on the command line and accessed via the
<>
operator. Since <>
isn't explicitly opened,
as a normal filehandle is, an eof() before <>
has been
used will cause @ARGV
to be examined to determine if input is
available. Similarly, an eof() after <>
has returned
end-of-file will assume you are processing another @ARGV
list,
and if you haven't set @ARGV
, will read input from STDIN
;
see I/O Operators in perlop.
In a while (<>)
loop, eof or eof(ARGV) can be used to
detect the end of each file, whereas eof() will detect the end
of the very last file only. Examples:
- # reset line numbering on each input file
- while (<>) {
- next if /^\s*#/; # skip comments
- print "$.\t$_";
- } continue {
- close ARGV if eof; # Not eof()!
- }
- # insert dashes just before last line of last file
- while (<>) {
- if (eof()) { # check for end of last file
- print "--------------\n";
- }
- print;
- last if eof(); # needed if we're reading from a terminal
- }
Practical hint: you almost never need to use eof in Perl, because the
input operators typically return undef when they run out of data or
encounter an error.
In the first form, the return value of EXPR is parsed and executed as if it were a little Perl program. The value of the expression (which is itself determined within scalar context) is first parsed, and if there were no errors, executed as a block within the lexical context of the current Perl program. This means, that in particular, any outer lexical variables are visible to it, and any package variable settings or subroutine and format definitions remain afterwards.
Note that the value is parsed every time the eval executes.
If EXPR is omitted, evaluates $_
. This form is typically used to
delay parsing and subsequent execution of the text of EXPR until run time.
If the unicode_eval
feature is enabled (which is the default under a
use 5.16
or higher declaration), EXPR or $_
is treated as a string of
characters, so use utf8
declarations have no effect, and source filters
are forbidden. In the absence of the unicode_eval
feature, the string
will sometimes be treated as characters and sometimes as bytes, depending
on the internal encoding, and source filters activated within the eval
exhibit the erratic, but historical, behaviour of affecting some outer file
scope that is still compiling. See also the evalbytes keyword, which
always treats its input as a byte stream and works properly with source
filters, and the feature pragma.
In the second form, the code within the BLOCK is parsed only once--at the
same time the code surrounding the eval itself was parsed--and executed
within the context of the current Perl program. This form is typically
used to trap exceptions more efficiently than the first (see below), while
also providing the benefit of checking the code within BLOCK at compile
time.
The final semicolon, if any, may be omitted from the value of EXPR or within the BLOCK.
In both forms, the value returned is the value of the last expression
evaluated inside the mini-program; a return statement may be also used, just
as with subroutines. The expression providing the return value is evaluated
in void, scalar, or list context, depending on the context of the eval
itself. See wantarray for more on how the evaluation context can be
determined.
If there is a syntax error or runtime error, or a die statement is
executed, eval returns undef in scalar context
or an empty list in list context, and $@
is set to the error
message. (Prior to 5.16, a bug caused undef to be returned
in list context for syntax errors, but not for runtime errors.)
If there was no error, $@
is set to the empty string. A
control flow operator like last or goto can bypass the setting of
$@
. Beware that using eval neither silences Perl from printing
warnings to STDERR, nor does it stuff the text of warning messages into $@
.
To do either of those, you have to use the $SIG{__WARN__}
facility, or
turn off warnings inside the BLOCK or EXPR using no warnings 'all'
.
See warn, perlvar, warnings and perllexwarn.
Note that, because eval traps otherwise-fatal errors, it is useful for
determining whether a particular feature (such as socket or symlink)
is implemented. It is also Perl's exception-trapping mechanism, where
the die operator is used to raise exceptions.
If you want to trap errors when loading an XS module, some problems with
the binary interface (such as Perl version skew) may be fatal even with
eval unless $ENV{PERL_DL_NONLAZY}
is set. See perlrun.
If the code to be executed doesn't vary, you may use the eval-BLOCK
form to trap run-time errors without incurring the penalty of
recompiling each time. The error, if any, is still returned in $@
.
Examples:
Using the eval{} form as an exception trap in libraries does have some
issues. Due to the current arguably broken state of __DIE__
hooks, you
may wish not to trigger any __DIE__
hooks that user code may have installed.
You can use the local $SIG{__DIE__}
construct for this purpose,
as this example shows:
This is especially significant, given that __DIE__
hooks can call
die again, which has the effect of changing their error messages:
Because this promotes action at a distance, this counterintuitive behavior may be fixed in a future release.
With an eval, you should be especially careful to remember what's
being looked at when:
Cases 1 and 2 above behave identically: they run the code contained in
the variable $x. (Although case 2 has misleading double quotes making
the reader wonder what else might be happening (nothing is).) Cases 3
and 4 likewise behave in the same way: they run the code '$x'
, which
does nothing but return the value of $x. (Case 4 is preferred for
purely visual reasons, but it also has the advantage of compiling at
compile-time instead of at run-time.) Case 5 is a place where
normally you would like to use double quotes, except that in this
particular situation, you can just use symbolic references instead, as
in case 6.
Before Perl 5.14, the assignment to $@
occurred before restoration
of localized variables, which means that for your code to run on older
versions, a temporary is required if you want to mask some but not all
errors:
eval BLOCK
does not count as a loop, so the loop control statements
next, last, or redo cannot be used to leave or restart the block.
An eval ''
executed within a subroutine defined
in the DB
package doesn't see the usual
surrounding lexical scope, but rather the scope of the first non-DB piece
of code that called it. You don't normally need to worry about this unless
you are writing a Perl debugger.
This function is like eval with a string argument, except it always
parses its argument, or $_
if EXPR is omitted, as a string of bytes. A
string containing characters whose ordinal value exceeds 255 results in an
error. Source filters activated within the evaluated code apply to the
code itself.
This function is only available under the evalbytes feature, a
use v5.16
(or higher) declaration, or with a CORE::
prefix. See
feature for more information.
The exec function executes a system command and never returns;
use system instead of exec if you want it to return. It fails and
returns false only if the command does not exist and it is executed
directly instead of via your system's command shell (see below).
Since it's a common mistake to use exec instead of system, Perl
warns you if exec is called in void context and if there is a following
statement that isn't die, warn, or exit (if -w
is set--but
you always do that, right?). If you really want to follow an exec
with some other statement, you can use one of these styles to avoid the warning:
If there is more than one argument in LIST, or if LIST is an array
with more than one value, calls execvp(3) with the arguments in LIST.
If there is only one scalar argument or an array with one element in it,
the argument is checked for shell metacharacters, and if there are any,
the entire argument is passed to the system's command shell for parsing
(this is /bin/sh -c on Unix platforms, but varies on other platforms).
If there are no shell metacharacters in the argument, it is split into
words and passed directly to execvp
, which is more efficient.
Examples:
If you don't really want to execute the first argument, but want to lie to the program you are executing about its own name, you can specify the program you actually want to run as an "indirect object" (without a comma) in front of the LIST. (This always forces interpretation of the LIST as a multivalued list, even if there is only a single scalar in the list.) Example:
- $shell = '/bin/csh';
- exec $shell '-sh'; # pretend it's a login shell
or, more directly,
- exec {'/bin/csh'} '-sh'; # pretend it's a login shell
When the arguments get executed via the system shell, results are subject to its quirks and capabilities. See `STRING` in perlop for details.
Using an indirect object with exec or system is also more
secure. This usage (which also works fine with system()) forces
interpretation of the arguments as a multivalued list, even if the
list had just one argument. That way you're safe from the shell
expanding wildcards or splitting up words with whitespace in them.
The first version, the one without the indirect object, ran the echo
program, passing it "surprise"
an argument. The second version didn't;
it tried to run a program named "echo surprise", didn't find it, and set
$?
to a non-zero value indicating failure.
Perl attempts to flush all files opened for output before the exec,
but this may not be supported on some platforms (see perlport).
To be safe, you may need to set $|
($AUTOFLUSH in English) or
call the autoflush()
method of IO::Handle
on any open handles
to avoid lost output.
Note that exec will not call your END
blocks, nor will it invoke
DESTROY
methods on your objects.
Portability issues: exec in perlport.
Given an expression that specifies an element of a hash, returns true if the specified element in the hash has ever been initialized, even if the corresponding value is undefined.
exists may also be called on array elements, but its behavior is much less obvious and is strongly tied to the use of delete on arrays. Be aware that calling exists on array values is deprecated and likely to be removed in a future version of Perl.
A hash or array element can be true only if it's defined and defined only if it exists, but the reverse doesn't necessarily hold true.
Given an expression that specifies the name of a subroutine,
returns true if the specified subroutine has ever been declared, even
if it is undefined. Mentioning a subroutine name for exists or defined
does not count as declaring it. Note that a subroutine that does not
exist may still be callable: its package may have an AUTOLOAD
method that makes it spring into existence the first time that it is
called; see perlsub.
Note that the EXPR can be arbitrarily complicated as long as the final operation is a hash or array key lookup or subroutine name:
Although the most deeply nested array or hash element will not spring into
existence just because its existence was tested, any intervening ones will.
Thus $ref->{"A"}
and $ref->{"A"}->{"B"}
will spring
into existence due to the existence test for the $key element above.
This happens anywhere the arrow operator is used, including even here:
This surprising autovivification in what does not at first--or even second--glance appear to be an lvalue context may be fixed in a future release.
Use of a subroutine call, rather than a subroutine name, as an argument to exists() is an error.
Evaluates EXPR and exits immediately with that value. Example:
- $ans = <STDIN>;
- exit 0 if $ans =~ /^[Xx]/;
See also die. If EXPR is omitted, exits with 0
status. The only
universally recognized values for EXPR are 0
for success and 1
for error; other values are subject to interpretation depending on the
environment in which the Perl program is running. For example, exiting
69 (EX_UNAVAILABLE) from a sendmail incoming-mail filter will cause
the mailer to return the item undelivered, but that's not true everywhere.
Don't use exit to abort a subroutine if there's any chance that
someone might want to trap whatever error happened. Use die instead,
which can be trapped by an eval.
The exit() function does not always exit immediately. It calls any
defined END
routines first, but these END
routines may not
themselves abort the exit. Likewise any object destructors that need to
be called are called before the real exit. END
routines and destructors
can change the exit status by modifying $?
. If this is a problem, you
can call POSIX::_exit($status)
to avoid END and destructor processing.
See perlmod for details.
Portability issues: exit in perlport.
Returns e (the natural logarithm base) to the power of EXPR.
If EXPR is omitted, gives exp($_).
Returns the casefolded version of EXPR. This is the internal function
implementing the \F
escape in double-quoted strings.
Casefolding is the process of mapping strings to a form where case differences are erased; comparing two strings in their casefolded form is effectively a way of asking if two strings are equal, regardless of case.
Roughly, if you ever found yourself writing this
Now you can write
- fc($this) eq fc($that)
And get the correct results.
Perl only implements the full form of casefolding,
but you can access the simple folds using casefold() in Unicode::UCD and
prop_invmap() in Unicode::UCD.
For further information on casefolding, refer to
the Unicode Standard, specifically sections 3.13 Default Case Operations
,
4.2 Case-Normative
, and 5.18 Case Mappings
,
available at http://www.unicode.org/versions/latest/, as well as the
Case Charts available at http://www.unicode.org/charts/case/.
If EXPR is omitted, uses $_
.
This function behaves the same way under various pragma, such as in a locale, as lc does.
While the Unicode Standard defines two additional forms of casefolding,
one for Turkic languages and one that never maps one character into multiple
characters, these are not provided by the Perl core; However, the CPAN module
Unicode::Casing
may be used to provide an implementation.
This keyword is available only when the "fc"
feature is enabled,
or when prefixed with CORE::
; See feature. Alternately,
include a use v5.16
or later to the current scope.
Implements the fcntl(2) function. You'll probably have to say
- use Fcntl;
first to get the correct constant definitions. Argument processing and
value returned work just like ioctl below.
For example:
You don't have to check for defined on the return from fcntl.
Like ioctl, it maps a 0
return from the system call into
"0 but true"
in Perl. This string is true in boolean context and 0
in numeric context. It is also exempt from the normal -w warnings
on improper numeric conversions.
Note that fcntl raises an exception if used on a machine that
doesn't implement fcntl(2). See the Fcntl module or your fcntl(2)
manpage to learn what functions are available on your system.
Here's an example of setting a filehandle named REMOTE
to be
non-blocking at the system level. You'll have to negotiate $|
on your own, though.
Portability issues: fcntl in perlport.
A special token that returns the name of the file in which it occurs.
Returns the file descriptor for a filehandle, or undefined if the
filehandle is not open. If there is no real file descriptor at the OS
level, as can happen with filehandles connected to memory objects via
open with a reference for the third argument, -1 is returned.
This is mainly useful for constructing
bitmaps for select and low-level POSIX tty-handling operations.
If FILEHANDLE is an expression, the value is taken as an indirect
filehandle, generally its name.
You can use this to find out whether two handles refer to the same underlying descriptor:
Calls flock(2), or an emulation of it, on FILEHANDLE. Returns true
for success, false on failure. Produces a fatal error if used on a
machine that doesn't implement flock(2), fcntl(2) locking, or lockf(3).
flock is Perl's portable file-locking interface, although it locks
entire files only, not records.
Two potentially non-obvious but traditional flock semantics are
that it waits indefinitely until the lock is granted, and that its locks
are merely advisory. Such discretionary locks are more flexible, but
offer fewer guarantees. This means that programs that do not also use
flock may modify files locked with flock. See perlport,
your port's specific documentation, and your system-specific local manpages
for details. It's best to assume traditional behavior if you're writing
portable programs. (But if you're not, you should as always feel perfectly
free to write for your own system's idiosyncrasies (sometimes called
"features"). Slavish adherence to portability concerns shouldn't get
in the way of your getting your job done.)
OPERATION is one of LOCK_SH, LOCK_EX, or LOCK_UN, possibly combined with
LOCK_NB. These constants are traditionally valued 1, 2, 8 and 4, but
you can use the symbolic names if you import them from the Fcntl module,
either individually, or as a group using the :flock
tag. LOCK_SH
requests a shared lock, LOCK_EX requests an exclusive lock, and LOCK_UN
releases a previously requested lock. If LOCK_NB is bitwise-or'ed with
LOCK_SH or LOCK_EX, then flock returns immediately rather than blocking
waiting for the lock; check the return status to see if you got it.
To avoid the possibility of miscoordination, Perl now flushes FILEHANDLE before locking or unlocking it.
Note that the emulation built with lockf(3) doesn't provide shared locks, and it requires that FILEHANDLE be open with write intent. These are the semantics that lockf(3) implements. Most if not all systems implement lockf(3) in terms of fcntl(2) locking, though, so the differing semantics shouldn't bite too many people.
Note that the fcntl(2) emulation of flock(3) requires that FILEHANDLE be open with read intent to use LOCK_SH and requires that it be open with write intent to use LOCK_EX.
Note also that some versions of flock cannot lock things over the
network; you would need to use the more system-specific fcntl for
that. If you like you can force Perl to ignore your system's flock(2)
function, and so provide its own fcntl(2)-based emulation, by passing
the switch -Ud_flock
to the Configure program when you configure
and build a new Perl.
Here's a mailbox appender for BSD systems.
- # import LOCK_* and SEEK_END constants
- use Fcntl qw(:flock SEEK_END);
- sub lock {
- my ($fh) = @_;
- flock($fh, LOCK_EX) or die "Cannot lock mailbox - $!\n";
- # and, in case someone appended while we were waiting...
- seek($fh, 0, SEEK_END) or die "Cannot seek - $!\n";
- }
- sub unlock {
- my ($fh) = @_;
- flock($fh, LOCK_UN) or die "Cannot unlock mailbox - $!\n";
- }
- open(my $mbox, ">>", "/usr/spool/mail/$ENV{'USER'}")
- or die "Can't open mailbox: $!";
- lock($mbox);
- print $mbox $msg,"\n\n";
- unlock($mbox);
On systems that support a real flock(2), locks are inherited across fork() calls, whereas those that must resort to the more capricious fcntl(2) function lose their locks, making it seriously harder to write servers.
See also DB_File for other flock() examples.
Portability issues: flock in perlport.
Does a fork(2) system call to create a new process running the
same program at the same point. It returns the child pid to the
parent process, 0
to the child process, or undef if the fork is
unsuccessful. File descriptors (and sometimes locks on those descriptors)
are shared, while everything else is copied. On most systems supporting
fork(), great care has gone into making it extremely efficient (for
example, using copy-on-write technology on data pages), making it the
dominant paradigm for multitasking over the last few decades.
Perl attempts to flush all files opened for
output before forking the child process, but this may not be supported
on some platforms (see perlport). To be safe, you may need to set
$|
($AUTOFLUSH in English) or call the autoflush()
method of
IO::Handle
on any open handles to avoid duplicate output.
If you fork without ever waiting on your children, you will
accumulate zombies. On some systems, you can avoid this by setting
$SIG{CHLD}
to "IGNORE"
. See also perlipc for more examples of
forking and reaping moribund children.
Note that if your forked child inherits system file descriptors like STDIN and STDOUT that are actually connected by a pipe or socket, even if you exit, then the remote server (such as, say, a CGI script or a backgrounded job launched from a remote shell) won't think you're done. You should reopen those to /dev/null if it's any issue.
On some platforms such as Windows, where the fork() system call is not available, Perl can be built to emulate fork() in the Perl interpreter. The emulation is designed, at the level of the Perl program, to be as compatible as possible with the "Unix" fork(). However it has limitations that have to be considered in code intended to be portable. See perlfork for more details.
Portability issues: fork in perlport.
Declare a picture format for use by the write function. For
example:
- format Something =
- Test: @<<<<<<<< @||||| @>>>>>
- $str, $%, '$' . int($num)
- .
- $str = "widget";
- $num = $cost/$quantity;
- $~ = 'Something';
- write;
See perlform for many details and examples.
This is an internal function used by formats, though you may call it,
too. It formats (see perlform) a list of values according to the
contents of PICTURE, placing the output into the format output
accumulator, $^A
(or $ACCUMULATOR
in English).
Eventually, when a write is done, the contents of
$^A
are written to some filehandle. You could also read $^A
and then set $^A
back to ""
. Note that a format typically
does one formline per line of form, but the formline function itself
doesn't care how many newlines are embedded in the PICTURE. This means
that the ~
and ~~
tokens treat the entire PICTURE as a single line.
You may therefore need to use multiple formlines to implement a single
record format, just like the format compiler.
Be careful if you put double quotes around the picture, because an @
character may be taken to mean the beginning of an array name.
formline always returns true. See perlform for other examples.
If you are trying to use this instead of write to capture the output,
you may find it easier to open a filehandle to a scalar
(open $fh, ">", \$output
) and write to that instead.
Returns the next character from the input file attached to FILEHANDLE,
or the undefined value at end of file or if there was an error (in
the latter case $!
is set). If FILEHANDLE is omitted, reads from
STDIN. This is not particularly efficient. However, it cannot be
used by itself to fetch single characters without waiting for the user
to hit enter. For that, try something more like:
Determination of whether $BSD_STYLE should be set is left as an exercise to the reader.
The POSIX::getattr
function can do this more portably on
systems purporting POSIX compliance. See also the Term::ReadKey
module from your nearest CPAN site; details on CPAN can be found under
CPAN in perlmodlib.
This implements the C library function of the same name, which on most
systems returns the current login from /etc/utmp, if any. If it
returns the empty string, use getpwuid.
Do not consider getlogin for authentication: it is not as
secure as getpwuid.
Portability issues: getlogin in perlport.
Returns the packed sockaddr address of the other end of the SOCKET connection.
- use Socket;
- $hersockaddr = getpeername(SOCK);
- ($port, $iaddr) = sockaddr_in($hersockaddr);
- $herhostname = gethostbyaddr($iaddr, AF_INET);
- $herstraddr = inet_ntoa($iaddr);
Returns the current process group for the specified PID. Use
a PID of 0
to get the current process group for the
current process. Will raise an exception if used on a machine that
doesn't implement getpgrp(2). If PID is omitted, returns the process
group of the current process. Note that the POSIX version of getpgrp
does not accept a PID argument, so only PID==0
is truly portable.
Portability issues: getpgrp in perlport.
Returns the process id of the parent process.
Note for Linux users: Between v5.8.1 and v5.16.0 Perl would work around non-POSIX thread semantics the minority of Linux systems (and Debian GNU/kFreeBSD systems) that used LinuxThreads, this emulation has since been removed. See the documentation for $$ for details.
Portability issues: getppid in perlport.
Returns the current priority for a process, a process group, or a user. (See getpriority(2).) Will raise a fatal exception if used on a machine that doesn't implement getpriority(2).
Portability issues: getpriority in perlport.
These routines are the same as their counterparts in the system C library. In list context, the return values from the various get routines are as follows:
- ($name,$passwd,$uid,$gid,
- $quota,$comment,$gcos,$dir,$shell,$expire) = getpw*
- ($name,$passwd,$gid,$members) = getgr*
- ($name,$aliases,$addrtype,$length,@addrs) = gethost*
- ($name,$aliases,$addrtype,$net) = getnet*
- ($name,$aliases,$proto) = getproto*
- ($name,$aliases,$port,$proto) = getserv*
(If the entry doesn't exist you get an empty list.)
The exact meaning of the $gcos field varies but it usually contains the real name of the user (as opposed to the login name) and other information pertaining to the user. Beware, however, that in many system users are able to change this information and therefore it cannot be trusted and therefore the $gcos is tainted (see perlsec). The $passwd and $shell, user's encrypted password and login shell, are also tainted, for the same reason.
In scalar context, you get the name, unless the function was a lookup by name, in which case you get the other thing, whatever it is. (If the entry doesn't exist you get the undefined value.) For example:
In getpw*() the fields $quota, $comment, and $expire are special
in that they are unsupported on many systems. If the
$quota is unsupported, it is an empty scalar. If it is supported, it
usually encodes the disk quota. If the $comment field is unsupported,
it is an empty scalar. If it is supported it usually encodes some
administrative comment about the user. In some systems the $quota
field may be $change or $age, fields that have to do with password
aging. In some systems the $comment field may be $class. The $expire
field, if present, encodes the expiration period of the account or the
password. For the availability and the exact meaning of these fields
in your system, please consult getpwnam(3) and your system's
pwd.h file. You can also find out from within Perl what your
$quota and $comment fields mean and whether you have the $expire field
by using the Config
module and the values d_pwquota
, d_pwage
,
d_pwchange
, d_pwcomment
, and d_pwexpire
. Shadow password
files are supported only if your vendor has implemented them in the
intuitive fashion that calling the regular C library routines gets the
shadow versions if you're running under privilege or if there exists
the shadow(3) functions as found in System V (this includes Solaris
and Linux). Those systems that implement a proprietary shadow password
facility are unlikely to be supported.
The $members value returned by getgr*() is a space-separated list of the login names of the members of the group.
For the gethost*() functions, if the h_errno
variable is supported in
C, it will be returned to you via $?
if the function call fails. The
@addrs
value returned by a successful call is a list of raw
addresses returned by the corresponding library call. In the
Internet domain, each address is four bytes long; you can unpack it
by saying something like:
- ($a,$b,$c,$d) = unpack('W4',$addr[0]);
The Socket library makes this slightly easier:
- use Socket;
- $iaddr = inet_aton("127.1"); # or whatever address
- $name = gethostbyaddr($iaddr, AF_INET);
- # or going the other way
- $straddr = inet_ntoa($iaddr);
In the opposite way, to resolve a hostname to the IP address you can write this:
- use Socket;
- $packed_ip = gethostbyname("www.perl.org");
- if (defined $packed_ip) {
- $ip_address = inet_ntoa($packed_ip);
- }
Make sure gethostbyname() is called in SCALAR context and that
its return value is checked for definedness.
The getprotobynumber function, even though it only takes one argument,
has the precedence of a list operator, so beware:
- getprotobynumber $number eq 'icmp' # WRONG
- getprotobynumber($number eq 'icmp') # actually means this
- getprotobynumber($number) eq 'icmp' # better this way
If you get tired of remembering which element of the return list
contains which return value, by-name interfaces are provided
in standard modules: File::stat
, Net::hostent
, Net::netent
,
Net::protoent
, Net::servent
, Time::gmtime
, Time::localtime
,
and User::grent
. These override the normal built-ins, supplying
versions that return objects with the appropriate names
for each field. For example:
Even though it looks as though they're the same method calls (uid),
they aren't, because a File::stat
object is different from
a User::pwent
object.
Portability issues: getpwnam in perlport to endservent in perlport.
Returns the packed sockaddr address of this end of the SOCKET connection, in case you don't know the address because you have several different IPs that the connection might have come in on.
- use Socket;
- $mysockaddr = getsockname(SOCK);
- ($port, $myaddr) = sockaddr_in($mysockaddr);
- printf "Connect to %s [%s]\n",
- scalar gethostbyaddr($myaddr, AF_INET),
- inet_ntoa($myaddr);
Queries the option named OPTNAME associated with SOCKET at a given LEVEL.
Options may exist at multiple protocol levels depending on the socket
type, but at least the uppermost socket level SOL_SOCKET (defined in the
Socket
module) will exist. To query options at another level the
protocol number of the appropriate protocol controlling the option
should be supplied. For example, to indicate that an option is to be
interpreted by the TCP protocol, LEVEL should be set to the protocol
number of TCP, which you can get using getprotobyname.
The function returns a packed string representing the requested socket
option, or undef on error, with the reason for the error placed in
$!
. Just what is in the packed string depends on LEVEL and OPTNAME;
consult getsockopt(2) for details. A common case is that the option is an
integer, in which case the result is a packed integer, which you can decode
using unpack with the i
(or I
) format.
Here's an example to test whether Nagle's algorithm is enabled on a socket:
- use Socket qw(:all);
- defined(my $tcp = getprotobyname("tcp"))
- or die "Could not determine the protocol number for tcp";
- # my $tcp = IPPROTO_TCP; # Alternative
- my $packed = getsockopt($socket, $tcp, TCP_NODELAY)
- or die "getsockopt TCP_NODELAY: $!";
- my $nodelay = unpack("I", $packed);
- print "Nagle's algorithm is turned ",
- $nodelay ? "off\n" : "on\n";
Portability issues: getsockopt in perlport.
In list context, returns a (possibly empty) list of filename expansions on
the value of EXPR such as the standard Unix shell /bin/csh would do. In
scalar context, glob iterates through such filename expansions, returning
undef when the list is exhausted. This is the internal function
implementing the <*.c>
operator, but you can use it directly. If
EXPR is omitted, $_
is used. The <*.c>
operator is discussed in
more detail in I/O Operators in perlop.
Note that glob splits its arguments on whitespace and treats
each segment as separate pattern. As such, glob("*.c *.h")
matches all files with a .c or .h extension. The expression
glob(".* *")
matches all files in the current working directory.
If you want to glob filenames that might contain whitespace, you'll
have to use extra quotes around the spacey filename to protect it.
For example, to glob filenames that have an e
followed by a space
followed by an f
, use either of:
If you had to get a variable through, you could do this:
If non-empty braces are the only wildcard characters used in the
glob, no filenames are matched, but potentially many strings
are returned. For example, this produces nine strings, one for
each pairing of fruits and colors:
- @many = glob "{apple,tomato,cherry}={green,yellow,red}";
This operator is implemented using the standard
File::Glob
extension. See File::Glob for details, including
bsd_glob
which does not treat whitespace as a pattern separator.
Portability issues: glob in perlport.
Works just like localtime but the returned values are localized for the standard Greenwich time zone.
Note: When called in list context, $isdst, the last value
returned by gmtime, is always 0
. There is no
Daylight Saving Time in GMT.
Portability issues: gmtime in perlport.
The goto-LABEL form finds the statement labeled with LABEL and
resumes execution there. It can't be used to get out of a block or
subroutine given to sort. It can be used to go almost anywhere
else within the dynamic scope, including out of subroutines, but it's
usually better to use some other construct such as last or die.
The author of Perl has never felt the need to use this form of goto
(in Perl, that is; C is another matter). (The difference is that C
does not offer named loops combined with loop control. Perl does, and
this replaces most structured uses of goto in other languages.)
The goto-EXPR form expects a label name, whose scope will be resolved
dynamically. This allows for computed gotos per FORTRAN, but isn't
necessarily recommended if you're optimizing for maintainability:
- goto ("FOO", "BAR", "GLARCH")[$i];
As shown in this example, goto-EXPR is exempt from the "looks like a
function" rule. A pair of parentheses following it does not (necessarily)
delimit its argument. goto("NE")."XT" is equivalent to goto NEXT
.
Also, unlike most named operators, this has the same precedence as
assignment.
Use of goto-LABEL or goto-EXPR to jump into a construct is
deprecated and will issue a warning. Even then, it may not be used to
go into any construct that requires initialization, such as a
subroutine or a foreach
loop. It also can't be used to go into a
construct that is optimized away.
The goto-&NAME form is quite different from the other forms of
goto. In fact, it isn't a goto in the normal sense at all, and
doesn't have the stigma associated with other gotos. Instead, it
exits the current subroutine (losing any changes set by local()) and
immediately calls in its place the named subroutine using the current
value of @_. This is used by AUTOLOAD
subroutines that wish to
load another subroutine and then pretend that the other subroutine had
been called in the first place (except that any modifications to @_
in the current subroutine are propagated to the other subroutine.)
After the goto, not even caller will be able to tell that this
routine was called first.
NAME needn't be the name of a subroutine; it can be a scalar variable containing a code reference or a block that evaluates to a code reference.
This is similar in spirit to, but not the same as, grep(1) and its relatives. In particular, it is not limited to using regular expressions.
Evaluates the BLOCK or EXPR for each element of LIST (locally setting
$_
to each element) and returns the list value consisting of those
elements for which the expression evaluated to true. In scalar
context, returns the number of times the expression was true.
- @foo = grep(!/^#/, @bar); # weed out comments
or equivalently,
- @foo = grep {!/^#/} @bar; # weed out comments
Note that $_
is an alias to the list value, so it can be used to
modify the elements of the LIST. While this is useful and supported,
it can cause bizarre results if the elements of LIST are not variables.
Similarly, grep returns aliases into the original list, much as a for
loop's index variable aliases the list elements. That is, modifying an
element of a list returned by grep (for example, in a foreach
, map
or another grep) actually modifies the element in the original list.
This is usually something to be avoided when writing clear code.
If $_
is lexical in the scope where the grep appears (because it has
been declared with the deprecated my $_
construct)
then, in addition to being locally aliased to
the list elements, $_
keeps being lexical inside the block; i.e., it
can't be seen from the outside, avoiding any potential side-effects.
See also map for a list composed of the results of the BLOCK or EXPR.
Interprets EXPR as a hex string and returns the corresponding value.
(To convert strings that might start with either 0
, 0x
, or 0b, see
oct.) If EXPR is omitted, uses $_
.
Hex strings may only represent integers. Strings that would cause integer overflow trigger a warning. Leading whitespace is not stripped, unlike oct(). To present something as hex, look into printf, sprintf, and unpack.
There is no builtin import function. It is just an ordinary
method (subroutine) defined (or inherited) by modules that wish to export
names to another module. The use function calls the import method
for the package used. See also use, perlmod, and Exporter.
The index function searches for one string within another, but without
the wildcard-like behavior of a full regular-expression pattern match.
It returns the position of the first occurrence of SUBSTR in STR at
or after POSITION. If POSITION is omitted, starts searching from the
beginning of the string. POSITION before the beginning of the string
or after its end is treated as if it were the beginning or the end,
respectively. POSITION and the return value are based at zero.
If the substring is not found, index returns -1.
Returns the integer portion of EXPR. If EXPR is omitted, uses $_
.
You should not use this function for rounding: one because it truncates
towards 0
, and two because machine representations of floating-point
numbers can sometimes produce counterintuitive results. For example,
int(-6.725/0.025) produces -268 rather than the correct -269; that's
because it's really more like -268.99999999999994315658 instead. Usually,
the sprintf, printf, or the POSIX::floor
and POSIX::ceil
functions will serve you better than will int().
Implements the ioctl(2) function. You'll probably first have to say
- require "sys/ioctl.ph"; # probably in
- # $Config{archlib}/sys/ioctl.ph
to get the correct function definitions. If sys/ioctl.ph doesn't
exist or doesn't have the correct definitions you'll have to roll your
own, based on your C header files such as <sys/ioctl.h>.
(There is a Perl script called h2ph that comes with the Perl kit that
may help you in this, but it's nontrivial.) SCALAR will be read and/or
written depending on the FUNCTION; a C pointer to the string value of SCALAR
will be passed as the third argument of the actual ioctl call. (If SCALAR
has no string value but does have a numeric value, that value will be
passed rather than a pointer to the string value. To guarantee this to be
true, add a 0
to the scalar before using it.) The pack and unpack
functions may be needed to manipulate the values of structures used by
ioctl.
The return value of ioctl (and fcntl) is as follows:
- if OS returns: then Perl returns:
- -1 undefined value
- 0 string "0 but true"
- anything else that number
Thus Perl returns true on success and false on failure, yet you can still easily determine the actual value returned by the operating system:
The special string "0 but true"
is exempt from -w complaints
about improper numeric conversions.
Portability issues: ioctl in perlport.
Joins the separate strings of LIST into a single string with fields separated by the value of EXPR, and returns that new string. Example:
- $rec = join(':', $login,$passwd,$uid,$gid,$gcos,$home,$shell);
Beware that unlike split, join doesn't take a pattern as its
first argument. Compare split.
Called in list context, returns a list consisting of all the keys of the named hash, or in Perl 5.12 or later only, the indices of an array. Perl releases prior to 5.12 will produce a syntax error if you try to use an array argument. In scalar context, returns the number of keys or indices.
Hash entries are returned in an apparently random order. The actual random
order is specific to a given hash; the exact same series of operations
on two hashes may result in a different order for each hash. Any insertion
into the hash may change the order, as will any deletion, with the exception
that the most recent key returned by each or keys may be deleted
without changing the order. So long as a given hash is unmodified you may
rely on keys, values and each to repeatedly return the same order
as each other. See Algorithmic Complexity Attacks in perlsec for
details on why hash order is randomized. Aside from the guarantees
provided here the exact details of Perl's hash algorithm and the hash
traversal order are subject to change in any release of Perl.
As a side effect, calling keys() resets the internal iterator of the HASH or ARRAY (see each). In particular, calling keys() in void context resets the iterator with no other overhead.
Here is yet another way to print your environment:
or how about sorted by key:
The returned values are copies of the original keys in the hash, so modifying them will not affect the original hash. Compare values.
To sort a hash by value, you'll need to use a sort function.
Here's a descending numeric sort of a hash by its values:
Used as an lvalue, keys allows you to increase the number of hash buckets
allocated for the given hash. This can gain you a measure of efficiency if
you know the hash is going to get big. (This is similar to pre-extending
an array by assigning a larger number to $#array.) If you say
- keys %hash = 200;
then %hash
will have at least 200 buckets allocated for it--256 of them,
in fact, since it rounds up to the next power of two. These
buckets will be retained even if you do %hash = ()
, use undef
%hash
if you want to free the storage while %hash
is still in scope.
You can't shrink the number of buckets allocated for the hash using
keys in this way (but you needn't worry about doing this by accident,
as trying has no effect). keys @array
in an lvalue context is a syntax
error.
Starting with Perl 5.14, keys can take a scalar EXPR, which must contain
a reference to an unblessed hash or array. The argument will be
dereferenced automatically. This aspect of keys is considered highly
experimental. The exact behaviour may change in a future version of Perl.
To avoid confusing would-be users of your code who are running earlier versions of Perl with mysterious syntax errors, put this sort of thing at the top of your file to signal that your code will work only on Perls of a recent vintage:
Sends a signal to a list of processes. Returns the number of processes successfully signaled (which is not necessarily the same as the number actually killed).
SIGNAL may be either a signal name (a string) or a signal number. A signal
name may start with a SIG
prefix, thus FOO
and SIGFOO
refer to the
same signal. The string form of SIGNAL is recommended for portability because
the same signal may have different numbers in different operating systems.
A list of signal names supported by the current platform can be found in
$Config{sig_name}
, which is provided by the Config
module. See Config
for more details.
A negative signal name is the same as a negative signal number, killing process
groups instead of processes. For example, kill '-KILL', $pgrp
and
kill -9, $pgrp
will send SIGKILL
to the entire process group specified. That
means you usually want to use positive not negative signals.
If SIGNAL is either the number 0 or the string ZERO
(or SIGZZERO
),
no signal is sent to
the process, but kill checks whether it's possible to send a signal to it
(that means, to be brief, that the process is owned by the same user, or we are
the super-user). This is useful to check that a child process is still
alive (even if only as a zombie) and hasn't changed its UID. See
perlport for notes on the portability of this construct.
The behavior of kill when a PROCESS number is zero or negative depends on the operating system. For example, on POSIX-conforming systems, zero will signal the current process group, -1 will signal all processes, and any other negative PROCESS number will act as a negative signal number and kill the entire process group specified.
If both the SIGNAL and the PROCESS are negative, the results are undefined. A warning may be produced in a future version.
See Signals in perlipc for more details.
On some platforms such as Windows where the fork() system call is not available. Perl can be built to emulate fork() at the interpreter level. This emulation has limitations related to kill that have to be considered, for code running on Windows and in code intended to be portable.
See perlfork for more details.
If there is no LIST of processes, no signal is sent, and the return value is 0. This form is sometimes used, however, because it causes tainting checks to be run. But see Laundering and Detecting Tainted Data in perlsec.
Portability issues: kill in perlport.
The last command is like the break
statement in C (as used in
loops); it immediately exits the loop in question. If the LABEL is
omitted, the command refers to the innermost enclosing
loop. The last EXPR
form, available starting in Perl
5.18.0, allows a label name to be computed at run time,
and is otherwise identical to last LABEL
. The
continue block, if any, is not executed:
- LINE: while (<STDIN>) {
- last LINE if /^$/; # exit when done with header
- #...
- }
last cannot be used to exit a block that returns a value such as
eval {}
, sub {}
, or do {}
, and should not be used to exit
a grep() or map() operation.
Note that a block by itself is semantically identical to a loop
that executes once. Thus last can be used to effect an early
exit out of such a block.
See also continue for an illustration of how last, next, and
redo work.
Unlike most named operators, this has the same precedence as assignment.
It is also exempt from the looks-like-a-function rule, so
last ("foo")."bar"
will cause "bar" to be part of the argument to
last.
Returns a lowercased version of EXPR. This is the internal function
implementing the \L
escape in double-quoted strings.
If EXPR is omitted, uses $_
.
What gets returned depends on several factors:
use bytes
is in effect:
The results follow ASCII semantics. Only characters A-Z
change, to a-z
respectively.
use locale
(but not use locale ':not_characters'
) is in effect:
Respects current LC_CTYPE locale for code points < 256; and uses Unicode semantics for the remaining code points (this last can only happen if the UTF8 flag is also set). See perllocale.
A deficiency in this is that case changes that cross the 255/256
boundary are not well-defined. For example, the lower case of LATIN CAPITAL
LETTER SHARP S (U+1E9E) in Unicode semantics is U+00DF (on ASCII
platforms). But under use locale
, the lower case of U+1E9E is
itself, because 0xDF may not be LATIN SMALL LETTER SHARP S in the
current locale, and Perl has no way of knowing if that character even
exists in the locale, much less what code point it is. Perl returns
the input character unchanged, for all instances (and there aren't
many) where the 255/256 boundary would otherwise be crossed.
Unicode semantics are used for the case change.
use feature 'unicode_strings'
or use locale ':not_characters'
is in effect:
Unicode semantics are used for the case change.
ASCII semantics are used for the case change. The lowercase of any character outside the ASCII range is the character itself.
Returns the value of EXPR with the first character lowercased. This
is the internal function implementing the \l
escape in
double-quoted strings.
If EXPR is omitted, uses $_
.
This function behaves the same way under various pragmata, such as in a locale, as lc does.
Returns the length in characters of the value of EXPR. If EXPR is
omitted, returns the length of $_
. If EXPR is undefined, returns
undef.
This function cannot be used on an entire array or hash to find out how
many elements these have. For that, use scalar @array
and scalar keys
%hash
, respectively.
Like all Perl character operations, length() normally deals in logical
characters, not physical bytes. For how many bytes a string encoded as
UTF-8 would take up, use length(Encode::encode_utf8(EXPR)) (you'll have
to use Encode
first). See Encode and perlunicode.
A special token that compiles to the current line number.
Creates a new filename linked to the old filename. Returns true for success, false otherwise.
Portability issues: link in perlport.
Does the same thing that the listen(2) system call does. Returns true if it succeeded, false otherwise. See the example in Sockets: Client/Server Communication in perlipc.
You really probably want to be using my instead, because local isn't
what most people think of as "local". See
Private Variables via my() in perlsub for details.
A local modifies the listed variables to be local to the enclosing block, file, or eval. If more than one value is listed, the list must be placed in parentheses. See Temporary Values via local() in perlsub for details, including issues with tied arrays and hashes.
The delete local EXPR
construct can also be used to localize the deletion
of array/hash elements to the current block.
See Localized deletion of elements of composite types in perlsub.
Converts a time as returned by the time function to a 9-element list with the time analyzed for the local time zone. Typically used as follows:
All list elements are numeric and come straight out of the C `struct
tm'. $sec
, $min
, and $hour
are the seconds, minutes, and hours
of the specified time.
$mday
is the day of the month and $mon
the month in
the range 0..11
, with 0 indicating January and 11 indicating December.
This makes it easy to get a month name from a list:
$year
contains the number of years since 1900. To get a 4-digit
year write:
- $year += 1900;
To get the last two digits of the year (e.g., "01" in 2001) do:
- $year = sprintf("%02d", $year % 100);
$wday
is the day of the week, with 0 indicating Sunday and 3 indicating
Wednesday. $yday
is the day of the year, in the range 0..364
(or 0..365
in leap years.)
$isdst
is true if the specified time occurs during Daylight Saving
Time, false otherwise.
If EXPR is omitted, localtime() uses the current time (as returned
by time(3)).
In scalar context, localtime() returns the ctime(3) value:
- $now_string = localtime; # e.g., "Thu Oct 13 04:54:34 1994"
The format of this scalar value is not locale-dependent
but built into Perl. For GMT instead of local
time use the gmtime builtin. See also the
Time::Local
module (for converting seconds, minutes, hours, and such back to
the integer value returned by time()), and the POSIX module's strftime(3)
and mktime(3) functions.
To get somewhat similar but locale-dependent date strings, set up your locale environment variables appropriately (please see perllocale) and try for example:
Note that the %a
and %b
, the short forms of the day of the week
and the month of the year, may not necessarily be three characters wide.
The Time::gmtime and Time::localtime modules provide a convenient, by-name access mechanism to the gmtime() and localtime() functions, respectively.
For a comprehensive date and time representation look at the DateTime module on CPAN.
Portability issues: localtime in perlport.
This function places an advisory lock on a shared variable or referenced object contained in THING until the lock goes out of scope.
The value returned is the scalar itself, if the argument is a scalar, or a reference, if the argument is a hash, array or subroutine.
lock() is a "weak keyword" : this means that if you've defined a function
by this name (before any calls to it), that function will be called
instead. If you are not under use threads::shared
this does nothing.
See threads::shared.
Returns the natural logarithm (base e) of EXPR. If EXPR is omitted,
returns the log of $_
. To get the
log of another base, use basic algebra:
The base-N log of a number is equal to the natural log of that number
divided by the natural log of N. For example:
See also exp for the inverse operation.
Does the same thing as the stat function (including setting the
special _
filehandle) but stats a symbolic link instead of the file
the symbolic link points to. If symbolic links are unimplemented on
your system, a normal stat is done. For much more detailed
information, please see the documentation for stat.
If EXPR is omitted, stats $_
.
Portability issues: lstat in perlport.
The match operator. See Regexp Quote-Like Operators in perlop.
Evaluates the BLOCK or EXPR for each element of LIST (locally setting
$_
to each element) and returns the list value composed of the
results of each such evaluation. In scalar context, returns the
total number of elements so generated. Evaluates BLOCK or EXPR in
list context, so each element of LIST may produce zero, one, or
more elements in the returned value.
translates a list of numbers to the corresponding characters.
translates a list of numbers to their squared values.
shows that number of returned elements can differ from the number of input elements. To omit an element, return an empty list (). This could also be achieved by writing
which makes the intention more clear.
Map always returns a list, which can be assigned to a hash such that the elements become key/value pairs. See perldata for more details.
- %hash = map { get_a_key_for($_) => $_ } @array;
is just a funny way to write
- %hash = ();
- foreach (@array) {
- $hash{get_a_key_for($_)} = $_;
- }
Note that $_
is an alias to the list value, so it can be used to
modify the elements of the LIST. While this is useful and supported,
it can cause bizarre results if the elements of LIST are not variables.
Using a regular foreach
loop for this purpose would be clearer in
most cases. See also grep for an array composed of those items of
the original list for which the BLOCK or EXPR evaluates to true.
If $_
is lexical in the scope where the map appears (because it has
been declared with the deprecated my $_
construct),
then, in addition to being locally aliased to
the list elements, $_
keeps being lexical inside the block; that is, it
can't be seen from the outside, avoiding any potential side-effects.
{ starts both hash references and blocks, so map { ... could be either
the start of map BLOCK LIST or map EXPR, LIST. Because Perl doesn't look
ahead for the closing } it has to take a guess at which it's dealing with
based on what it finds just after the
{. Usually it gets it right, but if it
doesn't it won't realize something is wrong until it gets to the } and
encounters the missing (or unexpected) comma. The syntax error will be
reported close to the }, but you'll need to change something near the {
such as using a unary +
to give Perl some help:
- %hash = map { "\L$_" => 1 } @array # perl guesses EXPR. wrong
- %hash = map { +"\L$_" => 1 } @array # perl guesses BLOCK. right
- %hash = map { ("\L$_" => 1) } @array # this also works
- %hash = map { lc($_) => 1 } @array # as does this.
- %hash = map +( lc($_) => 1 ), @array # this is EXPR and works!
- %hash = map ( lc($_), 1 ), @array # evaluates to (1, @array)
or to force an anon hash constructor use +{:
to get a list of anonymous hashes each with only one entry apiece.
Creates the directory specified by FILENAME, with permissions
specified by MASK (as modified by umask). If it succeeds it
returns true; otherwise it returns false and sets $!
(errno).
MASK defaults to 0777 if omitted, and FILENAME defaults
to $_
if omitted.
In general, it is better to create directories with a permissive MASK
and let the user modify that with their umask than it is to supply
a restrictive MASK and give the user no way to be more permissive.
The exceptions to this rule are when the file or directory should be
kept private (mail files, for instance). The perlfunc(1) entry on
umask discusses the choice of MASK in more detail.
Note that according to the POSIX 1003.1-1996 the FILENAME may have any number of trailing slashes. Some operating and filesystems do not get this right, so Perl automatically removes all trailing slashes to keep everyone happy.
To recursively create a directory structure, look at
the mkpath
function of the File::Path module.
Calls the System V IPC function msgctl(2). You'll probably have to say
- use IPC::SysV;
first to get the correct constant definitions. If CMD is IPC_STAT
,
then ARG must be a variable that will hold the returned msqid_ds
structure. Returns like ioctl: the undefined value for error,
"0 but true"
for zero, or the actual return value otherwise. See also
SysV IPC in perlipc and the documentation for IPC::SysV
and
IPC::Semaphore
.
Portability issues: msgctl in perlport.
Calls the System V IPC function msgget(2). Returns the message queue
id, or undef on error. See also
SysV IPC in perlipc and the documentation for IPC::SysV
and
IPC::Msg
.
Portability issues: msgget in perlport.
Calls the System V IPC function msgrcv to receive a message from
message queue ID into variable VAR with a maximum message size of
SIZE. Note that when a message is received, the message type as a
native long integer will be the first thing in VAR, followed by the
actual message. This packing may be opened with unpack("l! a*")
.
Taints the variable. Returns true if successful, false
on error. See also SysV IPC in perlipc and the documentation for
IPC::SysV
and IPC::SysV::Msg
.
Portability issues: msgrcv in perlport.
Calls the System V IPC function msgsnd to send the message MSG to the
message queue ID. MSG must begin with the native long integer message
type, be followed by the length of the actual message, and then finally
the message itself. This kind of packing can be achieved with
pack("l! a*", $type, $message)
. Returns true if successful,
false on error. See also the IPC::SysV
and IPC::SysV::Msg
documentation.
Portability issues: msgsnd in perlport.
A my declares the listed variables to be local (lexically) to the
enclosing block, file, or eval. If more than one value is listed,
the list must be placed in parentheses.
The exact semantics and interface of TYPE and ATTRS are still
evolving. TYPE is currently bound to the use of the fields
pragma,
and attributes are handled using the attributes
pragma, or starting
from Perl 5.8.0 also via the Attribute::Handlers
module. See
Private Variables via my() in perlsub for details, and fields,
attributes, and Attribute::Handlers.
The next command is like the continue statement in C; it starts
the next iteration of the loop:
- LINE: while (<STDIN>) {
- next LINE if /^#/; # discard comments
- #...
- }
Note that if there were a continue block on the above, it would get
executed even on discarded lines. If LABEL is omitted, the command
refers to the innermost enclosing loop. The next EXPR
form, available
as of Perl 5.18.0, allows a label name to be computed at run time, being
otherwise identical to next LABEL
.
next cannot be used to exit a block which returns a value such as
eval {}
, sub {}
, or do {}
, and should not be used to exit
a grep() or map() operation.
Note that a block by itself is semantically identical to a loop
that executes once. Thus next will exit such a block early.
See also continue for an illustration of how last, next, and
redo work.
Unlike most named operators, this has the same precedence as assignment.
It is also exempt from the looks-like-a-function rule, so
next ("foo")."bar"
will cause "bar" to be part of the argument to
next.
Interprets EXPR as an octal string and returns the corresponding
value. (If EXPR happens to start off with 0x
, interprets it as a
hex string. If EXPR starts off with 0b, it is interpreted as a
binary string. Leading whitespace is ignored in all three cases.)
The following will handle decimal, binary, octal, and hex in standard
Perl notation:
- $val = oct($val) if $val =~ /^0/;
If EXPR is omitted, uses $_
. To go the other way (produce a number
in octal), use sprintf() or printf():
The oct() function is commonly used when a string such as 644
needs
to be converted into a file mode, for example. Although Perl
automatically converts strings into numbers as needed, this automatic
conversion assumes base 10.
Leading white space is ignored without warning, as too are any trailing
non-digits, such as a decimal point (oct only handles non-negative
integers, not negative integers or floating point).
Opens the file whose filename is given by EXPR, and associates it with FILEHANDLE.
Simple examples to open a file for reading:
and for writing:
(The following is a comprehensive reference to open(): for a gentler introduction you may consider perlopentut.)
If FILEHANDLE is an undefined scalar variable (or array or hash element), a
new filehandle is autovivified, meaning that the variable is assigned a
reference to a newly allocated anonymous filehandle. Otherwise if
FILEHANDLE is an expression, its value is the real filehandle. (This is
considered a symbolic reference, so use strict "refs"
should not be
in effect.)
If EXPR is omitted, the global (package) scalar variable of the same
name as the FILEHANDLE contains the filename. (Note that lexical
variables--those declared with my or state--will not work for this
purpose; so if you're using my or state, specify EXPR in your
call to open.)
If three (or more) arguments are specified, the open mode (including
optional encoding) in the second argument are distinct from the filename in
the third. If MODE is <
or nothing, the file is opened for input.
If MODE is >, the file is opened for output, with existing files
first being truncated ("clobbered") and nonexisting files newly created.
If MODE is >>
, the file is opened for appending, again being
created if necessary.
You can put a +
in front of the > or <
to
indicate that you want both read and write access to the file; thus
+<
is almost always preferred for read/write updates--the
+> mode would clobber the file first. You can't usually use
either read-write mode for updating textfiles, since they have
variable-length records. See the -i switch in perlrun for a
better approach. The file is created with permissions of 0666
modified by the process's umask value.
These various prefixes correspond to the fopen(3) modes of r
,
r+
, w
, w+
, a
, and a+
.
In the one- and two-argument forms of the call, the mode and filename
should be concatenated (in that order), preferably separated by white
space. You can--but shouldn't--omit the mode in these forms when that mode
is <
. It is always safe to use the two-argument form of open if
the filename argument is a known literal.
For three or more arguments if MODE is |-, the filename is
interpreted as a command to which output is to be piped, and if MODE
is -|, the filename is interpreted as a command that pipes
output to us. In the two-argument (and one-argument) form, one should
replace dash (-
) with the command.
See Using open() for IPC in perlipc for more examples of this.
(You are not allowed to open to a command that pipes both in and
out, but see IPC::Open2, IPC::Open3, and
Bidirectional Communication with Another Process in perlipc for
alternatives.)
In the form of pipe opens taking three or more arguments, if LIST is specified
(extra arguments after the command name) then LIST becomes arguments
to the command invoked if the platform supports it. The meaning of
open with more than three arguments for non-pipe modes is not yet
defined, but experimental "layers" may give extra LIST arguments
meaning.
In the two-argument (and one-argument) form, opening <-
or -
opens STDIN and opening >-
opens STDOUT.
You may (and usually should) use the three-argument form of open to specify I/O layers (sometimes referred to as "disciplines") to apply to the handle that affect how the input and output are processed (see open and PerlIO for more details). For example:
opens the UTF8-encoded file containing Unicode characters; see perluniintro. Note that if layers are specified in the three-argument form, then default layers stored in ${^OPEN} (see perlvar; usually set by the open pragma or the switch -CioD) are ignored. Those layers will also be ignored if you specifying a colon with no name following it. In that case the default layer for the operating system (:raw on Unix, :crlf on Windows) is used.
Open returns nonzero on success, the undefined value otherwise. If
the open involved a pipe, the return value happens to be the pid of
the subprocess.
If you're running Perl on a system that distinguishes between text
files and binary files, then you should check out binmode for tips
for dealing with this. The key distinction between systems that need
binmode and those that don't is their text file formats. Systems
like Unix, Mac OS, and Plan 9, that end lines with a single
character and encode that character in C as "\n"
do not
need binmode. The rest need it.
When opening a file, it's seldom a good idea to continue
if the request failed, so open is frequently used with
die. Even if die won't do what you want (say, in a CGI script,
where you want to format a suitable error message (but there are
modules that can help with that problem)) always check
the return value from opening a file.
As a special case the three-argument form with a read/write mode and the third
argument being undef:
opens a filehandle to an anonymous temporary file. Also using +<
works for symmetry, but you really should consider writing something
to the temporary file first. You will need to seek() to do the
reading.
Perl is built using PerlIO by default; Unless you've
changed this (such as building Perl with Configure -Uuseperlio
), you can
open filehandles directly to Perl scalars via:
- open($fh, ">", \$variable) || ..
To (re)open STDOUT
or STDERR
as an in-memory file, close it first:
General examples:
- $ARTICLE = 100;
- open(ARTICLE) or die "Can't find article $ARTICLE: $!\n";
- while (<ARTICLE>) {...
- open(LOG, ">>/usr/spool/news/twitlog"); # (log is reserved)
- # if the open fails, output is discarded
- open(my $dbase, "+<", "dbase.mine") # open for update
- or die "Can't open 'dbase.mine' for update: $!";
- open(my $dbase, "+<dbase.mine") # ditto
- or die "Can't open 'dbase.mine' for update: $!";
- open(ARTICLE, "-|", "caesar <$article") # decrypt article
- or die "Can't start caesar: $!";
- open(ARTICLE, "caesar <$article |") # ditto
- or die "Can't start caesar: $!";
- open(EXTRACT, "|sort >Tmp$$") # $$ is our process id
- or die "Can't start sort: $!";
- # in-memory files
- open(MEMORY, ">", \$var)
- or die "Can't open memory file: $!";
- print MEMORY "foo!\n"; # output will appear in $var
- # process argument list of files along with any includes
- foreach $file (@ARGV) {
- process($file, "fh00");
- }
- sub process {
- my($filename, $input) = @_;
- $input++; # this is a string increment
- unless (open($input, "<", $filename)) {
- print STDERR "Can't open $filename: $!\n";
- return;
- }
- local $_;
- while (<$input>) { # note use of indirection
- if (/^#include "(.*)"/) {
- process($1, $input);
- next;
- }
- #... # whatever
- }
- }
See perliol for detailed info on PerlIO.
You may also, in the Bourne shell tradition, specify an EXPR beginning
with >&, in which case the rest of the string is interpreted
as the name of a filehandle (or file descriptor, if numeric) to be
duped (as dup(2)
) and opened. You may use &
after >,
>>
, <
, +>, +>>
, and +<
.
The mode you specify should match the mode of the original filehandle.
(Duping a filehandle does not take into account any existing contents
of IO buffers.) If you use the three-argument
form, then you can pass either a
number, the name of a filehandle, or the normal "reference to a glob".
Here is a script that saves, redirects, and restores STDOUT
and
STDERR
using various methods:
- #!/usr/bin/perl
- open(my $oldout, ">&STDOUT") or die "Can't dup STDOUT: $!";
- open(OLDERR, ">&", \*STDERR) or die "Can't dup STDERR: $!";
- open(STDOUT, '>', "foo.out") or die "Can't redirect STDOUT: $!";
- open(STDERR, ">&STDOUT") or die "Can't dup STDOUT: $!";
- select STDERR; $| = 1; # make unbuffered
- select STDOUT; $| = 1; # make unbuffered
- print STDOUT "stdout 1\n"; # this works for
- print STDERR "stderr 1\n"; # subprocesses too
- open(STDOUT, ">&", $oldout) or die "Can't dup \$oldout: $!";
- open(STDERR, ">&OLDERR") or die "Can't dup OLDERR: $!";
- print STDOUT "stdout 2\n";
- print STDERR "stderr 2\n";
If you specify '<&=X'
, where X
is a file descriptor number
or a filehandle, then Perl will do an equivalent of C's fdopen
of
that file descriptor (and not call dup(2)
); this is more
parsimonious of file descriptors. For example:
- # open for input, reusing the fileno of $fd
- open(FILEHANDLE, "<&=$fd")
or
- open(FILEHANDLE, "<&=", $fd)
or
- # open for append, using the fileno of OLDFH
- open(FH, ">>&=", OLDFH)
or
- open(FH, ">>&=OLDFH")
Being parsimonious on filehandles is also useful (besides being
parsimonious) for example when something is dependent on file
descriptors, like for example locking using flock(). If you do just
open(A, ">>&B")
, the filehandle A will not have the same file
descriptor as B, and therefore flock(A) will not flock(B) nor vice
versa. But with open(A, ">>&=B")
, the filehandles will share
the same underlying system file descriptor.
Note that under Perls older than 5.8.0, Perl uses the standard C library's'
fdopen() to implement the =
functionality. On many Unix systems,
fdopen() fails when file descriptors exceed a certain value, typically 255.
For Perls 5.8.0 and later, PerlIO is (most often) the default.
You can see whether your Perl was built with PerlIO by running perl -V
and looking for the useperlio=
line. If useperlio
is define
, you
have PerlIO; otherwise you don't.
If you open a pipe on the command -
(that is, specify either |- or -|
with the one- or two-argument forms of open),
an implicit fork is done, so open returns twice: in the parent
process it returns the pid
of the child process, and in the child process it returns (a defined) 0
.
Use defined($pid) or //
to determine whether the open was successful.
For example, use either
or
followed by
The filehandle behaves normally for the parent, but I/O to that filehandle is piped from/to the STDOUT/STDIN of the child process. In the child process, the filehandle isn't opened--I/O happens from/to the new STDOUT/STDIN. Typically this is used like the normal piped open when you want to exercise more control over just how the pipe command gets executed, such as when running setuid and you don't want to have to scan shell commands for metacharacters.
The following blocks are more or less equivalent:
- open(FOO, "|tr '[a-z]' '[A-Z]'");
- open(FOO, "|-", "tr '[a-z]' '[A-Z]'");
- open(FOO, "|-") || exec 'tr', '[a-z]', '[A-Z]';
- open(FOO, "|-", "tr", '[a-z]', '[A-Z]');
- open(FOO, "cat -n '$file'|");
- open(FOO, "-|", "cat -n '$file'");
- open(FOO, "-|") || exec "cat", "-n", $file;
- open(FOO, "-|", "cat", "-n", $file);
The last two examples in each block show the pipe as "list form", which is
not yet supported on all platforms. A good rule of thumb is that if
your platform has a real fork() (in other words, if your platform is
Unix, including Linux and MacOS X), you can use the list form. You would
want to use the list form of the pipe so you can pass literal arguments
to the command without risk of the shell interpreting any shell metacharacters
in them. However, this also bars you from opening pipes to commands
that intentionally contain shell metacharacters, such as:
See Safe Pipe Opens in perlipc for more examples of this.
Perl will attempt to flush all files opened for
output before any operation that may do a fork, but this may not be
supported on some platforms (see perlport). To be safe, you may need
to set $|
($AUTOFLUSH in English) or call the autoflush()
method
of IO::Handle
on any open handles.
On systems that support a close-on-exec flag on files, the flag will
be set for the newly opened file descriptor as determined by the value
of $^F
. See $^F in perlvar.
Closing any piped filehandle causes the parent process to wait for the
child to finish, then returns the status value in $?
and
${^CHILD_ERROR_NATIVE}
.
The filename passed to the one- and two-argument forms of open() will have leading and trailing whitespace deleted and normal redirection characters honored. This property, known as "magic open", can often be used to good effect. A user could specify a filename of "rsh cat file |", or you could change certain filenames as needed:
Use the three-argument form to open a file with arbitrary weird characters in it,
otherwise it's necessary to protect any leading and trailing whitespace:
(this may not work on some bizarre filesystems). One should conscientiously choose between the magic and three-argument form of open():
will allow the user to specify an argument of the form "rsh cat file |"
,
but will not work on a filename that happens to have a trailing space, while
will have exactly the opposite restrictions.
If you want a "real" C open (see open(2) on your system), then you
should use the sysopen function, which involves no such magic (but may
use subtly different filemodes than Perl open(), which is mapped to C
fopen()). This is another way to protect your filenames from
interpretation. For example:
Using the constructor from the IO::Handle
package (or one of its
subclasses, such as IO::File
or IO::Socket
), you can generate anonymous
filehandles that have the scope of the variables used to hold them, then
automatically (but silently) close once their reference counts become
zero, typically at scope exit:
- use IO::File;
- #...
- sub read_myfile_munged {
- my $ALL = shift;
- # or just leave it undef to autoviv
- my $handle = IO::File->new;
- open($handle, "<", "myfile") or die "myfile: $!";
- $first = <$handle>
- or return (); # Automatically closed here.
- mung($first) or die "mung failed"; # Or here.
- return (first, <$handle>) if $ALL; # Or here.
- return $first; # Or here.
- }
WARNING: The previous example has a bug because the automatic
close that happens when the refcount on handle
reaches zero does not
properly detect and report failures. Always close the handle
yourself and inspect the return value.
See seek for some details about mixing reading and writing.
Portability issues: open in perlport.
Opens a directory named EXPR for processing by readdir, telldir,
seekdir, rewinddir, and closedir. Returns true if successful.
DIRHANDLE may be an expression whose value can be used as an indirect
dirhandle, usually the real dirhandle name. If DIRHANDLE is an undefined
scalar variable (or array or hash element), the variable is assigned a
reference to a new anonymous dirhandle; that is, it's autovivified.
DIRHANDLEs have their own namespace separate from FILEHANDLEs.
See the example at readdir.
Returns the numeric value of the first character of EXPR.
If EXPR is an empty string, returns 0. If EXPR is omitted, uses $_
.
(Note character, not byte.)
For the reverse, see chr. See perlunicode for more about Unicode.
our makes a lexical alias to a package variable of the same name in the current
package for use within the current lexical scope.
our has the same scoping rules as my or state, but our only
declares an alias, whereas my or state both declare a variable name and
allocate storage for that name within the current scope.
This means that when use strict 'vars'
is in effect, our lets you use
a package variable without qualifying it with the package name, but only within
the lexical scope of the our declaration. In this way, our differs from
use vars
, which allows use of an unqualified name only within the
affected package, but across scopes.
If more than one value is listed, the list must be placed in parentheses.
An our declaration declares an alias for a package variable that will be visible
across its entire lexical scope, even across package boundaries. The
package in which the variable is entered is determined at the point
of the declaration, not at the point of use. This means the following
behavior holds:
Multiple our declarations with the same name in the same lexical
scope are allowed if they are in different packages. If they happen
to be in the same package, Perl will emit warnings if you have asked
for them, just like multiple my declarations. Unlike a second
my declaration, which will bind the name to a fresh variable, a
second our declaration in the same package, in the same scope, is
merely redundant.
An our declaration may also have a list of attributes associated
with it.
The exact semantics and interface of TYPE and ATTRS are still
evolving. TYPE is currently bound to the use of the fields
pragma,
and attributes are handled using the attributes
pragma, or, starting
from Perl 5.8.0, also via the Attribute::Handlers
module. See
Private Variables via my() in perlsub for details, and fields,
attributes, and Attribute::Handlers.
Takes a LIST of values and converts it into a string using the rules given by the TEMPLATE. The resulting string is the concatenation of the converted values. Typically, each converted value looks like its machine-level representation. For example, on 32-bit machines an integer may be represented by a sequence of 4 bytes, which will in Perl be presented as a string that's 4 characters long.
See perlpacktut for an introduction to this function.
The TEMPLATE is a sequence of characters that give the order and type of values, as follows:
- a A string with arbitrary binary data, will be null padded.
- A A text (ASCII) string, will be space padded.
- Z A null-terminated (ASCIZ) string, will be null padded.
- b A bit string (ascending bit order inside each byte,
- like vec()).
- B A bit string (descending bit order inside each byte).
- h A hex string (low nybble first).
- H A hex string (high nybble first).
- c A signed char (8-bit) value.
- C An unsigned char (octet) value.
- W An unsigned char value (can be greater than 255).
- s A signed short (16-bit) value.
- S An unsigned short value.
- l A signed long (32-bit) value.
- L An unsigned long value.
- q A signed quad (64-bit) value.
- Q An unsigned quad value.
- (Quads are available only if your system supports 64-bit
- integer values _and_ if Perl has been compiled to support
- those. Raises an exception otherwise.)
- i A signed integer value.
- I A unsigned integer value.
- (This 'integer' is _at_least_ 32 bits wide. Its exact
- size depends on what a local C compiler calls 'int'.)
- n An unsigned short (16-bit) in "network" (big-endian) order.
- N An unsigned long (32-bit) in "network" (big-endian) order.
- v An unsigned short (16-bit) in "VAX" (little-endian) order.
- V An unsigned long (32-bit) in "VAX" (little-endian) order.
- j A Perl internal signed integer value (IV).
- J A Perl internal unsigned integer value (UV).
- f A single-precision float in native format.
- d A double-precision float in native format.
- F A Perl internal floating-point value (NV) in native format
- D A float of long-double precision in native format.
- (Long doubles are available only if your system supports
- long double values _and_ if Perl has been compiled to
- support those. Raises an exception otherwise.)
- p A pointer to a null-terminated string.
- P A pointer to a structure (fixed-length string).
- u A uuencoded string.
- U A Unicode character number. Encodes to a character in char-
- acter mode and UTF-8 (or UTF-EBCDIC in EBCDIC platforms) in
- byte mode.
- w A BER compressed integer (not an ASN.1 BER, see perlpacktut
- for details). Its bytes represent an unsigned integer in
- base 128, most significant digit first, with as few digits
- as possible. Bit eight (the high bit) is set on each byte
- except the last.
- x A null byte (a.k.a ASCII NUL, "\000", chr(0))
- X Back up a byte.
- @ Null-fill or truncate to absolute position, counted from the
- start of the innermost ()-group.
- . Null-fill or truncate to absolute position specified by
- the value.
- ( Start of a ()-group.
One or more modifiers below may optionally follow certain letters in the TEMPLATE (the second column lists letters for which the modifier is valid):
- ! sSlLiI Forces native (short, long, int) sizes instead
- of fixed (16-/32-bit) sizes.
- xX Make x and X act as alignment commands.
- nNvV Treat integers as signed instead of unsigned.
- @. Specify position as byte offset in the internal
- representation of the packed string. Efficient
- but dangerous.
- > sSiIlLqQ Force big-endian byte-order on the type.
- jJfFdDpP (The "big end" touches the construct.)
- < sSiIlLqQ Force little-endian byte-order on the type.
- jJfFdDpP (The "little end" touches the construct.)
The > and <
modifiers can also be used on ()
groups
to force a particular byte-order on all components in that group,
including all its subgroups.
The following rules apply:
Each letter may optionally be followed by a number indicating the repeat
count. A numeric repeat count may optionally be enclosed in brackets, as
in pack("C[80]", @arr)
. The repeat count gobbles that many values from
the LIST when used with all format types other than a
, A
, Z
, b
,
B
, h
, H
, @
, ., x
, X
, and P
, where it means
something else, described below. Supplying a *
for the repeat count
instead of a number means to use however many items are left, except for:
@
, x
, and X
, where it is equivalent to 0
.
<.>, where it means relative to the start of the string.
u
, where it is equivalent to 1 (or 45, which here is equivalent).
One can replace a numeric repeat count with a template letter enclosed in brackets to use the packed byte length of the bracketed template for the repeat count.
For example, the template x[L]
skips as many bytes as in a packed long,
and the template "$t X[$t] $t"
unpacks twice whatever $t (when
variable-expanded) unpacks. If the template in brackets contains alignment
commands (such as x![d]
), its packed length is calculated as if the
start of the template had the maximal possible alignment.
When used with Z
, a *
as the repeat count is guaranteed to add a
trailing null byte, so the resulting string is always one byte longer than
the byte length of the item itself.
When used with @
, the repeat count represents an offset from the start
of the innermost ()
group.
When used with ., the repeat count determines the starting position to
calculate the value offset as follows:
If the repeat count is 0
, it's relative to the current position.
If the repeat count is *
, the offset is relative to the start of the
packed string.
And if it's an integer n, the offset is relative to the start of the
nth innermost ( )
group, or to the start of the string if n is
bigger then the group level.
The repeat count for u
is interpreted as the maximal number of bytes
to encode per line of output, with 0, 1 and 2 replaced by 45. The repeat
count should not be more than 65.
The a
, A
, and Z
types gobble just one value, but pack it as a
string of length count, padding with nulls or spaces as needed. When
unpacking, A
strips trailing whitespace and nulls, Z
strips everything
after the first null, and a
returns data with no stripping at all.
If the value to pack is too long, the result is truncated. If it's too
long and an explicit count is provided, Z
packs only $count-1
bytes,
followed by a null byte. Thus Z
always packs a trailing null, except
when the count is 0.
Likewise, the b
and B
formats pack a string that's that many bits long.
Each such format generates 1 bit of the result. These are typically followed
by a repeat count like B8
or B64
.
Each result bit is based on the least-significant bit of the corresponding
input character, i.e., on ord($char)%2. In particular, characters "0"
and "1"
generate bits 0 and 1, as do characters "\000"
and "\001"
.
Starting from the beginning of the input string, each 8-tuple
of characters is converted to 1 character of output. With format b
,
the first character of the 8-tuple determines the least-significant bit of a
character; with format B
, it determines the most-significant bit of
a character.
If the length of the input string is not evenly divisible by 8, the remainder is packed as if the input string were padded by null characters at the end. Similarly during unpacking, "extra" bits are ignored.
If the input string is longer than needed, remaining characters are ignored.
A *
for the repeat count uses all characters of the input field.
On unpacking, bits are converted to a string of 0
s and 1
s.
The h
and H
formats pack a string that many nybbles (4-bit groups,
representable as hexadecimal digits, "0".."9"
"a".."f"
) long.
For each such format, pack() generates 4 bits of result.
With non-alphabetical characters, the result is based on the 4 least-significant
bits of the input character, i.e., on ord($char)%16. In particular,
characters "0"
and "1"
generate nybbles 0 and 1, as do bytes
"\000"
and "\001"
. For characters "a".."f"
and "A".."F"
, the result
is compatible with the usual hexadecimal digits, so that "a"
and
"A"
both generate the nybble 0xA==10
. Use only these specific hex
characters with this format.
Starting from the beginning of the template to pack(), each pair
of characters is converted to 1 character of output. With format h
, the
first character of the pair determines the least-significant nybble of the
output character; with format H
, it determines the most-significant
nybble.
If the length of the input string is not even, it behaves as if padded by a null character at the end. Similarly, "extra" nybbles are ignored during unpacking.
If the input string is longer than needed, extra characters are ignored.
A *
for the repeat count uses all characters of the input field. For
unpack(), nybbles are converted to a string of hexadecimal digits.
The p
format packs a pointer to a null-terminated string. You are
responsible for ensuring that the string is not a temporary value, as that
could potentially get deallocated before you got around to using the packed
result. The P
format packs a pointer to a structure of the size indicated
by the length. A null pointer is created if the corresponding value for
p
or P
is undef; similarly with unpack(), where a null pointer
unpacks into undef.
If your system has a strange pointer size--meaning a pointer is neither as big as an int nor as big as a long--it may not be possible to pack or unpack pointers in big- or little-endian byte order. Attempting to do so raises an exception.
The / template character allows packing and unpacking of a sequence of
items where the packed structure contains a packed item count followed by
the packed items themselves. This is useful when the structure you're
unpacking has encoded the sizes or repeat counts for some of its fields
within the structure itself as separate fields.
For pack, you write length-item/sequence-item, and the
length-item describes how the length value is packed. Formats likely
to be of most use are integer-packing ones like n
for Java strings,
w
for ASN.1 or SNMP, and N
for Sun XDR.
For pack, sequence-item may have a repeat count, in which case
the minimum of that and the number of available items is used as the argument
for length-item. If it has no repeat count or uses a '*', the number
of available items is used.
For unpack, an internal stack of integer arguments unpacked so far is
used. You write /sequence-item and the repeat count is obtained by
popping off the last element from the stack. The sequence-item must not
have a repeat count.
If sequence-item refers to a string type ("A"
, "a"
, or "Z"
),
the length-item is the string length, not the number of strings. With
an explicit repeat count for pack, the packed string is adjusted to that
length. For example:
The length-item is not returned explicitly from unpack.
Supplying a count to the length-item format letter is only useful with
A
, a
, or Z
. Packing with a length-item of a
or Z
may
introduce "\000"
characters, which Perl does not regard as legal in
numeric strings.
The integer types s, S
, l
, and L
may be
followed by a !
modifier to specify native shorts or
longs. As shown in the example above, a bare l
means
exactly 32 bits, although the native long
as seen by the local C compiler
may be larger. This is mainly an issue on 64-bit platforms. You can
see whether using !
makes any difference this way:
i!
and I!
are also allowed, but only for completeness' sake:
they are identical to i
and I
.
The actual sizes (in bytes) of native shorts, ints, longs, and long longs on the platform where Perl was built are also available from the command line:
- $ perl -V:{short,int,long{,long}}size
- shortsize='2';
- intsize='4';
- longsize='4';
- longlongsize='8';
or programmatically via the Config
module:
$Config{longlongsize}
is undefined on systems without
long long support.
The integer formats s, S
, i
, I
, l
, L
, j
, and J
are
inherently non-portable between processors and operating systems because
they obey native byteorder and endianness. For example, a 4-byte integer
0x12345678 (305419896 decimal) would be ordered natively (arranged in and
handled by the CPU registers) into bytes as
- 0x12 0x34 0x56 0x78 # big-endian
- 0x78 0x56 0x34 0x12 # little-endian
Basically, Intel and VAX CPUs are little-endian, while everybody else, including Motorola m68k/88k, PPC, Sparc, HP PA, Power, and Cray, are big-endian. Alpha and MIPS can be either: Digital/Compaq uses (well, used) them in little-endian mode, but SGI/Cray uses them in big-endian mode.
The names big-endian and little-endian are comic references to the egg-eating habits of the little-endian Lilliputians and the big-endian Blefuscudians from the classic Jonathan Swift satire, Gulliver's Travels. This entered computer lingo via the paper "On Holy Wars and a Plea for Peace" by Danny Cohen, USC/ISI IEN 137, April 1, 1980.
Some systems may have even weirder byte orders such as
- 0x56 0x78 0x12 0x34
- 0x34 0x12 0x78 0x56
You can determine your system endianness with this incantation:
The byteorder on the platform where Perl was built is also available via Config:
or from the command line:
- $ perl -V:byteorder
Byteorders "1234"
and "12345678"
are little-endian; "4321"
and "87654321"
are big-endian.
For portably packed integers, either use the formats n
, N
, v
,
and V
or else use the > and <
modifiers described
immediately below. See also perlport.
Starting with Perl 5.10.0, integer and floating-point formats, along with
the p
and P
formats and ()
groups, may all be followed by the
> or <
endianness modifiers to respectively enforce big-
or little-endian byte-order. These modifiers are especially useful
given how n
, N
, v
, and V
don't cover signed integers,
64-bit integers, or floating-point values.
Here are some concerns to keep in mind when using an endianness modifier:
Exchanging signed integers between different platforms works only when all platforms store them in the same format. Most platforms store signed integers in two's-complement notation, so usually this is not an issue.
The > or <
modifiers can only be used on floating-point
formats on big- or little-endian machines. Otherwise, attempting to
use them raises an exception.
Forcing big- or little-endian byte-order on floating-point values for
data exchange can work only if all platforms use the same
binary representation such as IEEE floating-point. Even if all
platforms are using IEEE, there may still be subtle differences. Being able
to use > or <
on floating-point values can be useful,
but also dangerous if you don't know exactly what you're doing.
It is not a general way to portably store floating-point values.
When using > or <
on a ()
group, this affects
all types inside the group that accept byte-order modifiers,
including all subgroups. It is silently ignored for all other
types. You are not allowed to override the byte-order within a group
that already has a byte-order modifier suffix.
Real numbers (floats and doubles) are in native machine format only. Due to the multiplicity of floating-point formats and the lack of a standard "network" representation for them, no facility for interchange has been made. This means that packed floating-point data written on one machine may not be readable on another, even if both use IEEE floating-point arithmetic (because the endianness of the memory representation is not part of the IEEE spec). See also perlport.
If you know exactly what you're doing, you can use the > or <
modifiers to force big- or little-endian byte-order on floating-point values.
Because Perl uses doubles (or long doubles, if configured) internally for
all numeric calculation, converting from double into float and thence
to double again loses precision, so unpack("f", pack("f", $foo))
will not in general equal $foo.
Pack and unpack can operate in two modes: character mode (C0
mode) where
the packed string is processed per character, and UTF-8 mode (U0
mode)
where the packed string is processed in its UTF-8-encoded Unicode form on
a byte-by-byte basis. Character mode is the default
unless the format string starts with U
. You
can always switch mode mid-format with an explicit
C0
or U0
in the format. This mode remains in effect until the next
mode change, or until the end of the ()
group it (directly) applies to.
Using C0
to get Unicode characters while using U0
to get non-Unicode
bytes is not necessarily obvious. Probably only the first of these
is what you want:
- $ perl -CS -E 'say "\x{3B1}\x{3C9}"' |
- perl -CS -ne 'printf "%v04X\n", $_ for unpack("C0A*", $_)'
- 03B1.03C9
- $ perl -CS -E 'say "\x{3B1}\x{3C9}"' |
- perl -CS -ne 'printf "%v02X\n", $_ for unpack("U0A*", $_)'
- CE.B1.CF.89
- $ perl -CS -E 'say "\x{3B1}\x{3C9}"' |
- perl -C0 -ne 'printf "%v02X\n", $_ for unpack("C0A*", $_)'
- CE.B1.CF.89
- $ perl -CS -E 'say "\x{3B1}\x{3C9}"' |
- perl -C0 -ne 'printf "%v02X\n", $_ for unpack("U0A*", $_)'
- C3.8E.C2.B1.C3.8F.C2.89
Those examples also illustrate that you should not try to use
pack/unpack as a substitute for the Encode module.
You must yourself do any alignment or padding by inserting, for example,
enough "x"
es while packing. There is no way for pack() and unpack()
to know where characters are going to or coming from, so they
handle their output and input as flat sequences of characters.
A ()
group is a sub-TEMPLATE enclosed in parentheses. A group may
take a repeat count either as postfix, or for unpack(), also via the /
template character. Within each repetition of a group, positioning with
@
starts over at 0. Therefore, the result of
- pack("@1A((@2A)@3A)", qw[X Y Z])
is the string "\0X\0\0YZ"
.
x
and X
accept the !
modifier to act as alignment commands: they
jump forward or back to the closest position aligned at a multiple of count
characters. For example, to pack() or unpack() a C structure like
- struct {
- char c; /* one signed, 8-bit character */
- double d;
- char cc[2];
- }
one may need to use the template c x![d] d c[2]
. This assumes that
doubles must be aligned to the size of double.
For alignment commands, a count
of 0 is equivalent to a count
of 1;
both are no-ops.
n
, N
, v
and V
accept the !
modifier to
represent signed 16-/32-bit integers in big-/little-endian order.
This is portable only when all platforms sharing packed data use the
same binary representation for signed integers; for example, when all
platforms use two's-complement representation.
Comments can be embedded in a TEMPLATE using #
through the end of line.
White space can separate pack codes from each other, but modifiers and
repeat counts must follow immediately. Breaking complex templates into
individual line-by-line components, suitably annotated, can do as much to
improve legibility and maintainability of pack/unpack formats as /x can
for complicated pattern matches.
If TEMPLATE requires more arguments than pack() is given, pack()
assumes additional ""
arguments. If TEMPLATE requires fewer arguments
than given, extra arguments are ignored.
Examples:
- $foo = pack("WWWW",65,66,67,68);
- # foo eq "ABCD"
- $foo = pack("W4",65,66,67,68);
- # same thing
- $foo = pack("W4",0x24b6,0x24b7,0x24b8,0x24b9);
- # same thing with Unicode circled letters.
- $foo = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9);
- # same thing with Unicode circled letters. You don't get the
- # UTF-8 bytes because the U at the start of the format caused
- # a switch to U0-mode, so the UTF-8 bytes get joined into
- # characters
- $foo = pack("C0U4",0x24b6,0x24b7,0x24b8,0x24b9);
- # foo eq "\xe2\x92\xb6\xe2\x92\xb7\xe2\x92\xb8\xe2\x92\xb9"
- # This is the UTF-8 encoding of the string in the
- # previous example
- $foo = pack("ccxxcc",65,66,67,68);
- # foo eq "AB\0\0CD"
- # NOTE: The examples above featuring "W" and "c" are true
- # only on ASCII and ASCII-derived systems such as ISO Latin 1
- # and UTF-8. On EBCDIC systems, the first example would be
- # $foo = pack("WWWW",193,194,195,196);
- $foo = pack("s2",1,2);
- # "\001\000\002\000" on little-endian
- # "\000\001\000\002" on big-endian
- $foo = pack("a4","abcd","x","y","z");
- # "abcd"
- $foo = pack("aaaa","abcd","x","y","z");
- # "axyz"
- $foo = pack("a14","abcdefg");
- # "abcdefg\0\0\0\0\0\0\0"
- $foo = pack("i9pl", gmtime);
- # a real struct tm (on my system anyway)
- $utmp_template = "Z8 Z8 Z16 L";
- $utmp = pack($utmp_template, @utmp1);
- # a struct utmp (BSDish)
- @utmp2 = unpack($utmp_template, $utmp);
- # "@utmp1" eq "@utmp2"
- sub bintodec {
- unpack("N", pack("B32", substr("0" x 32 . shift, -32)));
- }
- $foo = pack('sx2l', 12, 34);
- # short 12, two zero bytes padding, long 34
- $bar = pack('s@4l', 12, 34);
- # short 12, zero fill to position 4, long 34
- # $foo eq $bar
- $baz = pack('s.l', 12, 4, 34);
- # short 12, zero fill to position 4, long 34
- $foo = pack('nN', 42, 4711);
- # pack big-endian 16- and 32-bit unsigned integers
- $foo = pack('S>L>', 42, 4711);
- # exactly the same
- $foo = pack('s<l<', -42, 4711);
- # pack little-endian 16- and 32-bit signed integers
- $foo = pack('(sl)<', -42, 4711);
- # exactly the same
The same template may generally also be used in unpack().
Declares the BLOCK or the rest of the compilation unit as being in the
given namespace. The scope of the package declaration is either the
supplied code BLOCK or, in the absence of a BLOCK, from the declaration
itself through the end of current scope (the enclosing block, file, or
eval). That is, the forms without a BLOCK are operative through the end
of the current scope, just like the my, state, and our operators.
All unqualified dynamic identifiers in this scope will be in the given
namespace, except where overridden by another package declaration or
when they're one of the special identifiers that qualify into main::
,
like STDOUT
, ARGV
, ENV
, and the punctuation variables.
A package statement affects dynamic variables only, including those
you've used local on, but not lexically-scoped variables, which are created
with my, state, or our. Typically it would be the first
declaration in a file included by require or use. You can switch into a
package in more than one place, since this only determines which default
symbol table the compiler uses for the rest of that block. You can refer to
identifiers in other packages than the current one by prefixing the identifier
with the package name and a double colon, as in $SomePack::var
or ThatPack::INPUT_HANDLE
. If package name is omitted, the main
package as assumed. That is, $::sail
is equivalent to
$main::sail
(as well as to $main'sail
, still seen in ancient
code, mostly from Perl 4).
If VERSION is provided, package sets the $VERSION
variable in the given
namespace to a version object with the VERSION provided. VERSION must be a
"strict" style version number as defined by the version module: a positive
decimal number (integer or decimal-fraction) without exponentiation or else a
dotted-decimal v-string with a leading 'v' character and at least three
components. You should set $VERSION
only once per package.
See Packages in perlmod for more information about packages, modules, and classes. See perlsub for other scoping issues.
A special token that returns the name of the package in which it occurs.
Opens a pair of connected pipes like the corresponding system call.
Note that if you set up a loop of piped processes, deadlock can occur
unless you are very careful. In addition, note that Perl's pipes use
IO buffering, so you may need to set $|
to flush your WRITEHANDLE
after each command, depending on the application.
Returns true on success.
See IPC::Open2, IPC::Open3, and Bidirectional Communication with Another Process in perlipc for examples of such things.
On systems that support a close-on-exec flag on files, that flag is set
on all newly opened file descriptors whose filenos are higher than
the current value of $^F (by default 2 for STDERR
). See $^F in perlvar.
Pops and returns the last value of the array, shortening the array by one element.
Returns the undefined value if the array is empty, although this may also
happen at other times. If ARRAY is omitted, pops the @ARGV
array in the
main program, but the @_
array in subroutines, just like shift.
Starting with Perl 5.14, pop can take a scalar EXPR, which must hold a
reference to an unblessed array. The argument will be dereferenced
automatically. This aspect of pop is considered highly experimental.
The exact behaviour may change in a future version of Perl.
To avoid confusing would-be users of your code who are running earlier versions of Perl with mysterious syntax errors, put this sort of thing at the top of your file to signal that your code will work only on Perls of a recent vintage:
- use 5.014; # so push/pop/etc work on scalars (experimental)
Returns the offset of where the last m//g search left off for the
variable in question ($_
is used when the variable is not
specified). Note that 0 is a valid match offset. undef indicates
that the search position is reset (usually due to match failure, but
can also be because no match has yet been run on the scalar).
pos directly accesses the location used by the regexp engine to
store the offset, so assigning to pos will change that offset, and
so will also influence the \G
zero-width assertion in regular
expressions. Both of these effects take place for the next match, so
you can't affect the position with pos during the current match,
such as in (?{pos() = 5})
or s//pos() = 5/e
.
Setting pos also resets the matched with zero-length flag, described
under Repeated Patterns Matching a Zero-length Substring in perlre.
Because a failed m//gc match doesn't reset the offset, the return
from pos won't change either in this case. See perlre and
perlop.
Prints a string or a list of strings. Returns true if successful.
FILEHANDLE may be a scalar variable containing the name of or a reference
to the filehandle, thus introducing one level of indirection. (NOTE: If
FILEHANDLE is a variable and the next token is a term, it may be
misinterpreted as an operator unless you interpose a +
or put
parentheses around the arguments.) If FILEHANDLE is omitted, prints to the
last selected (see select) output handle. If LIST is omitted, prints
$_
to the currently selected output handle. To use FILEHANDLE alone to
print the content of $_
to it, you must use a real filehandle like
FH
, not an indirect one like $fh
. To set the default output handle
to something other than STDOUT, use the select operation.
The current value of $,
(if any) is printed between each LIST item. The
current value of $\
(if any) is printed after the entire LIST has been
printed. Because print takes a LIST, anything in the LIST is evaluated in
list context, including any subroutines whose return lists you pass to
print. Be careful not to follow the print keyword with a left
parenthesis unless you want the corresponding right parenthesis to
terminate the arguments to the print; put parentheses around all arguments
(or interpose a +
, but that doesn't look as good).
If you're storing handles in an array or hash, or in general whenever you're using any expression more complex than a bareword handle or a plain, unsubscripted scalar variable to retrieve it, you will have to use a block returning the filehandle value instead, in which case the LIST may not be omitted:
Printing to a closed pipe or socket will generate a SIGPIPE signal. See perlipc for more on signal handling.
Equivalent to print FILEHANDLE sprintf(FORMAT, LIST)
, except that $\
(the output record separator) is not appended. The FORMAT and the
LIST are actually parsed as a single list. The first argument
of the list will be interpreted as the printf format. This
means that printf(@_) will use $_[0]
as the format. See
sprintf for an
explanation of the format argument. If use locale
(including
use locale ':not_characters'
) is in effect and
POSIX::setlocale() has been called, the character used for the decimal
separator in formatted floating-point numbers is affected by the LC_NUMERIC
locale setting. See perllocale and POSIX.
For historical reasons, if you omit the list, $_
is used as the format;
to use FILEHANDLE without a list, you must use a real filehandle like
FH
, not an indirect one like $fh
. However, this will rarely do what
you want; if $_ contains formatting codes, they will be replaced with the
empty string and a warning will be emitted if warnings are enabled. Just
use print if you want to print the contents of $_.
Don't fall into the trap of using a printf when a simple
print would do. The print is more efficient and less
error prone.
Returns the prototype of a function as a string (or undef if the
function has no prototype). FUNCTION is a reference to, or the name of,
the function whose prototype you want to retrieve.
If FUNCTION is a string starting with CORE::
, the rest is taken as a
name for a Perl builtin. If the builtin's arguments
cannot be adequately expressed by a prototype
(such as system), prototype() returns undef, because the builtin
does not really behave like a Perl function. Otherwise, the string
describing the equivalent prototype is returned.
Treats ARRAY as a stack by appending the values of LIST to the end of ARRAY. The length of ARRAY increases by the length of LIST. Has the same effect as
- for $value (LIST) {
- $ARRAY[++$#ARRAY] = $value;
- }
but is more efficient. Returns the number of elements in the array following
the completed push.
Starting with Perl 5.14, push can take a scalar EXPR, which must hold a
reference to an unblessed array. The argument will be dereferenced
automatically. This aspect of push is considered highly experimental.
The exact behaviour may change in a future version of Perl.
To avoid confusing would-be users of your code who are running earlier versions of Perl with mysterious syntax errors, put this sort of thing at the top of your file to signal that your code will work only on Perls of a recent vintage:
- use 5.014; # so push/pop/etc work on scalars (experimental)
Generalized quotes. See Quote-Like Operators in perlop.
Regexp-like quote. See Regexp Quote-Like Operators in perlop.
Returns the value of EXPR with all the ASCII non-"word"
characters backslashed. (That is, all ASCII characters not matching
/[A-Za-z_0-9]/
will be preceded by a backslash in the
returned string, regardless of any locale settings.)
This is the internal function implementing
the \Q
escape in double-quoted strings.
(See below for the behavior on non-ASCII code points.)
If EXPR is omitted, uses $_
.
quotemeta (and \Q
... \E
) are useful when interpolating strings into
regular expressions, because by default an interpolated variable will be
considered a mini-regular expression. For example:
Will cause $sentence
to become 'The big bad wolf jumped over...'
.
On the other hand:
Or:
Will both leave the sentence as is.
Normally, when accepting literal string
input from the user, quotemeta() or \Q
must be used.
In Perl v5.14, all non-ASCII characters are quoted in non-UTF-8-encoded strings, but not quoted in UTF-8 strings.
Starting in Perl v5.16, Perl adopted a Unicode-defined strategy for quoting non-ASCII characters; the quoting of ASCII characters is unchanged.
Also unchanged is the quoting of non-UTF-8 strings when outside the
scope of a use feature 'unicode_strings'
, which is to quote all
characters in the upper Latin1 range. This provides complete backwards
compatibility for old programs which do not use Unicode. (Note that
unicode_strings
is automatically enabled within the scope of a
use v5.12
or greater.)
Within the scope of use locale
, all non-ASCII Latin1 code points
are quoted whether the string is encoded as UTF-8 or not. As mentioned
above, locale does not affect the quoting of ASCII-range characters.
This protects against those locales where characters such as "|"
are
considered to be word characters.
Otherwise, Perl quotes non-ASCII characters using an adaptation from Unicode (see http://www.unicode.org/reports/tr31/). The only code points that are quoted are those that have any of the Unicode properties: Pattern_Syntax, Pattern_White_Space, White_Space, Default_Ignorable_Code_Point, or General_Category=Control.
Of these properties, the two important ones are Pattern_Syntax and Pattern_White_Space. They have been set up by Unicode for exactly this purpose of deciding which characters in a regular expression pattern should be quoted. No character that can be in an identifier has these properties.
Perl promises, that if we ever add regular expression pattern
metacharacters to the dozen already defined
(\ | ( ) [ { ^ $ * + ? .
), that we will only use ones that have the
Pattern_Syntax property. Perl also promises, that if we ever add
characters that are considered to be white space in regular expressions
(currently mostly affected by /x), they will all have the
Pattern_White_Space property.
Unicode promises that the set of code points that have these two properties will never change, so something that is not quoted in v5.16 will never need to be quoted in any future Perl release. (Not all the code points that match Pattern_Syntax have actually had characters assigned to them; so there is room to grow, but they are quoted whether assigned or not. Perl, of course, would never use an unassigned code point as an actual metacharacter.)
Quoting characters that have the other 3 properties is done to enhance the readability of the regular expression and not because they actually need to be quoted for regular expression purposes (characters with the White_Space property are likely to be indistinguishable on the page or screen from those with the Pattern_White_Space property; and the other two properties contain non-printing characters).
Returns a random fractional number greater than or equal to 0
and less
than the value of EXPR. (EXPR should be positive.) If EXPR is
omitted, the value 1
is used. Currently EXPR with the value 0
is
also special-cased as 1
(this was undocumented before Perl 5.8.0
and is subject to change in future versions of Perl). Automatically calls
srand unless srand has already been called. See also srand.
Apply int() to the value returned by rand() if you want random
integers instead of random fractional numbers. For example,
returns a random integer between 0
and 9
, inclusive.
(Note: If your rand function consistently returns numbers that are too large or too small, then your version of Perl was probably compiled with the wrong number of RANDBITS.)
rand() is not cryptographically secure. You should not rely
on it in security-sensitive situations. As of this writing, a
number of third-party CPAN modules offer random number generators
intended by their authors to be cryptographically secure,
including: Data::Entropy, Crypt::Random, Math::Random::Secure,
and Math::TrulyRandom.
Attempts to read LENGTH characters of data into variable SCALAR
from the specified FILEHANDLE. Returns the number of characters
actually read, 0
at end of file, or undef if there was an error (in
the latter case $!
is also set). SCALAR will be grown or shrunk
so that the last character actually read is the last character of the
scalar after the read.
An OFFSET may be specified to place the read data at some place in the
string other than the beginning. A negative OFFSET specifies
placement at that many characters counting backwards from the end of
the string. A positive OFFSET greater than the length of SCALAR
results in the string being padded to the required size with "\0"
bytes before the result of the read is appended.
The call is implemented in terms of either Perl's or your system's native fread(3) library function. To get a true read(2) system call, see sysread.
Note the characters: depending on the status of the filehandle,
either (8-bit) bytes or characters are read. By default, all
filehandles operate on bytes, but for example if the filehandle has
been opened with the :utf8
I/O layer (see open, and the open
pragma, open), the I/O will operate on UTF8-encoded Unicode
characters, not bytes. Similarly for the :encoding
pragma:
in that case pretty much any characters can be read.
Returns the next directory entry for a directory opened by opendir.
If used in list context, returns all the rest of the entries in the
directory. If there are no more entries, returns the undefined value in
scalar context and the empty list in list context.
If you're planning to filetest the return values out of a readdir, you'd
better prepend the directory in question. Otherwise, because we didn't
chdir there, it would have been testing the wrong file.
As of Perl 5.12 you can use a bare readdir in a while
loop,
which will set $_
on every iteration.
To avoid confusing would-be users of your code who are running earlier versions of Perl with mysterious failures, put this sort of thing at the top of your file to signal that your code will work only on Perls of a recent vintage:
- use 5.012; # so readdir assigns to $_ in a lone while test
Reads from the filehandle whose typeglob is contained in EXPR (or from
*ARGV
if EXPR is not provided). In scalar context, each call reads and
returns the next line until end-of-file is reached, whereupon the
subsequent call returns undef. In list context, reads until end-of-file
is reached and returns a list of lines. Note that the notion of "line"
used here is whatever you may have defined with $/
or
$INPUT_RECORD_SEPARATOR
). See $/ in perlvar.
When $/
is set to undef, when readline is in scalar
context (i.e., file slurp mode), and when an empty file is read, it
returns ''
the first time, followed by undef subsequently.
This is the internal function implementing the <EXPR>
operator, but you can use it directly. The <EXPR>
operator is discussed in more detail in I/O Operators in perlop.
- $line = <STDIN>;
- $line = readline(*STDIN); # same thing
If readline encounters an operating system error, $!
will be set
with the corresponding error message. It can be helpful to check
$!
when you are reading from filehandles you don't trust, such as a
tty or a socket. The following example uses the operator form of
readline and dies if the result is not defined.
Note that you have can't handle readline errors that way with the
ARGV
filehandle. In that case, you have to open each element of
@ARGV
yourself since eof handles ARGV
differently.
Returns the value of a symbolic link, if symbolic links are
implemented. If not, raises an exception. If there is a system
error, returns the undefined value and sets $!
(errno). If EXPR is
omitted, uses $_
.
Portability issues: readlink in perlport.
EXPR is executed as a system command.
The collected standard output of the command is returned.
In scalar context, it comes back as a single (potentially
multi-line) string. In list context, returns a list of lines
(however you've defined lines with $/
or $INPUT_RECORD_SEPARATOR
).
This is the internal function implementing the qx/EXPR/
operator, but you can use it directly. The qx/EXPR/
operator is discussed in more detail in I/O Operators in perlop.
If EXPR is omitted, uses $_
.
Receives a message on a socket. Attempts to receive LENGTH characters of data into variable SCALAR from the specified SOCKET filehandle. SCALAR will be grown or shrunk to the length actually read. Takes the same flags as the system call of the same name. Returns the address of the sender if SOCKET's protocol supports this; returns an empty string otherwise. If there's an error, returns the undefined value. This call is actually implemented in terms of recvfrom(2) system call. See UDP: Message Passing in perlipc for examples.
Note the characters: depending on the status of the socket, either
(8-bit) bytes or characters are received. By default all sockets
operate on bytes, but for example if the socket has been changed using
binmode() to operate with the :encoding(utf8)
I/O layer (see the
open pragma, open), the I/O will operate on UTF8-encoded Unicode
characters, not bytes. Similarly for the :encoding
pragma: in that
case pretty much any characters can be read.
The redo command restarts the loop block without evaluating the
conditional again. The continue block, if any, is not executed. If
the LABEL is omitted, the command refers to the innermost enclosing
loop. The redo EXPR
form, available starting in Perl 5.18.0, allows a
label name to be computed at run time, and is otherwise identical to redo
LABEL
. Programs that want to lie to themselves about what was just input
normally use this command:
redo cannot be used to retry a block that returns a value such as
eval {}
, sub {}
, or do {}
, and should not be used to exit
a grep() or map() operation.
Note that a block by itself is semantically identical to a loop
that executes once. Thus redo inside such a block will effectively
turn it into a looping construct.
See also continue for an illustration of how last, next, and
redo work.
Unlike most named operators, this has the same precedence as assignment.
It is also exempt from the looks-like-a-function rule, so
redo ("foo")."bar"
will cause "bar" to be part of the argument to
redo.
Returns a non-empty string if EXPR is a reference, the empty
string otherwise. If EXPR
is not specified, $_
will be used. The value returned depends on the
type of thing the reference is a reference to.
Builtin types include:
- SCALAR
- ARRAY
- HASH
- CODE
- REF
- GLOB
- LVALUE
- FORMAT
- IO
- VSTRING
- Regexp
If the referenced object has been blessed into a package, then that package
name is returned instead. You can think of ref as a typeof
operator.
The return value LVALUE
indicates a reference to an lvalue that is not
a variable. You get this from taking the reference of function calls like
pos() or substr(). VSTRING
is returned if the reference points
to a version string.
The result Regexp
indicates that the argument is a regular expression
resulting from qr//.
See also perlref.
Changes the name of a file; an existing file NEWNAME will be clobbered. Returns true for success, false otherwise.
Behavior of this function varies wildly depending on your system implementation. For example, it will usually not work across file system boundaries, even though the system mv command sometimes compensates for this. Other restrictions include whether it works on directories, open files, or pre-existing files. Check perlport and either the rename(2) manpage or equivalent system documentation for details.
For a platform independent move
function look at the File::Copy
module.
Portability issues: rename in perlport.
Demands a version of Perl specified by VERSION, or demands some semantics
specified by EXPR or by $_
if EXPR is not supplied.
VERSION may be either a numeric argument such as 5.006, which will be
compared to $]
, or a literal of the form v5.6.1, which will be compared
to $^V
(aka $PERL_VERSION). An exception is raised if
VERSION is greater than the version of the current Perl interpreter.
Compare with use, which can do a similar check at compile time.
Specifying VERSION as a literal of the form v5.6.1 should generally be avoided, because it leads to misleading error messages under earlier versions of Perl that do not support this syntax. The equivalent numeric version should be used instead.
Otherwise, require demands that a library file be included if it
hasn't already been included. The file is included via the do-FILE
mechanism, which is essentially just a variety of eval with the
caveat that lexical variables in the invoking script will be invisible
to the included code. Has semantics similar to the following subroutine:
- sub require {
- my ($filename) = @_;
- if (exists $INC{$filename}) {
- return 1 if $INC{$filename};
- die "Compilation failed in require";
- }
- my ($realfilename,$result);
- ITER: {
- foreach $prefix (@INC) {
- $realfilename = "$prefix/$filename";
- if (-f $realfilename) {
- $INC{$filename} = $realfilename;
- $result = do $realfilename;
- last ITER;
- }
- }
- die "Can't find $filename in \@INC";
- }
- if ($@) {
- $INC{$filename} = undef;
- die $@;
- } elsif (!$result) {
- delete $INC{$filename};
- die "$filename did not return true value";
- } else {
- return $result;
- }
- }
Note that the file will not be included twice under the same specified name.
The file must return true as the last statement to indicate
successful execution of any initialization code, so it's customary to
end such a file with 1;
unless you're sure it'll return true
otherwise. But it's better just to put the 1;
, in case you add more
statements.
If EXPR is a bareword, the require assumes a ".pm" extension and replaces "::" with "/" in the filename for you, to make it easy to load standard modules. This form of loading of modules does not risk altering your namespace.
In other words, if you try this:
- require Foo::Bar; # a splendid bareword
The require function will actually look for the "Foo/Bar.pm" file in the
directories specified in the @INC
array.
But if you try this:
The require function will look for the "Foo::Bar" file in the @INC array and will complain about not finding "Foo::Bar" there. In this case you can do:
- eval "require $class";
Now that you understand how require looks for files with a
bareword argument, there is a little extra functionality going on behind
the scenes. Before require looks for a ".pm" extension, it will
first look for a similar filename with a ".pmc" extension. If this file
is found, it will be loaded in place of any file ending in a ".pm"
extension.
You can also insert hooks into the import facility by putting Perl code directly into the @INC array. There are three forms of hooks: subroutine references, array references, and blessed objects.
Subroutine references are the simplest case. When the inclusion system walks through @INC and encounters a subroutine, this subroutine gets called with two parameters, the first a reference to itself, and the second the name of the file to be included (e.g., "Foo/Bar.pm"). The subroutine should return either nothing or else a list of up to three values in the following order:
A filehandle, from which the file will be read.
A reference to a subroutine. If there is no filehandle (previous item),
then this subroutine is expected to generate one line of source code per
call, writing the line into $_
and returning 1, then finally at end of
file returning 0. If there is a filehandle, then the subroutine will be
called to act as a simple source filter, with the line as read in $_
.
Again, return 1 for each valid line, and 0 after all lines have been
returned.
Optional state for the subroutine. The state is passed in as $_[1]
. A
reference to the subroutine itself is passed in as $_[0]
.
If an empty list, undef, or nothing that matches the first 3 values above
is returned, then require looks at the remaining elements of @INC.
Note that this filehandle must be a real filehandle (strictly a typeglob
or reference to a typeglob, whether blessed or unblessed); tied filehandles
will be ignored and processing will stop there.
If the hook is an array reference, its first element must be a subroutine reference. This subroutine is called as above, but the first parameter is the array reference. This lets you indirectly pass arguments to the subroutine.
In other words, you can write:
or:
If the hook is an object, it must provide an INC method that will be
called as above, the first parameter being the object itself. (Note that
you must fully qualify the sub's name, as unqualified INC
is always forced
into package main
.) Here is a typical code layout:
These hooks are also permitted to set the %INC entry corresponding to the files they have loaded. See %INC in perlvar.
For a yet-more-powerful import facility, see use and perlmod.
Generally used in a continue block at the end of a loop to clear
variables and reset ??
searches so that they work again. The
expression is interpreted as a list of single characters (hyphens
allowed for ranges). All variables and arrays beginning with one of
those letters are reset to their pristine state. If the expression is
omitted, one-match searches (?pattern?
) are reset to match again.
Only resets variables or searches in the current package. Always returns
1. Examples:
Resetting "A-Z"
is not recommended because you'll wipe out your
@ARGV
and @INC
arrays and your %ENV
hash. Resets only package
variables; lexical variables are unaffected, but they clean themselves
up on scope exit anyway, so you'll probably want to use them instead.
See my.
Returns from a subroutine, eval, or do FILE
with the value
given in EXPR. Evaluation of EXPR may be in list, scalar, or void
context, depending on how the return value will be used, and the context
may vary from one execution to the next (see wantarray). If no EXPR
is given, returns an empty list in list context, the undefined value in
scalar context, and (of course) nothing at all in void context.
(In the absence of an explicit return, a subroutine, eval,
or do FILE automatically returns the value of the last expression
evaluated.)
Unlike most named operators, this is also exempt from the
looks-like-a-function rule, so return ("foo")."bar"
will
cause "bar" to be part of the argument to return.
In list context, returns a list value consisting of the elements of LIST in the opposite order. In scalar context, concatenates the elements of LIST and returns a string value with all characters in the opposite order.
Used without arguments in scalar context, reverse() reverses $_
.
Note that reversing an array to itself (as in @a = reverse @a
) will
preserve non-existent elements whenever possible; i.e., for non-magical
arrays or for tied arrays with EXISTS
and DELETE
methods.
This operator is also handy for inverting a hash, although there are some caveats. If a value is duplicated in the original hash, only one of those can be represented as a key in the inverted hash. Also, this has to unwind one hash and build a whole new one, which may take some time on a large hash, such as from a DBM file.
- %by_name = reverse %by_address; # Invert the hash
Sets the current position to the beginning of the directory for the
readdir routine on DIRHANDLE.
Portability issues: rewinddir in perlport.
Works just like index() except that it returns the position of the last occurrence of SUBSTR in STR. If POSITION is specified, returns the last occurrence beginning at or before that position.
Deletes the directory specified by FILENAME if that directory is
empty. If it succeeds it returns true; otherwise it returns false and
sets $!
(errno). If FILENAME is omitted, uses $_
.
To remove a directory tree recursively (rm -rf
on Unix) look at
the rmtree
function of the File::Path module.
The substitution operator. See Regexp Quote-Like Operators in perlop.
Just like print, but implicitly appends a newline. say LIST
is
simply an abbreviation for { local $\ = "\n"; print LIST }
. To use
FILEHANDLE without a LIST to print the contents of $_
to it, you must
use a real filehandle like FH
, not an indirect one like $fh
.
This keyword is available only when the "say"
feature
is enabled, or when prefixed with CORE::
; see
feature. Alternately, include a use v5.10
or later to the current
scope.
Forces EXPR to be interpreted in scalar context and returns the value of EXPR.
There is no equivalent operator to force an expression to
be interpolated in list context because in practice, this is never
needed. If you really wanted to do so, however, you could use
the construction @{[ (some expression) ]}
, but usually a simple
(some expression)
suffices.
Because scalar is a unary operator, if you accidentally use a
parenthesized list for the EXPR, this behaves as a scalar comma expression,
evaluating all but the last element in void context and returning the final
element evaluated in scalar context. This is seldom what you want.
The following single statement:
is the moral equivalent of these two:
See perlop for more details on unary operators and the comma operator.
Sets FILEHANDLE's position, just like the fseek
call of stdio
.
FILEHANDLE may be an expression whose value gives the name of the
filehandle. The values for WHENCE are 0
to set the new position
in bytes to POSITION; 1
to set it to the current position plus
POSITION; and 2
to set it to EOF plus POSITION, typically
negative. For WHENCE you may use the constants SEEK_SET
,
SEEK_CUR
, and SEEK_END
(start of the file, current position, end
of the file) from the Fcntl module. Returns 1
on success, false
otherwise.
Note the in bytes: even if the filehandle has been set to
operate on characters (for example by using the :encoding(utf8)
open
layer), tell() will return byte offsets, not character offsets
(because implementing that would render seek() and tell() rather slow).
If you want to position the file for sysread or syswrite, don't use
seek, because buffering makes its effect on the file's read-write position
unpredictable and non-portable. Use sysseek instead.
Due to the rules and rigors of ANSI C, on some systems you have to do a
seek whenever you switch between reading and writing. Amongst other
things, this may have the effect of calling stdio's clearerr(3).
A WHENCE of 1
(SEEK_CUR
) is useful for not moving the file position:
- seek(TEST,0,1);
This is also useful for applications emulating tail -f
. Once you hit
EOF on your read and then sleep for a while, you (probably) have to stick in a
dummy seek() to reset things. The seek doesn't change the position,
but it does clear the end-of-file condition on the handle, so that the
next <FILE>
makes Perl try again to read something. (We hope.)
If that doesn't work (some I/O implementations are particularly cantankerous), you might need something like this:
Sets the current position for the readdir routine on DIRHANDLE. POS
must be a value returned by telldir. seekdir also has the same caveats
about possible directory compaction as the corresponding system library
routine.
Returns the currently selected filehandle. If FILEHANDLE is supplied,
sets the new current default filehandle for output. This has two
effects: first, a write or a print without a filehandle
default to this FILEHANDLE. Second, references to variables related to
output will refer to this output channel.
For example, to set the top-of-form format for more than one output channel, you might do the following:
FILEHANDLE may be an expression whose value gives the name of the actual filehandle. Thus:
Some programmers may prefer to think of filehandles as objects with methods, preferring to write the last example as:
- use IO::Handle;
- STDERR->autoflush(1);
Portability issues: select in perlport.
This calls the select(2) syscall with the bit masks specified, which
can be constructed using fileno and vec, along these lines:
If you want to select on many filehandles, you may wish to write a subroutine like this:
The usual idiom is:
- ($nfound,$timeleft) =
- select($rout=$rin, $wout=$win, $eout=$ein, $timeout);
or to block until something becomes ready just do this
Most systems do not bother to return anything useful in $timeleft, so calling select() in scalar context just returns $nfound.
Any of the bit masks can also be undef. The timeout, if specified, is in seconds, which may be fractional. Note: not all implementations are capable of returning the $timeleft. If not, they always return $timeleft equal to the supplied $timeout.
You can effect a sleep of 250 milliseconds this way:
Note that whether select gets restarted after signals (say, SIGALRM)
is implementation-dependent. See also perlport for notes on the
portability of select.
On error, select behaves just like select(2): it returns
-1 and sets $!
.
On some Unixes, select(2) may report a socket file descriptor as "ready for
reading" even when no data is available, and thus any subsequent read
would block. This can be avoided if you always use O_NONBLOCK on the
socket. See select(2) and fcntl(2) for further details.
The standard IO::Select
module provides a user-friendlier interface
to select, mostly because it does all the bit-mask work for you.
WARNING: One should not attempt to mix buffered I/O (like read
or <FH>) with select, except as permitted by POSIX, and even
then only on POSIX systems. You have to use sysread instead.
Portability issues: select in perlport.
Calls the System V IPC function semctl(2). You'll probably have to say
- use IPC::SysV;
first to get the correct constant definitions. If CMD is IPC_STAT or
GETALL, then ARG must be a variable that will hold the returned
semid_ds structure or semaphore value array. Returns like ioctl:
the undefined value for error, "0 but true" for zero, or the actual
return value otherwise. The ARG must consist of a vector of native
short integers, which may be created with pack("s!",(0)x$nsem).
See also SysV IPC in perlipc, IPC::SysV
, IPC::Semaphore
documentation.
Portability issues: semctl in perlport.
Calls the System V IPC function semget(2). Returns the semaphore id, or
the undefined value on error. See also
SysV IPC in perlipc, IPC::SysV
, IPC::SysV::Semaphore
documentation.
Portability issues: semget in perlport.
Calls the System V IPC function semop(2) for semaphore operations
such as signalling and waiting. OPSTRING must be a packed array of
semop structures. Each semop structure can be generated with
pack("s!3", $semnum, $semop, $semflag)
. The length of OPSTRING
implies the number of semaphore operations. Returns true if
successful, false on error. As an example, the
following code waits on semaphore $semnum of semaphore id $semid:
To signal the semaphore, replace -1
with 1
. See also
SysV IPC in perlipc, IPC::SysV
, and IPC::SysV::Semaphore
documentation.
Portability issues: semop in perlport.
Sends a message on a socket. Attempts to send the scalar MSG to the SOCKET filehandle. Takes the same flags as the system call of the same name. On unconnected sockets, you must specify a destination to send to, in which case it does a sendto(2) syscall. Returns the number of characters sent, or the undefined value on error. The sendmsg(2) syscall is currently unimplemented. See UDP: Message Passing in perlipc for examples.
Note the characters: depending on the status of the socket, either
(8-bit) bytes or characters are sent. By default all sockets operate
on bytes, but for example if the socket has been changed using
binmode() to operate with the :encoding(utf8)
I/O layer (see
open, or the open pragma, open), the I/O will operate on UTF-8
encoded Unicode characters, not bytes. Similarly for the :encoding
pragma: in that case pretty much any characters can be sent.
Sets the current process group for the specified PID, 0
for the current
process. Raises an exception when used on a machine that doesn't
implement POSIX setpgid(2) or BSD setpgrp(2). If the arguments are omitted,
it defaults to 0,0
. Note that the BSD 4.2 version of setpgrp does not
accept any arguments, so only setpgrp(0,0) is portable. See also
POSIX::setsid()
.
Portability issues: setpgrp in perlport.
Sets the current priority for a process, a process group, or a user. (See setpriority(2).) Raises an exception when used on a machine that doesn't implement setpriority(2).
Portability issues: setpriority in perlport.
Sets the socket option requested. Returns undef on error.
Use integer constants provided by the Socket
module for
LEVEL and OPNAME. Values for LEVEL can also be obtained from
getprotobyname. OPTVAL might either be a packed string or an integer.
An integer OPTVAL is shorthand for pack("i", OPTVAL).
An example disabling Nagle's algorithm on a socket:
- use Socket qw(IPPROTO_TCP TCP_NODELAY);
- setsockopt($socket, IPPROTO_TCP, TCP_NODELAY, 1);
Portability issues: setsockopt in perlport.
Shifts the first value of the array off and returns it, shortening the
array by 1 and moving everything down. If there are no elements in the
array, returns the undefined value. If ARRAY is omitted, shifts the
@_
array within the lexical scope of subroutines and formats, and the
@ARGV
array outside a subroutine and also within the lexical scopes
established by the eval STRING
, BEGIN {}
, INIT {}
, CHECK {}
,
UNITCHECK {}
, and END {}
constructs.
Starting with Perl 5.14, shift can take a scalar EXPR, which must hold a
reference to an unblessed array. The argument will be dereferenced
automatically. This aspect of shift is considered highly experimental.
The exact behaviour may change in a future version of Perl.
To avoid confusing would-be users of your code who are running earlier versions of Perl with mysterious syntax errors, put this sort of thing at the top of your file to signal that your code will work only on Perls of a recent vintage:
- use 5.014; # so push/pop/etc work on scalars (experimental)
See also unshift, push, and pop. shift and unshift do the
same thing to the left end of an array that pop and push do to the
right end.
Calls the System V IPC function shmctl. You'll probably have to say
- use IPC::SysV;
first to get the correct constant definitions. If CMD is IPC_STAT
,
then ARG must be a variable that will hold the returned shmid_ds
structure. Returns like ioctl: undef for error; "0
but
true" for zero; and the actual return value otherwise.
See also SysV IPC in perlipc and IPC::SysV
documentation.
Portability issues: shmctl in perlport.
Calls the System V IPC function shmget. Returns the shared memory
segment id, or undef on error.
See also SysV IPC in perlipc and IPC::SysV
documentation.
Portability issues: shmget in perlport.
Reads or writes the System V shared memory segment ID starting at
position POS for size SIZE by attaching to it, copying in/out, and
detaching from it. When reading, VAR must be a variable that will
hold the data read. When writing, if STRING is too long, only SIZE
bytes are used; if STRING is too short, nulls are written to fill out
SIZE bytes. Return true if successful, false on error.
shmread() taints the variable. See also SysV IPC in perlipc,
IPC::SysV
, and the IPC::Shareable
module from CPAN.
Portability issues: shmread in perlport and shmwrite in perlport.
Shuts down a socket connection in the manner indicated by HOW, which has the same interpretation as in the syscall of the same name.
This is useful with sockets when you want to tell the other side you're done writing but not done reading, or vice versa. It's also a more insistent form of close because it also disables the file descriptor in any forked copies in other processes.
Returns 1
for success; on error, returns undef if
the first argument is not a valid filehandle, or returns 0
and sets
$!
for any other failure.
Returns the sine of EXPR (expressed in radians). If EXPR is omitted,
returns sine of $_
.
For the inverse sine operation, you may use the Math::Trig::asin
function, or use this relation:
Causes the script to sleep for (integer) EXPR seconds, or forever if no argument is given. Returns the integer number of seconds actually slept.
May be interrupted if the process receives a signal such as SIGALRM
.
You probably cannot mix alarm and sleep calls, because sleep
is often implemented using alarm.
On some older systems, it may sleep up to a full second less than what you requested, depending on how it counts seconds. Most modern systems always sleep the full amount. They may appear to sleep longer than that, however, because your process might not be scheduled right away in a busy multitasking system.
For delays of finer granularity than one second, the Time::HiRes module
(from CPAN, and starting from Perl 5.8 part of the standard
distribution) provides usleep(). You may also use Perl's four-argument
version of select() leaving the first three arguments undefined, or you
might be able to use the syscall interface to access setitimer(2) if
your system supports it. See perlfaq8 for details.
See also the POSIX module's pause
function.
Opens a socket of the specified kind and attaches it to filehandle
SOCKET. DOMAIN, TYPE, and PROTOCOL are specified the same as for
the syscall of the same name. You should use Socket
first
to get the proper definitions imported. See the examples in
Sockets: Client/Server Communication in perlipc.
On systems that support a close-on-exec flag on files, the flag will be set for the newly opened file descriptor, as determined by the value of $^F. See $^F in perlvar.
Creates an unnamed pair of sockets in the specified domain, of the specified type. DOMAIN, TYPE, and PROTOCOL are specified the same as for the syscall of the same name. If unimplemented, raises an exception. Returns true if successful.
On systems that support a close-on-exec flag on files, the flag will be set for the newly opened file descriptors, as determined by the value of $^F. See $^F in perlvar.
Some systems defined pipe in terms of socketpair, in which a call
to pipe(Rdr, Wtr)
is essentially:
- use Socket;
- socketpair(Rdr, Wtr, AF_UNIX, SOCK_STREAM, PF_UNSPEC);
- shutdown(Rdr, 1); # no more writing for reader
- shutdown(Wtr, 0); # no more reading for writer
See perlipc for an example of socketpair use. Perl 5.8 and later will emulate socketpair using IP sockets to localhost if your system implements sockets but not socketpair.
Portability issues: socketpair in perlport.
In list context, this sorts the LIST and returns the sorted list value.
In scalar context, the behaviour of sort() is undefined.
If SUBNAME or BLOCK is omitted, sorts in standard string comparison
order. If SUBNAME is specified, it gives the name of a subroutine
that returns an integer less than, equal to, or greater than 0
,
depending on how the elements of the list are to be ordered. (The
<=>
and cmp
operators are extremely useful in such routines.)
SUBNAME may be a scalar variable name (unsubscripted), in which case
the value provides the name of (or a reference to) the actual
subroutine to use. In place of a SUBNAME, you can provide a BLOCK as
an anonymous, in-line sort subroutine.
If the subroutine's prototype is ($$)
, the elements to be compared are
passed by reference in @_
, as for a normal subroutine. This is slower
than unprototyped subroutines, where the elements to be compared are passed
into the subroutine as the package global variables $a and $b (see example
below). Note that in the latter case, it is usually highly counter-productive
to declare $a and $b as lexicals.
If the subroutine is an XSUB, the elements to be compared are pushed on to the stack, the way arguments are usually passed to XSUBs. $a and $b are not set.
The values to be compared are always passed by reference and should not be modified.
You also cannot exit out of the sort block or subroutine using any of the
loop control operators described in perlsyn or with goto.
When use locale
(but not use locale 'not_characters'
) is in
effect, sort LIST
sorts LIST according to the
current collation locale. See perllocale.
sort() returns aliases into the original list, much as a for loop's index
variable aliases the list elements. That is, modifying an element of a
list returned by sort() (for example, in a foreach
, map or grep)
actually modifies the element in the original list. This is usually
something to be avoided when writing clear code.
Perl 5.6 and earlier used a quicksort algorithm to implement sort. That algorithm was not stable, so could go quadratic. (A stable sort preserves the input order of elements that compare equal. Although quicksort's run time is O(NlogN) when averaged over all arrays of length N, the time can be O(N**2), quadratic behavior, for some inputs.) In 5.7, the quicksort implementation was replaced with a stable mergesort algorithm whose worst-case behavior is O(NlogN). But benchmarks indicated that for some inputs, on some platforms, the original quicksort was faster. 5.8 has a sort pragma for limited control of the sort. Its rather blunt control of the underlying algorithm may not persist into future Perls, but the ability to characterize the input or output in implementation independent ways quite probably will. See the sort pragma.
Examples:
- # sort lexically
- @articles = sort @files;
- # same thing, but with explicit sort routine
- @articles = sort {$a cmp $b} @files;
- # now case-insensitively
- @articles = sort {fc($a) cmp fc($b)} @files;
- # same thing in reversed order
- @articles = sort {$b cmp $a} @files;
- # sort numerically ascending
- @articles = sort {$a <=> $b} @files;
- # sort numerically descending
- @articles = sort {$b <=> $a} @files;
- # this sorts the %age hash by value instead of key
- # using an in-line function
- @eldest = sort { $age{$b} <=> $age{$a} } keys %age;
- # sort using explicit subroutine name
- sub byage {
- $age{$a} <=> $age{$b}; # presuming numeric
- }
- @sortedclass = sort byage @class;
- sub backwards { $b cmp $a }
- @harry = qw(dog cat x Cain Abel);
- @george = qw(gone chased yz Punished Axed);
- print sort @harry;
- # prints AbelCaincatdogx
- print sort backwards @harry;
- # prints xdogcatCainAbel
- print sort @george, 'to', @harry;
- # prints AbelAxedCainPunishedcatchaseddoggonetoxyz
- # inefficiently sort by descending numeric compare using
- # the first integer after the first = sign, or the
- # whole record case-insensitively otherwise
- my @new = sort {
- ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0]
- ||
- fc($a) cmp fc($b)
- } @old;
- # same thing, but much more efficiently;
- # we'll build auxiliary indices instead
- # for speed
- my @nums = @caps = ();
- for (@old) {
- push @nums, ( /=(\d+)/ ? $1 : undef );
- push @caps, fc($_);
- }
- my @new = @old[ sort {
- $nums[$b] <=> $nums[$a]
- ||
- $caps[$a] cmp $caps[$b]
- } 0..$#old
- ];
- # same thing, but without any temps
- @new = map { $_->[0] }
- sort { $b->[1] <=> $a->[1]
- ||
- $a->[2] cmp $b->[2]
- } map { [$_, /=(\d+)/, fc($_)] } @old;
- # using a prototype allows you to use any comparison subroutine
- # as a sort subroutine (including other package's subroutines)
- package other;
- sub backwards ($$) { $_[1] cmp $_[0]; } # $a and $b are
- # not set here
- package main;
- @new = sort other::backwards @old;
- # guarantee stability, regardless of algorithm
- use sort 'stable';
- @new = sort { substr($a, 3, 5) cmp substr($b, 3, 5) } @old;
- # force use of mergesort (not portable outside Perl 5.8)
- use sort '_mergesort'; # note discouraging _
- @new = sort { substr($a, 3, 5) cmp substr($b, 3, 5) } @old;
Warning: syntactical care is required when sorting the list returned from
a function. If you want to sort the list returned by the function call
find_records(@key)
, you can use:
If instead you want to sort the array @key with the comparison routine
find_records()
then you can use:
If you're using strict, you must not declare $a
and $b as lexicals. They are package globals. That means
that if you're in the main
package and type
- @articles = sort {$b <=> $a} @files;
then $a
and $b
are $main::a
and $main::b
(or $::a
and $::b
),
but if you're in the FooPack
package, it's the same as typing
- @articles = sort {$FooPack::b <=> $FooPack::a} @files;
The comparison function is required to behave. If it returns
inconsistent results (sometimes saying $x[1]
is less than $x[2]
and
sometimes saying the opposite, for example) the results are not
well-defined.
Because <=>
returns undef when either operand is NaN
(not-a-number), be careful when sorting with a
comparison function like $a <=> $b
any lists that might contain a
NaN
. The following example takes advantage that NaN != NaN
to
eliminate any NaN
s from the input list.
Removes the elements designated by OFFSET and LENGTH from an array, and
replaces them with the elements of LIST, if any. In list context,
returns the elements removed from the array. In scalar context,
returns the last element removed, or undef if no elements are
removed. The array grows or shrinks as necessary.
If OFFSET is negative then it starts that far from the end of the array.
If LENGTH is omitted, removes everything from OFFSET onward.
If LENGTH is negative, removes the elements from OFFSET onward
except for -LENGTH elements at the end of the array.
If both OFFSET and LENGTH are omitted, removes everything. If OFFSET is
past the end of the array, Perl issues a warning, and splices at the
end of the array.
The following equivalences hold (assuming $#a >= $i
)
- push(@a,$x,$y) splice(@a,@a,0,$x,$y)
- pop(@a) splice(@a,-1)
- shift(@a) splice(@a,0,1)
- unshift(@a,$x,$y) splice(@a,0,0,$x,$y)
- $a[$i] = $y splice(@a,$i,1,$y)
Example, assuming array lengths are passed before arrays:
Starting with Perl 5.14, splice can take scalar EXPR, which must hold a
reference to an unblessed array. The argument will be dereferenced
automatically. This aspect of splice is considered highly experimental.
The exact behaviour may change in a future version of Perl.
To avoid confusing would-be users of your code who are running earlier versions of Perl with mysterious syntax errors, put this sort of thing at the top of your file to signal that your code will work only on Perls of a recent vintage:
- use 5.014; # so push/pop/etc work on scalars (experimental)
Splits the string EXPR into a list of strings and returns the list in list context, or the size of the list in scalar context.
If only PATTERN is given, EXPR defaults to $_
.
Anything in EXPR that matches PATTERN is taken to be a separator that separates the EXPR into substrings (called "fields") that do not include the separator. Note that a separator may be longer than one character or even have no characters at all (the empty string, which is a zero-width match).
The PATTERN need not be constant; an expression may be used to specify a pattern that varies at runtime.
If PATTERN matches the empty string, the EXPR is split at the match position (between characters). As an example, the following:
uses the 'b' in 'abc' as a separator to produce the output 'a:c'. However, this:
uses empty string matches as separators to produce the output 'a:b:c'; thus, the empty string may be used to split EXPR into a list of its component characters.
As a special case for split, the empty pattern given in
match operator syntax (//
) specifically matches the empty string, which is contrary to its usual
interpretation as the last successful match.
If PATTERN is /^/
, then it is treated as if it used the
multiline modifier (/^/m
), since it
isn't much use otherwise.
As another special case, split emulates the default behavior of the
command line tool awk when the PATTERN is either omitted or a literal
string composed of a single space character (such as ' '
or
"\x20"
, but not e.g. / /
). In this case, any leading
whitespace in EXPR is removed before splitting occurs, and the PATTERN is
instead treated as if it were /\s+/
; in particular, this means that
any contiguous whitespace (not just a single space character) is used as
a separator. However, this special treatment can be avoided by specifying
the pattern / /
instead of the string " "
, thereby allowing
only a single space character to be a separator. In earlier Perl's this
special case was restricted to the use of a plain " "
as the
pattern argument to split, in Perl 5.18.0 and later this special case is
triggered by any expression which evaluates as the simple string " "
.
If omitted, PATTERN defaults to a single space, " "
, triggering
the previously described awk emulation.
If LIMIT is specified and positive, it represents the maximum number
of fields into which the EXPR may be split; in other words, LIMIT is
one greater than the maximum number of times EXPR may be split. Thus,
the LIMIT value 1
means that EXPR may be split a maximum of zero
times, producing a maximum of one field (namely, the entire value of
EXPR). For instance:
produces the output 'abc', and this:
produces the output 'a:bc', and each of these:
produces the output 'a:b:c'.
If LIMIT is negative, it is treated as if it were instead arbitrarily large; as many fields as possible are produced.
If LIMIT is omitted (or, equivalently, zero), then it is usually treated as if it were instead negative but with the exception that trailing empty fields are stripped (empty leading fields are always preserved); if all fields are empty, then all fields are considered to be trailing (and are thus stripped in this case). Thus, the following:
produces the output 'a:b:c', but the following:
produces the output 'a:b:c:::'.
In time-critical applications, it is worthwhile to avoid splitting into more fields than necessary. Thus, when assigning to a list, if LIMIT is omitted (or zero), then LIMIT is treated as though it were one larger than the number of variables in the list; for the following, LIMIT is implicitly 3:
- ($login, $passwd) = split(/:/);
Note that splitting an EXPR that evaluates to the empty string always produces zero fields, regardless of the LIMIT specified.
An empty leading field is produced when there is a positive-width match at the beginning of EXPR. For instance:
produces the output ':abc'. However, a zero-width match at the beginning of EXPR never produces an empty field, so that:
produces the output ' :a:b:c' (rather than ': :a:b:c').
An empty trailing field, on the other hand, is produced when there is a match at the end of EXPR, regardless of the length of the match (of course, unless a non-zero LIMIT is given explicitly, such fields are removed, as in the last example). Thus:
produces the output ' :a:b:c:'.
If the PATTERN contains
capturing groups,
then for each separator, an additional field is produced for each substring
captured by a group (in the order in which the groups are specified,
as per backreferences); if any group does not
match, then it captures the undef value instead of a substring. Also,
note that any such additional field is produced whenever there is a
separator (that is, whenever a split occurs), and such an additional field
does not count towards the LIMIT. Consider the following expressions
evaluated in list context (each returned list is provided in the associated
comment):
- split(/-|,/, "1-10,20", 3)
- # ('1', '10', '20')
- split(/(-|,)/, "1-10,20", 3)
- # ('1', '-', '10', ',', '20')
- split(/-|(,)/, "1-10,20", 3)
- # ('1', undef, '10', ',', '20')
- split(/(-)|,/, "1-10,20", 3)
- # ('1', '-', '10', undef, '20')
- split(/(-)|(,)/, "1-10,20", 3)
- # ('1', '-', undef, '10', undef, ',', '20')
Returns a string formatted by the usual printf conventions of the C
library function sprintf. See below for more details
and see sprintf(3) or printf(3) on your system for an explanation of
the general principles.
For example:
Perl does its own sprintf formatting: it emulates the C
function sprintf(3), but doesn't use it except for floating-point
numbers, and even then only standard modifiers are allowed.
Non-standard extensions in your local sprintf(3) are
therefore unavailable from Perl.
Unlike printf, sprintf does not do what you probably mean when you
pass it an array as your first argument.
The array is given scalar context,
and instead of using the 0th element of the array as the format, Perl will
use the count of elements in the array as the format, which is almost never
useful.
Perl's sprintf permits the following universally-known conversions:
- %% a percent sign
- %c a character with the given number
- %s a string
- %d a signed integer, in decimal
- %u an unsigned integer, in decimal
- %o an unsigned integer, in octal
- %x an unsigned integer, in hexadecimal
- %e a floating-point number, in scientific notation
- %f a floating-point number, in fixed decimal notation
- %g a floating-point number, in %e or %f notation
In addition, Perl permits the following widely-supported conversions:
- %X like %x, but using upper-case letters
- %E like %e, but using an upper-case "E"
- %G like %g, but with an upper-case "E" (if applicable)
- %b an unsigned integer, in binary
- %B like %b, but using an upper-case "B" with the # flag
- %p a pointer (outputs the Perl value's address in hexadecimal)
- %n special: *stores* the number of characters output so far
- into the next argument in the parameter list
Finally, for backward (and we do mean "backward") compatibility, Perl permits these unnecessary but widely-supported conversions:
- %i a synonym for %d
- %D a synonym for %ld
- %U a synonym for %lu
- %O a synonym for %lo
- %F a synonym for %f
Note that the number of exponent digits in the scientific notation produced
by %e
, %E
, %g
and %G
for numbers with the modulus of the
exponent less than 100 is system-dependent: it may be three or less
(zero-padded as necessary). In other words, 1.23 times ten to the
99th may be either "1.23e99" or "1.23e099".
Between the %
and the format letter, you may specify several
additional attributes controlling the interpretation of the format.
In order, these are:
An explicit format parameter index, such as 2$. By default sprintf
will format the next unused argument in the list, but this allows you
to take the arguments out of order:
one or more of:
- space prefix non-negative number with a space
- + prefix non-negative number with a plus sign
- - left-justify within the field
- 0 use zeros, not spaces, to right-justify
- # ensure the leading "0" for any octal,
- prefix non-zero hexadecimal with "0x" or "0X",
- prefix non-zero binary with "0b" or "0B"
For example:
- printf '<% d>', 12; # prints "< 12>"
- printf '<%+d>', 12; # prints "<+12>"
- printf '<%6s>', 12; # prints "< 12>"
- printf '<%-6s>', 12; # prints "<12 >"
- printf '<%06s>', 12; # prints "<000012>"
- printf '<%#o>', 12; # prints "<014>"
- printf '<%#x>', 12; # prints "<0xc>"
- printf '<%#X>', 12; # prints "<0XC>"
- printf '<%#b>', 12; # prints "<0b1100>"
- printf '<%#B>', 12; # prints "<0B1100>"
When a space and a plus sign are given as the flags at once, a plus sign is used to prefix a positive number.
When the # flag and a precision are given in the %o conversion, the precision is incremented if it's necessary for the leading "0".
This flag tells Perl to interpret the supplied string as a vector of
integers, one for each character in the string. Perl applies the format to
each integer in turn, then joins the resulting strings with a separator (a
dot . by default). This can be useful for displaying ordinal values of
characters in arbitrary strings:
Put an asterisk *
before the v
to override the string to
use to separate the numbers:
You can also explicitly specify the argument number to use for
the join string using something like *2$v; for example:
- printf '%*4$vX %*4$vX %*4$vX', # 3 IPv6 addresses
- @addr[1..3], ":";
Arguments are usually formatted to be only as wide as required to
display the given value. You can override the width by putting
a number here, or get the width from the next argument (with *
)
or from a specified argument (e.g., with *2$):
If a field width obtained through *
is negative, it has the same
effect as the -
flag: left-justification.
You can specify a precision (for numeric conversions) or a maximum
width (for string conversions) by specifying a . followed by a number.
For floating-point formats except g
and G
, this specifies
how many places right of the decimal point to show (the default being 6).
For example:
For "g" and "G", this specifies the maximum number of digits to show, including those prior to the decimal point and those after it; for example:
- # These examples are subject to system-specific variation.
- printf '<%g>', 1; # prints "<1>"
- printf '<%.10g>', 1; # prints "<1>"
- printf '<%g>', 100; # prints "<100>"
- printf '<%.1g>', 100; # prints "<1e+02>"
- printf '<%.2g>', 100.01; # prints "<1e+02>"
- printf '<%.5g>', 100.01; # prints "<100.01>"
- printf '<%.4g>', 100.01; # prints "<100>"
For integer conversions, specifying a precision implies that the output of the number itself should be zero-padded to this width, where the 0 flag is ignored:
- printf '<%.6d>', 1; # prints "<000001>"
- printf '<%+.6d>', 1; # prints "<+000001>"
- printf '<%-10.6d>', 1; # prints "<000001 >"
- printf '<%10.6d>', 1; # prints "< 000001>"
- printf '<%010.6d>', 1; # prints "< 000001>"
- printf '<%+10.6d>', 1; # prints "< +000001>"
- printf '<%.6x>', 1; # prints "<000001>"
- printf '<%#.6x>', 1; # prints "<0x000001>"
- printf '<%-10.6x>', 1; # prints "<000001 >"
- printf '<%10.6x>', 1; # prints "< 000001>"
- printf '<%010.6x>', 1; # prints "< 000001>"
- printf '<%#10.6x>', 1; # prints "< 0x000001>"
For string conversions, specifying a precision truncates the string to fit the specified width:
You can also get the precision from the next argument using .*:
If a precision obtained through *
is negative, it counts
as having no precision at all.
- printf '<%.*s>', 7, "string"; # prints "<string>"
- printf '<%.*s>', 3, "string"; # prints "<str>"
- printf '<%.*s>', 0, "string"; # prints "<>"
- printf '<%.*s>', -1, "string"; # prints "<string>"
- printf '<%.*d>', 1, 0; # prints "<0>"
- printf '<%.*d>', 0, 0; # prints "<>"
- printf '<%.*d>', -1, 0; # prints "<0>"
You cannot currently get the precision from a specified number,
but it is intended that this will be possible in the future, for
example using .*2$:
- printf '<%.*2$x>', 1, 6; # INVALID, but in future will print
- # "<000001>"
For numeric conversions, you can specify the size to interpret the
number as using l
, h
, V
, q, L
, or ll
. For integer
conversions (d u o x X b i D U O
), numbers are usually assumed to be
whatever the default integer size is on your platform (usually 32 or 64
bits), but you can override this to use instead one of the standard C types,
as supported by the compiler used to build Perl:
- hh interpret integer as C type "char" or "unsigned
- char" on Perl 5.14 or later
- h interpret integer as C type "short" or
- "unsigned short"
- j interpret integer as C type "intmax_t" on Perl
- 5.14 or later, and only with a C99 compiler
- (unportable)
- l interpret integer as C type "long" or
- "unsigned long"
- q, L, or ll interpret integer as C type "long long",
- "unsigned long long", or "quad" (typically
- 64-bit integers)
- t interpret integer as C type "ptrdiff_t" on Perl
- 5.14 or later
- z interpret integer as C type "size_t" on Perl 5.14
- or later
As of 5.14, none of these raises an exception if they are not supported on
your platform. However, if warnings are enabled, a warning of the
printf warning class is issued on an unsupported conversion flag.
Should you instead prefer an exception, do this:
- use warnings FATAL => "printf";
If you would like to know about a version dependency before you start running the program, put something like this at its top:
- use 5.014; # for hh/j/t/z/ printf modifiers
You can find out whether your Perl supports quads via Config:
For floating-point conversions (e f g E F G
), numbers are usually assumed
to be the default floating-point size on your platform (double or long double),
but you can force "long double" with q, L
, or ll
if your
platform supports them. You can find out whether your Perl supports long
doubles via Config:
You can find out whether Perl considers "long double" to be the default floating-point size to use on your platform via Config:
It can also be that long doubles and doubles are the same thing:
The size specifier V
has no effect for Perl code, but is supported for
compatibility with XS code. It means "use the standard size for a Perl
integer or floating-point number", which is the default.
Normally, sprintf() takes the next unused argument as the value to
format for each format specification. If the format specification
uses *
to require additional arguments, these are consumed from
the argument list in the order they appear in the format
specification before the value to format. Where an argument is
specified by an explicit index, this does not affect the normal
order for the arguments, even when the explicitly specified index
would have been the next argument.
So:
- printf "<%*.*s>", $a, $b, $c;
uses $a
for the width, $b
for the precision, and $c
as the value to format; while:
- printf '<%*1$.*s>', $a, $b;
would use $a
for the width and precision, and $b
as the
value to format.
Here are some more examples; be aware that when using an explicit
index, the $
may need escaping:
If use locale
(including use locale 'not_characters'
) is in effect
and POSIX::setlocale() has been called,
the character used for the decimal separator in formatted floating-point
numbers is affected by the LC_NUMERIC locale. See perllocale
and POSIX.
Return the positive square root of EXPR. If EXPR is omitted, uses
$_
. Works only for non-negative operands unless you've
loaded the Math::Complex
module.
Sets and returns the random number seed for the rand operator.
The point of the function is to "seed" the rand function so that rand
can produce a different sequence each time you run your program. When
called with a parameter, srand uses that for the seed; otherwise it
(semi-)randomly chooses a seed. In either case, starting with Perl 5.14,
it returns the seed. To signal that your code will work only on Perls
of a recent vintage:
- use 5.014; # so srand returns the seed
If srand() is not called explicitly, it is called implicitly without a
parameter at the first use of the rand operator.
However, there are a few situations where programs are likely to
want to call srand. One is for generating predictable results, generally for
testing or debugging. There, you use srand($seed), with the same $seed
each time. Another case is that you may want to call srand()
after a fork() to avoid child processes sharing the same seed value as the
parent (and consequently each other).
Do not call srand() (i.e., without an argument) more than once per
process. The internal state of the random number generator should
contain more entropy than can be provided by any seed, so calling
srand() again actually loses randomness.
Most implementations of srand take an integer and will silently
truncate decimal numbers. This means srand(42) will usually
produce the same results as srand(42.1). To be safe, always pass
srand an integer.
A typical use of the returned seed is for a test program which has too many combinations to test comprehensively in the time available to it each run. It can test a random subset each time, and should there be a failure, log the seed used for that run so that it can later be used to reproduce the same results.
rand() is not cryptographically secure. You should not rely
on it in security-sensitive situations. As of this writing, a
number of third-party CPAN modules offer random number generators
intended by their authors to be cryptographically secure,
including: Data::Entropy, Crypt::Random, Math::Random::Secure,
and Math::TrulyRandom.
Returns a 13-element list giving the status info for a file, either
the file opened via FILEHANDLE or DIRHANDLE, or named by EXPR. If EXPR is
omitted, it stats $_
(not _
!). Returns the empty list if stat fails. Typically
used as follows:
- ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
- $atime,$mtime,$ctime,$blksize,$blocks)
- = stat($filename);
Not all fields are supported on all filesystem types. Here are the meanings of the fields:
- 0 dev device number of filesystem
- 1 ino inode number
- 2 mode file mode (type and permissions)
- 3 nlink number of (hard) links to the file
- 4 uid numeric user ID of file's owner
- 5 gid numeric group ID of file's owner
- 6 rdev the device identifier (special files only)
- 7 size total size of file, in bytes
- 8 atime last access time in seconds since the epoch
- 9 mtime last modify time in seconds since the epoch
- 10 ctime inode change time in seconds since the epoch (*)
- 11 blksize preferred I/O size in bytes for interacting with the
- file (may vary from file to file)
- 12 blocks actual number of system-specific blocks allocated
- on disk (often, but not always, 512 bytes each)
(The epoch was at 00:00 January 1, 1970 GMT.)
(*) Not all fields are supported on all filesystem types. Notably, the ctime field is non-portable. In particular, you cannot expect it to be a "creation time"; see Files and Filesystems in perlport for details.
If stat is passed the special filehandle consisting of an underline, no
stat is done, but the current contents of the stat structure from the
last stat, lstat, or filetest are returned. Example:
(This works on machines only for which the device number is negative under NFS.)
Because the mode contains both the file type and its permissions, you
should mask off the file type portion and (s)printf using a "%o"
if you want to see the real permissions.
In scalar context, stat returns a boolean value indicating success
or failure, and, if successful, sets the information associated with
the special filehandle _
.
The File::stat module provides a convenient, by-name access mechanism:
You can import symbolic mode constants (S_IF*
) and functions
(S_IS*
) from the Fcntl module:
You could write the last two using the -u
and -d
operators.
Commonly available S_IF*
constants are:
- # Permissions: read, write, execute, for user, group, others.
- S_IRWXU S_IRUSR S_IWUSR S_IXUSR
- S_IRWXG S_IRGRP S_IWGRP S_IXGRP
- S_IRWXO S_IROTH S_IWOTH S_IXOTH
- # Setuid/Setgid/Stickiness/SaveText.
- # Note that the exact meaning of these is system-dependent.
- S_ISUID S_ISGID S_ISVTX S_ISTXT
- # File types. Not all are necessarily available on
- # your system.
- S_IFREG S_IFDIR S_IFLNK S_IFBLK S_IFCHR
- S_IFIFO S_IFSOCK S_IFWHT S_ENFMT
- # The following are compatibility aliases for S_IRUSR,
- # S_IWUSR, and S_IXUSR.
- S_IREAD S_IWRITE S_IEXEC
and the S_IF*
functions are
- S_IMODE($mode) the part of $mode containing the permission
- bits and the setuid/setgid/sticky bits
- S_IFMT($mode) the part of $mode containing the file type
- which can be bit-anded with (for example)
- S_IFREG or with the following functions
- # The operators -f, -d, -l, -b, -c, -p, and -S.
- S_ISREG($mode) S_ISDIR($mode) S_ISLNK($mode)
- S_ISBLK($mode) S_ISCHR($mode) S_ISFIFO($mode) S_ISSOCK($mode)
- # No direct -X operator counterpart, but for the first one
- # the -g operator is often equivalent. The ENFMT stands for
- # record flocking enforcement, a platform-dependent feature.
- S_ISENFMT($mode) S_ISWHT($mode)
See your native chmod(2) and stat(2) documentation for more details
about the S_*
constants. To get status info for a symbolic link
instead of the target file behind the link, use the lstat function.
Portability issues: stat in perlport.
state declares a lexically scoped variable, just like my.
However, those variables will never be reinitialized, contrary to
lexical variables that are reinitialized each time their enclosing block
is entered.
See Persistent Private Variables in perlsub for details.
state variables are enabled only when the use feature "state"
pragma
is in effect, unless the keyword is written as CORE::state
.
See also feature.
Takes extra time to study SCALAR ($_
if unspecified) in anticipation of
doing many pattern matches on the string before it is next modified.
This may or may not save time, depending on the nature and number of
patterns you are searching and the distribution of character
frequencies in the string to be searched; you probably want to compare
run times with and without it to see which is faster. Those loops
that scan for many short constant strings (including the constant
parts of more complex patterns) will benefit most.
(The way study works is this: a linked list of every
character in the string to be searched is made, so we know, for
example, where all the 'k'
characters are. From each search string,
the rarest character is selected, based on some static frequency tables
constructed from some C programs and English text. Only those places
that contain this "rarest" character are examined.)
For example, here is a loop that inserts index producing entries before any line containing a certain pattern:
In searching for /\bfoo\b/
, only locations in $_
that contain f
will be looked at, because f
is rarer than o
. In general, this is
a big win except in pathological cases. The only question is whether
it saves you more time than it took to build the linked list in the
first place.
Note that if you have to look for strings that you don't know till
runtime, you can build an entire loop as a string and eval that to
avoid recompiling all your patterns all the time. Together with
undefining $/
to input entire files as one record, this can be quite
fast, often faster than specialized programs like fgrep(1). The following
scans a list of files (@files
) for a list of words (@words
), and prints
out the names of those files that contain a match:
This is subroutine definition, not a real function per se. Without a BLOCK it's just a forward declaration. Without a NAME, it's an anonymous function declaration, so does return a value: the CODE ref of the closure just created.
See perlsub and perlref for details about subroutines and references; see attributes and Attribute::Handlers for more information about attributes.
A special token that returns a reference to the current subroutine, or
undef outside of a subroutine.
The behaviour of __SUB__ within a regex code block (such as /(?{...})/
)
is subject to change.
This token is only available under use v5.16
or the "current_sub"
feature. See feature.
Extracts a substring out of EXPR and returns it. First character is at offset zero. If OFFSET is negative, starts that far back from the end of the string. If LENGTH is omitted, returns everything through the end of the string. If LENGTH is negative, leaves that many characters off the end of the string.
You can use the substr() function as an lvalue, in which case EXPR
must itself be an lvalue. If you assign something shorter than LENGTH,
the string will shrink, and if you assign something longer than LENGTH,
the string will grow to accommodate it. To keep the string the same
length, you may need to pad or chop your value using sprintf.
If OFFSET and LENGTH specify a substring that is partly outside the string, only the part within the string is returned. If the substring is beyond either end of the string, substr() returns the undefined value and produces a warning. When used as an lvalue, specifying a substring that is entirely outside the string raises an exception. Here's an example showing the behavior for boundary cases:
An alternative to using substr() as an lvalue is to specify the replacement string as the 4th argument. This allows you to replace parts of the EXPR and return what was there before in one operation, just as you can with splice().
Note that the lvalue returned by the three-argument version of substr() acts as a 'magic bullet'; each time it is assigned to, it remembers which part of the original string is being modified; for example:
With negative offsets, it remembers its position from the end of the string when the target string is modified:
Prior to Perl version 5.10, the result of using an lvalue multiple times was unspecified. Prior to 5.16, the result with negative offsets was unspecified.
Creates a new filename symbolically linked to the old filename.
Returns 1
for success, 0
otherwise. On systems that don't support
symbolic links, raises an exception. To check for that,
use eval:
Portability issues: symlink in perlport.
Calls the system call specified as the first element of the list,
passing the remaining elements as arguments to the system call. If
unimplemented, raises an exception. The arguments are interpreted
as follows: if a given argument is numeric, the argument is passed as
an int. If not, the pointer to the string value is passed. You are
responsible to make sure a string is pre-extended long enough to
receive any result that might be written into a string. You can't use a
string literal (or other read-only string) as an argument to syscall
because Perl has to assume that any string pointer might be written
through. If your
integer arguments are not literals and have never been interpreted in a
numeric context, you may need to add 0
to them to force them to look
like numbers. This emulates the syswrite function (or vice versa):
Note that Perl supports passing of up to only 14 arguments to your syscall, which in practice should (usually) suffice.
Syscall returns whatever value returned by the system call it calls.
If the system call fails, syscall returns -1
and sets $!
(errno).
Note that some system calls can legitimately return -1
. The proper
way to handle such calls is to assign $!=0
before the call, then
check the value of $!
if syscall returns -1
.
There's a problem with syscall(&SYS_pipe): it returns the file
number of the read end of the pipe it creates, but there is no way
to retrieve the file number of the other end. You can avoid this
problem by using pipe instead.
Portability issues: syscall in perlport.
Opens the file whose filename is given by FILENAME, and associates it with FILEHANDLE. If FILEHANDLE is an expression, its value is used as the real filehandle wanted; an undefined scalar will be suitably autovivified. This function calls the underlying operating system's open(2) function with the parameters FILENAME, MODE, and PERMS.
The possible values and flag bits of the MODE parameter are
system-dependent; they are available via the standard module Fcntl
. See
the documentation of your operating system's open(2) syscall to see
which values and flag bits are available. You may combine several flags
using the |-operator.
Some of the most common values are O_RDONLY
for opening the file in
read-only mode, O_WRONLY
for opening the file in write-only mode,
and O_RDWR
for opening the file in read-write mode.
For historical reasons, some values work on almost every system supported by Perl: 0 means read-only, 1 means write-only, and 2 means read/write. We know that these values do not work under OS/390 and on the Macintosh; you probably don't want to use them in new code.
If the file named by FILENAME does not exist and the open call creates
it (typically because MODE includes the O_CREAT
flag), then the value of
PERMS specifies the permissions of the newly created file. If you omit
the PERMS argument to sysopen, Perl uses the octal value 0666
.
These permission values need to be in octal, and are modified by your
process's current umask.
In many systems the O_EXCL
flag is available for opening files in
exclusive mode. This is not locking: exclusiveness means here that
if the file already exists, sysopen() fails. O_EXCL
may not work
on network filesystems, and has no effect unless the O_CREAT
flag
is set as well. Setting O_CREAT|O_EXCL
prevents the file from
being opened if it is a symbolic link. It does not protect against
symbolic links in the file's path.
Sometimes you may want to truncate an already-existing file. This
can be done using the O_TRUNC
flag. The behavior of
O_TRUNC
with O_RDONLY
is undefined.
You should seldom if ever use 0644
as argument to sysopen, because
that takes away the user's option to have a more permissive umask.
Better to omit it. See the perlfunc(1) entry on umask for more
on this.
Note that sysopen depends on the fdopen() C library function.
On many Unix systems, fdopen() is known to fail when file descriptors
exceed a certain value, typically 255. If you need more file
descriptors than that, consider rebuilding Perl to use the sfio
library, or perhaps using the POSIX::open() function.
See perlopentut for a kinder, gentler explanation of opening files.
Portability issues: sysopen in perlport.
Attempts to read LENGTH bytes of data into variable SCALAR from the
specified FILEHANDLE, using the read(2). It bypasses
buffered IO, so mixing this with other kinds of reads, print,
write, seek, tell, or eof can cause confusion because the
perlio or stdio layers usually buffers data. Returns the number of
bytes actually read, 0
at end of file, or undef if there was an
error (in the latter case $!
is also set). SCALAR will be grown or
shrunk so that the last byte actually read is the last byte of the
scalar after the read.
An OFFSET may be specified to place the read data at some place in the
string other than the beginning. A negative OFFSET specifies
placement at that many characters counting backwards from the end of
the string. A positive OFFSET greater than the length of SCALAR
results in the string being padded to the required size with "\0"
bytes before the result of the read is appended.
There is no syseof() function, which is ok, since eof() doesn't work well on device files (like ttys) anyway. Use sysread() and check for a return value for 0 to decide whether you're done.
Note that if the filehandle has been marked as :utf8
Unicode
characters are read instead of bytes (the LENGTH, OFFSET, and the
return value of sysread() are in Unicode characters).
The :encoding(...)
layer implicitly introduces the :utf8
layer.
See binmode, open, and the open pragma, open.
Sets FILEHANDLE's system position in bytes using lseek(2). FILEHANDLE may
be an expression whose value gives the name of the filehandle. The values
for WHENCE are 0
to set the new position to POSITION; 1
to set the it
to the current position plus POSITION; and 2
to set it to EOF plus
POSITION, typically negative.
Note the in bytes: even if the filehandle has been set to operate
on characters (for example by using the :encoding(utf8)
I/O layer),
tell() will return byte offsets, not character offsets (because
implementing that would render sysseek() unacceptably slow).
sysseek() bypasses normal buffered IO, so mixing it with reads other
than sysread (for example <>
or read()) print, write,
seek, tell, or eof may cause confusion.
For WHENCE, you may also use the constants SEEK_SET
, SEEK_CUR
,
and SEEK_END
(start of the file, current position, end of the file)
from the Fcntl module. Use of the constants is also more portable
than relying on 0, 1, and 2. For example to define a "systell" function:
Returns the new position, or the undefined value on failure. A position
of zero is returned as the string "0 but true"
; thus sysseek returns
true on success and false on failure, yet you can still easily determine
the new position.
Does exactly the same thing as exec LIST
, except that a fork is
done first and the parent process waits for the child process to
exit. Note that argument processing varies depending on the
number of arguments. If there is more than one argument in LIST,
or if LIST is an array with more than one value, starts the program
given by the first element of the list with arguments given by the
rest of the list. If there is only one scalar argument, the argument
is checked for shell metacharacters, and if there are any, the
entire argument is passed to the system's command shell for parsing
(this is /bin/sh -c on Unix platforms, but varies on other
platforms). If there are no shell metacharacters in the argument,
it is split into words and passed directly to execvp
, which is
more efficient.
Perl will attempt to flush all files opened for
output before any operation that may do a fork, but this may not be
supported on some platforms (see perlport). To be safe, you may need
to set $|
($AUTOFLUSH in English) or call the autoflush()
method
of IO::Handle
on any open handles.
The return value is the exit status of the program as returned by the
wait call. To get the actual exit value, shift right by eight (see
below). See also exec. This is not what you want to use to capture
the output from a command; for that you should use merely backticks or
qx//, as described in `STRING` in perlop. Return value of -1
indicates a failure to start the program or an error of the wait(2) system
call (inspect $! for the reason).
If you'd like to make system (and many other bits of Perl) die on error,
have a look at the autodie pragma.
Like exec, system allows you to lie to a program about its name if
you use the system PROGRAM LIST
syntax. Again, see exec.
Since SIGINT
and SIGQUIT
are ignored during the execution of
system, if you expect your program to terminate on receipt of these
signals you will need to arrange to do so yourself based on the return
value.
If you'd like to manually inspect system's failure, you can check all
possible failure modes by inspecting $?
like this:
Alternatively, you may inspect the value of ${^CHILD_ERROR_NATIVE}
with the W*()
calls from the POSIX module.
When system's arguments are executed indirectly by the shell,
results and return codes are subject to its quirks.
See `STRING` in perlop and exec for details.
Since system does a fork and wait it may affect a SIGCHLD
handler. See perlipc for details.
Portability issues: system in perlport.
Attempts to write LENGTH bytes of data from variable SCALAR to the
specified FILEHANDLE, using write(2). If LENGTH is
not specified, writes whole SCALAR. It bypasses buffered IO, so
mixing this with reads (other than sysread()), print, write,
seek, tell, or eof may cause confusion because the perlio and
stdio layers usually buffer data. Returns the number of bytes
actually written, or undef if there was an error (in this case the
errno variable $!
is also set). If the LENGTH is greater than the
data available in the SCALAR after the OFFSET, only as much data as is
available will be written.
An OFFSET may be specified to write the data from some part of the string other than the beginning. A negative OFFSET specifies writing that many characters counting backwards from the end of the string. If SCALAR is of length zero, you can only use an OFFSET of 0.
WARNING: If the filehandle is marked :utf8
, Unicode characters
encoded in UTF-8 are written instead of bytes, and the LENGTH, OFFSET, and
return value of syswrite() are in (UTF8-encoded Unicode) characters.
The :encoding(...)
layer implicitly introduces the :utf8
layer.
Alternately, if the handle is not marked with an encoding but you
attempt to write characters with code points over 255, raises an exception.
See binmode, open, and the open pragma, open.
Returns the current position in bytes for FILEHANDLE, or -1 on error. FILEHANDLE may be an expression whose value gives the name of the actual filehandle. If FILEHANDLE is omitted, assumes the file last read.
Note the in bytes: even if the filehandle has been set to
operate on characters (for example by using the :encoding(utf8)
open
layer), tell() will return byte offsets, not character offsets (because
that would render seek() and tell() rather slow).
The return value of tell() for the standard streams like the STDIN depends on the operating system: it may return -1 or something else. tell() on pipes, fifos, and sockets usually returns -1.
There is no systell
function. Use sysseek(FH, 0, 1)
for that.
Do not use tell() (or other buffered I/O operations) on a filehandle that has been manipulated by sysread(), syswrite(), or sysseek(). Those functions ignore the buffering, while tell() does not.
Returns the current position of the readdir routines on DIRHANDLE.
Value may be given to seekdir to access a particular location in a
directory. telldir has the same caveats about possible directory
compaction as the corresponding system library routine.
This function binds a variable to a package class that will provide the
implementation for the variable. VARIABLE is the name of the variable
to be enchanted. CLASSNAME is the name of a class implementing objects
of correct type. Any additional arguments are passed to the
appropriate constructor
method of the class (meaning TIESCALAR
, TIEHANDLE
, TIEARRAY
,
or TIEHASH
). Typically these are arguments such as might be passed
to the dbm_open()
function of C. The object returned by the
constructor is also returned by the tie function, which would be useful
if you want to access other methods in CLASSNAME.
Note that functions such as keys and values may return huge lists
when used on large objects, like DBM files. You may prefer to use the
each function to iterate over such. Example:
A class implementing a hash should have the following methods:
- TIEHASH classname, LIST
- FETCH this, key
- STORE this, key, value
- DELETE this, key
- CLEAR this
- EXISTS this, key
- FIRSTKEY this
- NEXTKEY this, lastkey
- SCALAR this
- DESTROY this
- UNTIE this
A class implementing an ordinary array should have the following methods:
A class implementing a filehandle should have the following methods:
- TIEHANDLE classname, LIST
- READ this, scalar, length, offset
- READLINE this
- GETC this
- WRITE this, scalar, length, offset
- PRINT this, LIST
- PRINTF this, format, LIST
- BINMODE this
- EOF this
- FILENO this
- SEEK this, position, whence
- TELL this
- OPEN this, mode, LIST
- CLOSE this
- DESTROY this
- UNTIE this
A class implementing a scalar should have the following methods:
- TIESCALAR classname, LIST
- FETCH this,
- STORE this, value
- DESTROY this
- UNTIE this
Not all methods indicated above need be implemented. See perltie, Tie::Hash, Tie::Array, Tie::Scalar, and Tie::Handle.
Unlike dbmopen, the tie function will not use or require a module
for you; you need to do that explicitly yourself. See DB_File
or the Config module for interesting tie implementations.
For further details see perltie, tied VARIABLE.
Returns a reference to the object underlying VARIABLE (the same value
that was originally returned by the tie call that bound the variable
to a package.) Returns the undefined value if VARIABLE isn't tied to a
package.
Returns the number of non-leap seconds since whatever time the system
considers to be the epoch, suitable for feeding to gmtime and
localtime. On most systems the epoch is 00:00:00 UTC, January 1, 1970;
a prominent exception being Mac OS Classic which uses 00:00:00, January 1,
1904 in the current local time zone for its epoch.
For measuring time in better granularity than one second, use the
Time::HiRes module from Perl 5.8 onwards (or from CPAN before then), or,
if you have gettimeofday(2), you may be able to use the syscall
interface of Perl. See perlfaq8 for details.
For date and time processing look at the many related modules on CPAN. For a comprehensive date and time representation look at the DateTime module.
Returns a four-element list giving the user and system times in seconds for this process and any exited children of this process.
- ($user,$system,$cuser,$csystem) = times;
In scalar context, times returns $user
.
Children's times are only included for terminated children.
Portability issues: times in perlport.
The transliteration operator. Same as y///. See
Quote and Quote-like Operators in perlop.
Truncates the file opened on FILEHANDLE, or named by EXPR, to the
specified length. Raises an exception if truncate isn't implemented
on your system. Returns true if successful, undef on error.
The behavior is undefined if LENGTH is greater than the length of the file.
The position in the file of FILEHANDLE is left unchanged. You may want to call seek before writing to the file.
Portability issues: truncate in perlport.
Returns an uppercased version of EXPR. This is the internal function
implementing the \U
escape in double-quoted strings.
It does not attempt to do titlecase mapping on initial letters. See
ucfirst for that.
If EXPR is omitted, uses $_
.
This function behaves the same way under various pragma, such as in a locale, as lc does.
Returns the value of EXPR with the first character in uppercase
(titlecase in Unicode). This is the internal function implementing
the \u
escape in double-quoted strings.
If EXPR is omitted, uses $_
.
This function behaves the same way under various pragma, such as in a locale, as lc does.
Sets the umask for the process to EXPR and returns the previous value. If EXPR is omitted, merely returns the current umask.
The Unix permission rwxr-x---
is represented as three sets of three
bits, or three octal digits: 0750
(the leading 0 indicates octal
and isn't one of the digits). The umask value is such a number
representing disabled permissions bits. The permission (or "mode")
values you pass mkdir or sysopen are modified by your umask, so
even if you tell sysopen to create a file with permissions 0777
,
if your umask is 0022
, then the file will actually be created with
permissions 0755
. If your umask were 0027
(group can't
write; others can't read, write, or execute), then passing
sysopen 0666
would create a file with mode 0640
(because
0666 &~ 027
is 0640
).
Here's some advice: supply a creation mode of 0666
for regular
files (in sysopen) and one of 0777
for directories (in
mkdir) and executable files. This gives users the freedom of
choice: if they want protected files, they might choose process umasks
of 022
, 027
, or even the particularly antisocial mask of 077
.
Programs should rarely if ever make policy decisions better left to
the user. The exception to this is when writing files that should be
kept private: mail files, web browser cookies, .rhosts files, and
so on.
If umask(2) is not implemented on your system and you are trying to
restrict access for yourself (i.e., (EXPR & 0700) > 0
),
raises an exception. If umask(2) is not implemented and you are
not trying to restrict access for yourself, returns undef.
Remember that a umask is a number, usually given in octal; it is not a string of octal digits. See also oct, if all you have is a string.
Portability issues: umask in perlport.
Undefines the value of EXPR, which must be an lvalue. Use only on a
scalar value, an array (using @
), a hash (using %
), a subroutine
(using &
), or a typeglob (using *
). Saying undef $hash{$key}
will probably not do what you expect on most predefined variables or
DBM list values, so don't do that; see delete. Always returns the
undefined value. You can omit the EXPR, in which case nothing is
undefined, but you still get an undefined value that you could, for
instance, return from a subroutine, assign to a variable, or pass as a
parameter. Examples:
- undef $foo;
- undef $bar{'blurfl'}; # Compare to: delete $bar{'blurfl'};
- undef @ary;
- undef %hash;
- undef &mysub;
- undef *xyz; # destroys $xyz, @xyz, %xyz, &xyz, etc.
- return (wantarray ? (undef, $errmsg) : undef) if $they_blew_it;
- select undef, undef, undef, 0.25;
- ($a, $b, undef, $c) = &foo; # Ignore third value returned
Note that this is a unary operator, not a list operator.
Deletes a list of files. On success, it returns the number of files
it successfully deleted. On failure, it returns false and sets $!
(errno):
On error, unlink will not tell you which files it could not remove.
If you want to know which files you could not remove, try them one
at a time:
Note: unlink will not attempt to delete directories unless you are
superuser and the -U flag is supplied to Perl. Even if these
conditions are met, be warned that unlinking a directory can inflict
damage on your filesystem. Finally, using unlink on directories is
not supported on many operating systems. Use rmdir instead.
If LIST is omitted, unlink uses $_
.
unpack does the reverse of pack: it takes a string
and expands it out into a list of values.
(In scalar context, it returns merely the first value produced.)
If EXPR is omitted, unpacks the $_
string.
See perlpacktut for an introduction to this function.
The string is broken into chunks described by the TEMPLATE. Each chunk
is converted separately to a value. Typically, either the string is a result
of pack, or the characters of the string represent a C structure of some
kind.
The TEMPLATE has the same format as in the pack function.
Here's a subroutine that does substring:
and then there's
- sub ordinal { unpack("W",$_[0]); } # same as ord()
In addition to fields allowed in pack(), you may prefix a field with
a %<number> to indicate that
you want a <number>-bit checksum of the items instead of the items
themselves. Default is a 16-bit checksum. Checksum is calculated by
summing numeric values of expanded values (for string fields the sum of
ord($char) is taken; for bit fields the sum of zeroes and ones).
For example, the following computes the same number as the System V sum program:
The following efficiently counts the number of set bits in a bit vector:
- $setbits = unpack("%32b*", $selectmask);
The p
and P
formats should be used with care. Since Perl
has no way of checking whether the value passed to unpack()
corresponds to a valid memory location, passing a pointer value that's
not known to be valid is likely to have disastrous consequences.
If there are more pack codes or if the repeat count of a field or a group
is larger than what the remainder of the input string allows, the result
is not well defined: the repeat count may be decreased, or
unpack() may produce empty strings or zeros, or it may raise an exception.
If the input string is longer than one described by the TEMPLATE,
the remainder of that input string is ignored.
See pack for more examples and notes.
Does the opposite of a shift. Or the opposite of a push,
depending on how you look at it. Prepends list to the front of the
array and returns the new number of elements in the array.
- unshift(@ARGV, '-e') unless $ARGV[0] =~ /^-/;
Note the LIST is prepended whole, not one element at a time, so the
prepended elements stay in the same order. Use reverse to do the
reverse.
Starting with Perl 5.14, unshift can take a scalar EXPR, which must hold
a reference to an unblessed array. The argument will be dereferenced
automatically. This aspect of unshift is considered highly
experimental. The exact behaviour may change in a future version of Perl.
To avoid confusing would-be users of your code who are running earlier versions of Perl with mysterious syntax errors, put this sort of thing at the top of your file to signal that your code will work only on Perls of a recent vintage:
- use 5.014; # so push/pop/etc work on scalars (experimental)
Breaks the binding between a variable and a package. (See tie.) Has no effect if the variable is not tied.
Imports some semantics into the current package from the named module, generally by aliasing certain subroutine or variable names into your package. It is exactly equivalent to
- BEGIN { require Module; Module->import( LIST ); }
except that Module must be a bareword. The importation can be made conditional by using the if module.
In the peculiar use VERSION
form, VERSION may be either a positive
decimal fraction such as 5.006, which will be compared to $]
, or a v-string
of the form v5.6.1, which will be compared to $^V
(aka $PERL_VERSION). An
exception is raised if VERSION is greater than the version of the
current Perl interpreter; Perl will not attempt to parse the rest of the
file. Compare with require, which can do a similar check at run time.
Symmetrically, no VERSION
allows you to specify that you want a version
of Perl older than the specified one.
Specifying VERSION as a literal of the form v5.6.1 should generally be avoided, because it leads to misleading error messages under earlier versions of Perl (that is, prior to 5.6.0) that do not support this syntax. The equivalent numeric version should be used instead.
This is often useful if you need to check the current Perl version before
useing library modules that won't work with older versions of Perl.
(We try not to do this more than we have to.)
use VERSION
also enables all features available in the requested
version as defined by the feature
pragma, disabling any features
not in the requested version's feature bundle. See feature.
Similarly, if the specified Perl version is greater than or equal to
5.12.0, strictures are enabled lexically as
with use strict
. Any explicit use of
use strict
or no strict
overrides use VERSION
, even if it comes
before it. In both cases, the feature.pm and strict.pm files are
not actually loaded.
The BEGIN
forces the require and import to happen at compile time. The
require makes sure the module is loaded into memory if it hasn't been
yet. The import is not a builtin; it's just an ordinary static method
call into the Module
package to tell the module to import the list of
features back into the current package. The module can implement its
import method any way it likes, though most modules just choose to
derive their import method via inheritance from the Exporter
class that
is defined in the Exporter
module. See Exporter. If no import
method can be found then the call is skipped, even if there is an AUTOLOAD
method.
If you do not want to call the package's import method (for instance,
to stop your namespace from being altered), explicitly supply the empty list:
- use Module ();
That is exactly equivalent to
- BEGIN { require Module }
If the VERSION argument is present between Module and LIST, then the
use will call the VERSION method in class Module with the given
version as an argument. The default VERSION method, inherited from
the UNIVERSAL class, croaks if the given version is larger than the
value of the variable $Module::VERSION
.
Again, there is a distinction between omitting LIST (import called
with no arguments) and an explicit empty LIST ()
(import not
called). Note that there is no comma after VERSION!
Because this is a wide-open interface, pragmas (compiler directives) are also implemented this way. Currently implemented pragmas are:
Some of these pseudo-modules import semantics into the current
block scope (like strict
or integer
, unlike ordinary modules,
which import symbols into the current package (which are effective
through the end of the file).
Because use takes effect at compile time, it doesn't respect the
ordinary flow control of the code being compiled. In particular, putting
a use inside the false branch of a conditional doesn't prevent it
from being processed. If a module or pragma only needs to be loaded
conditionally, this can be done using the if pragma:
There's a corresponding no declaration that unimports meanings imported
by use, i.e., it calls unimport Module LIST
instead of import.
It behaves just as import does with VERSION, an omitted or empty LIST,
or no unimport method being found.
Care should be taken when using the no VERSION
form of no. It is
only meant to be used to assert that the running Perl is of a earlier
version than its argument and not to undo the feature-enabling side effects
of use VERSION
.
See perlmodlib for a list of standard modules and pragmas. See perlrun
for the -M
and -m command-line options to Perl that give use
functionality from the command-line.
Changes the access and modification times on each file of a list of files. The first two elements of the list must be the NUMERIC access and modification times, in that order. Returns the number of files successfully changed. The inode change time of each file is set to the current time. For example, this code has the same effect as the Unix touch(1) command when the files already exist and belong to the user running the program:
Since Perl 5.8.0, if the first two elements of the list are undef,
the utime(2) syscall from your C library is called with a null second
argument. On most systems, this will set the file's access and
modification times to the current time (i.e., equivalent to the example
above) and will work even on files you don't own provided you have write
permission:
Under NFS this will use the time of the NFS server, not the time of the local machine. If there is a time synchronization problem, the NFS server and local machine will have different times. The Unix touch(1) command will in fact normally use this form instead of the one shown in the first example.
Passing only one of the first two elements as undef is
equivalent to passing a 0 and will not have the effect
described when both are undef. This also triggers an
uninitialized warning.
On systems that support futimes(2), you may pass filehandles among the files. On systems that don't support futimes(2), passing filehandles raises an exception. Filehandles must be passed as globs or glob references to be recognized; barewords are considered filenames.
Portability issues: utime in perlport.
In list context, returns a list consisting of all the values of the named hash. In Perl 5.12 or later only, will also return a list of the values of an array; prior to that release, attempting to use an array argument will produce a syntax error. In scalar context, returns the number of values.
Hash entries are returned in an apparently random order. The actual random
order is specific to a given hash; the exact same series of operations
on two hashes may result in a different order for each hash. Any insertion
into the hash may change the order, as will any deletion, with the exception
that the most recent key returned by each or keys may be deleted
without changing the order. So long as a given hash is unmodified you may
rely on keys, values and each to repeatedly return the same order
as each other. See Algorithmic Complexity Attacks in perlsec for
details on why hash order is randomized. Aside from the guarantees
provided here the exact details of Perl's hash algorithm and the hash
traversal order are subject to change in any release of Perl.
As a side effect, calling values() resets the HASH or ARRAY's internal
iterator, see each. (In particular, calling values() in void context
resets the iterator with no other overhead. Apart from resetting the
iterator, values @array
in list context is the same as plain @array
.
(We recommend that you use void context keys @array
for this, but
reasoned that taking values @array
out would require more
documentation than leaving it in.)
Note that the values are not copied, which means modifying them will modify the contents of the hash:
Starting with Perl 5.14, values can take a scalar EXPR, which must hold
a reference to an unblessed hash or array. The argument will be
dereferenced automatically. This aspect of values is considered highly
experimental. The exact behaviour may change in a future version of Perl.
To avoid confusing would-be users of your code who are running earlier versions of Perl with mysterious syntax errors, put this sort of thing at the top of your file to signal that your code will work only on Perls of a recent vintage:
Treats the string in EXPR as a bit vector made up of elements of width BITS and returns the value of the element specified by OFFSET as an unsigned integer. BITS therefore specifies the number of bits that are reserved for each element in the bit vector. This must be a power of two from 1 to 32 (or 64, if your platform supports that).
If BITS is 8, "elements" coincide with bytes of the input string.
If BITS is 16 or more, bytes of the input string are grouped into chunks
of size BITS/8, and each group is converted to a number as with
pack()/unpack() with big-endian formats n
/N
(and analogously
for BITS==64). See pack for details.
If bits is 4 or less, the string is broken into bytes, then the bits
of each byte are broken into 8/BITS groups. Bits of a byte are
numbered in a little-endian-ish way, as in 0x01
, 0x02
,
0x04
, 0x08
, 0x10
, 0x20
, 0x40
, 0x80
. For example,
breaking the single input byte chr(0x36) into two groups gives a list
(0x6, 0x3)
; breaking it into 4 groups gives (0x2, 0x1, 0x3, 0x0)
.
vec may also be assigned to, in which case parentheses are needed
to give the expression the correct precedence as in
- vec($image, $max_x * $x + $y, 8) = 3;
If the selected element is outside the string, the value 0 is returned. If an element off the end of the string is written to, Perl will first extend the string with sufficiently many zero bytes. It is an error to try to write off the beginning of the string (i.e., negative OFFSET).
If the string happens to be encoded as UTF-8 internally (and thus has
the UTF8 flag set), this is ignored by vec, and it operates on the
internal byte string, not the conceptual character string, even if you
only have characters with values less than 256.
Strings created with vec can also be manipulated with the logical
operators |, &
, ^, and ~
. These operators will assume a bit
vector operation is desired when both operands are strings.
See Bitwise String Operators in perlop.
The following code will build up an ASCII string saying 'PerlPerlPerl'
.
The comments show the string after each step. Note that this code works
in the same way on big-endian or little-endian machines.
- my $foo = '';
- vec($foo, 0, 32) = 0x5065726C; # 'Perl'
- # $foo eq "Perl" eq "\x50\x65\x72\x6C", 32 bits
- print vec($foo, 0, 8); # prints 80 == 0x50 == ord('P')
- vec($foo, 2, 16) = 0x5065; # 'PerlPe'
- vec($foo, 3, 16) = 0x726C; # 'PerlPerl'
- vec($foo, 8, 8) = 0x50; # 'PerlPerlP'
- vec($foo, 9, 8) = 0x65; # 'PerlPerlPe'
- vec($foo, 20, 4) = 2; # 'PerlPerlPe' . "\x02"
- vec($foo, 21, 4) = 7; # 'PerlPerlPer'
- # 'r' is "\x72"
- vec($foo, 45, 2) = 3; # 'PerlPerlPer' . "\x0c"
- vec($foo, 93, 1) = 1; # 'PerlPerlPer' . "\x2c"
- vec($foo, 94, 1) = 1; # 'PerlPerlPerl'
- # 'l' is "\x6c"
To transform a bit vector into a string or list of 0's and 1's, use these:
If you know the exact length in bits, it can be used in place of the *
.
Here is an example to illustrate how the bits actually fall in place:
- #!/usr/bin/perl -wl
- print <<'EOT';
- 0 1 2 3
- unpack("V",$_) 01234567890123456789012345678901
- ------------------------------------------------------------------
- EOT
- for $w (0..3) {
- $width = 2**$w;
- for ($shift=0; $shift < $width; ++$shift) {
- for ($off=0; $off < 32/$width; ++$off) {
- $str = pack("B*", "0"x32);
- $bits = (1<<$shift);
- vec($str, $off, $width) = $bits;
- $res = unpack("b*",$str);
- $val = unpack("V", $str);
- write;
- }
- }
- }
- format STDOUT =
- vec($_,@#,@#) = @<< == @######### @>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- $off, $width, $bits, $val, $res
- .
- __END__
Regardless of the machine architecture on which it runs, the example above should print the following table:
- 0 1 2 3
- unpack("V",$_) 01234567890123456789012345678901
- ------------------------------------------------------------------
- vec($_, 0, 1) = 1 == 1 10000000000000000000000000000000
- vec($_, 1, 1) = 1 == 2 01000000000000000000000000000000
- vec($_, 2, 1) = 1 == 4 00100000000000000000000000000000
- vec($_, 3, 1) = 1 == 8 00010000000000000000000000000000
- vec($_, 4, 1) = 1 == 16 00001000000000000000000000000000
- vec($_, 5, 1) = 1 == 32 00000100000000000000000000000000
- vec($_, 6, 1) = 1 == 64 00000010000000000000000000000000
- vec($_, 7, 1) = 1 == 128 00000001000000000000000000000000
- vec($_, 8, 1) = 1 == 256 00000000100000000000000000000000
- vec($_, 9, 1) = 1 == 512 00000000010000000000000000000000
- vec($_,10, 1) = 1 == 1024 00000000001000000000000000000000
- vec($_,11, 1) = 1 == 2048 00000000000100000000000000000000
- vec($_,12, 1) = 1 == 4096 00000000000010000000000000000000
- vec($_,13, 1) = 1 == 8192 00000000000001000000000000000000
- vec($_,14, 1) = 1 == 16384 00000000000000100000000000000000
- vec($_,15, 1) = 1 == 32768 00000000000000010000000000000000
- vec($_,16, 1) = 1 == 65536 00000000000000001000000000000000
- vec($_,17, 1) = 1 == 131072 00000000000000000100000000000000
- vec($_,18, 1) = 1 == 262144 00000000000000000010000000000000
- vec($_,19, 1) = 1 == 524288 00000000000000000001000000000000
- vec($_,20, 1) = 1 == 1048576 00000000000000000000100000000000
- vec($_,21, 1) = 1 == 2097152 00000000000000000000010000000000
- vec($_,22, 1) = 1 == 4194304 00000000000000000000001000000000
- vec($_,23, 1) = 1 == 8388608 00000000000000000000000100000000
- vec($_,24, 1) = 1 == 16777216 00000000000000000000000010000000
- vec($_,25, 1) = 1 == 33554432 00000000000000000000000001000000
- vec($_,26, 1) = 1 == 67108864 00000000000000000000000000100000
- vec($_,27, 1) = 1 == 134217728 00000000000000000000000000010000
- vec($_,28, 1) = 1 == 268435456 00000000000000000000000000001000
- vec($_,29, 1) = 1 == 536870912 00000000000000000000000000000100
- vec($_,30, 1) = 1 == 1073741824 00000000000000000000000000000010
- vec($_,31, 1) = 1 == 2147483648 00000000000000000000000000000001
- vec($_, 0, 2) = 1 == 1 10000000000000000000000000000000
- vec($_, 1, 2) = 1 == 4 00100000000000000000000000000000
- vec($_, 2, 2) = 1 == 16 00001000000000000000000000000000
- vec($_, 3, 2) = 1 == 64 00000010000000000000000000000000
- vec($_, 4, 2) = 1 == 256 00000000100000000000000000000000
- vec($_, 5, 2) = 1 == 1024 00000000001000000000000000000000
- vec($_, 6, 2) = 1 == 4096 00000000000010000000000000000000
- vec($_, 7, 2) = 1 == 16384 00000000000000100000000000000000
- vec($_, 8, 2) = 1 == 65536 00000000000000001000000000000000
- vec($_, 9, 2) = 1 == 262144 00000000000000000010000000000000
- vec($_,10, 2) = 1 == 1048576 00000000000000000000100000000000
- vec($_,11, 2) = 1 == 4194304 00000000000000000000001000000000
- vec($_,12, 2) = 1 == 16777216 00000000000000000000000010000000
- vec($_,13, 2) = 1 == 67108864 00000000000000000000000000100000
- vec($_,14, 2) = 1 == 268435456 00000000000000000000000000001000
- vec($_,15, 2) = 1 == 1073741824 00000000000000000000000000000010
- vec($_, 0, 2) = 2 == 2 01000000000000000000000000000000
- vec($_, 1, 2) = 2 == 8 00010000000000000000000000000000
- vec($_, 2, 2) = 2 == 32 00000100000000000000000000000000
- vec($_, 3, 2) = 2 == 128 00000001000000000000000000000000
- vec($_, 4, 2) = 2 == 512 00000000010000000000000000000000
- vec($_, 5, 2) = 2 == 2048 00000000000100000000000000000000
- vec($_, 6, 2) = 2 == 8192 00000000000001000000000000000000
- vec($_, 7, 2) = 2 == 32768 00000000000000010000000000000000
- vec($_, 8, 2) = 2 == 131072 00000000000000000100000000000000
- vec($_, 9, 2) = 2 == 524288 00000000000000000001000000000000
- vec($_,10, 2) = 2 == 2097152 00000000000000000000010000000000
- vec($_,11, 2) = 2 == 8388608 00000000000000000000000100000000
- vec($_,12, 2) = 2 == 33554432 00000000000000000000000001000000
- vec($_,13, 2) = 2 == 134217728 00000000000000000000000000010000
- vec($_,14, 2) = 2 == 536870912 00000000000000000000000000000100
- vec($_,15, 2) = 2 == 2147483648 00000000000000000000000000000001
- vec($_, 0, 4) = 1 == 1 10000000000000000000000000000000
- vec($_, 1, 4) = 1 == 16 00001000000000000000000000000000
- vec($_, 2, 4) = 1 == 256 00000000100000000000000000000000
- vec($_, 3, 4) = 1 == 4096 00000000000010000000000000000000
- vec($_, 4, 4) = 1 == 65536 00000000000000001000000000000000
- vec($_, 5, 4) = 1 == 1048576 00000000000000000000100000000000
- vec($_, 6, 4) = 1 == 16777216 00000000000000000000000010000000
- vec($_, 7, 4) = 1 == 268435456 00000000000000000000000000001000
- vec($_, 0, 4) = 2 == 2 01000000000000000000000000000000
- vec($_, 1, 4) = 2 == 32 00000100000000000000000000000000
- vec($_, 2, 4) = 2 == 512 00000000010000000000000000000000
- vec($_, 3, 4) = 2 == 8192 00000000000001000000000000000000
- vec($_, 4, 4) = 2 == 131072 00000000000000000100000000000000
- vec($_, 5, 4) = 2 == 2097152 00000000000000000000010000000000
- vec($_, 6, 4) = 2 == 33554432 00000000000000000000000001000000
- vec($_, 7, 4) = 2 == 536870912 00000000000000000000000000000100
- vec($_, 0, 4) = 4 == 4 00100000000000000000000000000000
- vec($_, 1, 4) = 4 == 64 00000010000000000000000000000000
- vec($_, 2, 4) = 4 == 1024 00000000001000000000000000000000
- vec($_, 3, 4) = 4 == 16384 00000000000000100000000000000000
- vec($_, 4, 4) = 4 == 262144 00000000000000000010000000000000
- vec($_, 5, 4) = 4 == 4194304 00000000000000000000001000000000
- vec($_, 6, 4) = 4 == 67108864 00000000000000000000000000100000
- vec($_, 7, 4) = 4 == 1073741824 00000000000000000000000000000010
- vec($_, 0, 4) = 8 == 8 00010000000000000000000000000000
- vec($_, 1, 4) = 8 == 128 00000001000000000000000000000000
- vec($_, 2, 4) = 8 == 2048 00000000000100000000000000000000
- vec($_, 3, 4) = 8 == 32768 00000000000000010000000000000000
- vec($_, 4, 4) = 8 == 524288 00000000000000000001000000000000
- vec($_, 5, 4) = 8 == 8388608 00000000000000000000000100000000
- vec($_, 6, 4) = 8 == 134217728 00000000000000000000000000010000
- vec($_, 7, 4) = 8 == 2147483648 00000000000000000000000000000001
- vec($_, 0, 8) = 1 == 1 10000000000000000000000000000000
- vec($_, 1, 8) = 1 == 256 00000000100000000000000000000000
- vec($_, 2, 8) = 1 == 65536 00000000000000001000000000000000
- vec($_, 3, 8) = 1 == 16777216 00000000000000000000000010000000
- vec($_, 0, 8) = 2 == 2 01000000000000000000000000000000
- vec($_, 1, 8) = 2 == 512 00000000010000000000000000000000
- vec($_, 2, 8) = 2 == 131072 00000000000000000100000000000000
- vec($_, 3, 8) = 2 == 33554432 00000000000000000000000001000000
- vec($_, 0, 8) = 4 == 4 00100000000000000000000000000000
- vec($_, 1, 8) = 4 == 1024 00000000001000000000000000000000
- vec($_, 2, 8) = 4 == 262144 00000000000000000010000000000000
- vec($_, 3, 8) = 4 == 67108864 00000000000000000000000000100000
- vec($_, 0, 8) = 8 == 8 00010000000000000000000000000000
- vec($_, 1, 8) = 8 == 2048 00000000000100000000000000000000
- vec($_, 2, 8) = 8 == 524288 00000000000000000001000000000000
- vec($_, 3, 8) = 8 == 134217728 00000000000000000000000000010000
- vec($_, 0, 8) = 16 == 16 00001000000000000000000000000000
- vec($_, 1, 8) = 16 == 4096 00000000000010000000000000000000
- vec($_, 2, 8) = 16 == 1048576 00000000000000000000100000000000
- vec($_, 3, 8) = 16 == 268435456 00000000000000000000000000001000
- vec($_, 0, 8) = 32 == 32 00000100000000000000000000000000
- vec($_, 1, 8) = 32 == 8192 00000000000001000000000000000000
- vec($_, 2, 8) = 32 == 2097152 00000000000000000000010000000000
- vec($_, 3, 8) = 32 == 536870912 00000000000000000000000000000100
- vec($_, 0, 8) = 64 == 64 00000010000000000000000000000000
- vec($_, 1, 8) = 64 == 16384 00000000000000100000000000000000
- vec($_, 2, 8) = 64 == 4194304 00000000000000000000001000000000
- vec($_, 3, 8) = 64 == 1073741824 00000000000000000000000000000010
- vec($_, 0, 8) = 128 == 128 00000001000000000000000000000000
- vec($_, 1, 8) = 128 == 32768 00000000000000010000000000000000
- vec($_, 2, 8) = 128 == 8388608 00000000000000000000000100000000
- vec($_, 3, 8) = 128 == 2147483648 00000000000000000000000000000001
Behaves like wait(2) on your system: it waits for a child
process to terminate and returns the pid of the deceased process, or
-1
if there are no child processes. The status is returned in $?
and ${^CHILD_ERROR_NATIVE}
.
Note that a return value of -1
could mean that child processes are
being automatically reaped, as described in perlipc.
If you use wait in your handler for $SIG{CHLD} it may accidentally for the child created by qx() or system(). See perlipc for details.
Portability issues: wait in perlport.
Waits for a particular child process to terminate and returns the pid of
the deceased process, or -1
if there is no such child process. On some
systems, a value of 0 indicates that there are processes still running.
The status is returned in $?
and ${^CHILD_ERROR_NATIVE}
. If you say
then you can do a non-blocking wait for all pending zombie processes.
Non-blocking wait is available on machines supporting either the
waitpid(2) or wait4(2) syscalls. However, waiting for a particular
pid with FLAGS of 0
is implemented everywhere. (Perl emulates the
system call by remembering the status values of processes that have
exited but have not been harvested by the Perl script yet.)
Note that on some systems, a return value of -1
could mean that child
processes are being automatically reaped. See perlipc for details,
and for other examples.
Portability issues: waitpid in perlport.
Returns true if the context of the currently executing subroutine or
eval is looking for a list value. Returns false if the context is
looking for a scalar. Returns the undefined value if the context is
looking for no value (void context).
wantarray()'s result is unspecified in the top level of a file,
in a BEGIN
, UNITCHECK
, CHECK
, INIT
or END
block, or
in a DESTROY
method.
This function should have been named wantlist() instead.
Prints the value of LIST to STDERR. If the last element of LIST does
not end in a newline, it appends the same file/line number text as die
does.
If the output is empty and $@
already contains a value (typically from a
previous eval) that value is used after appending "\t...caught"
to $@
. This is useful for staying almost, but not entirely similar to
die.
If $@
is empty then the string "Warning: Something's wrong"
is used.
No message is printed if there is a $SIG{__WARN__}
handler
installed. It is the handler's responsibility to deal with the message
as it sees fit (like, for instance, converting it into a die). Most
handlers must therefore arrange to actually display the
warnings that they are not prepared to deal with, by calling warn
again in the handler. Note that this is quite safe and will not
produce an endless loop, since __WARN__
hooks are not called from
inside one.
You will find this behavior is slightly different from that of
$SIG{__DIE__}
handlers (which don't suppress the error text, but can
instead call die again to change it).
Using a __WARN__
handler provides a powerful way to silence all
warnings (even the so-called mandatory ones). An example:
- # wipe out *all* compile-time warnings
- BEGIN { $SIG{'__WARN__'} = sub { warn $_[0] if $DOWARN } }
- my $foo = 10;
- my $foo = 20; # no warning about duplicate my $foo,
- # but hey, you asked for it!
- # no compile-time or run-time warnings before here
- $DOWARN = 1;
- # run-time warnings enabled after here
- warn "\$foo is alive and $foo!"; # does show up
See perlvar for details on setting %SIG
entries and for more
examples. See the Carp module for other kinds of warnings using its
carp() and cluck() functions.
Writes a formatted record (possibly multi-line) to the specified FILEHANDLE,
using the format associated with that file. By default the format for
a file is the one having the same name as the filehandle, but the
format for the current output channel (see the select function) may be set
explicitly by assigning the name of the format to the $~
variable.
Top of form processing is handled automatically: if there is insufficient
room on the current page for the formatted record, the page is advanced by
writing a form feed, a special top-of-page format is used to format the new
page header before the record is written. By default, the top-of-page
format is the name of the filehandle with "_TOP" appended. This would be a
problem with autovivified filehandles, but it may be dynamically set to the
format of your choice by assigning the name to the $^
variable while
that filehandle is selected. The number of lines remaining on the current
page is in variable $-
, which can be set to 0
to force a new page.
If FILEHANDLE is unspecified, output goes to the current default output
channel, which starts out as STDOUT but may be changed by the
select operator. If the FILEHANDLE is an EXPR, then the expression
is evaluated and the resulting string is used to look up the name of
the FILEHANDLE at run time. For more on formats, see perlform.
Note that write is not the opposite of read. Unfortunately.
The transliteration operator. Same as tr///. See
Quote and Quote-like Operators in perlop.
These keywords are documented in Special Literals in perldata.
These compile phase keywords are documented in BEGIN, UNITCHECK, CHECK, INIT and END in perlmod.
This method keyword is documented in Destructors in perlobj.
These operators are documented in perlop.
This keyword is documented in Autoloading in perlsub.
These flow-control keywords are documented in Compound Statements in perlsyn.
These flow-control keywords related to the experimental switch feature are documented in Switch Statements in perlsyn .
perlglossary - Perl Glossary
A glossary of terms (technical and otherwise) used in the Perl documentation, derived from the Glossary of Programming Perl, Fourth Edition. Words or phrases in bold are defined elsewhere in this glossary.
Other useful sources include the Unicode Glossary http://unicode.org/glossary/, the Free On-Line Dictionary of Computing http://foldoc.org/, the Jargon File http://catb.org/~esr/jargon/, and Wikipedia http://www.wikipedia.org/.
A method used to indirectly inspect or update an object’s state (its instance variables).
The scalar values that you supply
to a function or subroutine when you call it. For instance, when you
call power("puff")
, the string "puff"
is the actual argument. See also
argument and formal arguments.
Some languages work directly with the memory addresses of values, but this can be like playing with fire. Perl provides a set of asbestos gloves for handling all memory management. The closest to an address operator in Perl is the backslash operator, but it gives you a hard reference, which is much safer than a memory address.
A well-defined sequence of steps, explained clearly enough that even a computer could do them.
A nickname for something, which behaves in all ways as
though you’d used the original name instead of the nickname. Temporary
aliases are implicitly created in the loop variable for foreach
loops, in
the $_
variable for map or grep operators, in $a
and $b
during sort’s comparison function, and in each element of @_
for the
actual arguments of a subroutine call. Permanent aliases are explicitly
created in packages by importing symbols or by assignment to
typeglobs. Lexically scoped aliases for package variables are explicitly
created by the our declaration.
The sort of characters we put into words. In Unicode, this is all letters including all ideographs and certain diacritics, letter numbers like Roman numerals, and various combining marks.
A list of possible choices from which you may
select only one, as in, “Would you like door A, B, or C?” Alternatives in
regular expressions are separated with a single vertical bar: |.
Alternatives in normal Perl expressions are separated with a double vertical
bar: ||. Logical alternatives in Boolean expressions are separated
with either || or or
.
Used to describe a referent that is not directly accessible through a named variable. Such a referent must be indirectly accessible through at least one hard reference. When the last hard reference goes away, the anonymous referent is destroyed without pity.
A bigger, fancier sort of program with a fancier name so people don’t realize they are using a program.
The kind of computer you’re working on, where one “kind of computer” means all those computers sharing a compatible machine language. Since Perl programs are (typically) simple text files, not executable images, a Perl program is much less sensitive to the architecture it’s running on than programs in other languages, such as C, that are compiled into machine code. See also platform and operating system.
A piece of data supplied to a program, subroutine, function, or method to tell it what it’s supposed to do. Also called a “parameter”.
The name of the array containing the argument vector
from the command line. If you use the empty <>
operator, ARGV
is the name of both the filehandle used to traverse the arguments and the
scalar containing the name of the current input file.
A symbol such as +
or / that tells
Perl to do the arithmetic you were supposed to learn in grade school.
An ordered sequence of values, stored such that you can easily access any of the values using an integer subscript that specifies the value’s offset in the sequence.
An archaic expression for what is more correctly referred to as list context.
The open source license that Larry Wall created for Perl, maximizing Perl’s usefulness, availability, and modifiability. The current version is 2. (http://www.opensource.org/licenses/artistic-license.php).
The American Standard Code for Information Interchange (a 7-bit character set adequate only for poorly representing English text). Often used loosely to describe the lowest 128 values of the various ISO-8859-X character sets, a bunch of mutually incompatible 8-bit codes best described as half ASCII. See also Unicode.
A component of a regular expression that must be true for the pattern to match but does not necessarily match any characters itself. Often used specifically to mean a zero-width assertion.
An operator whose assigned mission in life is to change the value of a variable.
Either a regular assignment or a compound
operator composed of an ordinary assignment and some other operator, that
changes the value of a variable in place; that is, relative to its old
value. For example, $a += 2
adds 2
to $a
.
See hash. Please. The term associative array is the old Perl 4 term for a hash. Some languages call it a dictionary.
Determines whether you do the left operator first or the
right operator first when you have “A operator B operator C”, and
the two operators are of the same precedence. Operators like +
are left
associative, while operators like **
are right associative. See Camel
chapter 3, “Unary and Binary Operators” for a list of operators and their
associativity.
Said of events or activities whose relative temporal ordering is indeterminate because too many things are going on at once. Hence, an asynchronous event is one you didn’t know when to expect.
A regular expression component potentially matching a substring containing one or more characters and treated as an indivisible syntactic unit by any following quantifier. (Contrast with an assertion that matches something of zero width and may not be quantified.)
When Democritus gave the word “atom” to the indivisible bits of matter, he meant literally something that could not be cut: ἀ- (not) + -τομος (cuttable). An atomic operation is an action that can’t be interrupted, not one forbidden in a nuclear-free zone.
A new feature that allows the declaration of
variables and subroutines with modifiers, as in sub foo : locked
method
. Also another name for an instance variable of an object.
A feature of operator overloading of objects, whereby the behavior of certain operators can be reasonably deduced using more fundamental operators. This assumes that the overloaded operators will often have the same relationships as the regular operators. See Camel chapter 13, “Overloading”.
To add one to something automatically, hence the name
of the ++
operator. To instead subtract one from something automatically
is known as an “autodecrement”.
To load on demand. (Also called “lazy” loading.)
Specifically, to call an AUTOLOAD
subroutine on behalf of an undefined
subroutine.
To split a string automatically, as the –a switch
does when running under –p or –n in order to emulate awk. (See also
the AutoSplit
module, which has nothing to do with the
–a
switch but a lot to do with autoloading.)
A Graeco-Roman word meaning “to bring oneself to life”.
In Perl, storage locations (lvalues) spontaneously generate themselves as
needed, including the creation of any hard reference values to point to
the next level of storage. The assignment $a[5][5][5][5][5] = "quintet"
potentially creates five scalar storage locations, plus four references (in
the first four scalar locations) pointing to four new anonymous arrays (to
hold the last four scalar locations). But the point of autovivification is
that you don’t have to worry about it.
Short for “array
value”, which refers to one of Perl’s internal data types that holds an
array. The AV
type is a subclass of SV.
Descriptive editing term—short for “awkward”. Also coincidentally refers to a venerable text-processing language from which Perl derived some of its high-level ideas.
A substring captured
by a subpattern within unadorned parentheses in a regex. Backslashed
decimal numbers (\1
, \2
, etc.) later in the same pattern refer back to
the corresponding subpattern in the current match. Outside the pattern, the
numbered variables ($1
, $2
, etc.) continue to refer to these same
values, as long as the pattern was the last successful match of the current
dynamic scope.
The practice of saying, “If I had to do it all over, I’d do it differently,” and then actually going back and doing it all over differently. Mathematically speaking, it’s returning from an unsuccessful recursion on a tree of possibilities. Perl backtracks when it attempts to match patterns with a regular expression, and its earlier attempts don’t pan out. See the section “The Little Engine That /Couldn(n’t)” in Camel chapter 5, “Pattern Matching”.
Means you can still run your old program because we didn’t break any of the features or bugs it was relying on.
A word sufficiently ambiguous to be deemed illegal under
use strict 'subs'
. In the absence of that stricture, a bareword is
treated as if quotes were around it.
A generic object type; that is, a class from which other, more specific classes are derived genetically by inheritance. Also called a “superclass” by people who respect their ancestors.
From Swift: someone who eats eggs big end first. Also used of computers that store the most significant byte of a word at a lower byte address than the least significant byte. Often considered superior to little-endian machines. See also little-endian.
Having to do with numbers represented in base 2. That means there’s basically two numbers: 0 and 1. Also used to describe a file of “nontext”, presumably because such a file makes full use of all the binary bits in its bytes. With the advent of Unicode, this distinction, already suspect, loses even more of its meaning.
An operator that takes two operands.
To assign a specific network address to a socket.
An integer in the range from 0 to 1, inclusive. The smallest possible unit of information storage. An eighth of a byte or of a dollar. (The term “Pieces of Eight” comes from being able to split the old Spanish dollar into 8 bits, each of which still counted for money. That’s why a 25- cent piece today is still “two bits”.)
The movement of bits left or right in a computer word, which has the effect of multiplying or dividing by a power of 2.
A sequence of bits that is actually being thought of as a sequence of bits, for once.
In corporate life, to grant official
approval to a thing, as in, “The VP of Engineering has blessed our
WebCruncher project.” Similarly, in Perl, to grant official approval to a
referent so that it can function as an object, such as a WebCruncher
object. See the bless function in Camel chapter 27, “Functions”.
What a process does when it has to wait for something: “My process blocked waiting for the disk.” As an unrelated noun, it refers to a large chunk of data, of a size that the operating system likes to deal with (normally a power of 2 such as 512 or 8192). Typically refers to a chunk of data that’s coming from or going to a disk file.
A syntactic construct
consisting of a sequence of Perl statements that is delimited by braces.
The if
and while
statements are defined in terms of BLOCK
s, for
instance. Sometimes we also say “block” to mean a lexical scope; that is, a
sequence of statements that acts like a BLOCK
, such as within an
eval or a file, even though the statements aren’t delimited by braces.
A method of making input and output efficient by passing one block at a time. By default, Perl does block buffering to disk files. See buffer and command buffering.
A value that is either true or false.
A special kind of scalar context used in conditionals to decide whether the scalar value returned by an expression is true or false. Does not evaluate as either a string or a number. See context.
A spot in your program where you’ve told the debugger to stop execution so you can poke around and see whether anything is wrong yet.
To send a datagram to multiple destinations simultaneously.
A psychoactive drug, popular in the ’80s, probably developed at UC Berkeley or thereabouts. Similar in many ways to the prescription-only medication called “System V”, but infinitely more useful. (Or, at least, more fun.) The full chemical name is “Berkeley Standard Distribution”.
A location in a hash table containing (potentially) multiple entries whose keys “hash” to the same hash value according to its hash function. (As internal policy, you don’t have to worry about it unless you’re into internals, or policy.)
A temporary holding location for data. Data that are
Block buffering means that the data is passed on to its destination
whenever the buffer is full. Line buffering means that it’s passed on
whenever a complete line is received. Command buffering means that it’s
passed every time you do a print command (or equivalent). If your output
is unbuffered, the system processes it one byte at a time without the use of
a holding area. This can be rather inefficient.
A function that is predefined in the
language. Even when hidden by overriding, you can always get at a built-
in function by qualifying its name with the CORE::
pseudopackage.
A group of related modules on CPAN. (Also sometimes refers to a group of command-line switches grouped into one switch cluster.)
A piece of data worth eight bits in most places.
A pidgin-like lingo spoken among ’droids when they don’t wish to reveal their orientation (see endian). Named after some similar languages spoken (for similar reasons) between compilers and interpreters in the late 20ᵗʰ century. These languages are characterized by representing everything as a nonarchitecture-dependent sequence of bytes.
A language beloved by many for its inside-out type definitions, inscrutable precedence rules, and heavy overloading of the function-call mechanism. (Well, actually, people first switched to C because they found lowercase identifiers easier to read than upper.) Perl is written in C, so it’s not surprising that Perl borrowed a few ideas from it.
A data repository. Instead of computing expensive answers several times, compute it once and save the result.
A handler that you register with some other part of your program in the hope that the other part of your program will trigger your handler when some event of interest transpires.
An argument-passing mechanism in which the formal arguments refer directly to the actual arguments, and the subroutine can change the actual arguments by changing the formal arguments. That is, the formal argument is an alias for the actual argument. See also call by value.
An argument-passing mechanism in which the formal arguments refer to a copy of the actual arguments, and the subroutine cannot change the actual arguments by changing the formal arguments. See also call by reference.
Reduced to a standard form to facilitate comparison.
The variables—such as $1
and
$2
, and %+
and %–
—that hold the text remembered in a pattern
match. See Camel chapter 5, “Pattern Matching”.
The use of parentheses around a subpattern in a regular expression to store the matched substring as a backreference. (Captured strings are also returned as a list in list context.) See Camel chapter 5, “Pattern Matching”.
Copying and pasting code without understanding it, while superstitiously believing in its value. This term originated from preindustrial cultures dealing with the detritus of explorers and colonizers of technologically advanced cultures. See The Gods Must Be Crazy.
A property of certain
characters. Originally, typesetter stored capital letters in the upper of
two cases and small letters in the lower one. Unicode recognizes three
cases: lowercase (character property \p{lower}
), titlecase
(\p{title}
), and uppercase (\p{upper}
). A fourth casemapping called
foldcase is not itself a distinct case, but it is used internally to
implement casefolding. Not all letters have case, and some nonletters
have case.
Comparing or matching a string case-insensitively. In Perl, it
is implemented with the /i pattern modifier, the fc function, and the
\F
double-quote translation escape.
The process of converting a string to one of the four Unicode
casemaps; in Perl, it is implemented with the fc, lc, ucfirst,
and uc functions.
The smallest individual element of a string. Computers store characters as integers, but Perl lets you operate on them as text. The integer used to represent a particular character is called that character’s codepoint.
A square-bracketed list of characters used in a regular expression to indicate that any character of the set may occur at a given point. Loosely, any predefined set of characters so used.
A predefined character class matchable by the \p
or \P
metasymbol. Unicode defines hundreds of standard properties
for every possible codepoint, and Perl defines a few of its own, too.
An operator that surrounds its operand, like the angle operator, or parentheses, or a hug.
A user-defined type, implemented in Perl via a package that provides (either directly or by inheritance) methods (that is, subroutines) to handle instances of the class (its objects). See also inheritance.
A method whose invocant is a package name, not an object reference. A method associated with the class as a whole. Also see instance method.
In networking, a process that initiates contact with a server process in order to exchange data and perhaps receive a service.
An anonymous subroutine that, when a reference to it is generated at runtime, keeps track of the identities of externally visible lexical variables, even after those lexical variables have supposedly gone out of scope. They’re called “closures” because this sort of behavior gives mathematicians a sense of closure.
A parenthesized subpattern used to group parts of a regular expression into a single atom.
The word returned by the ref
function when you apply it to a reference to a subroutine. See also CV.
A system that writes code for you in a low-level language, such as code to implement the backend of a compiler. See program generator.
The integer a computer uses to represent a given character. ASCII codepoints are in the range 0 to 127; Unicode codepoints are in the range 0 to 0x1F_FFFF; and Perl codepoints are in the range 0 to 2³²−1 or 0 to 2⁶⁴−1, depending on your native integer size. In Perl Culture, sometimes called ordinals.
A regular expression subpattern
whose real purpose is to execute some Perl code—for example, the (?{...})
and (??{...})
subpatterns.
The order into which characters sort. This is used by string comparison routines to decide, for example, where in this glossary to put “collating sequence”.
A person with permissions to index a namespace in PAUSE. Anyone can upload any namespace, but only primary and co-maintainers get their contributions indexed.
Any character with the
General Category of Combining Mark (\p{GC=M}
), which may be spacing or
nonspacing. Some are even invisible. A sequence of combining characters
following a grapheme base character together make up a single user-visible
character called a grapheme. Most but not all diacritics are combining
characters, and vice versa.
In shell programming, the syntactic combination of a program name and its arguments. More loosely, anything you type to a shell (a command interpreter) that starts it doing something. Even more loosely, a Perl statement, which might start with a label and typically ends with a semicolon.
A mechanism in Perl that lets you
store up the output of each Perl command and then flush it out as a
single request to the operating system. It’s enabled by setting the $|
($AUTOFLUSH
) variable to a true value. It’s used when you don’t want data
sitting around, not going where it’s supposed to, which may happen because
the default on a file or pipe is to use block buffering.
The values you supply
along with a program name when you tell a shell to execute a command.
These values are passed to a Perl program through @ARGV
.
The name of the program currently executing, as typed on the
command line. In C, the command name is passed to the program as the
first command-line argument. In Perl, it comes in separately as $0
.
A remark that doesn’t affect the meaning of the program.
In Perl, a comment is introduced by a #
character and continues to the
end of the line.
The file (or string, in the case of eval) that
is currently being compiled.
The process of turning source code into a machine-usable form. See compile phase.
Any time before Perl starts running your main
program. See also run phase. Compile phase is mostly spent in compile
time, but may also be spent in runtime when BEGIN
blocks, use or
no declarations, or constant subexpressions are being evaluated. The
startup and import code of any use declaration is also run during
compile phase.
Strictly speaking, a program that munches up another program and spits out yet another file containing the program in a “more executable” form, typically containing native machine instructions. The perl program is not a compiler by this definition, but it does contain a kind of compiler that takes a program and turns it into a more executable form (syntax trees) within the perl process itself, which the interpreter then interprets. There are, however, extension modules to get Perl to act more like a “real” compiler. See Camel chapter 16, “Compiling”.
The time when Perl is trying to make sense of your code, as opposed to when it thinks it knows what your code means and is merely trying to do what it thinks your code says to do, which is runtime.
A “constructor” for a referent that isn’t really an object, like an anonymous array or a hash (or a sonata, for that matter). For example, a pair of braces acts as a composer for a hash, and a pair of brackets acts as a composer for an array. See the section “Creating References” in Camel chapter 8, “References”.
The process of gluing one cat’s nose to another cat’s tail. Also a similar operation on two strings.
Something “iffy”. See Boolean context.
In telephony, the temporary electrical circuit between the caller’s and the callee’s phone. In networking, the same kind of temporary circuit between a client and a server.
As a noun, a piece of syntax made up of smaller pieces. As a transitive verb, to create an object using a constructor.
Any class method, instance, or subroutine that composes, initializes, blesses, and returns an object. Sometimes we use the term loosely to mean a composer.
The surroundings or environment. The context given by the surrounding code determines what kind of data a particular expression is expected to return. The three primary contexts are list context, scalar, and void context. Scalar context is sometimes subdivided into Boolean context, numeric context, string context, and void context. There’s also a “don’t care” context (which is dealt with in Camel chapter 2, “Bits and Pieces”, if you care).
The treatment of more than one physical line as a single logical line. Makefile lines are continued by putting a backslash before the newline. Mail headers, as defined by RFC 822, are continued by putting a space or tab after the newline. In general, lines in Perl do not need any form of continuation mark, because whitespace (including newlines) is gleefully ignored. Usually.
The corpse of a process, in the form of a file left in the working directory of the process, usually as a result of certain kinds of fatal errors.
The Comprehensive Perl Archive Network. (See the Camel Preface and Camel chapter 19, “CPAN” for details.)
The typical C compiler’s first pass, which processes lines
beginning with #
for conditional compilation and macro definition, and
does various manipulations of the program text based on the current
definitions. Also known as cpp(1).
Someone who breaks security on computer systems. A cracker may be a true hacker or only a script kiddie.
The last filehandle that was
designated with select(FILEHANDLE); STDOUT
, if no filehandle has
been selected.
The package in which the current statement is compiled. Scan backward in the text of your program through the current lexical scope or any enclosing lexical scopes until you find a package declaration. That’s your current package name.
See working directory.
In academia, a curriculum vitæ, a fancy kind of résumé. In Perl, an internal “code value” typedef holding a
subroutine. The CV
type is a subclass of SV.
A bare, single statement,
without any braces, hanging off an if
or while
conditional. C allows
them. Perl doesn’t.
A packet of data, such as a UDP message, that (from the viewpoint of the programs involved) can be sent independently over the network. (In fact, all packets are sent independently at the IP level, but stream protocols such as TCP hide this from your program.)
How your various pieces of data relate to each other and what shape they make when you put them all together, as in a rectangular table or a triangular tree.
A set of possible values, together with all the
operations that know how to deal with those values. For example, a numeric
data type has a certain set of numbers that you can work with, as well as
various mathematical operations that you can do on the numbers, but would
make little sense on, say, a string such as "Kilroy"
. Strings have their
own operations, such as concatenation. Compound types made of a number of
smaller pieces generally have operations to compose and decompose them, and
perhaps to rearrange them. Objects that model things in the real world
often have operations that correspond to real activities. For instance, if
you model an elevator, your elevator object might have an open_door
method.
Stands for “Database Management” routines, a set of routines that emulate an
associative array using disk files. The routines use a dynamic hashing
scheme to locate any entry with only two disk accesses. DBM files allow a
Perl program to keep a persistent hash across multiple invocations. You
can tie your hash variables to various DBM implementations.
An assertion that states something exists and perhaps describes what it’s like, without giving any commitment as to how or where you’ll use it. A declaration is like the part of your recipe that says, “two cups flour, one large egg, four or five tadpoles…” See statement for its opposite. Note that some declarations also function as statements. Subroutine declarations also act as definitions if a body is supplied.
Something that tells your program what sort of variable
you’d like. Perl doesn’t require you to declare variables, but you can use
my, our, or state to denote that you want something other than
the default.
To subtract a value from a
variable, as in “decrement $x
” (meaning to remove 1 from its value) or
“decrement $x
by 3”.
A value chosen for you if you don’t supply a value of your own.
Having a meaning. Perl thinks that some of the things
people try to do are devoid of meaning; in particular, making use of
variables that have never been given a value and performing certain
operations on data that isn’t there. For example, if you try to read data
past the end of a file, Perl will hand you back an undefined value. See also
false and the defined entry in Camel chapter 27, “Functions”.
A character or string that sets bounds to an arbitrarily sized textual object, not to be confused with a separator or terminator. “To delimit” really just means “to surround” or “to enclose” (like these parentheses are doing).
A fancy computer science term meaning “to follow a reference to what it points to”. The “de” part of it refers to the fact that you’re taking away one level of indirection.
A class that defines some of its methods in terms of a more generic class, called a base class. Note that classes aren’t classified exclusively into base classes or derived classes: a class can function as both a derived class and a base class simultaneously, which is kind of classy.
See file descriptor.
To deallocate the memory of a referent (first triggering
its DESTROY
method, if it has one).
A special method that is called
when an object is thinking about destroying itself. A Perl program’s
DESTROY
method doesn’t do the actual destruction; Perl just triggers
the method in case the class wants to do any associated cleanup.
A whiz-bang hardware gizmo (like a disk or tape drive or a modem or a joystick or a mouse) attached to your computer, which the operating system tries to make look like a file (or a bunch of files). Under Unix, these fake files tend to live in the /dev directory.
A pod directive. See Camel chapter 23, “Plain Old Documentation”.
A special file that contains other files. Some operating systems call these “folders”, “drawers”, “catalogues”, or “catalogs”.
A name that represents a particular instance of opening a
directory to read it, until you close it. See the opendir function.
Some people need this and some people avoid it. For Perl, it’s an old way to say I/O layer.
To send something to its correct destination. Often used metaphorically to indicate a transfer of programmatic control to a destination selected algorithmically, often by lookup in a table of function references or, in the case of object methods, by traversing the inheritance tree looking for the most specific definition for the method.
A standard, bundled release of a system of software. The default usage implies source code is included. If that is not the case, it will be called a “binary-only” distribution.
Some modules live both in the Standard Library and on CPAN. These modules might be developed on two tracks as people modify either version. The trend currently is to untangle these situations.
An enchantment, illusion, phantasm, or jugglery. Said when Perl’s magical dwimmer effects don’t do what you expect, but rather seem to be the product of arcane dweomercraft, sorcery, or wonder working. [From Middle English.]
DWIM is an acronym for “Do What I Mean”, the principle that something should just do what you want it to do without an undue amount of fuss. A bit of code that does “dwimming” is a “dwimmer”. Dwimming can require a great deal of behind-the-scenes magic, which (if it doesn’t stay properly behind the scenes) is called a dweomer instead.
Dynamic scoping works over a dynamic
scope, making variables visible throughout the rest of the block in
which they are first used and in any subroutines that are called by the
rest of the block. Dynamically scoped variables can have their values
temporarily changed (and implicitly restored later) by a local operator.
(Compare lexical scoping.) Used more loosely to mean how a subroutine
that is in the middle of calling another subroutine “contains” that
subroutine at runtime.
Derived from many sources. Some would say too many.
A basic building block. When you’re talking about an array, it’s one of the items that make up the array.
When something is contained in something else, particularly when that might be considered surprising: “I’ve embedded a complete Perl interpreter in my editor!”
The notion that an empty derived class should behave exactly like its base class.
The veil of abstraction separating the interface from the implementation (whether enforced or not), which mandates that all access to an object’s state be through methods alone.
See little-endian and big-endian.
When you change a value as it is being copied. [From French “in passing”, as in the exotic pawn-capturing maneuver in chess.]
The collective set of environment variables your
process inherits from its parent. Accessed via %ENV
.
A mechanism by which some high-level agent such as a user can pass its preferences down to its future offspring (child processes, grandchild processes, great-grandchild processes, and so on). Each environment variable is a key/value pair, like one entry in a hash.
End of File. Sometimes used metaphorically as the terminating string of a here document.
The error number returned by a
syscall when it fails. Perl refers to the error by the name $!
(or
$OS_ERROR
if you use the English module).
See exception or fatal error.
See metasymbol.
A fancy term for an error. See fatal error.
The way a program responds to an error. The
exception-handling mechanism in Perl is the eval operator.
To throw away the current process’s program and replace it with another, without exiting the process or relinquishing any resources held (apart from the old memory image).
A file that is specially marked to tell the operating system that it’s okay to run this file as a program. Usually shortened to “executable”.
To run a program or subroutine. (Has nothing to do
with the kill built-in, unless you’re trying to run a signal handler.)
The special mark that tells the operating system it can run this program. There are actually three execute bits under Unix, and which bit gets used depends on whether you own the file singularly, collectively, or not at all.
See status.
Used as a noun in this case, this refers to a known way to compromise a program to get it to do something the author didn’t intend. Your task is to write unexploitable programs.
To make symbols from a module available for import by other modules.
Anything you can legally say in a spot where a value is required. Typically composed of literals, variables, operators, functions, and subroutine calls, not necessarily in that order.
A Perl module that also pulls in compiled C or C++ code. More generally, any experimental option that can be compiled into Perl, such as multithreading.
In Perl, any value that would look like ""
or "0"
if evaluated in a string context. Since undefined values evaluate
to ""
, all undefined values are false, but not all false values are
undefined.
Frequently Asked Question (although not necessarily frequently answered, especially if the answer appears in the Perl FAQ shipped standard with Perl).
An uncaught exception, which causes termination of the
process after printing a message on your standard error stream. Errors
that happen inside an eval are not fatal. Instead, the eval terminates
after placing the exception message in the $@
($EVAL_ERROR
) variable.
You can try to provoke a fatal error with the die operator (known as
throwing or raising an exception), but this may be caught by a dynamically
enclosing eval. If not caught, the die becomes a fatal error.
A spoonerism of “creeping featurism”, noting the biological urge to add just one more feature to a program.
A single piece of numeric or string data that is part of a
longer string, record, or line. Variable-width fields are usually
split up by separators (so use split to extract the fields), while
fixed-width fields are usually at fixed positions (so use unpack).
Instance variables are also known as “fields”.
First In, First Out. See also LIFO. Also a nickname for a named pipe.
A named collection of data, usually stored on disk in a directory in a filesystem. Roughly like a document, if you’re into office metaphors. In modern filesystems, you can actually give a file more than one name. Some files have special properties, like directories and devices.
The little number the operating system uses to keep track of which opened file you’re talking about. Perl hides the file descriptor inside a standard I/O stream and then attaches the stream to a filehandle.
A “wildcard” match on filenames. See the glob function.
An identifier (not necessarily related to the real name of a file) that represents a particular instance of opening a file, until you close it. If you’re going to open and close several different files in succession, it’s fine to open each of them with the same filehandle, so you don’t have to write out separate code to process each file.
One name for a file. This name is listed in a
directory. You can use it in an open to tell the operating system
exactly which file you want to open, and associate the file with a
filehandle, which will carry the subsequent identity of that file in
your program, until you close it.
A set of directories and files residing on a partition of the disk. Sometimes known as a “partition”. You can change the file’s name or even move a file around from directory to directory within a filesystem without actually moving the file itself, at least under Unix.
A built-in unary operator that you use to
determine whether something is true about a file, such as –o
$filename
to test whether you’re the owner of the file.
A program designed to take a stream of input and transform it into a stream of output.
The first PAUSE author to upload a namespace automatically becomes the primary maintainer for that namespace. The “first come” permissions distinguish a primary maintainer who was assigned that role from one who received it automatically.
We tend to avoid this term because it means so many things.
It may mean a command-line switch that takes no argument itself (such as
Perl’s –n
and –p
flags) or, less frequently, a single-bit indicator
(such as the O_CREAT
and O_EXCL
flags used in sysopen). Sometimes
informally used to refer to certain regex modifiers.
A method of storing numbers in “scientific notation”, such that the precision of the number is independent of its magnitude (the decimal point “floats”). Perl does its numeric work with floating-point numbers (sometimes called “floats”) when it can’t get away with using integers. Floating-point numbers are mere approximations of real numbers.
The act of emptying a buffer, often before it’s full.
Far More Than Everything You Ever Wanted To Know. An exhaustive treatise on one narrow topic, something of a super-FAQ. See Tom for far more.
The casemap used in Unicode when comparing or matching without regard to case. Comparing lower-, title-, or uppercase are all unreliable due to Unicode’s complex, one-to-many case mappings. Foldcase is a lowercase variant (using a partially decomposed normalization form for certain codepoints) created specifically to resolve this.
To create a child process identical to the parent process at its moment of conception, at least until it gets ideas of its own. A thread with protected memory.
The generic names by which a
subroutine knows its arguments. In many languages, formal arguments
are always given individual names; in Perl, the formal arguments are just
the elements of an array. The formal arguments to a Perl program are
$ARGV[0]
, $ARGV[1]
, and so on. Similarly, the formal arguments to a
Perl subroutine are $_[0]
, $_[1]
, and so on. You may give the
arguments individual names by assigning the values to a my list. See
also actual arguments.
A specification of how many spaces and digits and things to put somewhere so that whatever you’re printing comes out nice and pretty.
Means you don’t have to pay money to get it, but the copyright on it may still belong to someone else (like Larry).
Means you’re not in legal trouble if you give a bootleg copy of it to your friends and we find out about it. In fact, we’d rather you gave a copy to all your friends.
Historically, any software that you give away, particularly if you make the source code available as well. Now often called open source software. Recently there has been a trend to use the term in contradistinction to open source software, to refer only to free software released under the Free Software Foundation’s GPL (General Public License), but this is difficult to justify etymologically.
Mathematically, a mapping of each of a set of input values to a particular output value. In computers, refers to a subroutine or operator that returns a value. It may or may not have input values (called arguments).
Someone like Larry, or one of his peculiar friends. Also refers to the strange prefixes that Perl requires as noun markers on its variables.
A misnamed feature—it should be called, “expecting your mother to pick up after you”. Strictly speaking, Perl doesn’t do this, but it relies on a reference-counting mechanism to keep things tidy. However, we rarely speak strictly and will often refer to the reference-counting scheme as a form of garbage collection. (If it’s any comfort, when your interpreter exits, a “real” garbage collector runs to make sure everything is cleaned up if you’ve been messy with circular references and such.)
Group ID—in Unix, the numeric group ID that the operating system uses to identify you and members of your group.
Strictly, the shell’s *
character, which will match
a “glob” of characters when you’re trying to generate a list of filenames.
Loosely, the act of using globs and similar symbols to do pattern matching.
See also fileglob and typeglob.
Something you can see from anywhere, usually used of
variables and subroutines that are visible everywhere in your
program. In Perl, only certain special variables are truly global—most
variables (and all subroutines) exist only in the current package.
Global variables can be declared with our. See “Global Declarations” in
Camel chapter 4, “Statements and Declarations”.
The garbage collection of globals (and the running of any associated object destructors) that takes place when a Perl interpreter is being shut down. Global destruction should not be confused with the Apocalypse, except perhaps when it should.
A language such as Perl that is good at hooking things together that weren’t intended to be hooked together.
The size of the pieces you’re dealing with, mentally speaking.
A graphene is an allotrope of carbon arranged in a hexagonal crystal lattice one atom thick. A grapheme, or more fully, a grapheme cluster string is a single user-visible character, which may in turn be several characters (codepoints) long. For example, a carriage return plus a line feed is a single grapheme but two characters, while a “ȫ” is a single grapheme but one, two, or even three characters, depending on normalization.
A subpattern whose quantifier wants to match as many things as possible.
Originally from the old Unix editor command for “Globally
search for a Regular Expression and Print it”, now used in the general
sense of any kind of search, especially text searches. Perl has a built-in
grep function that searches a list for elements matching any given
criterion, whereas the grep(1) program searches for lines matching a
regular expression in one or more files.
A set of users of which you are a member. In some operating systems (like Unix), you can give certain file access permissions to other members of your group.
An internal “glob value” typedef,
holding a typeglob. The GV
type is a subclass of SV.
Someone who is brilliantly persistent in solving technical problems, whether these involve golfing, fighting orcs, or programming. Hacker is a neutral term, morally speaking. Good hackers are not to be confused with evil crackers or clueless script kiddies. If you confuse them, we will presume that you are either evil or clueless.
A subroutine or method that Perl calls when your program needs to respond to some internal event, such as a signal, or an encounter with an operator subject to operator overloading. See also callback.
A scalar value containing the actual address of a referent, such that the referent’s reference count accounts for it. (Some hard references are held internally, such as the implicit reference from one of a typeglob’s variable slots to its corresponding referent.) A hard reference is different from a symbolic reference.
An unordered association of key/value pairs, stored such that you can easily use a string key to look up its associated data value. This glossary is like a hash, where the word to be defined is the key and the definition is the value. A hash is also sometimes septisyllabically called an “associative array”, which is a pretty good reason for simply calling it a “hash” instead.
A data structure used internally by Perl for implementing associative arrays (hashes) efficiently. See also bucket.
A file containing certain required
definitions that you must include “ahead” of the rest of your program to do
certain obscure operations. A C header file has a .h extension. Perl
doesn’t really have header files, though historically Perl has sometimes
used translated .h files with a .ph extension. See require in
Camel chapter 27, “Functions”. (Header files have been superseded by the
module mechanism.)
So called because of a similar construct in shells that pretends that the lines following the command are a separate file to be fed to the command, up to some terminating string. In Perl, however, it’s just a fancy form of quoting.
A number in base 16, “hex” for short. The digits for 10
through 15 are customarily represented by the letters a
through f
.
Hexadecimal constants in Perl start with 0x
. See also the hex
function in Camel chapter 27, “Functions”.
The directory you are put into when
you log in. On a Unix system, the name is often placed into $ENV{HOME}
or $ENV{LOGDIR}
by login, but you can also find it with
(get
pwuid($<))[7]
. (Some platforms do not have a concept of a
home directory.)
The computer on which a program or other data resides.
Excessive pride, the sort of thing for which Zeus zaps you. Also the quality that makes you write (and maintain) programs that other people won’t want to say bad things about. Hence, the third great virtue of a programmer. See also laziness and impatience.
Short for a “hash value” typedef, which
holds Perl’s internal representation of a hash. The HV
type is a
subclass of SV.
A legally formed name for most anything in which a computer program might be interested. Many languages (including Perl) allow identifiers to start with an alphabetic character, and then contain alphabetics and digits. Perl also allows connector punctuation like the underscore character wherever it allows alphabetics. (Perl also has more complicated names, like qualified names.)
The anger you feel when the computer is being lazy. This makes you write programs that don’t just react to your needs, but actually anticipate them. Or at least that pretend to. Hence, the second great virtue of a programmer. See also laziness and hubris.
How a piece of code actually goes about doing its job. Users of the code should not count on implementation details staying the same unless they are part of the published interface.
To gain access to symbols that are exported from another
module. See use in Camel chapter 27, “Functions”.
To increase the value of something by 1 (or by some other number, if so specified).
In olden days, the act of looking up a key in an
actual index (such as a phone book). But now it's merely the act of using
any kind of key or position to find the corresponding value, even if no
index is involved. Things have degenerated to the point that Perl’s
index function merely locates the position (index) of one string in
another.
An expression that evaluates to something that can be used as a filehandle: a string (filehandle name), a typeglob, a typeglob reference, or a low-level IO object.
If something in a program isn’t the value you’re looking for but indicates where the value is, that’s indirection. This can be done with either symbolic references or hard.
In English grammar, a short
noun phrase between a verb and its direct object indicating the beneficiary
or recipient of the action. In Perl, print STDOUT "$foo\n";
can be
understood as “verb indirect-object object”, where STDOUT
is the
recipient of the print action, and "$foo"
is the object being
printed. Similarly, when invoking a method, you might place the
invocant in the dative slot between the method and its arguments:
- $gollum = new Pathetic::Creature "Sméagol";
- give $gollum "Fisssssh!";
- give $gollum "Precious!";
The syntactic position falling between a method call
and its arguments when using the indirect object invocation syntax. (The
slot is distinguished by the absence of a comma between it and the next
argument.) STDERR
is in the indirect object slot here:
- print STDERR "Awake! Awake! Fear, Fire, Foes! Awake!\n";
An operator that comes in between its operands,
such as multiplication in 24 * 7
.
What you get from your ancestors, genetically or otherwise. If you happen to be a class, your ancestors are called base classes and your descendants are called derived classes. See single inheritance and multiple inheritance.
Short for “an instance of a class”, meaning an object of that class.
See instance variable.
A method of an object, as opposed to a class method.
A method whose invocant is an object, not a package name. Every object of a class shares all the methods of that class, so an instance method applies to all instances of the class, rather than applying to a particular instance. Also see class method.
An attribute of an object; data stored with the particular object rather than with the class as a whole.
A number with no fractional (decimal) part. A counting number, like 1, 2, 3, and so on, but including 0 and the negatives.
The services a piece of code promises to provide forever, in contrast to its implementation, which it should feel free to change whenever it likes.
The insertion of a scalar or list value somewhere
in the middle of another value, such that it appears to have been there all
along. In Perl, variable interpolation happens in double-quoted strings and
patterns, and list interpolation occurs when constructing the list of
values to pass to a list operator or other such construct that takes a
LIST
.
Strictly speaking, a program that reads a second program and does what the second program says directly without turning the program into a different form first, which is what compilers do. Perl is not an interpreter by this definition, because it contains a kind of compiler that takes a program and turns it into a more executable form (syntax trees) within the perl process itself, which the Perl runtime system then interprets.
The agent on whose behalf a method is invoked. In a class method, the invocant is a package name. In an instance method, the invocant is an object reference.
The act of calling up a deity, daemon, program, method, subroutine, or function to get it to do what you think it’s supposed to do. We usually “call” subroutines but “invoke” methods, since it sounds cooler.
Input from, or output to, a file or device.
An internal I/O object. Can also mean indirect object.
One of the filters between the data and what you get as input or what you end up with as output.
India Pale Ale. Also the International Phonetic Alphabet, the standard alphabet used for phonetic notation worldwide. Draws heavily on Unicode, including many combining characters.
Internet Protocol, or Intellectual Property.
Interprocess Communication.
A relationship between two objects in which one object is considered to be a more specific version of the other, generic object: “A camel is a mammal.” Since the generic object really only exists in a Platonic sense, we usually add a little abstraction to the notion of objects and think of the relationship as being between a generic base class and a specific derived class. Oddly enough, Platonic classes don’t always have Platonic relationships—see inheritance.
Doing something repeatedly.
A special programming gizmo that keeps track of where you are
in something that you’re trying to iterate over. The foreach
loop in
Perl contains an iterator; so does a hash, allowing you to each through
it.
The integer four, not to be confused with six, Tom’s favorite editor. IV also means an internal Integer Value of the type a scalar can hold, not to be confused with an NV.
“Just Another Perl Hacker”, a clever but cryptic bit of Perl code that, when executed, evaluates to that string. Often used to illustrate a particular Perl feature, and something of an ongoing Obfuscated Perl Contest seen in USENET signatures.
The string index to a hash, used to look up the value associated with that key.
See reserved words.
A name you give to a statement so that you can talk about that statement elsewhere in the program.
The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and then document what you wrote so you don’t have to answer so many questions about it. Hence, the first great virtue of a programmer. Also hence, this book. See also impatience and hubris.
The preference of the regular expression engine to match the leftmost occurrence of a pattern, then given a position at which a match will occur, the preference for the longest match (presuming the use of a greedy quantifier). See Camel chapter 5, “Pattern Matching” for much more on this subject.
A bit shift that multiplies the number by some power of 2.
Fancy term for a token.
Fancy term for a tokener.
Fancy term for tokenizing.
Looking at your Oxford English Dictionary through a microscope. (Also known as static scoping, because dictionaries don’t change very fast.) Similarly, looking at variables stored in a private dictionary (namespace) for each scope, which are visible only from their point of declaration down to the end of the lexical scope in which they are declared. —Syn. static scoping. —Ant. dynamic scoping.
A variable subject to
lexical scoping, declared by my. Often just called a “lexical”. (The
our declaration declares a lexically scoped name for a global variable,
which is not itself a lexical variable.)
Generally, a collection of procedures. In ancient days, referred to a collection of subroutines in a .pl file. In modern times, refers more often to the entire collection of Perl modules on your system.
Last In, First Out. See also FIFO. A LIFO is usually called a stack.
In Unix, a sequence of zero or more nonnewline characters terminated with a newline character. On non-Unix machines, this is emulated by the C library even if the underlying operating system has different ideas.
A grapheme consisting of either a carriage return followed by a line feed or any character with the Unicode Vertical Space character property.
Used by a standard I/O output stream that flushes its buffer after every newline. Many standard I/O libraries automatically set up line buffering on output that is going to the terminal.
The number of lines read previous to this one, plus 1. Perl
keeps a separate line number for each source or input file it opens. The
current source file’s line number is represented by __LINE__
. The
current input line number (for the file that was most recently read via
<FH>
) is represented by the $.
($INPUT_LINE_NUMBER
)
variable. Many error messages report both values, if available.
Used as a noun, a name in a directory that represents a file. A given file can have multiple links to it. It’s like having the same phone number listed in the phone directory under different names. As a verb, to resolve a partially compiled file’s unresolved symbols into a (nearly) executable image. Linking can generally be static or dynamic, which has nothing to do with static or dynamic scoping.
A syntactic construct representing a
comma- separated list of expressions, evaluated to produce a list value.
Each expression in a LIST
is evaluated in list context and
interpolated into the list value.
An ordered set of scalar values.
The situation in which an expression is
expected by its surroundings (the code calling it) to return a list of
values rather than a single value. Functions that want a LIST
of
arguments tell those arguments that they should produce a list value. See
also context.
An operator that does something with a list of
values, such as join or grep. Usually used for named built-in
operators (such as print, unlink, and system) that do not require
parentheses around their argument list.
An unnamed list of temporary scalar values that may be passed around within a program from any list-generating function to any function or construct that provides a list context.
A token in a programming language, such as a number or string, that gives you an actual value instead of merely representing possible values as a variable does.
From Swift: someone who eats eggs little end first. Also used of computers that store the least significant byte of a word at a lower byte address than the most significant byte. Often considered superior to big-endian machines. See also big-endian.
Not meaning the same thing everywhere. A global
variable in Perl can be localized inside a dynamic scope via the
local operator.
Symbols representing the concepts “and”, “or”, “xor”, and “not”.
An assertion that peeks at the string to the right of the current match location.
An assertion that peeks at the string to the left of the current match location.
A construct that performs something repeatedly, like a roller coaster.
Any statement within the body of a loop that can make a loop prematurely stop looping or skip an iteration. Generally, you shouldn’t try this on roller coasters.
A kind of key or name attached to a loop (or roller coaster) so that loop control statements can talk about which loop they want to control.
In Unicode, not just characters with the General Category of Lowercase Letter, but any character with the Lowercase property, including Modifier Letters, Letter Numbers, some Other Symbols, and one Combining Mark.
Able to serve as an lvalue.
Term used by language lawyers for a
storage location you can assign a new value to, such as a variable or
an element of an array. The “l” is short for “left”, as in the left side
of an assignment, a typical place for lvalues. An lvaluable function or
expression is one to which a value may be assigned, as in pos($x) = 10
.
An adjectival pseudofunction that
warps the meaning of an lvalue in some declarative fashion. Currently
there are three lvalue modifiers: my, our, and local.
Technically speaking, any extra semantics attached to a
variable such as $!
, $0
, %ENV
, or %SIG
, or to any tied
variable. Magical things happen when you diddle those variables.
An increment operator that knows how to bump up ASCII alphabetics as well as numbers.
Special variables that have side
effects when you access them or assign to them. For example, in Perl,
changing elements of the %ENV
array also changes the corresponding
environment variables that subprocesses will use. Reading the $!
variable gives you the current system error number or message.
A file that controls the compilation of a program. Perl programs don’t usually need a Makefile because the Perl compiler has plenty of self-control.
The Unix program that displays online documentation (manual pages) for you.
A “page” from the manuals, typically accessed via the man(1) command. A manpage contains a SYNOPSIS, a DESCRIPTION, a list of BUGS, and so on, and is typically longer than a page. There are manpages documenting commands, syscalls, library functions, devices, protocols, files, and such. In this book, we call any piece of standard Perl documentation (like perlop or perldelta) a manpage, no matter what format it’s installed in on your system.
See pattern matching.
See instance variable.
This always means your main memory, not your disk. Clouding the issue is the fact that your machine may implement virtual memory; that is, it will pretend that it has more memory than it really does, and it’ll use disk space to hold inactive bits. This can make it seem like you have a little more memory than you really do, but it’s not a substitute for real memory. The best thing that can be said about virtual memory is that it lets your performance degrade gradually rather than suddenly when you run out of real memory. But your program can die when you run out of virtual memory, too—if you haven’t thrashed your disk to death first.
A character that is not supposed to be treated normally. Which characters are to be treated specially as metacharacters varies greatly from context to context. Your shell will have certain metacharacters, double-quoted Perl strings have other metacharacters, and regular expression patterns have all the double-quote metacharacters plus some extra ones of their own.
Something we’d call a metacharacter except that it’s a sequence of more than one character. Generally, the first character in the sequence must be a true metacharacter to get the other characters in the metasymbol to misbehave along with it.
A kind of action that an object can take if you tell it to. See Camel chapter 12, “Objects”.
The path Perl takes through @INC
. By default, this is a double depth first
search, once looking for defined methods and once for AUTOLOAD
. However,
Perl lets you configure this with mro
.
A CPAN mirror that includes just the latest versions for each
distribution, probably created with CPAN::Mini
. See
Camel chapter 19, “CPAN”.
The belief that “small is beautiful”. Paradoxically, if you say something in a small language, it turns out big, and if you say it in a big language, it turns out small. Go figure.
In the context of the stat(2) syscall, refers to the field holding the permission bits and the type of the file.
See statement modifier, regular expression, and lvalue, not necessarily in that order.
A file that defines a package of (almost) the same
name, which can either export symbols or function as an object class.
(A module’s main .pm file may also load in other files in support of the
module.) See the use built-in.
An integer divisor when you’re interested in the remainder instead of the quotient.
When you speak one language and the computer thinks you’re
speaking another. You’ll see odd translations when you send UTF‑8, for
instance, but the computer thinks you sent Latin-1, showing all sorts of
weird characters instead. The term is written 「文字化け」in Japanese and
means “character rot”, an apt description. Pronounced [modʑibake
] in
standard IPA phonetics, or approximately “moh-jee-bah-keh”.
Short for one member of Perl mongers, a purveyor of Perl.
A temporary value scheduled to die when the current statement finishes.
See method resolution order.
An array with multiple subscripts for finding a single element. Perl implements these using references—see Camel chapter 9, “Data Structures”.
The features you got from your mother and father, mixed together unpredictably. (See also inheritance and single inheritance.) In computer languages (including Perl), it is the notion that a given class may have multiple direct ancestors or base classes.
A pipe with a name embedded in the filesystem so that it can be accessed by two unrelated processes.
A domain of names. You needn’t worry about whether the names in one such domain have been used in another. See package.
Not a number. The value Perl uses for certain invalid or inexpressible floating-point operations.
The most important attribute of a socket, like your telephone’s telephone number. Typically an IP address. See also port.
A single character that
represents the end of a line, with the ASCII value of 012 octal under Unix
(but 015 on a Mac), and represented by \n
in Perl strings. For Windows
machines writing text files, and for certain physical devices like
terminals, the single newline gets automatically translated by your C
library into a line feed and a carriage return, but normally, no
translation is done.
Network File System, which allows you to mount a remote filesystem as if it were local.
Converting a text string into an alternate but equivalent canonical (or compatible) representation that can then be compared for equivalence. Unicode recognizes four different normalization forms: NFD, NFC, NFKD, and NFKC.
A character with the numeric value of zero. It’s used by C to terminate strings, but Perl allows strings to contain a null.
A list value with zero elements, represented
in Perl by ()
.
A string containing no characters, not to be confused with a string containing a null character, which has a positive length and is true.
The situation in which an expression is expected by its surroundings (the code calling it) to return a number. See also context and string context.
(Sometimes spelled nummification and nummify.) Perl lingo
for implicit conversion into a number; the related verb is numify.
Numification is intended to rhyme with mummification, and numify with
mummify. It is unrelated to English numen, numina, numinous. We
originally forgot the extra m a long time ago, and some people got used to
our funny spelling, and so just as with HTTP_REFERER
’s own missing letter,
our weird spelling has stuck around.
Short for Nevada, no part of which will ever be confused with civilization. NV also means an internal floating- point Numeric Value of the type a scalar can hold, not to be confused with an IV.
Half a byte, equivalent to one hexadecimal digit, and worth four bits.
An instance of a class. Something that “knows” what user-defined type (class) it is, and what it can do because of what class it is. Your program can request an object to do things, but the object gets to decide whether it wants to do them or not. Some objects are more accommodating than others.
A number in base 8. Only the digits 0 through 7 are allowed. Octal
constants in Perl start with 0, as in 013. See also the oct function.
How many things you have to skip over when moving from the beginning of a string or array to a specific position within it. Thus, the minimum offset is zero, not one, because you don’t skip anything to get to the first item.
An entire computer program crammed into one line of text.
Programs for which the source code is freely available and freely redistributable, with no commercial strings attached. For a more detailed definition, see http://www.opensource.org/osd.html.
An expression that yields a value that an operator operates on. See also precedence.
A special program that runs on the bare machine and hides the gory details of managing processes and devices. Usually used in a looser sense to indicate a particular culture of programming. The loose sense can be used at varying levels of specificity. At one extreme, you might say that all versions of Unix and Unix-lookalikes are the same operating system (upsetting many people, especially lawyers and other advocates). At the other extreme, you could say this particular version of this particular vendor’s operating system is different from any other version of this or any other vendor’s operating system. Perl is much more portable across operating systems than many other languages. See also architecture and platform.
A gizmo that transforms some number of input values to some number of output values, often built into a language with a special syntax or symbol. A given operator may have specific expectations about what types of data you give as its arguments (operands) and what type of data you want back from it.
A kind of overloading that you can do on built-in operators to make them work on objects as if the objects were ordinary scalar values, but with the actual semantics supplied by the object class. This is set up with the overload pragma—see Camel chapter 13, “Overloading”.
See either switches or regular expression modifiers.
An abstract character’s integer value. Same thing as codepoint.
Giving additional meanings to a symbol or construct. Actually, all languages do overloading to one extent or another, since people are good at figuring out things from context.
Hiding or invalidating some other definition of the same name. (Not to be confused with overloading, which adds definitions that must be disambiguated some other way.) To confuse the issue further, we use the word with two overloaded definitions: to describe how you can define your own subroutine to hide a built-in function of the same name (see the section “Overriding Built-in Functions” in Camel chapter 11, “Modules”), and to describe how you can define a replacement method in a derived class to hide a base class’s method of the same name (see Camel chapter 12, “Objects”).
The one user (apart from the superuser) who has absolute control over a file. A file may also have a group of users who may exercise joint ownership if the real owner permits it. See permission bits.
A namespace for global variables, subroutines, and the like, such that they can be kept separate from like-named symbols in other namespaces. In a sense, only the package is global, since the symbols in the package’s symbol table are only accessible from code compiled outside the package by naming the package. But in another sense, all package symbols are also globals—they’re just well-organized globals.
Short for scratchpad.
See argument.
See base class.
See syntax tree.
The subtle but sometimes brutal art of attempting to turn your possibly malformed program into a valid syntax tree.
To fix by applying one, as it were. In the realm of hackerdom, a listing of the differences between two versions of a program as might be applied by the patch(1) program when you want to fix a bug or upgrade your old version.
The list of
directories the system searches to find a program you want to
execute. The list is stored as one of your environment variables,
accessible in Perl as $ENV{PATH}
.
A fully qualified filename such as /usr/bin/perl. Sometimes
confused with PATH
.
A template used in pattern matching.
Taking a pattern, usually a regular expression, and trying the pattern various ways on a string to see whether there’s any way to make it fit. Often used to pick interesting tidbits out of a file.
The Perl Authors Upload SErver (http://pause.perl.org), the gateway for modules on their way to CPAN.
A Perl user group, taking the form of its name from the New York Perl mongers, the first Perl user group. Find one near you at http://www.pm.org.
Bits that the owner of a file sets
or unsets to allow or disallow access to other people. These flag bits are
part of the mode word returned by the stat built-in when you ask
about a file. On Unix systems, you can check the ls(1) manpage for more
information.
What you get when you do Perl++
twice. Doing it only once
will curl your hair. You have to increment it eight times to shampoo your
hair. Lather, rinse, iterate.
A direct connection that carries the output of one process to the input of another without an intermediate temporary file. Once the pipe is set up, the two processes in question can read and write as if they were talking to a normal file, with some caveats.
A series of processes all in a row, linked by pipes, where each passes its output stream to the next.
The entire hardware and software context in which a program runs. A program written in a platform-dependent language might break if you change any of the following: machine, operating system, libraries, compiler, or system configuration. The perl interpreter has to be compiled differently for each platform because it is implemented in C, but programs written in the Perl language are largely platform independent.
The markup used to embed documentation into your Perl code. Pod stands for “Plain old documentation”. See Camel chapter 23, “Plain Old Documentation”.
A sequence, such as =head1
, that denotes
the start of a pod section.
A variable in a language like C that contains the exact memory location of some other item. Perl handles pointers internally so you don’t have to worry about them. Instead, you just use symbolic pointers in the form of keys and variable names, or hard references, which aren’t pointers (but act like pointers and do in fact contain pointers).
The notion that you can tell an object to do something generic, and the object will interpret the command in different ways depending on its type. [< Greek πολυ- + μορϕή, many forms.]
The part of the address of a TCP or UDP socket that directs packets to the correct process after finding the right machine, something like the phone extension you give when you reach the company operator. Also the result of converting code to run on a different platform than originally intended, or the verb denoting this conversion.
Once upon a time, C code compilable under both BSD and SysV. In general, code that can be easily converted to run on another platform, where “easily” can be defined however you like, and usually is. Anything may be considered portable if you try hard enough, such as a mobile home or London Bridge.
Someone who “carries” software from one platform to another. Porting programs written in platform-dependent languages such as C can be difficult work, but porting programs like Perl is very much worth the agony.
Said of quantifiers and groups in patterns that refuse to give up anything once they’ve gotten their mitts on it. Catchier and easier to say than the even more formal nonbacktrackable.
The Portable Operating System Interface specification.
An operator that follows its operand, as in
$x++
.
An internal shorthand for a “push- pop” code; that is, C code implementing Perl’s stack machine.
A standard module whose practical hints and suggestions are received (and possibly ignored) at compile time. Pragmas are named in all lowercase.
The rules of conduct that, in the absence of other guidance, determine what should happen first. For example, in the absence of parentheses, you always do multiplication before addition.
An operator that precedes its operand, as in
++$x
.
What some helper process did to transform the incoming data into a form more suitable for the current process. Often done with an incoming pipe. See also C preprocessor.
The author that PAUSE allows to assign co-maintainer permissions to a namespace. A primary maintainer can give up this distinction by assigning it to another PAUSE author. See Camel chapter 19, “CPAN”.
A subroutine.
An instance of a running program. Under multitasking
systems like Unix, two or more separate processes could be running the same
program independently at the same time—in fact, the fork function is
designed to bring about this happy state of affairs. Under other operating
systems, processes are sometimes called “threads”, “tasks”, or “jobs”,
often with slight nuances in meaning.
See script.
A system that algorithmically writes code for you in a high-level language. See also code generator.
Pattern matching matching>that picks up where it left off before.
See either instance variable or character property.
In networking, an agreed-upon way of sending messages back and forth so that neither correspondent will get too confused.
An optional part of a subroutine declaration telling the Perl compiler how many and what flavor of arguments may be passed as actual arguments, so you can write subroutine calls that parse much like built-in functions. (Or don’t parse, as the case may be.)
A construct that sometimes looks like a function but really
isn’t. Usually reserved for lvalue modifiers like my, for context
modifiers like scalar, and for the pick-your-own-quotes constructs,
q//, qq//, qx//, qw//, qr//, m//, s///, y///, and
tr///.
Formerly, a reference to an array whose initial element happens to hold a reference to a hash. You used to be able to treat a pseudohash reference as either an array reference or a hash reference. Pseduohashes are no longer supported.
An operator Xthat looks something like a literal,
such as the output-grabbing operator, <literal
moreinfo="none"`>command
`.
Something not owned by anybody. Perl is copyrighted and is thus not in the public domain—it’s just freely available and freely redistributable.
A notional “baton” handed around the Perl community indicating who is the lead integrator in some arena of development.
A pumpkin holder, the person in charge of pumping the pump, or at least priming it. Must be willing to play the part of the Great Pumpkin now and then.
A “pointer value”, which is Perl
Internals Talk for a char*
.
Possessing a complete name. The symbol $Ent::moot
is
qualified; $moot
is unqualified. A fully qualified filename is specified
from the top-level directory.
A component of a regular expression specifying how many times the foregoing atom may occur.
A race condition exists when the result of several interrelated events depends on the ordering of those events, but that order cannot be guaranteed due to nondeterministic timing effects. If two or more programs, or parts of the same program, try to go through the same series of events, one might interrupt the work of the other. This is a good way to find an exploit.
With respect to files, one that has the proper permission bit set to let you access the file. With respect to computer programs, one that’s written well enough that someone has a chance of figuring out what it’s trying to do.
The last rites performed by a parent process
on behalf of a deceased child process so that it doesn’t remain a
zombie. See the wait and waitpid function calls.
A set of related data values in a file or stream, often associated with a unique key field. In Unix, often commensurate with a line, or a blank-line–terminated set of lines (a “paragraph”). Each line of the /etc/passwd file is a record, keyed on login name, containing information about that user.
The art of defining something (at least partly) in terms of itself, which is a naughty no-no in dictionaries but often works out okay in computer programs if you’re careful not to recurse forever (which is like an infinite loop with more spectacular failure modes).
Where you look to find a pointer to information somewhere else. (See indirection.) References come in two flavors: symbolic references and hard references.
Whatever a reference refers to, which may or may not have a name. Common types of referents include scalars, arrays, hashes, and subroutines.
See regular expression.
A single entity with various
interpretations, like an elephant. To a computer scientist, it’s a grammar
for a little language in which some strings are legal and others aren’t. To
normal people, it’s a pattern you can use to find what you’re looking for
when it varies from case to case. Perl’s regular expressions are far from
regular in the theoretical sense, but in regular use they work quite well.
Here’s a regular expression: /Oh s.*t./
. This will match strings like
“Oh say can you see by the dawn's early light
” and “Oh sit!
”. See
Camel chapter 5, “Pattern Matching”.
An option on a pattern or substitution, such as /i to render the pattern
case- insensitive.
A file that’s not a directory, a
device, a named pipe or socket, or a symbolic link. Perl uses
the –f
file test operator to identify regular files. Sometimes called a
“plain” file.
An operator that says whether a particular ordering relationship is true about a pair of operands. Perl has both numeric and string relational operators. See collating sequence.
A word with a specific, built-in meaning
to a compiler, such as if
or delete. In many languages (not Perl),
it’s illegal to use reserved words to name anything else. (Which is why
they’re reserved, after all.) In Perl, you just can’t use them to name
labels or filehandles. Also called “keywords”.
The value produced by a subroutine or expression when evaluated. In Perl, a return value may be either a list or a scalar.
Request For Comment, which despite the timid connotations is the name of a series of important standards documents.
A bit shift that divides a number by some power of 2.
A name for a concrete set of behaviors. A role is a way to add behavior to a class without inheritance.
The superuser (UID
== 0). Also the top-level directory of
the filesystem.
What you are told when someone thinks you should Read The Fine Manual.
Any time after Perl starts running your main program.
See also compile phase. Run phase is mostly spent in runtime but may
also be spent in compile time when require, do FILE
, or
eval STRING
operators are executed, or when a substitution uses
the /ee modifier.
The time when Perl is actually doing what your code says to do, as opposed to the earlier period of time when it was trying to figure out whether what you said made any sense whatsoever, which is compile time.
A pattern that contains one or more variables to be interpolated before parsing the pattern as a regular expression, and that therefore cannot be analyzed at compile time, but must be reanalyzed each time the pattern match operator is evaluated. Runtime patterns are useful but expensive.
A recreational vehicle, not to be confused with vehicular recreation. RV also means an internal Reference Value of the type a scalar can hold. See also IV and NV if you’re not confused yet.
A value that you might find on the right side of an assignment. See also lvalue.
A walled off area that’s not supposed to affect beyond its walls. You let kids play in the sandbox instead of running in the road. See Camel chapter 20, “Security”.
A simple, singular value; a number, string, or reference.
The situation in which an expression is expected by its surroundings (the code calling it) to return a single value rather than a list of values. See also context and list context. A scalar context sometimes imposes additional constraints on the return value—see string context and numeric context. Sometimes we talk about a Boolean context inside conditionals, but this imposes no additional constraints, since any scalar value, whether numeric or string, is already true or false.
A number or quoted string—an actual value in the text of your program, as opposed to a variable.
A value that happens to be a scalar as opposed to a list.
A variable prefixed with
$
that holds a single value.
From how far away you can see a variable, looking through
one. Perl has two visibility mechanisms. It does dynamic scoping of
local variables, meaning that the rest of the block, and any
subroutines that are called by the rest of the block, can see the
variables that are local to the block. Perl does lexical scoping of
my variables, meaning that the rest of the block can see the variable,
but other subroutines called by the block cannot see the variable.
The area in which a particular invocation of a particular file or subroutine keeps some of its temporary values, including any lexically scoped variables.
A text file that is a program intended to be executed directly rather than compiled to another form of file before execution.
Also, in the context of Unicode, a writing system for a particular language or group of languages, such as Greek, Bengali, or Tengwar.
A cracker who is not a hacker but knows just enough to run canned scripts. A cargo-cult programmer.
A venerable Stream EDitor from which Perl derives some of its ideas.
A fancy kind of interlock that prevents multiple threads or processes from using up the same resources simultaneously.
A character or string that keeps two surrounding strings from being
confused with each other. The split function works on separators. Not to be confused with delimiters
or terminators. The “or” in the previous sentence separated the two
alternatives.
Putting a fancy data structure into linear order so that it can be stored as a string in a disk file or database, or sent through a pipe. Also called marshalling.
In networking, a process that either advertises a service or just hangs around at a known location and waits for clients who need service to get in touch with it.
Something you do for someone else to make them happy,
like giving them the time of day (or of their life). On some machines,
well-known services are listed by the getservent
function.
Same as setuid, only having to do with giving away group privileges.
Said of a program that runs with the privileges of its owner rather than (as is usually the case) the privileges of whoever is running it. Also describes the bit in the mode word (permission bits) that controls the feature. This bit must be explicitly set by the owner to enable this feature, and the program must be carefully written not to give away more privileges than it ought to.
A piece of memory accessible by two different processes who otherwise would not see each other’s memory.
Irish for the whole McGillicuddy. In Perl culture, a
portmanteau of “sharp” and “bang”, meaning the #!
sequence that tells
the system where to find the interpreter.
A command-line interpreter. The program that interactively gives you a prompt, accepts one or more lines of input, and executes the programs you mentioned, feeding each of them their proper arguments and input data. Shells can also execute scripts containing such commands. Under Unix, typical shells include the Bourne shell (/bin/sh), the C shell (/bin/csh), and the Korn shell (/bin/ksh). Perl is not strictly a shell because it’s not interactive (although Perl programs can be interactive).
Something extra that happens when you evaluate an
expression. Nowadays it can refer to almost anything. For example,
evaluating a simple assignment statement typically has the “side effect” of
assigning a value to a variable. (And you thought assigning the value was
your primary intent in the first place!) Likewise, assigning a value to the
special variable $|
($AUTOFLUSH
) has the side effect of forcing a
flush after every write or print on the currently selected
filehandle.
A glyph used in magic. Or, for Perl, the symbol in front
of a variable name, such as $
, @
, and %
.
A bolt out of the blue; that is, an event triggered by the operating system, probably when you’re least expecting it.
A subroutine that, instead of being content to be
called in the normal fashion, sits around waiting for a bolt out of the
blue before it will deign to execute. Under Perl, bolts out of the blue
are called signals, and you send them with the kill built-in. See the
%SIG
hash in Camel chapter 25, “Special Names” and the section “Signals”
in Camel chapter 15, “Interprocess Communication”.
The features you got from your mother, if she told you that you don’t have a father. (See also inheritance and multiple inheritance.) In computer languages, the idea that classes reproduce asexually so that a given class can only have one direct ancestor or base class. Perl supplies no such restriction, though you may certainly program Perl that way if you like.
A selection of any number of elements from a list, array, or hash.
To read an entire file into a string in one operation.
An endpoint for network communication among multiple processes that works much like a telephone or a post office box. The most important thing about a socket is its network address (like a phone number). Different kinds of sockets have different kinds of addresses—some look like filenames, and some don’t.
See symbolic reference.
A special kind of module that does preprocessing on your script just before it gets to the tokener.
A device you can put things on the top of, and later take them back off in the opposite order in which you put them on. See LIFO.
Included in the official Perl distribution, as in a standard module, a standard tool, or a standard Perl manpage.
The default output stream for nasty remarks that don’t belong in
standard output. Represented within a Perl program by the output> filehandle STDERR
. You can use this
stream explicitly, but the die and warn built-ins write to your
standard error stream automatically (unless trapped or otherwise
intercepted).
The default input stream for your program,
which if possible shouldn’t care where its data is coming from. Represented
within a Perl program by the filehandle STDIN
.
A standard C library for doing buffered input
and output to the operating system. (The “standard” of standard I/O is
at most marginally related to the “standard” of standard input and output.)
In general, Perl relies on whatever implementation of standard I/O a given
operating system supplies, so the buffering characteristics of a Perl
program on one machine may not exactly match those on another machine.
Normally this only influences efficiency, not semantics. If your standard
I/O package is doing block buffering and you want it to flush the buffer
more often, just set the $|
variable to a true value.
Everything that comes with the official perl distribution. Some vendor versions of perl change their distributions, leaving out some parts or including extras. See also dual-lived.
The default output stream for your program,
which if possible shouldn’t care where its data is going. Represented
within a Perl program by the filehandle STDOUT
.
A command to the computer about what to do next, like a step in a recipe: “Add marmalade to batter and mix until mixed.” A statement is distinguished from a declaration, which doesn’t tell the computer to do anything, but just to learn something.
A conditional or loop that you put after the statement instead of before, if you know what we mean.
Varying slowly compared to something else. (Unfortunately, everything is relatively stable compared to something else, except for certain elementary particles, and we’re not so sure about them.) In computers, where things are supposed to vary rapidly, “static” has a derogatory connotation, indicating a slightly dysfunctional variable, subroutine, or method. In Perl culture, the word is politely avoided.
If you’re a C or C++ programmer, you might be looking for Perl’s state
keyword.
No such thing. See class method.
No such thing. See lexical scoping.
No such thing. Just use a lexical
variable in a scope larger than your subroutine, or declare it with
state instead of with my.
A special internal spot in which Perl keeps the information about the last file on which you requested information.
The value returned to the
parent process when one of its child processes dies. This value is
placed in the special variable $?
. Its upper eight bits are the exit
status of the defunct process, and its lower eight bits identify the signal
(if any) that the process died from. On Unix systems, this status value is
the same as the status word returned by wait(2). See system in Camel
chapter 27, “Functions”.
See standard error.
See standard input.
See standard I/O.
See standard output.
A flow of data into or out of a process as a steady sequence of bytes or characters, without the appearance of being broken up into packets. This is a kind of interface—the underlying implementation may well break your data up into separate packets for delivery, but this is hidden from you.
A sequence of characters such as “He said !@#*&%@#*?!”. A string does not have to be entirely printable.
The situation in which an expression is expected by its surroundings (the code calling it) to return a string. See also context and numeric context.
The process of producing a string representation of an abstract object.
C keyword introducing a structure definition or name.
See data structure.
See derived class.
A component of a regular expression pattern.
A named or otherwise accessible piece of program that can be invoked from elsewhere in the program in order to accomplish some subgoal of the program. A subroutine is often parameterized to accomplish different but related things depending on its input arguments. If the subroutine returns a meaningful value, it is also called a function.
A value that indicates the position of a particular array element in an array.
Changing parts of a string via the s///
operator. (We avoid use of this term to mean variable interpolation.)
A portion of a string, starting at a certain character position (offset) and proceeding for a certain number of characters.
See base class.
The person whom the operating system will let do almost anything. Typically your system administrator or someone pretending to be your system administrator. On Unix systems, the root user. On Windows systems, usually the Administrator user.
Short for “scalar value”. But
within the Perl interpreter, every referent is treated as a member of a
class derived from SV, in an object-oriented sort of way. Every value
inside Perl is passed around as a C language SV*
pointer. The SV
struct knows its own “referent type”, and the code is smart enough (we
hope) not to try to call a hash function on a subroutine.
An option you give on a command line to influence the way your program works, usually introduced with a minus sign. The word is also used as a nickname for a switch statement.
The combination of multiple command-
line switches (e.g., –a –b –c
) into one switch (e.g., –abc
).
Any switch with an additional argument must be the last switch in a
cluster.
A program technique that lets you
evaluate an expression and then, based on the value of the expression,
do a multiway branch to the appropriate piece of code for that value. Also
called a “case structure”, named after the similar Pascal construct. Most
switch statements in Perl are spelled given
. See “The given
statement” in Camel chapter 4, “Statements and Declarations”.
Generally, any token or metasymbol. Often used more specifically to mean the sort of name you might find in a symbol table.
A program that lets you step through the execution of your program, stopping or printing things out here and there to see whether anything has gone wrong, and, if so, what. The “symbolic” part just means that you can talk to the debugger using the same symbols with which your program is written.
An alternate filename that points to the real filename, which in turn points to the real file. Whenever the operating system is trying to parse a pathname containing a symbolic link, it merely substitutes the new name and continues parsing.
A variable whose value is the
name of another variable or subroutine. By dereferencing the first
variable, you can get at the second one. Symbolic references are illegal
under use strict "refs"
.
Where a compiler remembers symbols. A program like Perl must somehow remember all the names of all the variables, filehandles, and subroutines you’ve used. It does this by placing the names in a symbol table, which is implemented in Perl using a hash table. There is a separate symbol table for each package to give each package its own namespace.
Programming in which the orderly sequence of events can be determined; that is, when things happen one after the other, not at the same time.
An alternative way of writing something more easily; a shortcut.
From Greek σύνταξις, “with-arrangement”. How things (particularly symbols) are put together with each other.
An internal representation of your program wherein lower-level constructs dangle off the higher-level constructs enclosing them.
A function call directly to the operating
system. Many of the important subroutines and functions you use aren’t
direct system calls, but are built up in one or more layers above the
system call level. In general, Perl programmers don’t need to worry about
the distinction. However, if you do happen to know which Perl functions are
really syscalls, you can predict which of these will set the $!
($ERRNO
) variable on failure. Unfortunately, beginning programmers often
confusingly employ the term “system call” to mean what happens when you
call the Perl system function, which actually involves many syscalls. To
avoid any confusion, we nearly always say “syscall” for something you could
call indirectly via Perl’s syscall function, and never for something you
would call with Perl’s system function.
The special bookkeeping Perl does to track the flow of external data through your program and disallow their use in system commands.
Said of data derived from the grubby hands of a user,
and thus unsafe for a secure program to rely on. Perl does taint checks if
you run a setuid (or setgid) program, or if you use the –T
switch.
Running under the –T
switch, marking all external data as
suspect and refusing to use it with system commands. See Camel chapter 20,
“Security”.
Short for Transmission Control Protocol. A protocol wrapped around the Internet Protocol to make an unreliable packet transmission mechanism appear to the application program to be a reliable stream of bytes. (Usually.)
Short for a “terminal”—that is, a leaf node of a syntax tree. A thing that functions grammatically as an operand for the operators in an expression.
A character or string that marks the end of another string. The $/
variable contains the string that terminates a readline operation, which
chomp deletes from the end. Not to be confused with delimiters or
separators. The period at the end of this sentence is a terminator.
An operator taking three operands. Sometimes pronounced trinary.
A string or file containing primarily printable characters.
Like a forked process, but without fork’s inherent memory protection. A thread is lighter weight than a full process, in that a process could have multiple threads running around in it, all fighting over the same process’s memory space unless steps are taken to protect threads from one another.
The bond between a magical variable and its
implementation class. See the tie function in Camel chapter 27,
“Functions” and Camel chapter 14, “Tied Variables”.
The case used for capitals that are followed by lowercase characters instead of by more capitals. Sometimes called sentence case or headline case. English doesn’t use Unicode titlecase, but casing rules for English titles are more complicated than simply capitalizing each word’s first character.
There’s More Than One Way To Do It, the Perl Motto. The notion that there can be more than one valid path to solving a programming problem in context. (This doesn’t mean that more ways are always better or that all possible paths are equally desirable—just that there need not be One True Way.)
A morpheme in a programming language, the smallest unit of text with semantic significance.
A module that breaks a program text into a sequence of tokens for later analysis by a parser.
Splitting up a program text into tokens. Also known as “lexing”, in which case you get “lexemes” instead of tokens.
The notion that, with a complete set of simple tools that work well together, you can build almost anything you want. Which is fine if you’re assembling a tricycle, but if you’re building a defranishizing comboflux regurgalator, you really want your own machine shop in which to build special tools. Perl is sort of a machine shop.
The thing you’re working on. Structures like
while(<>), for
, foreach
, and given
set the topic for
you by assigning to $_
, the default (topic) variable.
To turn one string
representation into another by mapping each character of the source string
to its corresponding character in the result string. Not to be confused
with translation: for example, Greek πολύχρωμος transliterates into
polychromos but translates into many-colored. See the tr///
operator in Camel chapter 5, “Pattern Matching”.
An event that causes a handler to be run.
Not a stellar system with three stars, but an operator taking three operands. Sometimes pronounced ternary.
A venerable typesetting language from which Perl derives
the name of its $%
variable and which is secretly used in the production
of Camel books.
Any scalar value that doesn’t evaluate to 0 or
""
.
Emptying a file of existing
contents, either automatically when opening a file for writing or
explicitly via the truncate function.
See data type and class.
Converting data from one type to another. C permits this. Perl does not need it. Nor want it.
A type definition in the C and C++ languages.
A lexical variable lexical>that is declared with a class
type: my Pony $bill
.
Use of a single identifier, prefixed with *
. For
example, *name
stands for any or all of $name
, @name
, %name
,
&name
, or just name
. How you use it determines whether it is
interpreted as all or only one of them. See “Typeglobs and Filehandles” in
Camel chapter 2, “Bits and Pieces”.
A description of how C types may be transformed to and from Perl types within an extension module written in XS.
User Datagram Protocol, the typical way to send datagrams over the Internet.
A user ID. Often used in the context of file or process ownership.
A mask of those permission bits that should be forced
off when creating files or directories, in order to establish a policy of
whom you’ll ordinarily deny access to. See the umask function.
An operator with only one operand, like !
or
chdir. Unary operators are usually prefix operators; that is, they
precede their operand. The ++
and ––
operators can be either prefix
or postfix. (Their position does change their meanings.)
A character set comprising all the major character sets of the world, more or less. See http://www.unicode.org.
A very large and constantly evolving language with several alternative and largely incompatible syntaxes, in which anyone can define anything any way they choose, and usually do. Speakers of this language think it’s easy to learn because it’s so easily twisted to one’s own ends, but dialectical differences make tribal intercommunication nearly impossible, and travelers are often reduced to a pidgin-like subset of the language. To be universally understood, a Unix shell programmer must spend years of study in the art. Many have abandoned this discipline and now communicate via an Esperanto-like language called Perl.
In ancient times, Unix was also used to refer to some code that a couple of people at Bell Labs wrote to make use of a PDP-7 computer that wasn’t doing much of anything else at the time.
In Unicode, not just characters with the General Category of Uppercase Letter, but any character with the Uppercase property, including some Letter Numbers and Symbols. Not to be confused with titlecase.
An actual piece of data, in contrast to all the variables, references, keys, indices, operators, and whatnot that you need to access the value.
A named storage location that can hold any of various kinds of value, as your program sees fit.
The interpolation of a scalar or array variable into a string.
Said of a function that happily receives an indeterminate number of actual arguments.
Mathematical jargon for a list of scalar values.
Providing the appearance of something without the reality, as in: virtual memory is not real memory. (See also memory.) The opposite of “virtual” is “transparent”, which means providing the reality of something without the appearance, as in: Perl handles the variable-length UTF‑8 character encoding transparently.
A form of scalar context in which an expression is not expected to return any value at all and is evaluated for its side effects alone.
A “version” or “vector” string
specified with a v
followed by a series of decimal integers in dot
notation, for instance, v1.20.300.4000
. Each number turns into a
character with the specified ordinal value. (The v
is optional when
there are at least three integers.)
A message printed to the STDERR
stream to the effect that something might be
wrong but isn’t worth blowing up over. See warn in Camel chapter 27,
“Functions” and the warnings
pragma in Camel chapter 28, “Pragmantic
Modules”.
An expression which, when its value changes, causes a breakpoint in the Perl debugger.
A reference that doesn’t get counted normally. When all the normal references to data disappear, the data disappears. These are useful for circular references that would never disappear otherwise.
A character that moves your cursor but doesn’t otherwise put anything on your screen. Typically refers to any of: space, tab, line feed, carriage return, or form feed. In Unicode, matches many other characters that Unicode considers whitespace, including the ɴ-ʙʀ .
In normal “computerese”, the piece of data of the size most efficiently handled by your computer, typically 32 bits or so, give or take a few powers of 2. In Perl culture, it more often refers to an alphanumeric identifier (including underscores), or to a string of nonwhitespace characters bounded by whitespace or string boundaries.
Your current directory, from
which relative pathnames are interpreted by the operating system. The
operating system knows your current directory because you told it with a
chdir, or because you started out in the place where your parent
process was when you were born.
A program or subroutine that runs some other program or subroutine for you, modifying some of its input or output to better suit your purposes.
What You See Is What You Get. Usually used when something
that appears on the screen matches how it will eventually look, like Perl’s
format declarations. Also used to mean the opposite of magic because
everything works exactly as it appears, as in the three- argument form of
open.
An extraordinarily exported, expeditiously excellent, expressly eXternal Subroutine, executed in existing C or C++ or in an exciting extension language called (exasperatingly) XS.
An external subroutine defined in XS.
Yet Another Compiler Compiler. A parser generator without which Perl probably would not have existed. See the file perly.y in the Perl source distribution.
A subpattern assertion matching the null string between characters.
A process that has died (exited) but
whose parent has not yet received proper notification of its demise by
virtue of having called wait or waitpid. If you fork, you must
clean up after your child processes when they exit; otherwise, the process
table will fill up and your system administrator will Not Be Happy with
you.
Based on the Glossary of Programming Perl, Fourth Edition, by Tom Christiansen, brian d foy, Larry Wall, & Jon Orwant. Copyright (c) 2000, 1996, 1991, 2012 O'Reilly Media, Inc. This document may be distributed under the same terms as Perl itself.
- You can refer to this document in Pod via "L<perlgpl>"
- Or you can see this document by entering "perldoc perlgpl"
Perl is free software; you can redistribute it and/or modify it under the terms of either:
- a) the GNU General Public License as published by the Free
- Software Foundation; either version 1, or (at your option) any
- later version, or
- b) the "Artistic License" which comes with this Kit.
This is the "GNU General Public License, version 1". It's here so that modules, programs, etc., that want to declare this as their distribution license can link to it.
For the Perl Artistic License, see perlartistic.
- GNU GENERAL PUBLIC LICENSE
- Version 1, February 1989
- Copyright (C) 1989 Free Software Foundation, Inc.
- 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- Everyone is permitted to copy and distribute verbatim copies
- of this license document, but changing it is not allowed.
- Preamble
- The license agreements of most software companies try to keep users
- at the mercy of those companies. By contrast, our General Public
- License is intended to guarantee your freedom to share and change free
- software--to make sure the software is free for all its users. The
- General Public License applies to the Free Software Foundation's
- software and to any other program whose authors commit to using it.
- You can use it for your programs, too.
- When we speak of free software, we are referring to freedom, not
- price. Specifically, the General Public License is designed to make
- sure that you have the freedom to give away or sell copies of free
- software, that you receive source code or can get it if you want it,
- that you can change the software or use pieces of it in new free
- programs; and that you know you can do these things.
- To protect your rights, we need to make restrictions that forbid
- anyone to deny you these rights or to ask you to surrender the rights.
- These restrictions translate to certain responsibilities for you if you
- distribute copies of the software, or if you modify it.
- For example, if you distribute copies of a such a program, whether
- gratis or for a fee, you must give the recipients all the rights that
- you have. You must make sure that they, too, receive or can get the
- source code. And you must tell them their rights.
- We protect your rights with two steps: (1) copyright the software, and
- (2) offer you this license which gives you legal permission to copy,
- distribute and/or modify the software.
- Also, for each author's protection and ours, we want to make certain
- that everyone understands that there is no warranty for this free
- software. If the software is modified by someone else and passed on, we
- want its recipients to know that what they have is not the original, so
- that any problems introduced by others will not reflect on the original
- authors' reputations.
- The precise terms and conditions for copying, distribution and
- modification follow.
- GNU GENERAL PUBLIC LICENSE
- TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
- 0. This License Agreement applies to any program or other work which
- contains a notice placed by the copyright holder saying it may be
- distributed under the terms of this General Public License. The
- "Program", below, refers to any such program or work, and a "work based
- on the Program" means either the Program or any work containing the
- Program or a portion of it, either verbatim or with modifications. Each
- licensee is addressed as "you".
- 1. You may copy and distribute verbatim copies of the Program's source
- code as you receive it, in any medium, provided that you conspicuously and
- appropriately publish on each copy an appropriate copyright notice and
- disclaimer of warranty; keep intact all the notices that refer to this
- General Public License and to the absence of any warranty; and give any
- other recipients of the Program a copy of this General Public License
- along with the Program. You may charge a fee for the physical act of
- transferring a copy.
- 2. You may modify your copy or copies of the Program or any portion of
- it, and copy and distribute such modifications under the terms of Paragraph
- 1 above, provided that you also do the following:
- a) cause the modified files to carry prominent notices stating that
- you changed the files and the date of any change; and
- b) cause the whole of any work that you distribute or publish, that
- in whole or in part contains the Program or any part thereof, either
- with or without modifications, to be licensed at no charge to all
- third parties under the terms of this General Public License (except
- that you may choose to grant warranty protection to some or all
- third parties, at your option).
- c) If the modified program normally reads commands interactively when
- run, you must cause it, when started running for such interactive use
- in the simplest and most usual way, to print or display an
- announcement including an appropriate copyright notice and a notice
- that there is no warranty (or else, saying that you provide a
- warranty) and that users may redistribute the program under these
- conditions, and telling the user how to view a copy of this General
- Public License.
- d) You may charge a fee for the physical act of transferring a
- copy, and you may at your option offer warranty protection in
- exchange for a fee.
- Mere aggregation of another independent work with the Program (or its
- derivative) on a volume of a storage or distribution medium does not bring
- the other work under the scope of these terms.
- 3. You may copy and distribute the Program (or a portion or derivative of
- it, under Paragraph 2) in object code or executable form under the terms of
- Paragraphs 1 and 2 above provided that you also do one of the following:
- a) accompany it with the complete corresponding machine-readable
- source code, which must be distributed under the terms of
- Paragraphs 1 and 2 above; or,
- b) accompany it with a written offer, valid for at least three
- years, to give any third party free (except for a nominal charge
- for the cost of distribution) a complete machine-readable copy of the
- corresponding source code, to be distributed under the terms of
- Paragraphs 1 and 2 above; or,
- c) accompany it with the information you received as to where the
- corresponding source code may be obtained. (This alternative is
- allowed only for noncommercial distribution and only if you
- received the program in object code or executable form alone.)
- Source code for a work means the preferred form of the work for making
- modifications to it. For an executable file, complete source code means
- all the source code for all modules it contains; but, as a special
- exception, it need not include source code for modules which are standard
- libraries that accompany the operating system on which the executable
- file runs, or for standard header files or definitions files that
- accompany that operating system.
- 4. You may not copy, modify, sublicense, distribute or transfer the
- Program except as expressly provided under this General Public License.
- Any attempt otherwise to copy, modify, sublicense, distribute or transfer
- the Program is void, and will automatically terminate your rights to use
- the Program under this License. However, parties who have received
- copies, or rights to use copies, from you under this General Public
- License will not have their licenses terminated so long as such parties
- remain in full compliance.
- 5. By copying, distributing or modifying the Program (or any work based
- on the Program) you indicate your acceptance of this license to do so,
- and all its terms and conditions.
- 6. Each time you redistribute the Program (or any work based on the
- Program), the recipient automatically receives a license from the original
- licensor to copy, distribute or modify the Program subject to these
- terms and conditions. You may not impose any further restrictions on the
- recipients' exercise of the rights granted herein.
- 7. The Free Software Foundation may publish revised and/or new versions
- of the General Public License from time to time. Such new versions will
- be similar in spirit to the present version, but may differ in detail to
- address new problems or concerns.
- Each version is given a distinguishing version number. If the Program
- specifies a version number of the license which applies to it and "any
- later version", you have the option of following the terms and conditions
- either of that version or of any later version published by the Free
- Software Foundation. If the Program does not specify a version number of
- the license, you may choose any version ever published by the Free Software
- Foundation.
- 8. If you wish to incorporate parts of the Program into other free
- programs whose distribution conditions are different, write to the author
- to ask for permission. For software which is copyrighted by the Free
- Software Foundation, write to the Free Software Foundation; we sometimes
- make exceptions for this. Our decision will be guided by the two goals
- of preserving the free status of all derivatives of our free software and
- of promoting the sharing and reuse of software generally.
- NO WARRANTY
- 9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
- FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
- OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
- PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
- OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
- MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
- TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
- PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
- REPAIR OR CORRECTION.
- 10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
- WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
- REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
- INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
- OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
- TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
- YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
- PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
- POSSIBILITY OF SUCH DAMAGES.
- END OF TERMS AND CONDITIONS
- Appendix: How to Apply These Terms to Your New Programs
- If you develop a new program, and you want it to be of the greatest
- possible use to humanity, the best way to achieve this is to make it
- free software which everyone can redistribute and change under these
- terms.
- To do so, attach the following notices to the program. It is safest to
- attach them to the start of each source file to most effectively convey
- the exclusion of warranty; and each file should have at least the
- "copyright" line and a pointer to where the full notice is found.
- <one line to give the program's name and a brief idea of what it does.>
- Copyright (C) 19yy <name of author>
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 1, or (at your option)
- any later version.
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA
- 02110-1301 USA
- Also add information on how to contact you by electronic and paper mail.
- If the program is interactive, make it output a short notice like this
- when it starts in an interactive mode:
- Gnomovision version 69, Copyright (C) 19xx name of author
- Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type 'show w'.
- This is free software, and you are welcome to redistribute it
- under certain conditions; type 'show c' for details.
- The hypothetical commands 'show w' and 'show c' should show the
- appropriate parts of the General Public License. Of course, the
- commands you use may be called something other than 'show w' and 'show
- c'; they could even be mouse-clicks or menu items--whatever suits your
- program.
- You should also get your employer (if you work as a programmer) or your
- school, if any, to sign a "copyright disclaimer" for the program, if
- necessary. Here a sample; alter the names:
- Yoyodyne, Inc., hereby disclaims all copyright interest in the
- program 'Gnomovision' (a program to direct compilers to make passes
- at assemblers) written by James Hacker.
- <signature of Ty Coon>, 1 April 1989
- Ty Coon, President of Vice
- That's all there is to it!
perlguts - Introduction to the Perl API
This document attempts to describe how to use the Perl API, as well as to provide some info on the basic workings of the Perl core. It is far from complete and probably contains many errors. Please refer any questions or comments to the author below.
Perl has three typedefs that handle Perl's three main data types:
- SV Scalar Value
- AV Array Value
- HV Hash Value
Each typedef has specific routines that manipulate the various data types.
Perl uses a special typedef IV which is a simple signed integer type that is guaranteed to be large enough to hold a pointer (as well as an integer). Additionally, there is the UV, which is simply an unsigned IV.
Perl also uses two special typedefs, I32 and I16, which will always be at least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, as well.) They will usually be exactly 32 and 16 bits long, but on Crays they will both be 64 bits.
An SV can be created and loaded with one command. There are five types of values that can be loaded: an integer value (IV), an unsigned integer value (UV), a double (NV), a string (PV), and another scalar (SV). ("PV" stands for "Pointer Value". You might think that it is misnamed because it is described as pointing only to strings. However, it is possible to have it point to other things. For example, inversion lists, used in regular expression data structures, are scalars, each consisting of an array of UVs which are accessed through PVs. But, using it for non-strings requires care, as the underlying assumption of much of the internals is that PVs are just for strings. Often, for example, a trailing NUL is tacked on automatically. The non-string use is documented only in this paragraph.)
The seven routines are:
- SV* newSViv(IV);
- SV* newSVuv(UV);
- SV* newSVnv(double);
- SV* newSVpv(const char*, STRLEN);
- SV* newSVpvn(const char*, STRLEN);
- SV* newSVpvf(const char*, ...);
- SV* newSVsv(SV*);
STRLEN
is an integer type (Size_t, usually defined as size_t in
config.h) guaranteed to be large enough to represent the size of
any string that perl can handle.
In the unlikely case of a SV requiring more complex initialisation, you
can create an empty SV with newSV(len). If len
is 0 an empty SV of
type NULL is returned, else an SV of type PV is returned with len + 1 (for
the NUL) bytes of storage allocated, accessible via SvPVX. In both cases
the SV has the undef value.
- SV *sv = newSV(0); /* no storage allocated */
- SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage
- * allocated */
To change the value of an already-existing SV, there are eight routines:
- void sv_setiv(SV*, IV);
- void sv_setuv(SV*, UV);
- void sv_setnv(SV*, double);
- void sv_setpv(SV*, const char*);
- void sv_setpvn(SV*, const char*, STRLEN)
- void sv_setpvf(SV*, const char*, ...);
- void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *,
- SV **, I32, bool *);
- void sv_setsv(SV*, SV*);
Notice that you can choose to specify the length of the string to be
assigned by using sv_setpvn
, newSVpvn
, or newSVpv
, or you may
allow Perl to calculate the length by using sv_setpv
or by specifying
0 as the second argument to newSVpv
. Be warned, though, that Perl will
determine the string's length by using strlen
, which depends on the
string terminating with a NUL character, and not otherwise containing
NULs.
The arguments of sv_setpvf
are processed like sprintf, and the
formatted output becomes the value.
sv_vsetpvfn
is an analogue of vsprintf
, but it allows you to specify
either a pointer to a variable argument list or the address and length of
an array of SVs. The last argument points to a boolean; on return, if that
boolean is true, then locale-specific information has been used to format
the string, and the string's contents are therefore untrustworthy (see
perlsec). This pointer may be NULL if that information is not
important. Note that this function requires you to specify the length of
the format.
The sv_set*()
functions are not generic enough to operate on values
that have "magic". See Magic Virtual Tables later in this document.
All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and corruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typically add a trailing NUL for this reason. Nevertheless, you should be very careful when you pass a string stored in an SV to a C function or system call.
To access the actual value that an SV points to, you can use the macros:
- SvIV(SV*)
- SvUV(SV*)
- SvNV(SV*)
- SvPV(SV*, STRLEN len)
- SvPV_nolen(SV*)
which will automatically coerce the actual scalar type into an IV, UV, double, or string.
In the SvPV
macro, the length of the string returned is placed into the
variable len
(this is a macro, so you do not use &len
). If you do
not care what the length of the data is, use the SvPV_nolen
macro.
Historically the SvPV
macro with the global variable PL_na
has been
used in this case. But that can be quite inefficient because PL_na
must
be accessed in thread-local storage in threaded Perl. In any case, remember
that Perl allows arbitrary strings of data that may both contain NULs and
might not be terminated by a NUL.
Also remember that C doesn't allow you to safely say foo(SvPV(s, len),
len);. It might work with your compiler, but it won't work for everyone.
Break this sort of statement up into separate assignments:
- SV *s;
- STRLEN len;
- char *ptr;
- ptr = SvPV(s, len);
- foo(ptr, len);
If you want to know if the scalar value is TRUE, you can use:
- SvTRUE(SV*)
Although Perl will automatically grow strings for you, if you need to force Perl to allocate more memory for your SV, you can use the macro
- SvGROW(SV*, STRLEN newlen)
which will determine if more memory needs to be allocated. If so, it will
call the function sv_grow
. Note that SvGROW
can only increase, not
decrease, the allocated memory of an SV and that it does not automatically
add space for the trailing NUL byte (perl's own string functions typically do
SvGROW(sv, len + 1)
).
If you have an SV and want to know what kind of data Perl thinks is stored in it, you can use the following macros to check the type of SV you have.
- SvIOK(SV*)
- SvNOK(SV*)
- SvPOK(SV*)
You can get and set the current length of the string stored in an SV with the following macros:
- SvCUR(SV*)
- SvCUR_set(SV*, I32 val)
You can also get a pointer to the end of the string stored in the SV with the macro:
- SvEND(SV*)
But note that these last three macros are valid only if SvPOK()
is true.
If you want to append something to the end of string stored in an SV*
,
you can use the following functions:
- void sv_catpv(SV*, const char*);
- void sv_catpvn(SV*, const char*, STRLEN);
- void sv_catpvf(SV*, const char*, ...);
- void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **,
- I32, bool);
- void sv_catsv(SV*, SV*);
The first function calculates the length of the string to be appended by
using strlen
. In the second, you specify the length of the string
yourself. The third function processes its arguments like sprintf and
appends the formatted output. The fourth function works like vsprintf
.
You can specify the address and length of an array of SVs instead of the
va_list argument. The fifth function extends the string stored in the first
SV with the string stored in the second SV. It also forces the second SV
to be interpreted as a string.
The sv_cat*()
functions are not generic enough to operate on values that
have "magic". See Magic Virtual Tables later in this document.
If you know the name of a scalar variable, you can get a pointer to its SV by using the following:
- SV* get_sv("package::varname", 0);
This returns NULL if the variable does not exist.
If you want to know if this variable (or any other SV) is actually defined,
you can call:
- SvOK(SV*)
The scalar undef value is stored in an SV instance called PL_sv_undef
.
Its address can be used whenever an SV*
is needed. Make sure that
you don't try to compare a random sv with &PL_sv_undef
. For example
when interfacing Perl code, it'll work correctly for:
- foo(undef);
But won't work when called as:
- $x = undef;
- foo($x);
So to repeat always use SvOK() to check whether an sv is defined.
Also you have to be careful when using &PL_sv_undef
as a value in
AVs or HVs (see AVs, HVs and undefined values).
There are also the two values PL_sv_yes
and PL_sv_no
, which contain
boolean TRUE and FALSE values, respectively. Like PL_sv_undef
, their
addresses can be used whenever an SV*
is needed.
Do not be fooled into thinking that (SV *) 0 is the same as &PL_sv_undef
.
Take this code:
- SV* sv = (SV*) 0;
- if (I-am-to-return-a-real-value) {
- sv = sv_2mortal(newSViv(42));
- }
- sv_setsv(ST(0), sv);
This code tries to return a new SV (which contains the value 42) if it should
return a real value, or undef otherwise. Instead it has returned a NULL
pointer which, somewhere down the line, will cause a segmentation violation,
bus error, or just weird results. Change the zero to &PL_sv_undef
in the
first line and all will be well.
To free an SV that you've created, call SvREFCNT_dec(SV*)
. Normally this
call is not necessary (see Reference Counts and Mortality).
Perl provides the function sv_chop
to efficiently remove characters
from the beginning of a string; you give it an SV and a pointer to
somewhere inside the PV, and it discards everything before the
pointer. The efficiency comes by means of a little hack: instead of
actually removing the characters, sv_chop
sets the flag OOK
(offset OK) to signal to other functions that the offset hack is in
effect, and it puts the number of bytes chopped off into the IV field
of the SV. It then moves the PV pointer (called SvPVX
) forward that
many bytes, and adjusts SvCUR
and SvLEN
.
Hence, at this point, the start of the buffer that we allocated lives
at SvPVX(sv) - SvIV(sv)
in memory and the PV pointer is pointing
into the middle of this allocated storage.
This is best demonstrated by example:
- % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
- SV = PVIV(0x8128450) at 0x81340f0
- REFCNT = 1
- FLAGS = (POK,OOK,pPOK)
- IV = 1 (OFFSET)
- PV = 0x8135781 ( "1" . ) "2345"\0
- CUR = 4
- LEN = 5
Here the number of bytes chopped off (1) is put into IV, and
Devel::Peek::Dump
helpfully reminds us that this is an offset. The
portion of the string between the "real" and the "fake" beginnings is
shown in parentheses, and the values of SvCUR
and SvLEN
reflect
the fake beginning, not the real one.
Something similar to the offset hack is performed on AVs to enable
efficient shifting and splicing off the beginning of the array; while
AvARRAY
points to the first element in the array that is visible from
Perl, AvALLOC
points to the real start of the C array. These are
usually the same, but a shift operation can be carried out by
increasing AvARRAY
by one and decreasing AvFILL
and AvMAX
.
Again, the location of the real start of the C array only comes into
play when freeing the array. See av_shift
in av.c.
Recall that the usual method of determining the type of scalar you have is
to use Sv*OK
macros. Because a scalar can be both a number and a string,
usually these macros will always return TRUE and calling the Sv*V
macros will do the appropriate conversion of string to integer/double or
integer/double to string.
If you really need to know if you have an integer, double, or string pointer in an SV, you can use the following three macros instead:
- SvIOKp(SV*)
- SvNOKp(SV*)
- SvPOKp(SV*)
These will tell you if you truly have an integer, double, or string pointer stored in your SV. The "p" stands for private.
There are various ways in which the private and public flags may differ. For example, a tied SV may have a valid underlying value in the IV slot (so SvIOKp is true), but the data should be accessed via the FETCH routine rather than directly, so SvIOK is false. Another is when numeric conversion has occurred and precision has been lost: only the private flag is set on 'lossy' values. So when an NV is converted to an IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
In general, though, it's best to use the Sv*V
macros.
There are two ways to create and load an AV. The first method creates an empty AV:
- AV* newAV();
The second method both creates the AV and initially populates it with SVs:
- AV* av_make(I32 num, SV **ptr);
The second argument points to an array containing num
SV*
's. Once the
AV has been created, the SVs can be destroyed, if so desired.
Once the AV has been created, the following operations are possible on it:
- void av_push(AV*, SV*);
- SV* av_pop(AV*);
- SV* av_shift(AV*);
- void av_unshift(AV*, I32 num);
These should be familiar operations, with the exception of av_unshift
.
This routine adds num
elements at the front of the array with the undef
value. You must then use av_store
(described below) to assign values
to these new elements.
Here are some other functions:
- I32 av_top_index(AV*);
- SV** av_fetch(AV*, I32 key, I32 lval);
- SV** av_store(AV*, I32 key, SV* val);
The av_top_index
function returns the highest index value in an array (just
like $#array in Perl). If the array is empty, -1 is returned. The
av_fetch
function returns the value at index key
, but if lval
is non-zero, then av_fetch
will store an undef value at that index.
The av_store
function stores the value val
at index key
, and does
not increment the reference count of val
. Thus the caller is responsible
for taking care of that, and if av_store
returns NULL, the caller will
have to decrement the reference count to avoid a memory leak. Note that
av_fetch
and av_store
both return SV**
's, not SV*
's as their
return value.
A few more:
- void av_clear(AV*);
- void av_undef(AV*);
- void av_extend(AV*, I32 key);
The av_clear
function deletes all the elements in the AV* array, but
does not actually delete the array itself. The av_undef
function will
delete all the elements in the array plus the array itself. The
av_extend
function extends the array so that it contains at least key+1
elements. If key+1
is less than the currently allocated length of the array,
then nothing is done.
If you know the name of an array variable, you can get a pointer to its AV by using the following:
- AV* get_av("package::varname", 0);
This returns NULL if the variable does not exist.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use the array access functions on tied arrays.
To create an HV, you use the following routine:
- HV* newHV();
Once the HV has been created, the following operations are possible on it:
- SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
- SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
The klen
parameter is the length of the key being passed in (Note that
you cannot pass 0 in as a value of klen
to tell Perl to measure the
length of the key). The val
argument contains the SV pointer to the
scalar being stored, and hash
is the precomputed hash value (zero if
you want hv_store
to calculate it for you). The lval
parameter
indicates whether this fetch is actually a part of a store operation, in
which case a new undefined value will be added to the HV with the supplied
key and hv_fetch
will return as if the value had already existed.
Remember that hv_store
and hv_fetch
return SV**
's and not just
SV*
. To access the scalar value, you must first dereference the return
value. However, you should check to make sure that the return value is
not NULL before dereferencing it.
The first of these two functions checks if a hash table entry exists, and the second deletes it.
- bool hv_exists(HV*, const char* key, U32 klen);
- SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
If flags
does not include the G_DISCARD
flag then hv_delete
will
create and return a mortal copy of the deleted value.
And more miscellaneous functions:
- void hv_clear(HV*);
- void hv_undef(HV*);
Like their AV counterparts, hv_clear
deletes all the entries in the hash
table but does not actually delete the hash table. The hv_undef
deletes
both the entries and the hash table itself.
Perl keeps the actual data in a linked list of structures with a typedef of HE.
These contain the actual key and value pointers (plus extra administrative
overhead). The key is a string pointer; the value is an SV*
. However,
once you have an HE*
, to get the actual key and value, use the routines
specified below.
- I32 hv_iterinit(HV*);
- /* Prepares starting point to traverse hash table */
- HE* hv_iternext(HV*);
- /* Get the next entry, and return a pointer to a
- structure that has both the key and value */
- char* hv_iterkey(HE* entry, I32* retlen);
- /* Get the key from an HE structure and also return
- the length of the key string */
- SV* hv_iterval(HV*, HE* entry);
- /* Return an SV pointer to the value of the HE
- structure */
- SV* hv_iternextsv(HV*, char** key, I32* retlen);
- /* This convenience routine combines hv_iternext,
- hv_iterkey, and hv_iterval. The key and retlen
- arguments are return values for the key and its
- length. The value is returned in the SV* argument */
If you know the name of a hash variable, you can get a pointer to its HV by using the following:
- HV* get_hv("package::varname", 0);
This returns NULL if the variable does not exist.
The hash algorithm is defined in the PERL_HASH
macro:
- PERL_HASH(hash, key, klen)
The exact implementation of this macro varies by architecture and version of perl, and the return value may change per invocation, so the value is only valid for the duration of a single perl process.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use the hash access functions on tied hashes.
Beginning with version 5.004, the following functions are also supported:
- HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
- HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
- bool hv_exists_ent (HV* tb, SV* key, U32 hash);
- SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
- SV* hv_iterkeysv (HE* entry);
Note that these functions take SV*
keys, which simplifies writing
of extension code that deals with hash structures. These functions
also allow passing of SV*
keys to tie functions without forcing
you to stringify the keys (unlike the previous set of functions).
They also return and accept whole hash entries (HE*
), making their
use more efficient (since the hash number for a particular string
doesn't have to be recomputed every time). See perlapi for detailed
descriptions.
The following macros must always be used to access the contents of hash entries. Note that the arguments to these macros must be simple variables, since they may get evaluated more than once. See perlapi for detailed descriptions of these macros.
- HePV(HE* he, STRLEN len)
- HeVAL(HE* he)
- HeHASH(HE* he)
- HeSVKEY(HE* he)
- HeSVKEY_force(HE* he)
- HeSVKEY_set(HE* he, SV* sv)
These two lower level macros are defined, but must only be used when
dealing with keys that are not SV*
s:
- HeKEY(HE* he)
- HeKLEN(HE* he)
Note that both hv_store
and hv_store_ent
do not increment the
reference count of the stored val
, which is the caller's responsibility.
If these functions return a NULL value, the caller will usually have to
decrement the reference count of val
to avoid a memory leak.
Sometimes you have to store undefined values in AVs or HVs. Although
this may be a rare case, it can be tricky. That's because you're
used to using &PL_sv_undef
if you need an undefined SV.
For example, intuition tells you that this XS code:
- AV *av = newAV();
- av_store( av, 0, &PL_sv_undef );
is equivalent to this Perl code:
Unfortunately, this isn't true. AVs use &PL_sv_undef
as a marker
for indicating that an array element has not yet been initialized.
Thus, exists $av[0]
would be true for the above Perl code, but
false for the array generated by the XS code.
Other problems can occur when storing &PL_sv_undef
in HVs:
- hv_store( hv, "key", 3, &PL_sv_undef, 0 );
This will indeed make the value undef, but if you try to modify
the value of key
, you'll get the following error:
- Modification of non-creatable hash value attempted
In perl 5.8.0, &PL_sv_undef
was also used to mark placeholders
in restricted hashes. This caused such hash entries not to appear
when iterating over the hash or when checking for the keys
with the hv_exists
function.
You can run into similar problems when you store &PL_sv_yes
or
&PL_sv_no
into AVs or HVs. Trying to modify such elements
will give you the following error:
- Modification of a read-only value attempted
To make a long story short, you can use the special variables
&PL_sv_undef
, &PL_sv_yes
and &PL_sv_no
with AVs and
HVs, but you have to make sure you know what you're doing.
Generally, if you want to store an undefined value in an AV
or HV, you should not use &PL_sv_undef
, but rather create a
new undefined value using the newSV
function, for example:
- av_store( av, 42, newSV(0) );
- hv_store( hv, "foo", 3, newSV(0), 0 );
References are a special type of scalar that point to other data types (including other references).
To create a reference, use either of the following functions:
- SV* newRV_inc((SV*) thing);
- SV* newRV_noinc((SV*) thing);
The thing
argument can be any of an SV*
, AV*
, or HV*
. The
functions are identical except that newRV_inc
increments the reference
count of the thing
, while newRV_noinc
does not. For historical
reasons, newRV
is a synonym for newRV_inc
.
Once you have a reference, you can use the following macro to dereference the reference:
- SvRV(SV*)
then call the appropriate routines, casting the returned SV*
to either an
AV*
or HV*
, if required.
To determine if an SV is a reference, you can use the following macro:
- SvROK(SV*)
To discover what type of value the reference refers to, use the following macro and then check the return value.
- SvTYPE(SvRV(SV*))
The most useful types that will be returned are:
- < SVt_PVAV Scalar
- SVt_PVAV Array
- SVt_PVHV Hash
- SVt_PVCV Code
- SVt_PVGV Glob (possibly a file handle)
See svtype in perlapi for more details.
References are also used to support object-oriented programming. In perl's OO lexicon, an object is simply a reference that has been blessed into a package (or class). Once blessed, the programmer may now use the reference to access the various methods in the class.
A reference can be blessed into a package with the following function:
- SV* sv_bless(SV* sv, HV* stash);
The sv
argument must be a reference value. The stash
argument
specifies which class the reference will belong to. See
Stashes and Globs for information on converting class names into stashes.
/* Still under construction */
The following function upgrades rv to reference if not already one.
Creates a new SV for rv to point to. If classname
is non-null, the SV
is blessed into the specified class. SV is returned.
- SV* newSVrv(SV* rv, const char* classname);
The following three functions copy integer, unsigned integer or double
into an SV whose reference is rv
. SV is blessed if classname
is
non-null.
- SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
- SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
- SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
The following function copies the pointer value (the address, not the
string!) into an SV whose reference is rv. SV is blessed if classname
is non-null.
- SV* sv_setref_pv(SV* rv, const char* classname, void* pv);
The following function copies a string into an SV whose reference is rv
.
Set length to 0 to let Perl calculate the string length. SV is blessed if
classname
is non-null.
- SV* sv_setref_pvn(SV* rv, const char* classname, char* pv,
- STRLEN length);
The following function tests whether the SV is blessed into the specified class. It does not check inheritance relationships.
- int sv_isa(SV* sv, const char* name);
The following function tests whether the SV is a reference to a blessed object.
- int sv_isobject(SV* sv);
The following function tests whether the SV is derived from the specified
class. SV can be either a reference to a blessed object or a string
containing a class name. This is the function implementing the
UNIVERSAL::isa
functionality.
- bool sv_derived_from(SV* sv, const char* name);
To check if you've got an object derived from a specific class you have to write:
- if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
To create a new Perl variable with an undef value which can be accessed from your Perl script, use the following routines, depending on the variable type.
- SV* get_sv("package::varname", GV_ADD);
- AV* get_av("package::varname", GV_ADD);
- HV* get_hv("package::varname", GV_ADD);
Notice the use of GV_ADD as the second parameter. The new variable can now be set, using the routines appropriate to the data type.
There are additional macros whose values may be bitwise OR'ed with the
GV_ADD
argument to enable certain extra features. Those bits are:
Marks the variable as multiply defined, thus preventing the:
- Name <varname> used only once: possible typo
warning.
Issues the warning:
- Had to create <varname> unexpectedly
if the variable did not exist before the function was called.
If you do not specify a package name, the variable is created in the current package.
Perl uses a reference count-driven garbage collection mechanism. SVs, AVs, or HVs (xV for short in the following) start their life with a reference count of 1. If the reference count of an xV ever drops to 0, then it will be destroyed and its memory made available for reuse.
This normally doesn't happen at the Perl level unless a variable is undef'ed or the last variable holding a reference to it is changed or overwritten. At the internal level, however, reference counts can be manipulated with the following macros:
- int SvREFCNT(SV* sv);
- SV* SvREFCNT_inc(SV* sv);
- void SvREFCNT_dec(SV* sv);
However, there is one other function which manipulates the reference
count of its argument. The newRV_inc
function, you will recall,
creates a reference to the specified argument. As a side effect,
it increments the argument's reference count. If this is not what
you want, use newRV_noinc
instead.
For example, imagine you want to return a reference from an XSUB function.
Inside the XSUB routine, you create an SV which initially has a reference
count of one. Then you call newRV_inc
, passing it the just-created SV.
This returns the reference as a new SV, but the reference count of the
SV you passed to newRV_inc
has been incremented to two. Now you
return the reference from the XSUB routine and forget about the SV.
But Perl hasn't! Whenever the returned reference is destroyed, the
reference count of the original SV is decreased to one and nothing happens.
The SV will hang around without any way to access it until Perl itself
terminates. This is a memory leak.
The correct procedure, then, is to use newRV_noinc
instead of
newRV_inc
. Then, if and when the last reference is destroyed,
the reference count of the SV will go to zero and it will be destroyed,
stopping any memory leak.
There are some convenience functions available that can help with the destruction of xVs. These functions introduce the concept of "mortality". An xV that is mortal has had its reference count marked to be decremented, but not actually decremented, until "a short time later". Generally the term "short time later" means a single Perl statement, such as a call to an XSUB function. The actual determinant for when mortal xVs have their reference count decremented depends on two macros, SAVETMPS and FREETMPS. See perlcall and perlxs for more details on these macros.
"Mortalization" then is at its simplest a deferred SvREFCNT_dec
.
However, if you mortalize a variable twice, the reference count will
later be decremented twice.
"Mortal" SVs are mainly used for SVs that are placed on perl's stack. For example an SV which is created just to pass a number to a called sub is made mortal to have it cleaned up automatically when it's popped off the stack. Similarly, results returned by XSUBs (which are pushed on the stack) are often made mortal.
To create a mortal variable, use the functions:
- SV* sv_newmortal()
- SV* sv_2mortal(SV*)
- SV* sv_mortalcopy(SV*)
The first call creates a mortal SV (with no value), the second converts an existing
SV to a mortal SV (and thus defers a call to SvREFCNT_dec
), and the
third creates a mortal copy of an existing SV.
Because sv_newmortal
gives the new SV no value, it must normally be given one
via sv_setpv
, sv_setiv
, etc. :
- SV *tmp = sv_newmortal();
- sv_setiv(tmp, an_integer);
As that is multiple C statements it is quite common so see this idiom instead:
- SV *tmp = sv_2mortal(newSViv(an_integer));
You should be careful about creating mortal variables. Strange things
can happen if you make the same value mortal within multiple contexts,
or if you make a variable mortal multiple times. Thinking of "Mortalization"
as deferred SvREFCNT_dec
should help to minimize such problems.
For example if you are passing an SV which you know has a high enough REFCNT
to survive its use on the stack you need not do any mortalization.
If you are not sure then doing an SvREFCNT_inc
and sv_2mortal
, or
making a sv_mortalcopy
is safer.
The mortal routines are not just for SVs; AVs and HVs can be
made mortal by passing their address (type-casted to SV*
) to the
sv_2mortal
or sv_mortalcopy
routines.
A stash is a hash that contains all variables that are defined within a package. Each key of the stash is a symbol name (shared by all the different types of objects that have the same name), and each value in the hash table is a GV (Glob Value). This GV in turn contains references to the various objects of that name, including (but not limited to) the following:
- Scalar Value
- Array Value
- Hash Value
- I/O Handle
- Format
- Subroutine
There is a single stash called PL_defstash
that holds the items that exist
in the main
package. To get at the items in other packages, append the
string "::" to the package name. The items in the Foo
package are in
the stash Foo::
in PL_defstash. The items in the Bar::Baz
package are
in the stash Baz::
in Bar::
's stash.
To get the stash pointer for a particular package, use the function:
- HV* gv_stashpv(const char* name, I32 flags)
- HV* gv_stashsv(SV*, I32 flags)
The first function takes a literal string, the second uses the string stored
in the SV. Remember that a stash is just a hash table, so you get back an
HV*
. The flags
flag will create a new package if it is set to GV_ADD.
The name that gv_stash*v
wants is the name of the package whose symbol table
you want. The default package is called main
. If you have multiply nested
packages, pass their names to gv_stash*v
, separated by ::
as in the Perl
language itself.
Alternately, if you have an SV that is a blessed reference, you can find out the stash pointer by using:
- HV* SvSTASH(SvRV(SV*));
then use the following to get the package name itself:
- char* HvNAME(HV* stash);
If you need to bless or re-bless an object you can use the following function:
- SV* sv_bless(SV*, HV* stash)
where the first argument, an SV*
, must be a reference, and the second
argument is a stash. The returned SV*
can now be used in the same way
as any other SV.
For more information on references and blessings, consult perlref.
Scalar variables normally contain only one type of value, an integer, double, pointer, or reference. Perl will automatically convert the actual scalar data from the stored type into the requested type.
Some scalar variables contain more than one type of scalar data. For
example, the variable $!
contains either the numeric value of errno
or its string equivalent from either strerror
or sys_errlist[]
.
To force multiple data values into an SV, you must do two things: use the
sv_set*v
routines to add the additional scalar type, then set a flag
so that Perl will believe it contains more than one type of data. The
four macros to set the flags are:
- SvIOK_on
- SvNOK_on
- SvPOK_on
- SvROK_on
The particular macro you must use depends on which sv_set*v
routine
you called first. This is because every sv_set*v
routine turns on
only the bit for the particular type of data being set, and turns off
all the rest.
For example, to create a new Perl variable called "dberror" that contains both the numeric and descriptive string error values, you could use the following code:
- extern int dberror;
- extern char *dberror_list;
- SV* sv = get_sv("dberror", GV_ADD);
- sv_setiv(sv, (IV) dberror);
- sv_setpv(sv, dberror_list[dberror]);
- SvIOK_on(sv);
If the order of sv_setiv
and sv_setpv
had been reversed, then the
macro SvPOK_on
would need to be called instead of SvIOK_on
.
[This section still under construction. Ignore everything here. Post no bills. Everything not permitted is forbidden.]
Any SV may be magical, that is, it has special features that a normal
SV does not have. These features are stored in the SV structure in a
linked list of struct magic
's, typedef'ed to MAGIC
.
- struct magic {
- MAGIC* mg_moremagic;
- MGVTBL* mg_virtual;
- U16 mg_private;
- char mg_type;
- U8 mg_flags;
- I32 mg_len;
- SV* mg_obj;
- char* mg_ptr;
- };
Note this is current as of patchlevel 0, and could change at any time.
Perl adds magic to an SV using the sv_magic function:
- void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
The sv
argument is a pointer to the SV that is to acquire a new magical
feature.
If sv
is not already magical, Perl uses the SvUPGRADE
macro to
convert sv
to type SVt_PVMG
. Perl then continues by adding new magic
to the beginning of the linked list of magical features. Any prior entry
of the same type of magic is deleted. Note that this can be overridden,
and multiple instances of the same type of magic can be associated with an
SV.
The name
and namlen
arguments are used to associate a string with
the magic, typically the name of a variable. namlen
is stored in the
mg_len
field and if name
is non-null then either a savepvn
copy of
name
or name
itself is stored in the mg_ptr
field, depending on
whether namlen
is greater than zero or equal to zero respectively. As a
special case, if (name && namlen == HEf_SVKEY)
then name
is assumed
to contain an SV*
and is stored as-is with its REFCNT incremented.
The sv_magic function uses how
to determine which, if any, predefined
"Magic Virtual Table" should be assigned to the mg_virtual
field.
See the Magic Virtual Tables section below. The how
argument is also
stored in the mg_type
field. The value of how
should be chosen
from the set of macros PERL_MAGIC_foo
found in perl.h. Note that before
these macros were added, Perl internals used to directly use character
literals, so you may occasionally come across old code or documentation
referring to 'U' magic rather than PERL_MAGIC_uvar
for example.
The obj
argument is stored in the mg_obj
field of the MAGIC
structure. If it is not the same as the sv
argument, the reference
count of the obj
object is incremented. If it is the same, or if
the how
argument is PERL_MAGIC_arylen
, or if it is a NULL pointer,
then obj
is merely stored, without the reference count being incremented.
See also sv_magicext
in perlapi for a more flexible way to add magic
to an SV.
There is also a function to add magic to an HV
:
- void hv_magic(HV *hv, GV *gv, int how);
This simply calls sv_magic
and coerces the gv
argument into an SV
.
To remove the magic from an SV, call the function sv_unmagic:
The type
argument should be equal to the how
value when the SV
was initially made magical.
However, note that sv_unmagic
removes all magic of a certain type
from the
SV
. If you want to remove only certain magic of a type
based on the magic
virtual table, use sv_unmagicext
instead:
The mg_virtual
field in the MAGIC
structure is a pointer to an
MGVTBL
, which is a structure of function pointers and stands for
"Magic Virtual Table" to handle the various operations that might be
applied to that variable.
The MGVTBL
has five (or sometimes eight) pointers to the following
routine types:
- int (*svt_get)(SV* sv, MAGIC* mg);
- int (*svt_set)(SV* sv, MAGIC* mg);
- U32 (*svt_len)(SV* sv, MAGIC* mg);
- int (*svt_clear)(SV* sv, MAGIC* mg);
- int (*svt_free)(SV* sv, MAGIC* mg);
- int (*svt_copy)(SV *sv, MAGIC* mg, SV *nsv,
- const char *name, I32 namlen);
- int (*svt_dup)(MAGIC *mg, CLONE_PARAMS *param);
- int (*svt_local)(SV *nsv, MAGIC *mg);
This MGVTBL structure is set at compile-time in perl.h and there are currently 32 types. These different structures contain pointers to various routines that perform additional actions depending on which function is being called.
- Function pointer Action taken
- ---------------- ------------
- svt_get Do something before the value of the SV is
- retrieved.
- svt_set Do something after the SV is assigned a value.
- svt_len Report on the SV's length.
- svt_clear Clear something the SV represents.
- svt_free Free any extra storage associated with the SV.
- svt_copy copy tied variable magic to a tied element
- svt_dup duplicate a magic structure during thread cloning
- svt_local copy magic to local value during 'local'
For instance, the MGVTBL structure called vtbl_sv
(which corresponds
to an mg_type
of PERL_MAGIC_sv
) contains:
- { magic_get, magic_set, magic_len, 0, 0 }
Thus, when an SV is determined to be magical and of type PERL_MAGIC_sv
,
if a get operation is being performed, the routine magic_get
is
called. All the various routines for the various magical types begin
with magic_
. NOTE: the magic routines are not considered part of
the Perl API, and may not be exported by the Perl library.
The last three slots are a recent addition, and for source code compatibility they are only checked for if one of the three flags MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags. This means that most code can continue declaring a vtable as a 5-element value. These three are currently used exclusively by the threading code, and are highly subject to change.
The current kinds of Magic Virtual Tables are:
- mg_type
- (old-style char and macro) MGVTBL Type of magic
- -------------------------- ------ -------------
- \0 PERL_MAGIC_sv vtbl_sv Special scalar variable
- # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
- % PERL_MAGIC_rhash (none) extra data for restricted
- hashes
- & PERL_MAGIC_proto (none) my sub prototype CV
- . PERL_MAGIC_pos vtbl_pos pos() lvalue
- : PERL_MAGIC_symtab (none) extra data for symbol
- tables
- < PERL_MAGIC_backref vtbl_backref for weak ref data
- @ PERL_MAGIC_arylen_p (none) to move arylen out of XPVAV
- B PERL_MAGIC_bm vtbl_regexp Boyer-Moore
- (fast string search)
- c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table
- (AMT) on stash
- D PERL_MAGIC_regdata vtbl_regdata Regex match position data
- (@+ and @- vars)
- d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
- element
- E PERL_MAGIC_env vtbl_env %ENV hash
- e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
- f PERL_MAGIC_fm vtbl_regexp Formline
- ('compiled' format)
- g PERL_MAGIC_regex_global vtbl_mglob m//g target
- H PERL_MAGIC_hints vtbl_hints %^H hash
- h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element
- I PERL_MAGIC_isa vtbl_isa @ISA array
- i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
- k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
- L PERL_MAGIC_dbfile (none) Debugger %_<filename
- l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename
- element
- N PERL_MAGIC_shared (none) Shared between threads
- n PERL_MAGIC_shared_scalar (none) Shared between threads
- o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation
- P PERL_MAGIC_tied vtbl_pack Tied array or hash
- p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
- q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
- r PERL_MAGIC_qr vtbl_regexp precompiled qr// regex
- S PERL_MAGIC_sig (none) %SIG hash
- s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
- t PERL_MAGIC_taint vtbl_taint Taintedness
- U PERL_MAGIC_uvar vtbl_uvar Available for use by
- extensions
- u PERL_MAGIC_uvar_elem (none) Reserved for use by
- extensions
- V PERL_MAGIC_vstring (none) SV was vstring literal
- v PERL_MAGIC_vec vtbl_vec vec() lvalue
- w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information
- x PERL_MAGIC_substr vtbl_substr substr() lvalue
- y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
- variable / smart parameter
- vivification
- ] PERL_MAGIC_checkcall vtbl_checkcall inlining/mutation of call
- to this CV
- ~ PERL_MAGIC_ext (none) Available for use by
- extensions
When an uppercase and lowercase letter both exist in the table, then the uppercase letter is typically used to represent some kind of composite type (a list or a hash), and the lowercase letter is used to represent an element of that composite type. Some internals code makes use of this case relationship. However, 'v' and 'V' (vec and v-string) are in no way related.
The PERL_MAGIC_ext
and PERL_MAGIC_uvar
magic types are defined
specifically for use by extensions and will not be used by perl itself.
Extensions can use PERL_MAGIC_ext
magic to 'attach' private information
to variables (typically objects). This is especially useful because
there is no way for normal perl code to corrupt this private information
(unlike using extra elements of a hash object).
Similarly, PERL_MAGIC_uvar
magic can be used much like tie() to call a
C function any time a scalar's value is used or changed. The MAGIC
's
mg_ptr
field points to a ufuncs
structure:
- struct ufuncs {
- I32 (*uf_val)(pTHX_ IV, SV*);
- I32 (*uf_set)(pTHX_ IV, SV*);
- IV uf_index;
- };
When the SV is read from or written to, the uf_val
or uf_set
function will be called with uf_index
as the first arg and a pointer to
the SV as the second. A simple example of how to add PERL_MAGIC_uvar
magic is shown below. Note that the ufuncs structure is copied by
sv_magic, so you can safely allocate it on the stack.
- void
- Umagic(sv)
- SV *sv;
- PREINIT:
- struct ufuncs uf;
- CODE:
- uf.uf_val = &my_get_fn;
- uf.uf_set = &my_set_fn;
- uf.uf_index = 0;
- sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
Attaching PERL_MAGIC_uvar
to arrays is permissible but has no effect.
For hashes there is a specialized hook that gives control over hash
keys (but not values). This hook calls PERL_MAGIC_uvar
'get' magic
if the "set" function in the ufuncs
structure is NULL. The hook
is activated whenever the hash is accessed with a key specified as
an SV
through the functions hv_store_ent
, hv_fetch_ent
,
hv_delete_ent
, and hv_exists_ent
. Accessing the key as a string
through the functions without the ..._ent
suffix circumvents the
hook. See GUTS in Hash::Util::FieldHash for a detailed description.
Note that because multiple extensions may be using PERL_MAGIC_ext
or PERL_MAGIC_uvar
magic, it is important for extensions to take
extra care to avoid conflict. Typically only using the magic on
objects blessed into the same class as the extension is sufficient.
For PERL_MAGIC_ext
magic, it is usually a good idea to define an
MGVTBL
, even if all its fields will be 0
, so that individual
MAGIC
pointers can be identified as a particular kind of magic
using their magic virtual table. mg_findext
provides an easy way
to do that:
- STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 };
- MAGIC *mg;
- if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) {
- /* this is really ours, not another module's PERL_MAGIC_ext */
- my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr;
- ...
- }
Also note that the sv_set*()
and sv_cat*()
functions described
earlier do not invoke 'set' magic on their targets. This must
be done by the user either by calling the SvSETMAGIC()
macro after
calling these functions, or by using one of the sv_set*_mg()
or
sv_cat*_mg()
functions. Similarly, generic C code must call the
SvGETMAGIC()
macro to invoke any 'get' magic if they use an SV
obtained from external sources in functions that don't handle magic.
See perlapi for a description of these functions.
For example, calls to the sv_cat*()
functions typically need to be
followed by SvSETMAGIC()
, but they don't need a prior SvGETMAGIC()
since their implementation handles 'get' magic.
- MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that
- * type */
This routine returns a pointer to a MAGIC
structure stored in the SV.
If the SV does not have that magical feature, NULL
is returned. If the
SV has multiple instances of that magical feature, the first one will be
returned. mg_findext
can be used to find a MAGIC
structure of an SV
based on both its magic type and its magic virtual table:
- MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl);
Also, if the SV passed to mg_find
or mg_findext
is not of type
SVt_PVMG, Perl may core dump.
- int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
This routine checks to see what types of magic sv
has. If the mg_type
field is an uppercase letter, then the mg_obj is copied to nsv
, but
the mg_type field is changed to be the lowercase letter.
Tied hashes and arrays are magical beasts of the PERL_MAGIC_tied
magic type.
WARNING: As of the 5.004 release, proper usage of the array and hash access functions requires understanding a few caveats. Some of these caveats are actually considered bugs in the API, to be fixed in later releases, and are bracketed with [MAYCHANGE] below. If you find yourself actually applying such information in this section, be aware that the behavior may change in the future, umm, without warning.
The perl tie function associates a variable with an object that implements the various GET, SET, etc methods. To perform the equivalent of the perl tie function from an XSUB, you must mimic this behaviour. The code below carries out the necessary steps - firstly it creates a new hash, and then creates a second hash which it blesses into the class which will implement the tie methods. Lastly it ties the two hashes together, and returns a reference to the new tied hash. Note that the code below does NOT call the TIEHASH method in the MyTie class - see Calling Perl Routines from within C Programs for details on how to do this.
- SV*
- mytie()
- PREINIT:
- HV *hash;
- HV *stash;
- SV *tie;
- CODE:
- hash = newHV();
- tie = newRV_noinc((SV*)newHV());
- stash = gv_stashpv("MyTie", GV_ADD);
- sv_bless(tie, stash);
- hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
- RETVAL = newRV_noinc(hash);
- OUTPUT:
- RETVAL
The av_store
function, when given a tied array argument, merely
copies the magic of the array onto the value to be "stored", using
mg_copy
. It may also return NULL, indicating that the value did not
actually need to be stored in the array. [MAYCHANGE] After a call to
av_store
on a tied array, the caller will usually need to call
mg_set(val)
to actually invoke the perl level "STORE" method on the
TIEARRAY object. If av_store
did return NULL, a call to
SvREFCNT_dec(val)
will also be usually necessary to avoid a memory
leak. [/MAYCHANGE]
The previous paragraph is applicable verbatim to tied hash access using the
hv_store
and hv_store_ent
functions as well.
av_fetch
and the corresponding hash functions hv_fetch
and
hv_fetch_ent
actually return an undefined mortal value whose magic
has been initialized using mg_copy
. Note the value so returned does not
need to be deallocated, as it is already mortal. [MAYCHANGE] But you will
need to call mg_get()
on the returned value in order to actually invoke
the perl level "FETCH" method on the underlying TIE object. Similarly,
you may also call mg_set()
on the return value after possibly assigning
a suitable value to it using sv_setsv
, which will invoke the "STORE"
method on the TIE object. [/MAYCHANGE]
[MAYCHANGE]
In other words, the array or hash fetch/store functions don't really
fetch and store actual values in the case of tied arrays and hashes. They
merely call mg_copy
to attach magic to the values that were meant to be
"stored" or "fetched". Later calls to mg_get
and mg_set
actually
do the job of invoking the TIE methods on the underlying objects. Thus
the magic mechanism currently implements a kind of lazy access to arrays
and hashes.
Currently (as of perl version 5.004), use of the hash and array access functions requires the user to be aware of whether they are operating on "normal" hashes and arrays, or on their tied variants. The API may be changed to provide more transparent access to both tied and normal data types in future versions. [/MAYCHANGE]
You would do well to understand that the TIEARRAY and TIEHASH interfaces are mere sugar to invoke some perl method calls while using the uniform hash and array syntax. The use of this sugar imposes some overhead (typically about two to four extra opcodes per FETCH/STORE operation, in addition to the creation of all the mortal variables required to invoke the methods). This overhead will be comparatively small if the TIE methods are themselves substantial, but if they are only a few statements long, the overhead will not be insignificant.
Perl has a very handy construction
- {
- local $var = 2;
- ...
- }
This construction is approximately equivalent to
- {
- my $oldvar = $var;
- $var = 2;
- ...
- $var = $oldvar;
- }
The biggest difference is that the first construction would
reinstate the initial value of $var, irrespective of how control exits
the block: goto, return, die/eval, etc. It is a little bit
more efficient as well.
There is a way to achieve a similar task from C via Perl API: create a
pseudo-block, and arrange for some changes to be automatically
undone at the end of it, either explicit, or via a non-local exit (via
die()). A block-like construct is created by a pair of
ENTER
/LEAVE
macros (see Returning a Scalar in perlcall).
Such a construct may be created specially for some important localized
task, or an existing one (like boundaries of enclosing Perl
subroutine/block, or an existing pair for freeing TMPs) may be
used. (In the second case the overhead of additional localization must
be almost negligible.) Note that any XSUB is automatically enclosed in
an ENTER
/LEAVE
pair.
Inside such a pseudo-block the following service is available:
SAVEINT(int i)
SAVEIV(IV i)
SAVEI32(I32 i)
SAVELONG(long i)
These macros arrange things to restore the value of integer variable
i
at the end of enclosing pseudo-block.
SAVESPTR(s)
SAVEPPTR(p)
These macros arrange things to restore the value of pointers s and
p
. s must be a pointer of a type which survives conversion to
SV*
and back, p
should be able to survive conversion to char*
and back.
SAVEFREESV(SV *sv)
The refcount of sv
would be decremented at the end of
pseudo-block. This is similar to sv_2mortal
in that it is also a
mechanism for doing a delayed SvREFCNT_dec
. However, while sv_2mortal
extends the lifetime of sv
until the beginning of the next statement,
SAVEFREESV
extends it until the end of the enclosing scope. These
lifetimes can be wildly different.
Also compare SAVEMORTALIZESV
.
SAVEMORTALIZESV(SV *sv)
Just like SAVEFREESV
, but mortalizes sv
at the end of the current
scope instead of decrementing its reference count. This usually has the
effect of keeping sv
alive until the statement that called the currently
live scope has finished executing.
SAVEFREEOP(OP *op)
The OP *
is op_free()ed at the end of pseudo-block.
SAVEFREEPV(p)
The chunk of memory which is pointed to by p
is Safefree()ed at the
end of pseudo-block.
SAVECLEARSV(SV *sv)
Clears a slot in the current scratchpad which corresponds to sv
at
the end of pseudo-block.
SAVEDELETE(HV *hv, char *key, I32 length)
The key key
of hv
is deleted at the end of pseudo-block. The
string pointed to by key
is Safefree()ed. If one has a key in
short-lived storage, the corresponding string may be reallocated like
this:
- SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)
At the end of pseudo-block the function f
is called with the
only argument p
.
SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)
At the end of pseudo-block the function f
is called with the
implicit context argument (if any), and p
.
SAVESTACK_POS()
The current offset on the Perl internal stack (cf. SP
) is restored
at the end of pseudo-block.
The following API list contains functions, thus one needs to
provide pointers to the modifiable data explicitly (either C pointers,
or Perlish GV *
s). Where the above macros take int, a similar
function takes int *
.
SV* save_scalar(GV *gv)
Equivalent to Perl code local $gv
.
AV* save_ary(GV *gv)
HV* save_hash(GV *gv)
Similar to save_scalar
, but localize @gv
and %gv
.
void save_item(SV *item)
Duplicates the current value of SV
, on the exit from the current
ENTER
/LEAVE
pseudo-block will restore the value of SV
using the stored value. It doesn't handle magic. Use save_scalar
if
magic is affected.
void save_list(SV **sarg, I32 maxsarg)
A variant of save_item
which takes multiple arguments via an array
sarg
of SV*
of length maxsarg
.
SV* save_svref(SV **sptr)
Similar to save_scalar
, but will reinstate an SV *
.
void save_aptr(AV **aptr)
void save_hptr(HV **hptr)
Similar to save_svref
, but localize AV *
and HV *
.
The Alias
module implements localization of the basic types within the
caller's scope. People who are interested in how to localize things in
the containing scope should take a look there too.
The XSUB mechanism is a simple way for Perl programs to access C subroutines. An XSUB routine will have a stack that contains the arguments from the Perl program, and a way to map from the Perl data structures to a C equivalent.
The stack arguments are accessible through the ST(n)
macro, which returns
the n
'th stack argument. Argument 0 is the first argument passed in the
Perl subroutine call. These arguments are SV*
, and can be used anywhere
an SV*
is used.
Most of the time, output from the C routine can be handled through use of the RETVAL and OUTPUT directives. However, there are some cases where the argument stack is not already long enough to handle all the return values. An example is the POSIX tzname() call, which takes no arguments, but returns two, the local time zone's standard and summer time abbreviations.
To handle this situation, the PPCODE directive is used and the stack is extended using the macro:
- EXTEND(SP, num);
where SP
is the macro that represents the local copy of the stack pointer,
and num
is the number of elements the stack should be extended by.
Now that there is room on the stack, values can be pushed on it using PUSHs
macro. The pushed values will often need to be "mortal" (See
Reference Counts and Mortality):
- PUSHs(sv_2mortal(newSViv(an_integer)))
- PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
- PUSHs(sv_2mortal(newSVnv(a_double)))
- PUSHs(sv_2mortal(newSVpv("Some String",0)))
- /* Although the last example is better written as the more
- * efficient: */
- PUSHs(newSVpvs_flags("Some String", SVs_TEMP))
And now the Perl program calling tzname
, the two values will be assigned
as in:
- ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
An alternate (and possibly simpler) method to pushing values on the stack is to use the macro:
- XPUSHs(SV*)
This macro automatically adjusts the stack for you, if needed. Thus, you
do not need to call EXTEND
to extend the stack.
Despite their suggestions in earlier versions of this document the macros
(X)PUSH[iunp] are not suited to XSUBs which return multiple results.
For that, either stick to the (X)PUSHs macros shown above, or use the new
m(X)PUSH[iunp] macros instead; see Putting a C value on Perl stack.
For more information, consult perlxs and perlxstut.
If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable of the XSUB's package.
But it also puts the same information in certain fields of the XSUB itself:
- HV *stash = CvSTASH(cv);
- const char *subname = SvPVX(cv);
- STRLEN name_length = SvCUR(cv); /* in bytes */
- U32 is_utf8 = SvUTF8(cv);
SvPVX(cv)
contains just the sub name itself, not including the package.
For an AUTOLOAD routine in UNIVERSAL or one of its superclasses,
CvSTASH(cv)
returns NULL during a method call on a nonexistent package.
Note: Setting $AUTOLOAD stopped working in 5.6.1, which did not support XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need to support 5.8-5.14, use the XSUB's fields.
There are four routines that can be used to call a Perl subroutine from within a C program. These four are:
- I32 call_sv(SV*, I32);
- I32 call_pv(const char*, I32);
- I32 call_method(const char*, I32);
- I32 call_argv(const char*, I32, char**);
The routine most often used is call_sv
. The SV*
argument
contains either the name of the Perl subroutine to be called, or a
reference to the subroutine. The second argument consists of flags
that control the context in which the subroutine is called, whether
or not the subroutine is being passed arguments, how errors should be
trapped, and how to treat return values.
All four routines return the number of arguments that the subroutine returned on the Perl stack.
These routines used to be called perl_call_sv
, etc., before Perl v5.6.0,
but those names are now deprecated; macros of the same name are provided for
compatibility.
When using any of these routines (except call_argv
), the programmer
must manipulate the Perl stack. These include the following macros and
functions:
- dSP
- SP
- PUSHMARK()
- PUTBACK
- SPAGAIN
- ENTER
- SAVETMPS
- FREETMPS
- LEAVE
- XPUSH*()
- POP*()
For a detailed description of calling conventions from C to Perl, consult perlcall.
All memory meant to be used with the Perl API functions should be manipulated using the macros described in this section. The macros provide the necessary transparency between differences in the actual malloc implementation that is used within perl.
It is suggested that you enable the version of malloc that is distributed with Perl. It keeps pools of various sizes of unallocated memory in order to satisfy allocation requests more quickly. However, on some platforms, it may cause spurious malloc or free errors.
The following three macros are used to initially allocate memory :
- Newx(pointer, number, type);
- Newxc(pointer, number, type, cast);
- Newxz(pointer, number, type);
The first argument pointer
should be the name of a variable that will
point to the newly allocated memory.
The second and third arguments number
and type
specify how many of
the specified type of data structure should be allocated. The argument
type
is passed to sizeof
. The final argument to Newxc
, cast
,
should be used if the pointer
argument is different from the type
argument.
Unlike the Newx
and Newxc
macros, the Newxz
macro calls memzero
to zero out all the newly allocated memory.
- Renew(pointer, number, type);
- Renewc(pointer, number, type, cast);
- Safefree(pointer)
These three macros are used to change a memory buffer size or to free a
piece of memory no longer needed. The arguments to Renew
and Renewc
match those of New
and Newc
with the exception of not needing the
"magic cookie" argument.
- Move(source, dest, number, type);
- Copy(source, dest, number, type);
- Zero(dest, number, type);
These three macros are used to move, copy, or zero out previously allocated
memory. The source
and dest
arguments point to the source and
destination starting points. Perl will move, copy, or zero out number
instances of the size of the type
data structure (using the sizeof
function).
The most recent development releases of Perl have been experimenting with removing Perl's dependency on the "normal" standard I/O suite and allowing other stdio implementations to be used. This involves creating a new abstraction layer that then calls whichever implementation of stdio Perl was compiled with. All XSUBs should now use the functions in the PerlIO abstraction layer and not make any assumptions about what kind of stdio is being used.
For a complete description of the PerlIO abstraction, consult perlapio.
A lot of opcodes (this is an elementary operation in the internal perl stack machine) put an SV* on the stack. However, as an optimization the corresponding SV is (usually) not recreated each time. The opcodes reuse specially assigned SVs (targets) which are (as a corollary) not constantly freed/created.
Each of the targets is created only once (but see Scratchpads and recursion below), and when an opcode needs to put an integer, a double, or a string on stack, it just sets the corresponding parts of its target and puts the target on stack.
The macro to put this target on stack is PUSHTARG
, and it is
directly used in some opcodes, as well as indirectly in zillions of
others, which use it via (X)PUSH[iunp].
Because the target is reused, you must be careful when pushing multiple values on the stack. The following code will not do what you think:
- XPUSHi(10);
- XPUSHi(20);
This translates as "set TARG
to 10, push a pointer to TARG
onto
the stack; set TARG
to 20, push a pointer to TARG
onto the stack".
At the end of the operation, the stack does not contain the values 10
and 20, but actually contains two pointers to TARG
, which we have set
to 20.
If you need to push multiple different values then you should either use
the (X)PUSHs macros, or else use the new m(X)PUSH[iunp] macros,
none of which make use of TARG
. The (X)PUSHs macros simply push an
SV* on the stack, which, as noted under XSUBs and the Argument Stack,
will often need to be "mortal". The new m(X)PUSH[iunp] macros make
this a little easier to achieve by creating a new mortal for you (via
(X)PUSHmortal), pushing that onto the stack (extending it if necessary
in the case of the mXPUSH[iunp]
macros), and then setting its value.
Thus, instead of writing this to "fix" the example above:
- XPUSHs(sv_2mortal(newSViv(10)))
- XPUSHs(sv_2mortal(newSViv(20)))
you can simply write:
- mXPUSHi(10)
- mXPUSHi(20)
On a related note, if you do use (X)PUSH[iunp], then you're going to
need a dTARG
in your variable declarations so that the *PUSH*
macros can make use of the local variable TARG
. See also dTARGET
and dXSTARG
.
The question remains on when the SVs which are targets for opcodes are created. The answer is that they are created when the current unit--a subroutine or a file (for opcodes for statements outside of subroutines)--is compiled. During this time a special anonymous Perl array is created, which is called a scratchpad for the current unit.
A scratchpad keeps SVs which are lexicals for the current unit and are
targets for opcodes. One can deduce that an SV lives on a scratchpad
by looking on its flags: lexicals have SVs_PADMY
set, and
targets have SVs_PADTMP
set.
The correspondence between OPs and targets is not 1-to-1. Different OPs in the compile tree of the unit can use the same target, if this would not conflict with the expected life of the temporary.
In fact it is not 100% true that a compiled unit contains a pointer to the scratchpad AV. In fact it contains a pointer to an AV of (initially) one element, and this element is the scratchpad AV. Why do we need an extra level of indirection?
The answer is recursion, and maybe threads. Both these can create several execution pointers going into the same subroutine. For the subroutine-child not write over the temporaries for the subroutine-parent (lifespan of which covers the call to the child), the parent and the child should have different scratchpads. (And the lexicals should be separate anyway!)
So each subroutine is born with an array of scratchpads (of length 1). On each entry to the subroutine it is checked that the current depth of the recursion is not more than the length of this array, and if it is, new scratchpad is created and pushed into the array.
The targets on this scratchpad are undefs, but they are already
marked with correct flags.
Here we describe the internal form your code is converted to by Perl. Start with a simple example:
- $a = $b + $c;
This is converted to a tree similar to this one:
- assign-to
- / \
- + $a
- / \
- $b $c
(but slightly more complicated). This tree reflects the way Perl parsed your code, but has nothing to do with the execution order. There is an additional "thread" going through the nodes of the tree which shows the order of execution of the nodes. In our simplified example above it looks like:
- $b ---> $c ---> + ---> $a ---> assign-to
But with the actual compile tree for $a = $b + $c
it is different:
some nodes optimized away. As a corollary, though the actual tree
contains more nodes than our simplified example, the execution order
is the same as in our example.
If you have your perl compiled for debugging (usually done with
-DDEBUGGING
on the Configure
command line), you may examine the
compiled tree by specifying -Dx
on the Perl command line. The
output takes several lines per node, and for $b+$c
it looks like
this:
- 5 TYPE = add ===> 6
- TARG = 1
- FLAGS = (SCALAR,KIDS)
- {
- TYPE = null ===> (4)
- (was rv2sv)
- FLAGS = (SCALAR,KIDS)
- {
- 3 TYPE = gvsv ===> 4
- FLAGS = (SCALAR)
- GV = main::b
- }
- }
- {
- TYPE = null ===> (5)
- (was rv2sv)
- FLAGS = (SCALAR,KIDS)
- {
- 4 TYPE = gvsv ===> 5
- FLAGS = (SCALAR)
- GV = main::c
- }
- }
This tree has 5 nodes (one per TYPE
specifier), only 3 of them are
not optimized away (one per number in the left column). The immediate
children of the given node correspond to {}
pairs on the same level
of indentation, thus this listing corresponds to the tree:
- add
- / \
- null null
- | |
- gvsv gvsv
The execution order is indicated by ===>
marks, thus it is 3
4 5 6 (node 6
is not included into above listing), i.e.,
gvsv gvsv add whatever
.
Each of these nodes represents an op, a fundamental operation inside the
Perl core. The code which implements each operation can be found in the
pp*.c files; the function which implements the op with type gvsv
is pp_gvsv
, and so on. As the tree above shows, different ops have
different numbers of children: add
is a binary operator, as one would
expect, and so has two children. To accommodate the various different
numbers of children, there are various types of op data structure, and
they link together in different ways.
The simplest type of op structure is OP
: this has no children. Unary
operators, UNOP
s, have one child, and this is pointed to by the
op_first
field. Binary operators (BINOP
s) have not only an
op_first
field but also an op_last
field. The most complex type of
op is a LISTOP
, which has any number of children. In this case, the
first child is pointed to by op_first
and the last child by
op_last
. The children in between can be found by iteratively
following the op_sibling
pointer from the first child to the last.
There are also two other op types: a PMOP
holds a regular expression,
and has no children, and a LOOP
may or may not have children. If the
op_children
field is non-zero, it behaves like a LISTOP
. To
complicate matters, if a UNOP
is actually a null
op after
optimization (see Compile pass 2: context propagation) it will still
have children in accordance with its former type.
Another way to examine the tree is to use a compiler back-end module, such as B::Concise.
The tree is created by the compiler while yacc code feeds it the constructions it recognizes. Since yacc works bottom-up, so does the first pass of perl compilation.
What makes this pass interesting for perl developers is that some
optimization may be performed on this pass. This is optimization by
so-called "check routines". The correspondence between node names
and corresponding check routines is described in opcode.pl (do not
forget to run make regen_headers
if you modify this file).
A check routine is called when the node is fully constructed except for the execution-order thread. Since at this time there are no back-links to the currently constructed node, one can do most any operation to the top-level node, including freeing it and/or creating new nodes above/below it.
The check routine returns the node which should be inserted into the tree (if the top-level node was not modified, check routine returns its argument).
By convention, check routines have names ck_*
. They are usually
called from new*OP
subroutines (or convert
) (which in turn are
called from perly.y).
Immediately after the check routine is called the returned node is checked for being compile-time executable. If it is (the value is judged to be constant) it is immediately executed, and a constant node with the "return value" of the corresponding subtree is substituted instead. The subtree is deleted.
If constant folding was not performed, the execution-order thread is created.
When a context for a part of compile tree is known, it is propagated down through the tree. At this time the context can have 5 values (instead of 2 for runtime context): void, boolean, scalar, list, and lvalue. In contrast with the pass 1 this pass is processed from top to bottom: a node's context determines the context for its children.
Additional context-dependent optimizations are performed at this time. Since at this moment the compile tree contains back-references (via "thread" pointers), nodes cannot be free()d now. To allow optimized-away nodes at this stage, such nodes are null()ified instead of free()ing (i.e. their type is changed to OP_NULL).
After the compile tree for a subroutine (or for an eval or a file)
is created, an additional pass over the code is performed. This pass
is neither top-down or bottom-up, but in the execution order (with
additional complications for conditionals). Optimizations performed
at this stage are subject to the same restrictions as in the pass 2.
Peephole optimizations are done by calling the function pointed to
by the global variable PL_peepp
. By default, PL_peepp
just
calls the function pointed to by the global variable PL_rpeepp
.
By default, that performs some basic op fixups and optimisations along
the execution-order op chain, and recursively calls PL_rpeepp
for
each side chain of ops (resulting from conditionals). Extensions may
provide additional optimisations or fixups, hooking into either the
per-subroutine or recursive stage, like this:
- static peep_t prev_peepp;
- static void my_peep(pTHX_ OP *o)
- {
- /* custom per-subroutine optimisation goes here */
- prev_peepp(aTHX_ o);
- /* custom per-subroutine optimisation may also go here */
- }
- BOOT:
- prev_peepp = PL_peepp;
- PL_peepp = my_peep;
- static peep_t prev_rpeepp;
- static void my_rpeep(pTHX_ OP *o)
- {
- OP *orig_o = o;
- for(; o; o = o->op_next) {
- /* custom per-op optimisation goes here */
- }
- prev_rpeepp(aTHX_ orig_o);
- }
- BOOT:
- prev_rpeepp = PL_rpeepp;
- PL_rpeepp = my_rpeep;
The compile tree is executed in a runops function. There are two runops
functions, in run.c and in dump.c. Perl_runops_debug
is used
with DEBUGGING and Perl_runops_standard
is used otherwise. For fine
control over the execution of the compile tree it is possible to provide
your own runops function.
It's probably best to copy one of the existing runops functions and change it to suit your needs. Then, in the BOOT section of your XS file, add the line:
- PL_runops = my_runops;
This function should be as efficient as possible to keep your programs running as fast as possible.
As of perl 5.14 it is possible to hook into the compile-time lexical
scope mechanism using Perl_blockhook_register
. This is used like
this:
- STATIC void my_start_hook(pTHX_ int full);
- STATIC BHK my_hooks;
- BOOT:
- BhkENTRY_set(&my_hooks, bhk_start, my_start_hook);
- Perl_blockhook_register(aTHX_ &my_hooks);
This will arrange to have my_start_hook
called at the start of
compiling every lexical scope. The available hooks are:
void bhk_start(pTHX_ int full)
This is called just after starting a new lexical scope. Note that Perl code like
- if ($x) { ... }
creates two scopes: the first starts at the ( and has full == 1
,
the second starts at the { and has full == 0
. Both end at the
}, so calls to start
and pre/post_end
will match. Anything
pushed onto the save stack by this hook will be popped just before the
scope ends (between the pre_
and post_end
hooks, in fact).
void bhk_pre_end(pTHX_ OP **o)
This is called at the end of a lexical scope, just before unwinding the stack. o is the root of the optree representing the scope; it is a double pointer so you can replace the OP if you need to.
void bhk_post_end(pTHX_ OP **o)
This is called at the end of a lexical scope, just after unwinding the
stack. o is as above. Note that it is possible for calls to pre_
and post_end
to nest, if there is something on the save stack that
calls string eval.
void bhk_eval(pTHX_ OP *const o)
This is called just before starting to compile an eval STRING
, do
FILE
, require or use, after the eval has been set up. o is the
OP that requested the eval, and will normally be an OP_ENTEREVAL
,
OP_DOFILE
or OP_REQUIRE
.
Once you have your hook functions, you need a BHK
structure to put
them in. It's best to allocate it statically, since there is no way to
free it once it's registered. The function pointers should be inserted
into this structure using the BhkENTRY_set
macro, which will also set
flags indicating which entries are valid. If you do need to allocate
your BHK
dynamically for some reason, be sure to zero it before you
start.
Once registered, there is no mechanism to switch these hooks off, so if
that is necessary you will need to do this yourself. An entry in %^H
is probably the best way, so the effect is lexically scoped; however it
is also possible to use the BhkDISABLE
and BhkENABLE
macros to
temporarily switch entries on and off. You should also be aware that
generally speaking at least one scope will have opened before your
extension is loaded, so you will see some pre/post_end
pairs that
didn't have a matching start
.
dump functionsTo aid debugging, the source file dump.c contains a number of functions which produce formatted output of internal data structures.
The most commonly used of these functions is Perl_sv_dump
; it's used
for dumping SVs, AVs, HVs, and CVs. The Devel::Peek
module calls
sv_dump
to produce debugging output from Perl-space, so users of that
module should already be familiar with its format.
Perl_op_dump
can be used to dump an OP
structure or any of its
derivatives, and produces output similar to perl -Dx
; in fact,
Perl_dump_eval
will dump the main root of the code being evaluated,
exactly like -Dx
.
Other useful functions are Perl_dump_sub
, which turns a GV
into an
op tree, Perl_dump_packsubs
which calls Perl_dump_sub
on all the
subroutines in a package like so: (Thankfully, these are all xsubs, so
there is no op tree)
- (gdb) print Perl_dump_packsubs(PL_defstash)
- SUB attributes::bootstrap = (xsub 0x811fedc 0)
- SUB UNIVERSAL::can = (xsub 0x811f50c 0)
- SUB UNIVERSAL::isa = (xsub 0x811f304 0)
- SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
- SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
and Perl_dump_all
, which dumps all the subroutines in the stash and
the op tree of the main root.
The Perl interpreter can be regarded as a closed box: it has an API for feeding it code or otherwise making it do things, but it also has functions for its own use. This smells a lot like an object, and there are ways for you to build Perl so that you can have multiple interpreters, with one interpreter represented either as a C structure, or inside a thread-specific structure. These structures contain all the context, the state of that interpreter.
One macro controls the major Perl build flavor: MULTIPLICITY. The MULTIPLICITY build has a C structure that packages all the interpreter state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also normally defined, and enables the support for passing in a "hidden" first argument that represents all three data structures. MULTIPLICITY makes multi-threaded perls possible (with the ithreads threading model, related to the macro USE_ITHREADS.)
Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and
PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the
former turns on MULTIPLICITY.) The PERL_GLOBAL_STRUCT causes all the
internal variables of Perl to be wrapped inside a single global struct,
struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or
the function Perl_GetVars(). The PERL_GLOBAL_STRUCT_PRIVATE goes
one step further, there is still a single struct (allocated in main()
either from heap or from stack) but there are no global data symbols
pointing to it. In either case the global struct should be initialised
as the very first thing in main() using Perl_init_global_struct() and
correspondingly tear it down after perl_free() using Perl_free_global_struct(),
please see miniperlmain.c for usage details. You may also need
to use dVAR
in your coding to "declare the global variables"
when you are using them. dTHX does this for you automatically.
To see whether you have non-const data you can use a BSD-compatible nm
:
- nm libperl.a | grep -v ' [TURtr] '
If this displays any D
or d
symbols, you have non-const data.
For backward compatibility reasons defining just PERL_GLOBAL_STRUCT doesn't actually hide all symbols inside a big global struct: some PerlIO_xxx vtables are left visible. The PERL_GLOBAL_STRUCT_PRIVATE then hides everything (see how the PERLIO_FUNCS_DECL is used).
All this obviously requires a way for the Perl internal functions to be either subroutines taking some kind of structure as the first argument, or subroutines taking nothing as the first argument. To enable these two very different ways of building the interpreter, the Perl source (as it does in so many other situations) makes heavy use of macros and subroutine naming conventions.
First problem: deciding which functions will be public API functions and
which will be private. All functions whose names begin S_
are private
(think "S" for "secret" or "static"). All other functions begin with
"Perl_", but just because a function begins with "Perl_" does not mean it is
part of the API. (See Internal Functions.) The easiest way to be sure a
function is part of the API is to find its entry in perlapi.
If it exists in perlapi, it's part of the API. If it doesn't, and you
think it should be (i.e., you need it for your extension), send mail via
perlbug explaining why you think it should be.
Second problem: there must be a syntax so that the same subroutine declarations and calls can pass a structure as their first argument, or pass nothing. To solve this, the subroutines are named and declared in a particular way. Here's a typical start of a static function used within the Perl guts:
- STATIC void
- S_incline(pTHX_ char *s)
STATIC becomes "static" in C, and may be #define'd to nothing in some configurations in the future.
A public function (i.e. part of the internal API, but not necessarily sanctioned for use in extensions) begins like this:
- void
- Perl_sv_setiv(pTHX_ SV* dsv, IV num)
pTHX_
is one of a number of macros (in perl.h) that hide the
details of the interpreter's context. THX stands for "thread", "this",
or "thingy", as the case may be. (And no, George Lucas is not involved. :-)
The first character could be 'p' for a prototype, 'a' for argument,
or 'd' for declaration, so we have pTHX
, aTHX
and dTHX
, and
their variants.
When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no first argument containing the interpreter's context. The trailing underscore in the pTHX_ macro indicates that the macro expansion needs a comma after the context argument because other arguments follow it. If PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the subroutine is not prototyped to take the extra argument. The form of the macro without the trailing underscore is used when there are no additional explicit arguments.
When a core function calls another, it must pass the context. This
is normally hidden via macros. Consider sv_setiv
. It expands into
something like this:
- #ifdef PERL_IMPLICIT_CONTEXT
- #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b)
- /* can't do this for vararg functions, see below */
- #else
- #define sv_setiv Perl_sv_setiv
- #endif
This works well, and means that XS authors can gleefully write:
- sv_setiv(foo, bar);
and still have it work under all the modes Perl could have been compiled with.
This doesn't work so cleanly for varargs functions, though, as macros
imply that the number of arguments is known in advance. Instead we
either need to spell them out fully, passing aTHX_
as the first
argument (the Perl core tends to do this with functions like
Perl_warner), or use a context-free version.
The context-free version of Perl_warner is called
Perl_warner_nocontext, and does not take the extra argument. Instead
it does dTHX; to get the context from thread-local storage. We
#define warner Perl_warner_nocontext
so that extensions get source
compatibility at the expense of performance. (Passing an arg is
cheaper than grabbing it from thread-local storage.)
You can ignore [pad]THXx when browsing the Perl headers/sources. Those are strictly for use within the core. Extensions and embedders need only be aware of [pad]THX.
dTHR
was introduced in perl 5.005 to support the older thread model.
The older thread model now uses the THX
mechanism to pass context
pointers around, so dTHR
is not useful any more. Perl 5.6.0 and
later still have it for backward source compatibility, but it is defined
to be a no-op.
When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call any functions in the Perl API will need to pass the initial context argument somehow. The kicker is that you will need to write it in such a way that the extension still compiles when Perl hasn't been built with PERL_IMPLICIT_CONTEXT enabled.
There are three ways to do this. First, the easy but inefficient way, which is also the default, in order to maintain source compatibility with extensions: whenever XSUB.h is #included, it redefines the aTHX and aTHX_ macros to call a function that will return the context. Thus, something like:
- sv_setiv(sv, num);
in your extension will translate to this when PERL_IMPLICIT_CONTEXT is in effect:
- Perl_sv_setiv(Perl_get_context(), sv, num);
or to this otherwise:
- Perl_sv_setiv(sv, num);
You don't have to do anything new in your extension to get this; since the Perl library provides Perl_get_context(), it will all just work.
The second, more efficient way is to use the following template for your Foo.xs:
- #define PERL_NO_GET_CONTEXT /* we want efficiency */
- #include "EXTERN.h"
- #include "perl.h"
- #include "XSUB.h"
- STATIC void my_private_function(int arg1, int arg2);
- STATIC void
- my_private_function(int arg1, int arg2)
- {
- dTHX; /* fetch context */
- ... call many Perl API functions ...
- }
- [... etc ...]
- MODULE = Foo PACKAGE = Foo
- /* typical XSUB */
- void
- my_xsub(arg)
- int arg
- CODE:
- my_private_function(arg, 10);
Note that the only two changes from the normal way of writing an
extension is the addition of a #define PERL_NO_GET_CONTEXT
before
including the Perl headers, followed by a dTHX;
declaration at
the start of every function that will call the Perl API. (You'll
know which functions need this, because the C compiler will complain
that there's an undeclared identifier in those functions.) No changes
are needed for the XSUBs themselves, because the XS() macro is
correctly defined to pass in the implicit context if needed.
The third, even more efficient way is to ape how it is done within the Perl guts:
- #define PERL_NO_GET_CONTEXT /* we want efficiency */
- #include "EXTERN.h"
- #include "perl.h"
- #include "XSUB.h"
- /* pTHX_ only needed for functions that call Perl API */
- STATIC void my_private_function(pTHX_ int arg1, int arg2);
- STATIC void
- my_private_function(pTHX_ int arg1, int arg2)
- {
- /* dTHX; not needed here, because THX is an argument */
- ... call Perl API functions ...
- }
- [... etc ...]
- MODULE = Foo PACKAGE = Foo
- /* typical XSUB */
- void
- my_xsub(arg)
- int arg
- CODE:
- my_private_function(aTHX_ arg, 10);
This implementation never has to fetch the context using a function call, since it is always passed as an extra argument. Depending on your needs for simplicity or efficiency, you may mix the previous two approaches freely.
Never add a comma after pTHX
yourself--always use the form of the
macro with the underscore for functions that take explicit arguments,
or the form without the argument for functions with no explicit arguments.
If one is compiling Perl with the -DPERL_GLOBAL_STRUCT
the dVAR
definition is needed if the Perl global variables (see perlvars.h
or globvar.sym) are accessed in the function and dTHX
is not
used (the dTHX
includes the dVAR
if necessary). One notices
the need for dVAR
only with the said compile-time define, because
otherwise the Perl global variables are visible as-is.
If you create interpreters in one thread and then proceed to call them in another, you need to make sure perl's own Thread Local Storage (TLS) slot is initialized correctly in each of those threads.
The perl_alloc
and perl_clone
API functions will automatically set
the TLS slot to the interpreter they created, so that there is no need to do
anything special if the interpreter is always accessed in the same thread that
created it, and that thread did not create or call any other interpreters
afterwards. If that is not the case, you have to set the TLS slot of the
thread before calling any functions in the Perl API on that particular
interpreter. This is done by calling the PERL_SET_CONTEXT
macro in that
thread as the first thing you do:
- /* do this before doing anything else with some_perl */
- PERL_SET_CONTEXT(some_perl);
- ... other Perl API calls on some_perl go here ...
Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything that the interpreter knows about itself and pass it around, so too are there plans to allow the interpreter to bundle up everything it knows about the environment it's running on. This is enabled with the PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on Windows.
This allows the ability to provide an extra pointer (called the "host" environment) for all the system calls. This makes it possible for all the system stuff to maintain their own state, broken down into seven C structures. These are thin wrappers around the usual system calls (see win32/perllib.c) for the default perl executable, but for a more ambitious host (like the one that would do fork() emulation) all the extra work needed to pretend that different interpreters are actually different "processes", would be done here.
The Perl engine/interpreter and the host are orthogonal entities. There could be one or more interpreters in a process, and one or more "hosts", with free association between them.
All of Perl's internal functions which will be exposed to the outside
world are prefixed by Perl_
so that they will not conflict with XS
functions or functions used in a program in which Perl is embedded.
Similarly, all global variables begin with PL_
. (By convention,
static functions start with S_
.)
Inside the Perl core (PERL_CORE
defined), you can get at the functions
either with or without the Perl_
prefix, thanks to a bunch of defines
that live in embed.h. Note that extension code should not set
PERL_CORE
; this exposes the full perl internals, and is likely to cause
breakage of the XS in each new perl release.
The file embed.h is generated automatically from embed.pl and embed.fnc. embed.pl also creates the prototyping header files for the internal functions, generates the documentation and a lot of other bits and pieces. It's important that when you add a new function to the core or change an existing one, you change the data in the table in embed.fnc as well. Here's a sample entry from that table:
- Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval
The second column is the return type, the third column the name. Columns after that are the arguments. The first column is a set of flags:
This function is a part of the public API. All such functions should also have 'd', very few do not.
This function has a Perl_
prefix; i.e. it is defined as
Perl_av_fetch
.
This function has documentation using the apidoc
feature which we'll
look at in a second. Some functions have 'd' but not 'A'; docs are good.
Other available flags are:
This is a static function and is defined as STATIC S_whatever
, and
usually called within the sources as whatever(...)
.
This does not need an interpreter context, so the definition has no
pTHX
, and it follows that callers don't use aTHX
. (See
Background and PERL_IMPLICIT_CONTEXT.)
This function never returns; croak
, exit and friends.
This function takes a variable number of arguments, printf style.
The argument list should end with ...
, like this:
- Afprd |void |croak |const char* pat|...
This function is part of the experimental development API, and may change or disappear without notice.
This function should not have a compatibility macro to define, say,
Perl_parse
to parse
. It must be called as Perl_parse
.
This function isn't exported out of the Perl core.
This is implemented as a macro.
This function is explicitly exported.
This function is visible to extensions included in the Perl core.
Binary backward compatibility; this function is a macro but also has
a Perl_
implementation (which is exported).
See the comments at the top of embed.fnc
for others.
If you edit embed.pl or embed.fnc, you will need to run
make regen_headers
to force a rebuild of embed.h and other
auto-generated files.
If you are printing IVs, UVs, or NVS instead of the stdio(3) style
formatting codes like %d
, %ld
, %f
, you should use the
following macros for portability
- IVdf IV in decimal
- UVuf UV in decimal
- UVof UV in octal
- UVxf UV in hexadecimal
- NVef NV %e-like
- NVff NV %f-like
- NVgf NV %g-like
These will take care of 64-bit integers and long doubles. For example:
- printf("IV is %"IVdf"\n", iv);
The IVdf will expand to whatever is the correct format for the IVs.
If you are printing addresses of pointers, use UVxf combined with PTR2UV(), do not use %lx or %p.
Because pointer size does not necessarily equal integer size, use the follow macros to do it right.
- PTR2UV(pointer)
- PTR2IV(pointer)
- PTR2NV(pointer)
- INT2PTR(pointertotype, integer)
For example:
- IV iv = ...;
- SV *sv = INT2PTR(SV*, iv);
and
- AV *av = ...;
- UV uv = PTR2UV(av);
There are a couple of macros to do very basic exception handling in XS
modules. You have to define NO_XSLOCKS
before including XSUB.h to
be able to use these macros:
- #define NO_XSLOCKS
- #include "XSUB.h"
You can use these macros if you call code that may croak, but you need to do some cleanup before giving control back to Perl. For example:
- dXCPT; /* set up necessary variables */
- XCPT_TRY_START {
- code_that_may_croak();
- } XCPT_TRY_END
- XCPT_CATCH
- {
- /* do cleanup here */
- XCPT_RETHROW;
- }
Note that you always have to rethrow an exception that has been
caught. Using these macros, it is not possible to just catch the
exception and ignore it. If you have to ignore the exception, you
have to use the call_*
function.
The advantage of using the above macros is that you don't have
to setup an extra function for call_*
, and that using these
macros is faster than using call_*
.
There's an effort going on to document the internal functions and automatically produce reference manuals from them - perlapi is one such manual which details all the functions which are available to XS writers. perlintern is the autogenerated manual for the functions which are not part of the API and are supposedly for internal use only.
Source documentation is created by putting POD comments into the C source, like this:
- /*
- =for apidoc sv_setiv
- Copies an integer into the given SV. Does not handle 'set' magic. See
- C<sv_setiv_mg>.
- =cut
- */
Please try and supply some documentation if you add functions to the Perl core.
The Perl API changes over time. New functions are added or the interfaces
of existing functions are changed. The Devel::PPPort
module tries to
provide compatibility code for some of these changes, so XS writers don't
have to code it themselves when supporting multiple versions of Perl.
Devel::PPPort
generates a C header file ppport.h that can also
be run as a Perl script. To generate ppport.h, run:
- perl -MDevel::PPPort -eDevel::PPPort::WriteFile
Besides checking existing XS code, the script can also be used to retrieve
compatibility information for various API calls using the --api-info
command line switch. For example:
- % perl ppport.h --api-info=sv_magicext
For details, see perldoc ppport.h
.
Perl 5.6.0 introduced Unicode support. It's important for porters and XS writers to understand this support and make sure that the code they write does not corrupt Unicode data.
In the olden, less enlightened times, we all used to use ASCII. Most of us did, anyway. The big problem with ASCII is that it's American. Well, no, that's not actually the problem; the problem is that it's not particularly useful for people who don't use the Roman alphabet. What used to happen was that particular languages would stick their own alphabet in the upper range of the sequence, between 128 and 255. Of course, we then ended up with plenty of variants that weren't quite ASCII, and the whole point of it being a standard was lost.
Worse still, if you've got a language like Chinese or Japanese that has hundreds or thousands of characters, then you really can't fit them into a mere 256, so they had to forget about ASCII altogether, and build their own systems using pairs of numbers to refer to one character.
To fix this, some people formed Unicode, Inc. and produced a new character set containing all the characters you can possibly think of and more. There are several ways of representing these characters, and the one Perl uses is called UTF-8. UTF-8 uses a variable number of bytes to represent a character. You can learn more about Unicode and Perl's Unicode model in perlunicode.
You can't. This is because UTF-8 data is stored in bytes just like
non-UTF-8 data. The Unicode character 200, (0xC8
for you hex types)
capital E with a grave accent, is represented by the two bytes
v196.172
. Unfortunately, the non-Unicode string chr(196).chr(172)
has that byte sequence as well. So you can't tell just by looking - this
is what makes Unicode input an interesting problem.
In general, you either have to know what you're dealing with, or you
have to guess. The API function is_utf8_string
can help; it'll tell
you if a string contains only valid UTF-8 characters. However, it can't
do the work for you. On a character-by-character basis,
is_utf8_char_buf
will tell you whether the current character in a string is valid UTF-8.
As mentioned above, UTF-8 uses a variable number of bytes to store a
character. Characters with values 0...127 are stored in one byte, just
like good ol' ASCII. Character 128 is stored as v194.128
; this
continues up to character 191, which is v194.191
. Now we've run out of
bits (191 is binary 10111111
) so we move on; 192 is v195.128
. And
so it goes on, moving to three bytes at character 2048.
Assuming you know you're dealing with a UTF-8 string, you can find out
how long the first character in it is with the UTF8SKIP
macro:
- char *utf = "\305\233\340\240\201";
- I32 len;
- len = UTF8SKIP(utf); /* len is 2 here */
- utf += len;
- len = UTF8SKIP(utf); /* len is 3 here */
Another way to skip over characters in a UTF-8 string is to use
utf8_hop
, which takes a string and a number of characters to skip
over. You're on your own about bounds checking, though, so don't use it
lightly.
All bytes in a multi-byte UTF-8 character will have the high bit set, so you can test if you need to do something special with this character like this (the UTF8_IS_INVARIANT() is a macro that tests whether the byte can be encoded as a single byte even in UTF-8):
- U8 *utf;
- U8 *utf_end; /* 1 beyond buffer pointed to by utf */
- UV uv; /* Note: a UV, not a U8, not a char */
- STRLEN len; /* length of character in bytes */
- if (!UTF8_IS_INVARIANT(*utf))
- /* Must treat this as UTF-8 */
- uv = utf8_to_uvchr_buf(utf, utf_end, &len);
- else
- /* OK to treat this character as a byte */
- uv = *utf;
You can also see in that example that we use utf8_to_uvchr_buf
to get the
value of the character; the inverse function uvchr_to_utf8
is available
for putting a UV into UTF-8:
- if (!UTF8_IS_INVARIANT(uv))
- /* Must treat this as UTF8 */
- utf8 = uvchr_to_utf8(utf8, uv);
- else
- /* OK to treat this character as a byte */
- *utf8++ = uv;
You must convert characters to UVs using the above functions if
you're ever in a situation where you have to match UTF-8 and non-UTF-8
characters. You may not skip over UTF-8 characters in this case. If you
do this, you'll lose the ability to match hi-bit non-UTF-8 characters;
for instance, if your UTF-8 string contains v196.172
, and you skip
that character, you can never match a chr(200) in a non-UTF-8 string.
So don't do that!
Currently, Perl deals with Unicode strings and non-Unicode strings
slightly differently. A flag in the SV, SVf_UTF8
, indicates that the
string is internally encoded as UTF-8. Without it, the byte value is the
codepoint number and vice versa (in other words, the string is encoded
as iso-8859-1, but use feature 'unicode_strings'
is needed to get iso-8859-1
semantics). You can check and manipulate this flag with the
following macros:
- SvUTF8(sv)
- SvUTF8_on(sv)
- SvUTF8_off(sv)
This flag has an important effect on Perl's treatment of the string: if
Unicode data is not properly distinguished, regular expressions,
length, substr and other string handling operations will have
undesirable results.
The problem comes when you have, for instance, a string that isn't flagged as UTF-8, and contains a byte sequence that could be UTF-8 - especially when combining non-UTF-8 and UTF-8 strings.
Never forget that the SVf_UTF8
flag is separate to the PV value; you
need be sure you don't accidentally knock it off while you're
manipulating SVs. More specifically, you cannot expect to do this:
- SV *sv;
- SV *nsv;
- STRLEN len;
- char *p;
- p = SvPV(sv, len);
- frobnicate(p);
- nsv = newSVpvn(p, len);
The char*
string does not tell you the whole story, and you can't
copy or reconstruct an SV just by copying the string value. Check if the
old SV has the UTF8 flag set, and act accordingly:
- p = SvPV(sv, len);
- frobnicate(p);
- nsv = newSVpvn(p, len);
- if (SvUTF8(sv))
- SvUTF8_on(nsv);
In fact, your frobnicate
function should be made aware of whether or
not it's dealing with UTF-8 data, so that it can handle the string
appropriately.
Since just passing an SV to an XS function and copying the data of
the SV is not enough to copy the UTF8 flags, even less right is just
passing a char *
to an XS function.
If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade one of the strings to UTF-8. If you've got an SV, the easiest way to do this is:
- sv_utf8_upgrade(sv);
However, you must not do this, for example:
- if (!SvUTF8(left))
- sv_utf8_upgrade(left);
If you do this in a binary operator, you will actually change one of the strings that came into the operator, and, while it shouldn't be noticeable by the end user, it can cause problems in deficient code.
Instead, bytes_to_utf8
will give you a UTF-8-encoded copy of its
string argument. This is useful for having the data available for
comparisons and so on, without harming the original SV. There's also
utf8_to_bytes
to go the other way, but naturally, this will fail if
the string contains any characters above 255 that can't be represented
in a single byte.
Not really. Just remember these things:
There's no way to tell if a string is UTF-8 or not. You can tell if an SV
is UTF-8 by looking at its SvUTF8
flag. Don't forget to set the flag if
something should be UTF-8. Treat the flag as part of the PV, even though
it's not - if you pass on the PV to somewhere, pass on the flag too.
If a string is UTF-8, always use utf8_to_uvchr_buf
to get at the value,
unless UTF8_IS_INVARIANT(*s)
in which case you can use *s
.
When writing a character uv
to a UTF-8 string, always use
uvchr_to_utf8
, unless UTF8_IS_INVARIANT(uv)) in which case
you can use *s = uv
.
Mixing UTF-8 and non-UTF-8 strings is tricky. Use bytes_to_utf8
to get
a new string which is UTF-8 encoded, and then combine them.
Custom operator support is an experimental feature that allows you to
define your own ops. This is primarily to allow the building of
interpreters for other languages in the Perl core, but it also allows
optimizations through the creation of "macro-ops" (ops which perform the
functions of multiple ops which are usually executed together, such as
gvsv, gvsv, add
.)
This feature is implemented as a new op type, OP_CUSTOM
. The Perl
core does not "know" anything special about this op type, and so it will
not be involved in any optimizations. This also means that you can
define your custom ops to be any op structure - unary, binary, list and
so on - you like.
It's important to know what custom operators won't do for you. They
won't let you add new syntax to Perl, directly. They won't even let you
add new keywords, directly. In fact, they won't change the way Perl
compiles a program at all. You have to do those changes yourself, after
Perl has compiled the program. You do this either by manipulating the op
tree using a CHECK
block and the B::Generate
module, or by adding
a custom peephole optimizer with the optimize
module.
When you do this, you replace ordinary Perl ops with custom ops by
creating ops with the type OP_CUSTOM
and the op_ppaddr
of your own
PP function. This should be defined in XS code, and should look like
the PP ops in pp_*.c. You are responsible for ensuring that your op
takes the appropriate number of values from the stack, and you are
responsible for adding stack marks if necessary.
You should also "register" your op with the Perl interpreter so that it
can produce sensible error and warning messages. Since it is possible to
have multiple custom ops within the one "logical" op type OP_CUSTOM
,
Perl uses the value of o->op_ppaddr
to determine which custom op
it is dealing with. You should create an XOP
structure for each
ppaddr you use, set the properties of the custom op with
XopENTRY_set
, and register the structure against the ppaddr using
Perl_custom_op_register
. A trivial example might look like:
- static XOP my_xop;
- static OP *my_pp(pTHX);
- BOOT:
- XopENTRY_set(&my_xop, xop_name, "myxop");
- XopENTRY_set(&my_xop, xop_desc, "Useless custom op");
- Perl_custom_op_register(aTHX_ my_pp, &my_xop);
The available fields in the structure are:
A short name for your op. This will be included in some error messages,
and will also be returned as $op->name
by the B module, so
it will appear in the output of module like B::Concise.
A short description of the function of the op.
Which of the various *OP
structures this op uses. This should be one of
the OA_*
constants from op.h, namely
This should be interpreted as 'PVOP
' only. The _OR_SVOP
is because
the only core PVOP
, OP_TRANS
, can sometimes be a SVOP
instead.
The other OA_*
constants should not be used.
This member is of type Perl_cpeep_t
, which expands to void
(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)
. If it is set, this function
will be called from Perl_rpeep
when ops of this type are encountered
by the peephole optimizer. o is the OP that needs optimizing;
oldop is the previous OP optimized, whose op_next
points to o.
B::Generate
directly supports the creation of custom ops by name.
Until May 1997, this document was maintained by Jeff Okamoto <okamoto@corp.hp.com>. It is now maintained as part of Perl itself by the Perl 5 Porters <perl5-porters@perl.org>.
With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, Stephen McCamant, and Gurusamy Sarathy.
perlapi, perlintern, perlxs, perlembed
perlhack - How to hack on Perl
This document explains how Perl development works. It includes details about the Perl 5 Porters email list, the Perl repository, the Perlbug bug tracker, patch guidelines, and commentary on Perl development philosophy.
If you just want to submit a single small patch like a pod fix, a test for a bug, comment fixes, etc., it's easy! Here's how:
The perl source is in a git repository. You can clone the repository with the following command:
- % git clone git://perl5.git.perl.org/perl.git perl
In case the advice in this guide has been updated recently, read the latest version directly from the perl source:
- % perldoc pod/perlhack.pod
Hack, hack, hack.
You can run all the tests with the following commands:
- % ./Configure -des -Dusedevel
- % make test
Keep hacking until the tests pass.
Committing your work will save the change on your local system:
- % git commit -a -m 'Commit message goes here'
Make sure the commit message describes your change in a single sentence. For example, "Fixed spelling errors in perlhack.pod".
The next step is to submit your patch to the Perl core ticket system via email.
If your changes are in a single git commit, run the following commands to write the file as a MIME attachment and send it with a meaningful subject:
- % git format-patch -1 --attach
- % ./perl -Ilib utils/perlbug -s "[PATCH] $(
- git log -1 --oneline HEAD)" -f 0001-*.patch
The perlbug program will ask you a few questions about your email address and the patch you're submitting. Once you've answered them it will submit your patch via email.
If your changes are in multiple commits, generate a patch file containing them all, and attach that:
- % git format-patch origin/blead --attach --stdout > patches
- % ./perl -Ilib utils/perlbug -f patches
When prompted, pick a subject that summarizes your changes overall and has "[PATCH]" at the beginning.
The porters appreciate the time you spent helping to make Perl better. Thank you!
The next time you wish to make a patch, you need to start from the latest perl in a pristine state. Check you don't have any local changes or added files in your perl check-out which you wish to keep, then run these commands:
- % git pull
- % git reset --hard origin/blead
- % git clean -dxf
If you want to report a bug in Perl, you must use the perlbug command line tool. This tool will ensure that your bug report includes all the relevant system and configuration information.
To browse existing Perl bugs and patches, you can use the web interface at http://rt.perl.org/.
Please check the archive of the perl5-porters list (see below) and/or the bug tracking system before submitting a bug report. Often, you'll find that the bug has been reported already.
You can log in to the bug tracking system and comment on existing bug reports. If you have additional information regarding an existing bug, please add it. This will help the porters fix the bug.
The perl5-porters (p5p) mailing list is where the Perl standard distribution is maintained and developed. The people who maintain Perl are also referred to as the "Perl 5 Porters", "p5p" or just the "porters".
A searchable archive of the list is available at http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/. There is also another archive at http://archive.develooper.com/perl5-porters@perl.org/.
The perl5-changes mailing list receives a copy of each patch that gets submitted to the maintenance and development branches of the perl repository. See http://lists.perl.org/list/perl5-changes.html for subscription and archive information.
Many porters are also active on the irc://irc.perl.org/#p5p channel. Feel free to join the channel and ask questions about hacking on the Perl core.
All of Perl's source code is kept centrally in a Git repository at perl5.git.perl.org. The repository contains many Perl revisions from Perl 1 onwards and all the revisions from Perforce, the previous version control system.
For much more detail on using git with the Perl repository, please see perlgit.
You will need a copy of Git for your computer. You can fetch a copy of the repository using the git protocol:
- % git clone git://perl5.git.perl.org/perl.git perl
This clones the repository and makes a local copy in the perl directory.
If you cannot use the git protocol for firewall reasons, you can also clone via http, though this is much slower:
- % git clone http://perl5.git.perl.org/perl.git perl
You may access the repository over the web. This allows you to browse the tree, see recent commits, subscribe to RSS feeds for the changes, search for particular commits and more. You may access it at http://perl5.git.perl.org/perl.git. A mirror of the repository is found at http://github.com/mirrors/perl.
You can also choose to use rsync to get a copy of the current source tree for the bleadperl branch and all maintenance branches:
- % rsync -avz rsync://perl5.git.perl.org/perl-current .
- % rsync -avz rsync://perl5.git.perl.org/perl-5.12.x .
- % rsync -avz rsync://perl5.git.perl.org/perl-5.10.x .
- % rsync -avz rsync://perl5.git.perl.org/perl-5.8.x .
- % rsync -avz rsync://perl5.git.perl.org/perl-5.6.x .
- % rsync -avz rsync://perl5.git.perl.org/perl-5.005xx .
(Add the --delete
option to remove leftover files.)
To get a full list of the available sync points:
- % rsync perl5.git.perl.org::
If you have a commit bit, please see perlgit for more details on using git.
If you're planning to do more extensive work than a single small fix, we encourage you to read the documentation below. This will help you focus your work and make your patches easier to incorporate into the Perl source.
If you have a small patch to submit, please submit it via perlbug. You can also send email directly to perlbug@perl.org. Please note that messages sent to perlbug may be held in a moderation queue, so you won't receive a response immediately.
You'll know your submission has been processed when you receive an email from our ticket tracking system. This email will give you a ticket number. Once your patch has made it to the ticket tracking system, it will also be sent to the perl5-porters@perl.org list.
Patches are reviewed and discussed on the p5p list. Simple, uncontroversial patches will usually be applied without any discussion. When the patch is applied, the ticket will be updated and you will receive email. In addition, an email will be sent to the p5p list.
In other cases, the patch will need more work or discussion. That will happen on the p5p list.
You are encouraged to participate in the discussion and advocate for your patch. Sometimes your patch may get lost in the shuffle. It's appropriate to send a reminder email to p5p if no action has been taken in a month. Please remember that the Perl 5 developers are all volunteers, and be polite.
Changes are always applied directly to the main development branch, called "blead". Some patches may be backported to a maintenance branch. If you think your patch is appropriate for the maintenance branch (see MAINTENANCE BRANCHES in perlpolicy), please explain why when you submit it.
If you are submitting a code patch there are several things that you can do to help the Perl 5 Porters accept your patch.
If you used git to check out the Perl source, then using git
format-patch
will produce a patch in a style suitable for Perl. The
format-patch command produces one patch file for each commit you
made. If you prefer to send a single patch for all commits, you can
use git diff
.
- % git checkout blead
- % git pull
- % git diff blead my-branch-name
This produces a patch based on the difference between blead and your
current branch. It's important to make sure that blead is up to date
before producing the diff, that's why we call git pull
first.
We strongly recommend that you use git if possible. It will make your life easier, and ours as well.
However, if you're not using git, you can still produce a suitable
patch. You'll need a pristine copy of the Perl source to diff against.
The porters prefer unified diffs. Using GNU diff
, you can produce a
diff like this:
- % diff -Npurd perl.pristine perl.mine
Make sure that you make realclean
in your copy of Perl to remove any
build artifacts, or you may get a confusing result.
As you craft each patch you intend to submit to the Perl core, it's important to write a good commit message. This is especially important if your submission will consist of a series of commits.
The first line of the commit message should be a short description without a period. It should be no longer than the subject line of an email, 50 characters being a good rule of thumb.
A lot of Git tools (Gitweb, GitHub, git log --pretty=oneline, ...) will only display the first line (cut off at 50 characters) when presenting commit summaries.
The commit message should include a description of the problem that the patch corrects or new functionality that the patch adds.
As a general rule of thumb, your commit message should help a programmer who knows the Perl core quickly understand what you were trying to do, how you were trying to do it, and why the change matters to Perl.
Your commit message should describe why the change you are making is important. When someone looks at your change in six months or six years, your intent should be clear.
If you're deprecating a feature with the intent of later simplifying another bit of code, say so. If you're fixing a performance problem or adding a new feature to support some other bit of the core, mention that.
Your commit message should describe what part of the Perl core you're changing and what you expect your patch to do.
While it's not necessary for documentation changes, new tests or trivial patches, it's often worth explaining how your change works. Even if it's clear to you today, it may not be clear to a porter next month or next year.
A commit message isn't intended to take the place of comments in your code. Commit messages should describe the change you made, while code comments should describe the current state of the code.
If you've just implemented a new feature, complete with doc, tests and well-commented code, a brief commit message will often suffice. If, however, you've just changed a single character deep in the parser or lexer, you might need to write a small novel to ensure that future readers understand what you did and why you did it.
Be sure to adequately comment your code. While commenting every line is unnecessary, anything that takes advantage of side effects of operators, that creates changes that will be felt outside of the function being patched, or that others may find confusing should be documented. If you are going to err, it is better to err on the side of adding too many comments than too few.
The best comments explain why the code does what it does, not what it does.
In general, please follow the particular style of the code you are patching.
In particular, follow these general guidelines for patching Perl sources:
8-wide tabs (no exceptions!)
4-wide indents for code, 2-wide indents for nested CPP #defines
Try hard not to exceed 79-columns
ANSI C prototypes
Uncuddled elses and "K&R" style for indenting control constructs
No C++ style (//) comments
Mark places that need to be revisited with XXX (and revisit often!)
Opening brace lines up with "if" when conditional spans multiple lines; should be at end-of-line otherwise
In function definitions, name starts in column 0 (return value is on previous line)
Single space after keywords that are followed by parens, no space between function name and following paren
Avoid assignments in conditionals, but if they're unavoidable, use extra paren, e.g. "if (a && (b = c)) ..."
"return foo;" rather than "return(foo);"
"if (!foo) ..." rather than "if (foo == FALSE) ..." etc.
Do not declare variables using "register". It may be counterproductive with modern compilers, and is deprecated in C++, under which the Perl source is regularly compiled.
In-line functions that are in headers that are accessible to XS code
need to be able to compile without warnings with commonly used extra
compilation flags, such as gcc's -Wswitch-default
which warns
whenever a switch statement does not have a "default" case. The use of
these extra flags is to catch potential problems in legal C code, and
is often used by Perl aggregators, such as Linux distributors.
If your patch changes code (rather than just changing documentation), you should also include one or more test cases which illustrate the bug you're fixing or validate the new functionality you're adding. In general, you should update an existing test file rather than create a new one.
Your test suite additions should generally follow these guidelines (courtesy of Gurusamy Sarathy <gsar@activestate.com>):
Know what you're testing. Read the docs, and the source.
Tend to fail, not succeed.
Interpret results strictly.
Use unrelated features (this will flush out bizarre interactions).
Use non-standard idioms (otherwise you are not testing TIMTOWTDI).
Avoid using hardcoded test numbers whenever possible (the EXPECTED/GOT found in t/op/tie.t is much more maintainable, and gives better failure reports).
Give meaningful error messages when a test fails.
Avoid using qx// and system() unless you are testing for them. If you do use them, make sure that you cover _all_ perl platforms.
Unlink any temporary files you create.
Promote unforeseen warnings to errors with $SIG{__WARN__}.
Be sure to use the libraries and modules shipped with the version being tested, not those that were already installed.
Add comments to the code explaining what you are testing for.
Make updating the '1..42' string unnecessary. Or make sure that you update it.
Test _all_ behaviors of a given operator, library, or function.
Test all optional arguments.
Test return values in various contexts (boolean, scalar, list, lvalue).
Use both global and lexical variables.
Don't forget the exceptional, pathological cases.
This works just like patching anything else, with one extra consideration.
Modules in the cpan/ directory of the source tree are maintained outside of the Perl core. When the author updates the module, the updates are simply copied into the core. See that module's documentation or its listing on http://search.cpan.org/ for more information on reporting bugs and submitting patches.
In most cases, patches to modules in cpan/ should be sent upstream
and should not be applied to the Perl core individually. If a patch to
a file in cpan/ absolutely cannot wait for the fix to be made
upstream, released to CPAN and copied to blead, you must add (or
update) a CUSTOMIZED
entry in the "Porting/Maintainers.pl" file
to flag that a local modification has been made. See
"Porting/Maintainers.pl" for more details.
In contrast, modules in the dist/ directory are maintained in the core.
For changes significant enough to warrant a pod/perldelta.pod entry, the porters will greatly appreciate it if you submit a delta entry along with your actual change. Significant changes include, but are not limited to:
Adding, deprecating, or removing core features
Adding, deprecating, removing, or upgrading core or dual-life modules
Adding new core tests
Fixing security issues and user-visible bugs in the core
Changes that might break existing code, either on the perl or C level
Significant performance improvements
Adding, removing, or significantly changing documentation in the pod/ directory
Important platform-specific changes
Please make sure you add the perldelta entry to the right section
within pod/perldelta.pod. More information on how to write good
perldelta entries is available in the Style
section of
Porting/how_to_write_a_perldelta.pod.
New features and extensions to the language can be contentious. There is no specific set of criteria which determine what features get added, but here are some questions to consider when developing a patch:
Our goals include, but are not limited to:
Keep it fast, simple, and useful.
Keep features/concepts as orthogonal as possible.
No arbitrary limits (platforms, data sizes, cultures).
Keep it open and exciting to use/patch/advocate Perl everywhere.
Either assimilate new technologies, or build bridges to them.
All the talk in the world is useless without an implementation. In almost every case, the person or people who argue for a new feature will be expected to be the ones who implement it. Porters capable of coding new features have their own agendas, and are not available to implement your (possibly good) idea.
It's a cardinal sin to break existing Perl programs. New warnings can be contentious--some say that a program that emits warnings is not broken, while others say it is. Adding keywords has the potential to break programs, changing the meaning of existing token sequences or functions might break programs.
The Perl 5 core includes mechanisms to help porters make backwards incompatible changes more compatible such as the feature and deprecate modules. Please use them when appropriate.
Perl 5 has extension mechanisms, modules and XS, specifically to avoid the need to keep changing the Perl interpreter. You can write modules that export functions, you can give those functions prototypes so they can be called like built-in functions, you can even write XS code to mess with the runtime data structures of the Perl interpreter if you want to implement really complicated things.
Whenever possible, new features should be prototyped in a CPAN module before they will be considered for the core.
Is this something that only the submitter wants added to the language, or is it broadly useful? Sometimes, instead of adding a feature with a tight focus, the porters might decide to wait until someone implements the more generalized feature.
Radical rewrites of large chunks of the Perl interpreter have the potential to introduce new bugs.
The smaller and more localized the change, the better. Similarly, a series of small patches is greatly preferred over a single large patch.
A patch is likely to be rejected if it closes off future avenues of development. For instance, a patch that placed a true and final interpretation on prototypes is likely to be rejected because there are still options for the future of prototypes that haven't been addressed.
Good patches (tight code, complete, correct) stand more chance of going in. Sloppy or incorrect patches might be placed on the back burner until the pumpking has time to fix, or might be discarded altogether without further notice.
The worst patches make use of system-specific features. It's highly unlikely that non-portable additions to the Perl language will be accepted.
Patches which change behaviour (fixing bugs or introducing new features) must include regression tests to verify that everything works as expected.
Without tests provided by the original author, how can anyone else changing perl in the future be sure that they haven't unwittingly broken the behaviour the patch implements? And without tests, how can the patch's author be confident that his/her hard work put into the patch won't be accidentally thrown away by someone in the future?
Patches without documentation are probably ill-thought out or incomplete. No features can be added or changed without documentation, so submitting a patch for the appropriate pod docs as well as the source code is important.
Larry said "Although the Perl Slogan is There's More Than One Way to Do It, I hesitate to make 10 ways to do something". This is a tricky heuristic to navigate, though--one man's essential addition is another man's pointless cruft.
Work for the pumpking, work for Perl programmers, work for module authors, ... Perl is supposed to be easy.
Working code is always preferred to pie-in-the-sky ideas. A patch to add a feature stands a much higher chance of making it to the language than does a random feature request, no matter how fervently argued the request might be. This ties into "Will it be useful?", as the fact that someone took the time to make the patch demonstrates a strong desire for the feature.
The core uses the same testing style as the rest of Perl, a simple "ok/not ok" run through Test::Harness, but there are a few special considerations.
There are three ways to write a test in the core: Test::More,
t/test.pl and ad hoc print $test ? "ok 42\n" : "not ok 42\n"
.
The decision of which to use depends on what part of the test suite
you're working on. This is a measure to prevent a high-level failure
(such as Config.pm breaking) from causing basic functionality tests to
fail.
The t/test.pl library provides some of the features of Test::More, but avoids loading most modules and uses as few core features as possible.
If you write your own test, use the Test Anything Protocol.
Since we don't know if require works, or even subroutines, use ad hoc tests for these three. Step carefully to avoid using the feature being tested. Tests in t/opbasic, for instance, have been placed there rather than in t/op because they test functionality which t/test.pl presumes has already been demonstrated to work.
Now that basic require() and subroutines are tested, you can use the t/test.pl library.
You can also use certain libraries like Config conditionally, but be sure to skip the test gracefully if it's not there.
Now that the core of Perl is tested, Test::More can and should be used. You can also use the full suite of core modules in the tests.
When you say "make test", Perl uses the t/TEST program to run the test suite (except under Win32 where it uses t/harness instead). All tests are run from the t/ directory, not the directory which contains the test. This causes some problems with the tests in lib/, so here's some opportunity for some patching.
You must be triply conscious of cross-platform concerns. This usually
boils down to using File::Spec and avoiding things like fork()
and system() unless absolutely necessary.
make test
targetsThere are various special make targets that can be used to test Perl slightly differently than the standard "test" target. Not all them are expected to give a 100% success rate. Many of them have several aliases, and many of them are not available on certain operating systems.
This runs some basic sanity tests on the source tree and helps catch basic errors before you submit a patch.
Run miniperl on t/base, t/comp, t/cmd, t/run, t/io, t/op, t/uni and t/mro tests.
(Only in Linux) Run all the tests using the memory leak + naughty memory access tool "valgrind". The log files will be named testname.valgrind.
Run the test suite with the t/harness controlling program, instead of t/TEST. t/harness is more sophisticated, and uses the Test::Harness module, thus using this test target supposes that perl mostly works. The main advantage for our purposes is that it prints a detailed summary of failed tests at the end. Also, unlike t/TEST, it doesn't redirect stderr to stdout.
Note that under Win32 t/harness is always used instead of t/TEST, so there is no special "test_harness" target.
Under Win32's "test" target you may use the TEST_SWITCHES and TEST_FILES environment variables to control the behaviour of t/harness. This means you can say
- nmake test TEST_FILES="op/*.t"
- nmake test TEST_SWITCHES="-torture" TEST_FILES="op/*.t"
Sets PERL_SKIP_TTY_TEST to true before running normal test.
The core distribution can now run its regression tests in parallel on
Unix-like platforms. Instead of running make test
, set TEST_JOBS
in your environment to the number of tests to run in parallel, and run
make test_harness
. On a Bourne-like shell, this can be done as
- TEST_JOBS=3 make test_harness # Run 3 tests in parallel
An environment variable is used, rather than parallel make itself,
because TAP::Harness needs to be able to schedule individual
non-conflicting test scripts itself, and there is no standard interface
to make
utilities to interact with their job schedulers.
Note that currently some test scripts may fail when run in parallel (most notably ext/IO/t/io_dir.t). If necessary, run just the failing scripts again sequentially and see if the failures go away.
You can run part of the test suite by hand by using one of the following commands from the t/ directory:
- ./perl -I../lib TEST list-of-.t-files
or
- ./perl -I../lib harness list-of-.t-files
(If you don't specify test scripts, the whole test suite will be run.)
If you use harness
for testing, you have several command line
options available to you. The arguments are as follows, and are in the
order that they must appear if used together.
- harness -v -torture -re=pattern LIST OF FILES TO TEST
- harness -v -torture -re LIST OF PATTERNS TO MATCH
If LIST OF FILES TO TEST
is omitted, the file list is obtained from
the manifest. The file list may include shell wildcards which will be
expanded out.
Run the tests under verbose mode so you can see what tests were run, and debug output.
Run the torture tests as well as the normal set.
Filter the file list so that all the test files run match PATTERN. Note that this form is distinct from the -re LIST OF PATTERNS form below in that it allows the file list to be provided as well.
Filter the file list so that all the test files run match /(LIST|OF|PATTERNS)/. Note that with this form the patterns are joined by '|' and you cannot supply a list of files, instead the test files are obtained from the MANIFEST.
You can run an individual test by a command similar to
- ./perl -I../lib path/to/foo.t
except that the harnesses set up some environment variables that may affect the execution of the test:
indicates that we're running this test as part of the perl core test suite. This is useful for modules that have a dual life on CPAN.
is set to 2 if it isn't set already (see PERL_DESTRUCT_LEVEL in perlhacktips).
(used only by t/TEST) if set, overrides the path to the perl executable that should be used to run the tests (the default being ./perl).
if set, tells to skip the tests that need a terminal. It's actually set automatically by the Makefile, but can also be forced artificially by running 'make test_notty'.
Setting this variable runs all the Net::Ping modules tests, otherwise some tests that interact with the outside world are skipped. See perl58delta.
Setting this variable skips the vrexx.t tests for OS2::REXX.
This sets a variable in op/numconvert.t.
Setting this variable includes the tests in t/bigmem/. This should
be set to the number of gigabytes of memory available for testing, eg.
PERL_TEST_MEMORY=4
indicates that tests that require 4GiB of
available memory can be run safely.
See also the documentation for the Test and Test::Harness modules, for more environment variables that affect testing.
To hack on the Perl guts, you'll need to read the following things:
An overview of the Perl source tree. This will help you find the files you're looking for.
An overview of the Perl interpreter source code and some details on how Perl does what it does.
This document walks through the creation of a small patch to Perl's C code. If you're just getting started with Perl core hacking, this will help you understand how it works.
More details on hacking the Perl core. This document focuses on lower level details such as how to write tests, compilation issues, portability, debugging, etc.
If you plan on doing serious C hacking, make sure to read this.
This is of paramount importance, since it's the documentation of what goes where in the Perl source. Read it over a couple of times and it might start to make sense - don't worry if it doesn't yet, because the best way to study it is to read it in conjunction with poking at Perl source, and we'll do that later on.
Gisle Aas's "illustrated perlguts", also known as illguts, has very helpful pictures:
A working knowledge of XSUB programming is incredibly useful for core hacking; XSUBs use techniques drawn from the PP code, the portion of the guts that actually executes a Perl program. It's a lot gentler to learn those techniques from simple examples and explanation than from the core itself.
The documentation for the Perl API explains what some of the internal functions do, as well as the many macros used in the source.
This is a collection of words of wisdom for a Perl porter; some of it is only useful to the pumpkin holder, but most of it applies to anyone wanting to go about Perl development.
The CPAN testers ( http://testers.cpan.org/ ) are a group of volunteers who test CPAN modules on a variety of platforms.
Perl Smokers ( http://www.nntp.perl.org/group/perl.daily-build/ and http://www.nntp.perl.org/group/perl.daily-build.reports/ ) automatically test Perl source releases on platforms with various configurations.
Both efforts welcome volunteers. In order to get involved in smoke testing of the perl itself visit http://search.cpan.org/dist/Test-Smoke/. In order to start smoke testing CPAN modules visit http://search.cpan.org/dist/CPANPLUS-YACSmoke/ or http://search.cpan.org/dist/minismokebox/ or http://search.cpan.org/dist/CPAN-Reporter/.
If you've read all the documentation in the document and the ones listed above, you're more than ready to hack on Perl.
Here's some more recommendations
Subscribe to perl5-porters, follow the patches and try and understand them; don't be afraid to ask if there's a portion you're not clear on - who knows, you may unearth a bug in the patch...
Do read the README associated with your operating system, e.g. README.aix on the IBM AIX OS. Don't hesitate to supply patches to that README if you find anything missing or changed over a new OS release.
Find an area of Perl that seems interesting to you, and see if you can work out how it works. Scan through the source, and step over it in the debugger. Play, poke, investigate, fiddle! You'll probably get to understand not just your chosen area but a much wider range of perl's activity as well, and probably sooner than you'd think.
If you can do these things, you've started on the long road to Perl porting. Thanks for wanting to help make Perl better - and happy hacking!
If you recognized the quote about the Road above, you're in luck.
Most software projects begin each file with a literal description of each file's purpose. Perl instead begins each with a literary allusion to that file's purpose.
Like chapters in many books, all top-level Perl source files (along with a few others here and there) begin with an epigrammatic inscription that alludes, indirectly and metaphorically, to the material you're about to read.
Quotations are taken from writings of J.R.R. Tolkien pertaining to his Legendarium, almost always from The Lord of the Rings. Chapters and page numbers are given using the following editions:
The Hobbit, by J.R.R. Tolkien. The hardcover, 70th-anniversary edition of 2007 was used, published in the UK by Harper Collins Publishers and in the US by the Houghton Mifflin Company.
The Lord of the Rings, by J.R.R. Tolkien. The hardcover, 50th-anniversary edition of 2004 was used, published in the UK by Harper Collins Publishers and in the US by the Houghton Mifflin Company.
The Lays of Beleriand, by J.R.R. Tolkien and published posthumously by his son and literary executor, C.J.R. Tolkien, being the 3rd of the 12 volumes in Christopher's mammoth History of Middle Earth. Page numbers derive from the hardcover edition, first published in 1983 by George Allen & Unwin; no page numbers changed for the special 3-volume omnibus edition of 2002 or the various trade-paper editions, all again now by Harper Collins or Houghton Mifflin.
Other JRRT books fair game for quotes would thus include The Adventures of Tom Bombadil, The Silmarillion, Unfinished Tales, and The Tale of the Children of Hurin, all but the first posthumously assembled by CJRT. But The Lord of the Rings itself is perfectly fine and probably best to quote from, provided you can find a suitable quote there.
So if you were to supply a new, complete, top-level source file to add to Perl, you should conform to this peculiar practice by yourself selecting an appropriate quotation from Tolkien, retaining the original spelling and punctuation and using the same format the rest of the quotes are in. Indirect and oblique is just fine; remember, it's a metaphor, so being meta is, after all, what it's for.
This document was originally written by Nathan Torkington, and is maintained by the perl5-porters mailing list.
perlhacktips - Tips for Perl core C code hacking
This document will help you learn the best way to go about hacking on the Perl core C code. It covers common problems, debugging, profiling, and more.
If you haven't read perlhack and perlhacktut yet, you might want to do that first.
Perl source plays by ANSI C89 rules: no C99 (or C++) extensions. In some cases we have to take pre-ANSI requirements into consideration. You don't care about some particular platform having broken Perl? I hear there is still a strong demand for J2EE programmers.
Not compiling with threading
Compiling with threading (-Duseithreads) completely rewrites the function prototypes of Perl. You better try your changes with that. Related to this is the difference between "Perl_-less" and "Perl_-ly" APIs, for example:
- Perl_sv_setiv(aTHX_ ...);
- sv_setiv(...);
The first one explicitly passes in the context, which is needed for e.g. threaded builds. The second one does that implicitly; do not get them mixed. If you are not passing in a aTHX_, you will need to do a dTHX (or a dVAR) as the first thing in the function.
See How multiple interpreters and concurrency are supported in perlguts for further discussion about context.
Not compiling with -DDEBUGGING
The DEBUGGING define exposes more code to the compiler, therefore more ways for things to go wrong. You should try it.
Introducing (non-read-only) globals
Do not introduce any modifiable globals, truly global or file static. They are bad form and complicate multithreading and other forms of concurrency. The right way is to introduce them as new interpreter variables, see intrpvar.h (at the very end for binary compatibility).
Introducing read-only (const) globals is okay, as long as you verify
with e.g. nm libperl.a|egrep -v ' [TURtr] '
(if your nm
has
BSD-style output) that the data you added really is read-only. (If it
is, it shouldn't show up in the output of that command.)
If you want to have static strings, make them constant:
- static const char etc[] = "...";
If you want to have arrays of constant strings, note carefully the
right combination of const
s:
- static const char * const yippee[] =
- {"hi", "ho", "silver"};
There is a way to completely hide any modifiable globals (they are all
moved to heap), the compilation setting
-DPERL_GLOBAL_STRUCT_PRIVATE
. It is not normally used, but can be
used for testing, read more about it in Background and PERL_IMPLICIT_CONTEXT in perlguts.
Not exporting your new function
Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any function that is part of the public API (the shared Perl library) to be explicitly marked as exported. See the discussion about embed.pl in perlguts.
Exporting your new function
The new shiny result of either genuine new functionality or your arduous refactoring is now ready and correctly exported. So what could possibly go wrong?
Maybe simply that your function did not need to be exported in the first place. Perl has a long and not so glorious history of exporting functions that it should not have.
If the function is used only inside one source code file, make it static. See the discussion about embed.pl in perlguts.
If the function is used across several files, but intended only for Perl's internal use (and this should be the common case), do not export it to the public API. See the discussion about embed.pl in perlguts.
The following are common causes of compilation and/or execution failures, not common to Perl as such. The C FAQ is good bedtime reading. Please test your changes with as many C compilers and platforms as possible; we will, anyway, and it's nice to save oneself from public embarrassment.
If using gcc, you can add the -std=c89
option which will hopefully
catch most of these unportabilities. (However it might also catch
incompatibilities in your system's header files.)
Use the Configure -Dgccansipedantic
flag to enable the gcc -ansi
-pedantic
flags which enforce stricter ANSI rules.
If using the gcc -Wall
note that not all the possible warnings (like
-Wunitialized
) are given unless you also compile with -O
.
Note that if using gcc, starting from Perl 5.9.5 the Perl core source
code files (the ones at the top level of the source code distribution,
but not e.g. the extensions under ext/) are automatically compiled with
as many as possible of the -std=c89
, -ansi
, -pedantic
, and a
selection of -W
flags (see cflags.SH).
Also study perlport carefully to avoid any bad assumptions about the operating system, filesystems, and so forth.
You may once in a while try a "make microperl" to see whether we can still compile Perl with just the bare minimum of interfaces. (See README.micro.)
Do not assume an operating system indicates a certain compiler.
Casting pointers to integers or casting integers to pointers
- void castaway(U8* p)
- {
- IV i = p;
or
- void castaway(U8* p)
- {
- IV i = (IV)p;
Both are bad, and broken, and unportable. Use the PTR2IV() macro that does it right. (Likewise, there are PTR2UV(), PTR2NV(), INT2PTR(), and NUM2PTR().)
Casting between data function pointers and data pointers
Technically speaking casting between function pointers and data pointers is unportable and undefined, but practically speaking it seems to work, but you should use the FPTR2DPTR() and DPTR2FPTR() macros. Sometimes you can also play games with unions.
Assuming sizeof(int) == sizeof(long)
There are platforms where longs are 64 bits, and platforms where ints are 64 bits, and while we are out to shock you, even platforms where shorts are 64 bits. This is all legal according to the C standard. (In other words, "long long" is not a portable way to specify 64 bits, and "long long" is not even guaranteed to be any wider than "long".)
Instead, use the definitions IV, UV, IVSIZE, I32SIZE, and so forth. Avoid things like I32 because they are not guaranteed to be exactly 32 bits, they are at least 32 bits, nor are they guaranteed to be int or long. If you really explicitly need 64-bit variables, use I64 and U64, but only if guarded by HAS_QUAD.
Assuming one can dereference any type of pointer for any type of data
- char *p = ...;
- long pony = *p; /* BAD */
Many platforms, quite rightly so, will give you a core dump instead of a pony if the p happens not to be correctly aligned.
Lvalue casts
- (int)*p = ...; /* BAD */
Simply not portable. Get your lvalue to be of the right type, or maybe use temporary variables, or dirty tricks with unions.
Assume anything about structs (especially the ones you don't control, like the ones coming from the system headers)
That a certain field exists in a struct
That no other fields exist besides the ones you know of
That a field is of certain signedness, sizeof, or type
That the fields are in a certain order
While C guarantees the ordering specified in the struct definition, between different platforms the definitions might differ
That the sizeof(struct) or the alignments are the same everywhere
There might be padding bytes between the fields to align the fields - the bytes can be anything
Structs are required to be aligned to the maximum alignment required by the fields - which for native types is for usually equivalent to sizeof() of the field
Assuming the character set is ASCIIish
Perl can compile and run under EBCDIC platforms. See perlebcdic. This is transparent for the most part, but because the character sets differ, you shouldn't use numeric (decimal, octal, nor hex) constants to refer to characters. You can safely say 'A', but not 0x41. You can safely say '\n', but not \012. If a character doesn't have a trivial input form, you should add it to the list in regen/unicode_constants.pl, and have Perl create #defines for you, based on the current platform.
Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper case alphabetic characters. That is not true in EBCDIC. Nor for 'a' to 'z'. But '0' - '9' is an unbroken range in both systems. Don't assume anything about other ranges.
Many of the comments in the existing code ignore the possibility of EBCDIC, and may be wrong therefore, even if the code works. This is actually a tribute to the successful transparent insertion of being able to handle EBCDIC without having to change pre-existing code.
UTF-8 and UTF-EBCDIC are two different encodings used to represent
Unicode code points as sequences of bytes. Macros with the same names
(but different definitions) in utf8.h
and utfebcdic.h
are used to
allow the calling code to think that there is only one such encoding.
This is almost always referred to as utf8
, but it means the EBCDIC
version as well. Again, comments in the code may well be wrong even if
the code itself is right. For example, the concept of invariant
characters
differs between ASCII and EBCDIC. On ASCII platforms, only
characters that do not have the high-order bit set (i.e. whose ordinals
are strict ASCII, 0 - 127) are invariant, and the documentation and
comments in the code may assume that, often referring to something
like, say, hibit
. The situation differs and is not so simple on
EBCDIC machines, but as long as the code itself uses the
NATIVE_IS_INVARIANT()
macro appropriately, it works, even if the
comments are wrong.
Assuming the character set is just ASCII
ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 extra characters have different meanings depending on the locale. Absent a locale, currently these extra characters are generally considered to be unassigned, and this has presented some problems. This is being changed starting in 5.12 so that these characters will be considered to be Latin-1 (ISO-8859-1).
Mixing #define and #ifdef
You cannot portably "stack" cpp directives. For example in the above you need two separate BURGLE() #defines, one for each #ifdef branch.
Adding non-comment stuff after #endif or #else
- #ifdef SNOSH
- ...
- #else !SNOSH /* BAD */
- ...
- #endif SNOSH /* BAD */
The #endif and #else cannot portably have anything non-comment after them. If you want to document what is going (which is a good idea especially if the branches are long), use (C) comments:
- #ifdef SNOSH
- ...
- #else /* !SNOSH */
- ...
- #endif /* SNOSH */
The gcc option -Wendif-labels
warns about the bad variant (by
default on starting from Perl 5.9.4).
Having a comma after the last element of an enum list
- enum color {
- CERULEAN,
- CHARTREUSE,
- CINNABAR, /* BAD */
- };
is not portable. Leave out the last comma.
Also note that whether enums are implicitly morphable to ints varies between compilers, you might need to (int).
Using //-comments
- // This function bamfoodles the zorklator. /* BAD */
That is C99 or C++. Perl is C89. Using the //-comments is silently allowed by many C compilers but cranking up the ANSI C89 strictness (which we like to do) causes the compilation to fail.
Mixing declarations and code
- void zorklator()
- {
- int n = 3;
- set_zorkmids(n); /* BAD */
- int q = 4;
That is C99 or C++. Some C compilers allow that, but you shouldn't.
The gcc option -Wdeclaration-after-statements
scans for such
problems (by default on starting from Perl 5.9.4).
Introducing variables inside for()
- for(int i = ...; ...; ...) { /* BAD */
That is C99 or C++. While it would indeed be awfully nice to have that also in C89, to limit the scope of the loop variable, alas, we cannot.
Mixing signed char pointers with unsigned char pointers
- int foo(char *s) { ... }
- ...
- unsigned char *t = ...; /* Or U8* t = ... */
- foo(t); /* BAD */
While this is legal practice, it is certainly dubious, and downright fatal in at least one platform: for example VMS cc considers this a fatal error. One cause for people often making this mistake is that a "naked char" and therefore dereferencing a "naked char pointer" have an undefined signedness: it depends on the compiler and the flags of the compiler and the underlying platform whether the result is signed or unsigned. For this very same reason using a 'char' as an array index is bad.
Macros that have string constants and their arguments as substrings of the string constants
- #define FOO(n) printf("number = %d\n", n) /* BAD */
- FOO(10);
Pre-ANSI semantics for that was equivalent to
- printf("10umber = %d\10");
which is probably not what you were expecting. Unfortunately at least one reasonably common and modern C compiler does "real backward compatibility" here, in AIX that is what still happens even though the rest of the AIX compiler is very happily C89.
Using printf formats for non-basic C types
- IV i = ...;
- printf("i = %d\n", i); /* BAD */
While this might by accident work in some platform (where IV happens to
be an int), in general it cannot. IV might be something larger. Even
worse the situation is with more specific types (defined by Perl's
configuration step in config.h):
- Uid_t who = ...;
- printf("who = %d\n", who); /* BAD */
The problem here is that Uid_t might be not only not int-wide but it
might also be unsigned, in which case large uids would be printed as
negative values.
There is no simple solution to this because of printf()'s limited intelligence, but for many types the right format is available as with either 'f' or '_f' suffix, for example:
- IVdf /* IV in decimal */
- UVxf /* UV is hexadecimal */
- printf("i = %"IVdf"\n", i); /* The IVdf is a string constant. */
- Uid_t_f /* Uid_t in decimal */
- printf("who = %"Uid_t_f"\n", who);
Or you can try casting to a "wide enough" type:
- printf("i = %"IVdf"\n", (IV)something_very_small_and_signed);
Also remember that the %p
format really does require a void pointer:
- U8* p = ...;
- printf("p = %p\n", (void*)p);
The gcc option -Wformat
scans for such problems.
Blindly using variadic macros
gcc has had them for a while with its own syntax, and C99 brought them with a standardized syntax. Don't use the former, and use the latter only if the HAS_C99_VARIADIC_MACROS is defined.
Blindly passing va_list
Not all platforms support passing va_list to further varargs (stdarg) functions. The right thing to do is to copy the va_list using the Perl_va_copy() if the NEED_VA_COPY is defined.
Using gcc statement expressions
- val = ({...;...;...}); /* BAD */
While a nice extension, it's not portable. The Perl code does admittedly use them if available to gain some extra speed (essentially as a funky form of inlining), but you shouldn't.
Binding together several statements in a macro
Use the macros STMT_START and STMT_END.
- STMT_START {
- ...
- } STMT_END
Testing for operating systems or versions when should be testing for features
- #ifdef __FOONIX__ /* BAD */
- foo = quux();
- #endif
Unless you know with 100% certainty that quux() is only ever available for the "Foonix" operating system and that is available and correctly working for all past, present, and future versions of "Foonix", the above is very wrong. This is more correct (though still not perfect, because the below is a compile-time check):
- #ifdef HAS_QUUX
- foo = quux();
- #endif
How does the HAS_QUUX become defined where it needs to be? Well, if Foonix happens to be Unixy enough to be able to run the Configure script, and Configure has been taught about detecting and testing quux(), the HAS_QUUX will be correctly defined. In other platforms, the corresponding configuration step will hopefully do the same.
In a pinch, if you cannot wait for Configure to be educated, or if you have a good hunch of where quux() might be available, you can temporarily try the following:
- #if (defined(__FOONIX__) || defined(__BARNIX__))
- # define HAS_QUUX
- #endif
- ...
- #ifdef HAS_QUUX
- foo = quux();
- #endif
But in any case, try to keep the features and operating systems separate.
malloc(0), realloc(0), calloc(0, 0) are non-portable. To be portable allocate at least one byte. (In general you should rarely need to work at this low level, but instead use the various malloc wrappers.)
snprintf() - the return type is unportable. Use my_snprintf() instead.
Last but not least, here are various tips for safer coding.
Do not use gets()
Or we will publicly ridicule you. Seriously.
Do not use strcpy() or strcat() or strncpy() or strncat()
Use my_strlcpy() and my_strlcat() instead: they either use the native implementation, or Perl's own implementation (borrowed from the public domain implementation of INN).
Do not use sprintf() or vsprintf()
If you really want just plain byte strings, use my_snprintf() and my_vsnprintf() instead, which will try to use snprintf() and vsnprintf() if those safer APIs are available. If you want something fancier than a plain byte string, use SVs and Perl_sv_catpvf().
You can compile a special debugging version of Perl, which allows you
to use the -D
option of Perl to tell more about what Perl is doing.
But sometimes there is no alternative than to dive in with a debugger,
either to see the stack trace of a core dump (very useful in a bug
report), or trying to figure out what went wrong before the core dump
happened, or how did we end up having wrong or unexpected results.
To really poke around with Perl, you'll probably want to build Perl for debugging, like this:
- ./Configure -d -D optimize=-g
- make
-g
is a flag to the C compiler to have it produce debugging
information which will allow us to step through a running program, and
to see in which C function we are at (without the debugging information
we might see only the numerical addresses of the functions, which is
not very helpful).
Configure will also turn on the DEBUGGING
compilation symbol
which enables all the internal debugging code in Perl. There are a
whole bunch of things you can debug with this: perlrun lists them
all, and the best way to find out about them is to play about with
them. The most useful options are probably
- l Context (loop) stack processing
- t Trace execution
- o Method and overloading resolution
- c String/numeric conversions
Some of the functionality of the debugging code can be achieved using XS modules.
If the debugging output of -D
doesn't help you, it's time to step
through perl's execution with a source-level debugger.
We'll use gdb
for our examples here; the principles will apply to
any debugger (many vendors call their debugger dbx
), but check the
manual of the one you're using.
To fire up the debugger, type
- gdb ./perl
Or if you have a core dump:
- gdb ./perl core
You'll want to do that in your Perl source tree so the debugger can read the source code. You should see the copyright message, followed by the prompt.
- (gdb)
help
will get you into the documentation, but here are the most
useful commands:
Run the program with the given arguments.
Tells the debugger that we'll want to pause execution when we reach either the named function (but see Internal Functions in perlguts!) or the given line in the named source file.
Steps through the program a line at a time.
Steps through the program a line at a time, without descending into functions.
Run until the next breakpoint.
Run until the end of the current function, then stop again.
Just pressing Enter will do the most recent operation again - it's a blessing when stepping through miles of source code.
Execute the given C code and print its results. WARNING: Perl makes heavy use of macros, and gdb does not necessarily support macros (see later gdb macro support). You'll have to substitute them yourself, or to invoke cpp on the source code files (see The .i Targets) So, for instance, you can't say
- print SvPV_nolen(sv)
but you have to say
- print Perl_sv_2pv_nolen(sv)
You may find it helpful to have a "macro dictionary", which you can
produce by saying cpp -dM perl.c | sort
. Even then, cpp won't
recursively apply those macros for you.
Recent versions of gdb have fairly good macro support, but in order
to use it you'll need to compile perl with macro definitions included
in the debugging information. Using gcc version 3.1, this means
configuring with -Doptimize=-g3
. Other compilers might use a
different switch (if they support debugging macros at all).
One way to get around this macro hell is to use the dumping functions
in dump.c; these work a little like an internal
Devel::Peek, but they also cover OPs and other
structures that you can't get at from Perl. Let's take an example.
We'll use the $a = $b + $c
we used before, but give it a bit of
context: $b = "6XXXX"; $c = 2.3;
. Where's a good place to stop and
poke around?
What about pp_add
, the function we examined earlier to implement the
+
operator:
- (gdb) break Perl_pp_add
- Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
Notice we use Perl_pp_add
and not pp_add
- see
Internal Functions in perlguts. With the breakpoint in place, we can
run our program:
- (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c'
Lots of junk will go past as gdb reads in the relevant source files and libraries, and then:
- Breakpoint 1, Perl_pp_add () at pp_hot.c:309
- 309 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
- (gdb) step
- 311 dPOPTOPnnrl_ul;
- (gdb)
We looked at this bit of code before, and we said that
dPOPTOPnnrl_ul
arranges for two NV
s to be placed into left
and
right
- let's slightly expand it:
- #define dPOPTOPnnrl_ul NV right = POPn; \
- SV *leftsv = TOPs; \
- NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
POPn
takes the SV from the top of the stack and obtains its NV
either directly (if SvNOK
is set) or by calling the sv_2nv
function. TOPs
takes the next SV from the top of the stack - yes,
POPn
uses TOPs
- but doesn't remove it. We then use SvNV
to
get the NV from leftsv
in the same way as before - yes, POPn
uses
SvNV
.
Since we don't have an NV for $b
, we'll have to use sv_2nv
to
convert it. If we step again, we'll find ourselves there:
- Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
- 1669 if (!sv)
- (gdb)
We can now use Perl_sv_dump
to investigate the SV:
- SV = PV(0xa057cc0) at 0xa0675d0
- REFCNT = 1
- FLAGS = (POK,pPOK)
- PV = 0xa06a510 "6XXXX"\0
- CUR = 5
- LEN = 6
- $1 = void
We know we're going to get 6
from this, so let's finish the
subroutine:
- (gdb) finish
- Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
- 0x462669 in Perl_pp_add () at pp_hot.c:311
- 311 dPOPTOPnnrl_ul;
We can also dump out this op: the current op is always stored in
PL_op
, and we can dump it with Perl_op_dump
. This'll give us
similar output to B::Debug.
- {
- 13 TYPE = add ===> 14
- TARG = 1
- FLAGS = (SCALAR,KIDS)
- {
- TYPE = null ===> (12)
- (was rv2sv)
- FLAGS = (SCALAR,KIDS)
- {
- 11 TYPE = gvsv ===> 12
- FLAGS = (SCALAR)
- GV = main::b
- }
- }
# finish this later #
Various tools exist for analysing C source code statically, as opposed to dynamically, that is, without executing the code. It is possible to detect resource leaks, undefined behaviour, type mismatches, portability problems, code paths that would cause illegal memory accesses, and other similar problems by just parsing the C code and looking at the resulting graph, what does it tell about the execution and data flows. As a matter of fact, this is exactly how C compilers know to give warnings about dubious code.
The good old C code quality inspector, lint
, is available in several
platforms, but please be aware that there are several different
implementations of it by different vendors, which means that the flags
are not identical across different platforms.
There is a lint variant called splint
(Secure Programming Lint)
available from http://www.splint.org/ that should compile on any
Unix-like platform.
There are lint
and <splint> targets in Makefile, but you may have to
diddle with the flags (see above).
Coverity (http://www.coverity.com/) is a product similar to lint and as a testbed for their product they periodically check several open source projects, and they give out accounts to open source developers to the defect databases.
The cpd tool detects cut-and-paste coding. If one instance of the cut-and-pasted code changes, all the other spots should probably be changed, too. Therefore such code should probably be turned into a subroutine or a macro.
cpd (http://pmd.sourceforge.net/cpd.html) is part of the pmd project (http://pmd.sourceforge.net/). pmd was originally written for static analysis of Java code, but later the cpd part of it was extended to parse also C and C++.
Download the pmd-bin-X.Y.zip () from the SourceForge site, extract the pmd-X.Y.jar from it, and then run that on source code thusly:
- java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD \
- --minimum-tokens 100 --files /some/where/src --language c > cpd.txt
You may run into memory limits, in which case you should use the -Xmx option:
- java -Xmx512M ...
Though much can be written about the inconsistency and coverage
problems of gcc warnings (like -Wall
not meaning "all the warnings",
or some common portability problems not being covered by -Wall
, or
-ansi
and -pedantic
both being a poorly defined collection of
warnings, and so forth), gcc is still a useful tool in keeping our
coding nose clean.
The -Wall
is by default on.
The -ansi
(and its sidekick, -pedantic
) would be nice to be on
always, but unfortunately they are not safe on all platforms, they can
for example cause fatal conflicts with the system headers (Solaris
being a prime example). If Configure -Dgccansipedantic
is used, the
cflags
frontend selects -ansi -pedantic
for the platforms where
they are known to be safe.
Starting from Perl 5.9.4 the following extra flags are added:
-Wendif-labels
-Wextra
-Wdeclaration-after-statement
The following flags would be nice to have but they would first need their own Augean stablemaster:
-Wpointer-arith
-Wshadow
-Wstrict-prototypes
The -Wtraditional
is another example of the annoying tendency of gcc
to bundle a lot of warnings under one switch (it would be impossible to
deploy in practice because it would complain a lot) but it does contain
some warnings that would be beneficial to have available on their own,
such as the warning about string constants inside macros containing the
macro arguments: this behaved differently pre-ANSI than it does in
ANSI, and some C compilers are still in transition, AIX being an
example.
Other C compilers (yes, there are other C compilers than gcc) often
have their "strict ANSI" or "strict ANSI with some portability
extensions" modes on, like for example the Sun Workshop has its -Xa
mode on (though implicitly), or the DEC (these days, HP...) has its
-std1
mode on.
NOTE 1: Running under older memory debuggers such as Purify, valgrind or Third Degree greatly slows down the execution: seconds become minutes, minutes become hours. For example as of Perl 5.8.1, the ext/Encode/t/Unicode.t takes extraordinarily long to complete under e.g. Purify, Third Degree, and valgrind. Under valgrind it takes more than six hours, even on a snappy computer. The said test must be doing something that is quite unfriendly for memory debuggers. If you don't feel like waiting, that you can simply kill away the perl process. Roughly valgrind slows down execution by factor 10, AddressSanitizer by factor 2.
NOTE 2: To minimize the number of memory leak false alarms (see PERL_DESTRUCT_LEVEL for more information), you have to set the environment variable PERL_DESTRUCT_LEVEL to 2.
For csh-like shells:
- setenv PERL_DESTRUCT_LEVEL 2
For Bourne-type shells:
- PERL_DESTRUCT_LEVEL=2
- export PERL_DESTRUCT_LEVEL
In Unixy environments you can also use the env
command:
- env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ...
NOTE 3: There are known memory leaks when there are compile-time
errors within eval or require, seeing S_doeval
in the call stack is
a good sign of these. Fixing these leaks is non-trivial, unfortunately,
but they must be fixed eventually.
NOTE 4: DynaLoader will not clean up after itself completely
unless Perl is built with the Configure option
-Accflags=-DDL_UNLOAD_ALL_AT_EXIT
.
Purify is a commercial tool that is helpful in identifying memory overruns, wild pointers, memory leaks and other such badness. Perl must be compiled in a specific way for optimal testing with Purify. Purify is available under Windows NT, Solaris, HP-UX, SGI, and Siemens Unix.
On Unix, Purify creates a new Perl binary. To get the most benefit out of Purify, you should create the perl to Purify using:
- sh Configure -Accflags=-DPURIFY -Doptimize='-g' \
- -Uusemymalloc -Dusemultiplicity
where these arguments mean:
Disables Perl's arena memory allocation functions, as well as forcing use of memory allocation functions derived from the system malloc.
Adds debugging information so that you see the exact source statements where the problem occurs. Without this flag, all you will see is the source filename of where the error occurred.
Disable Perl's malloc so that Purify can more closely monitor allocations and leaks. Using Perl's malloc will make Purify report most leaks in the "potential" leaks category.
Enabling the multiplicity option allows perl to clean up thoroughly when the interpreter shuts down, which reduces the number of bogus leak reports from Purify.
Once you've compiled a perl suitable for Purify'ing, then you can just:
- make pureperl
which creates a binary named 'pureperl' that has been Purify'ed. This binary is used in place of the standard 'perl' binary when you want to debug Perl memory problems.
As an example, to show any memory leaks produced during the standard Perl testset you would create and run the Purify'ed perl as:
- make pureperl
- cd t
- ../pureperl -I../lib harness
which would run Perl on test.pl and report any memory problems.
Purify outputs messages in "Viewer" windows by default. If you don't have a windowing environment or if you simply want the Purify output to unobtrusively go to a log file instead of to the interactive window, use these following options to output to the log file "perl.log":
- setenv PURIFYOPTIONS "-chain-length=25 -windows=no \
- -log-file=perl.log -append-logfile=yes"
If you plan to use the "Viewer" windows, then you only need this option:
- setenv PURIFYOPTIONS "-chain-length=25"
In Bourne-type shells:
- PURIFYOPTIONS="..."
- export PURIFYOPTIONS
or if you have the "env" utility:
- env PURIFYOPTIONS="..." ../pureperl ...
Purify on Windows NT instruments the Perl binary 'perl.exe' on the fly. There are several options in the makefile you should change to get the most use out of Purify:
You should add -DPURIFY to the DEFINES line so the DEFINES line looks something like:
- DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1
to disable Perl's arena memory allocation functions, as well as to force use of memory allocation functions derived from the system malloc.
Enabling the multiplicity option allows perl to clean up thoroughly when the interpreter shuts down, which reduces the number of bogus leak reports from Purify.
Disable Perl's malloc so that Purify can more closely monitor allocations and leaks. Using Perl's malloc will make Purify report most leaks in the "potential" leaks category.
Adds debugging information so that you see the exact source statements where the problem occurs. Without this flag, all you will see is the source filename of where the error occurred.
As an example, to show any memory leaks produced during the standard Perl testset you would create and run Purify as:
- cd win32
- make
- cd ../t
- purify ../perl -I../lib harness
which would instrument Perl in memory, run Perl on test.pl, then finally report any memory problems.
The valgrind tool can be used to find out both memory leaks and illegal heap memory accesses. As of version 3.3.0, Valgrind only supports Linux on x86, x86-64 and PowerPC and Darwin (OS X) on x86 and x86-64). The special "test.valgrind" target can be used to run the tests under valgrind. Found errors and memory leaks are logged in files named testfile.valgrind.
Valgrind also provides a cachegrind tool, invoked on perl as:
- VG_OPTS=--tool=cachegrind make test.valgrind
As system libraries (most notably glibc) are also triggering errors, valgrind allows to suppress such errors using suppression files. The default suppression file that comes with valgrind already catches a lot of them. Some additional suppressions are defined in t/perl.supp.
To get valgrind and for more information see
- http://valgrind.org/
AddressSanitizer is a clang extension, included in clang since v3.1. It checks illegal heap pointers, global pointers, stack pointers and use after free errors, and is fast enough that you can easily compile your debugging or optimized perl with it. It does not check memory leaks though. AddressSanitizer is available for linux, Mac OS X and soon on Windows.
To build perl with AddressSanitizer, your Configure invocation should look like:
- sh Configure -des -Dcc=clang \
- -Accflags=-faddress-sanitizer -Aldflags=-faddress-sanitizer \
- -Alddlflags=-shared\ -faddress-sanitizer
where these arguments mean:
This should be replaced by the full path to your clang executable if it is not in your path.
Compile perl and extensions sources with AddressSanitizer.
Link the perl executable with AddressSanitizer.
Link dynamic extensions with AddressSanitizer. You must manually
specify -shared
because using -Alddlflags=-shared
will prevent
Configure from setting a default value for lddlflags
, which usually
contains -shared
(at least on linux).
See also http://code.google.com/p/address-sanitizer/wiki/AddressSanitizer.
Depending on your platform there are various ways of profiling Perl.
There are two commonly used techniques of profiling executables: statistical time-sampling and basic-block counting.
The first method takes periodically samples of the CPU program counter, and since the program counter can be correlated with the code generated for functions, we get a statistical view of in which functions the program is spending its time. The caveats are that very small/fast functions have lower probability of showing up in the profile, and that periodically interrupting the program (this is usually done rather frequently, in the scale of milliseconds) imposes an additional overhead that may skew the results. The first problem can be alleviated by running the code for longer (in general this is a good idea for profiling), the second problem is usually kept in guard by the profiling tools themselves.
The second method divides up the generated code into basic blocks. Basic blocks are sections of code that are entered only in the beginning and exited only at the end. For example, a conditional jump starts a basic block. Basic block profiling usually works by instrumenting the code by adding enter basic block #nnnn book-keeping code to the generated code. During the execution of the code the basic block counters are then updated appropriately. The caveat is that the added extra code can skew the results: again, the profiling tools usually try to factor their own effects out of the results.
gprof is a profiling tool available in many Unix platforms, it uses statistical time-sampling.
You can build a profiled version of perl called "perl.gprof" by
invoking the make target "perl.gprof" (What is required is that Perl
must be compiled using the -pg
flag, you may need to re-Configure).
Running the profiled version of Perl will create an output file called
gmon.out is created which contains the profiling data collected
during the execution.
The gprof tool can then display the collected data in various ways. Usually gprof understands the following options:
Suppress statically defined functions from the profile.
Suppress the verbose descriptions in the profile.
Exclude the given routine and its descendants from the profile.
Display only the given routine and its descendants in the profile.
Generate a summary file called gmon.sum which then may be given to subsequent gprof runs to accumulate data over several runs.
Display routines that have zero usage.
For more detailed explanation of the available commands and output formats, see your own local documentation of gprof.
quick hint:
- $ sh Configure -des -Dusedevel -Doptimize='-pg' && make perl.gprof
- $ ./perl.gprof someprog # creates gmon.out in current directory
- $ gprof ./perl.gprof > out
- $ view out
Starting from GCC 3.0 basic block profiling is officially available for the GNU CC.
You can build a profiled version of perl called perl.gcov by
invoking the make target "perl.gcov" (what is required that Perl must
be compiled using gcc with the flags -fprofile-arcs -ftest-coverage
,
you may need to re-Configure).
Running the profiled version of Perl will cause profile output to be generated. For each source file an accompanying ".da" file will be created.
To display the results you use the "gcov" utility (which should be installed if you have gcc 3.0 or newer installed). gcov is run on source code files, like this
- gcov sv.c
which will cause sv.c.gcov to be created. The .gcov files contain the source code annotated with relative frequencies of execution indicated by "#" markers.
Useful options of gcov include -b
which will summarise the basic
block, branch, and function call coverage, and -c
which instead of
relative frequencies will use the actual counts. For more information
on the use of gcov and basic block profiling with gcc, see the
latest GNU CC manual, as of GCC 3.0 see
- http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc.html
and its section titled "8. gcov: a Test Coverage Program"
- http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132
quick hint:
- $ sh Configure -des -Dusedevel -Doptimize='-g' \
- -Accflags='-fprofile-arcs -ftest-coverage' \
- -Aldflags='-fprofile-arcs -ftest-coverage' && make perl.gcov
- $ rm -f regexec.c.gcov regexec.gcda
- $ ./perl.gcov
- $ gcov regexec.c
- $ view regexec.c.gcov
If you want to run any of the tests yourself manually using e.g. valgrind, or the pureperl or perl.third executables, please note that by default perl does not explicitly cleanup all the memory it has allocated (such as global memory arenas) but instead lets the exit() of the whole program "take care" of such allocations, also known as "global destruction of objects".
There is a way to tell perl to do complete cleanup: set the environment variable PERL_DESTRUCT_LEVEL to a non-zero value. The t/TEST wrapper does set this to 2, and this is what you need to do too, if you don't want to see the "global leaks": For example, for "third-degreed" Perl:
- env PERL_DESTRUCT_LEVEL=2 ./perl.third -Ilib t/foo/bar.t
(Note: the mod_perl apache module uses also this environment variable for its own purposes and extended its semantics. Refer to the mod_perl documentation for more information. Also, spawned threads do the equivalent of setting this variable to the value 1.)
If, at the end of a run you get the message N scalars leaked, you
can recompile with -DDEBUG_LEAKING_SCALARS
, which will cause the
addresses of all those leaked SVs to be dumped along with details as to
where each SV was originally allocated. This information is also
displayed by Devel::Peek. Note that the extra details recorded with
each SV increases memory usage, so it shouldn't be used in production
environments. It also converts new_SV()
from a macro into a real
function, so you can use your favourite debugger to discover where
those pesky SVs were allocated.
If you see that you're leaking memory at runtime, but neither valgrind
nor -DDEBUG_LEAKING_SCALARS
will find anything, you're probably
leaking SVs that are still reachable and will be properly cleaned up
during destruction of the interpreter. In such cases, using the -Dm
switch can point you to the source of the leak. If the executable was
built with -DDEBUG_LEAKING_SCALARS
, -Dm
will output SV
allocations in addition to memory allocations. Each SV allocation has a
distinct serial number that will be written on creation and destruction
of the SV. So if you're executing the leaking code in a loop, you need
to look for SVs that are created, but never destroyed between each
cycle. If such an SV is found, set a conditional breakpoint within
new_SV()
and make it break only when PL_sv_serial
is equal to the
serial number of the leaking SV. Then you will catch the interpreter in
exactly the state where the leaking SV is allocated, which is
sufficient in many cases to find the source of the leak.
As -Dm
is using the PerlIO layer for output, it will by itself
allocate quite a bunch of SVs, which are hidden to avoid recursion. You
can bypass the PerlIO layer if you use the SV logging provided by
-DPERL_MEM_LOG
instead.
If compiled with -DPERL_MEM_LOG
, both memory and SV allocations go
through logging functions, which is handy for breakpoint setting.
Unless -DPERL_MEM_LOG_NOIMPL
is also compiled, the logging functions
read $ENV{PERL_MEM_LOG} to determine whether to log the event, and if
so how:
- $ENV{PERL_MEM_LOG} =~ /m/ Log all memory ops
- $ENV{PERL_MEM_LOG} =~ /s/ Log all SV ops
- $ENV{PERL_MEM_LOG} =~ /t/ include timestamp in Log
- $ENV{PERL_MEM_LOG} =~ /^(\d+)/ write to FD given (default is 2)
Memory logging is somewhat similar to -Dm
but is independent of
-DDEBUGGING
, and at a higher level; all uses of Newx(), Renew(), and
Safefree() are logged with the caller's source code file and line
number (and C function name, if supported by the C compiler). In
contrast, -Dm
is directly at the point of malloc()
. SV logging is
similar.
Since the logging doesn't use PerlIO, all SV allocations are logged and
no extra SV allocations are introduced by enabling the logging. If
compiled with -DDEBUG_LEAKING_SCALARS
, the serial number for each SV
allocation is also logged.
Those debugging perl with the DDD frontend over gdb may find the following useful:
You can extend the data conversion shortcuts menu, so for example you can display an SV's IV value with one click, without doing any typing. To do that simply edit ~/.ddd/init file and add after:
- ! Display shortcuts.
- Ddd*gdbDisplayShortcuts: \
- /t () // Convert to Bin\n\
- /d () // Convert to Dec\n\
- /x () // Convert to Hex\n\
- /o () // Convert to Oct(\n\
the following two lines:
- ((XPV*) (())->sv_any )->xpv_pv // 2pvx\n\
- ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx
so now you can do ivx and pvx lookups or you can plug there the sv_peek "conversion":
- Perl_sv_peek(my_perl, (SV*)()) // sv_peek
(The my_perl is for threaded builds.) Just remember that every line, but the last one, should end with \n\
Alternatively edit the init file interactively via: 3rd mouse button -> New Display -> Edit Menu
Note: you can define up to 20 conversion shortcuts in the gdb section.
If you see in a debugger a memory area mysteriously full of 0xABABABAB or 0xEFEFEFEF, you may be seeing the effect of the Poison() macros, see perlclib.
Under ithreads the optree is read only. If you want to enforce this, to
check for write accesses from buggy code, compile with
-DPERL_DEBUG_READONLY_OPS
to enable code that allocates op memory
via mmap
, and sets it read-only when it is attached to a subroutine. Any
write access to an op results in a SIGBUS
and abort.
This code is intended for development only, and may not be portable
even to all Unix variants. Also, it is an 80% solution, in that it
isn't able to make all ops read only. Specifically it does not apply to op
slabs belonging to BEGIN
blocks.
However, as an 80% solution it is still effective, as it has caught bugs in the past.
You can expand the macros in a foo.c file by saying
- make foo.i
which will expand the macros using cpp. Don't be scared by the results.
This document was originally written by Nathan Torkington, and is maintained by the perl5-porters mailing list.
perlhacktut - Walk through the creation of a simple C code patch
This document takes you through a simple patch example.
If you haven't read perlhack yet, go do that first! You might also want to read through perlsource too.
Once you're done here, check out perlhacktips next.
Let's take a simple patch from start to finish.
Here's something Larry suggested: if a U
is the first active format
during a pack, (for example, pack "U3C8", @stuff
) then the
resulting string should be treated as UTF-8 encoded.
If you are working with a git clone of the Perl repository, you will want to create a branch for your changes. This will make creating a proper patch much simpler. See the perlgit for details on how to do this.
How do we prepare to fix this up? First we locate the code in question
- the pack happens at runtime, so it's going to be in one of the
pp files. Sure enough, pp_pack
is in pp.c. Since we're going
to be altering this file, let's copy it to pp.c~.
[Well, it was in pp.c when this tutorial was written. It has now
been split off with pp_unpack
to its own file, pp_pack.c]
Now let's look over pp_pack
: we take a pattern into pat
, and then
loop over the pattern, taking each format character in turn into
datum_type
. Then for each possible format character, we swallow up
the other arguments in the pattern (a field width, an asterisk, and so
on) and convert the next chunk input into the specified format, adding
it onto the output SV cat
.
How do we know if the U
is the first format in the pat
? Well, if
we have a pointer to the start of pat
then, if we see a U
we can
test whether we're still at the start of the string. So, here's where
pat
is set up:
- STRLEN fromlen;
- char *pat = SvPVx(*++MARK, fromlen);
- char *patend = pat + fromlen;
- I32 len;
- I32 datumtype;
- SV *fromstr;
We'll have another string pointer in there:
- STRLEN fromlen;
- char *pat = SvPVx(*++MARK, fromlen);
- char *patend = pat + fromlen;
- + char *patcopy;
- I32 len;
- I32 datumtype;
- SV *fromstr;
And just before we start the loop, we'll set patcopy
to be the start
of pat
:
- items = SP - MARK;
- MARK++;
- sv_setpvn(cat, "", 0);
- + patcopy = pat;
- while (pat < patend) {
Now if we see a U
which was at the start of the string, we turn on
the UTF8
flag for the output SV, cat
:
- + if (datumtype == 'U' && pat==patcopy+1)
- + SvUTF8_on(cat);
- if (datumtype == '#') {
- while (pat < patend && *pat != '\n')
- pat++;
Remember that it has to be patcopy+1
because the first character of
the string is the U
which has been swallowed into datumtype!
Oops, we forgot one thing: what if there are spaces at the start of the
pattern? pack(" U*", @stuff)
will have U
as the first active
character, even though it's not the first thing in the pattern. In this
case, we have to advance patcopy
along with pat
when we see
spaces:
- if (isSPACE(datumtype))
- continue;
needs to become
- if (isSPACE(datumtype)) {
- patcopy++;
- continue;
- }
OK. That's the C part done. Now we must do two additional things before this patch is ready to go: we've changed the behaviour of Perl, and so we must document that change. We must also provide some more regression tests to make sure our patch works and doesn't create a bug somewhere else along the line.
The regression tests for each operator live in t/op/, and so we make
a copy of t/op/pack.t to t/op/pack.t~. Now we can add our tests
to the end. First, we'll test that the U
does indeed create Unicode
strings.
t/op/pack.t has a sensible ok() function, but if it didn't we could use the one from t/test.pl.
- require './test.pl';
- plan( tests => 159 );
so instead of this:
we can write the more sensible (see Test::More for a full explanation of is() and other testing functions).
Now we'll test that we got that space-at-the-beginning business right:
And finally we'll test that we don't make Unicode strings if U
is
not the first active format:
Mustn't forget to change the number of tests which appears at the top, or else the automated tester will get confused. This will either look like this:
- print "1..156\n";
or this:
- plan( tests => 156 );
We now compile up Perl, and run it through the test suite. Our new tests pass, hooray!
Finally, the documentation. The job is never done until the paperwork
is over, so let's describe the change we've just made. The relevant
place is pod/perlfunc.pod; again, we make a copy, and then we'll
insert this text in the description of pack:
- =item *
- If the pattern begins with a C<U>, the resulting string will be treated
- as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string
- with an initial C<U0>, and the bytes that follow will be interpreted as
- Unicode characters. If you don't want this to happen, you can begin
- your pattern with C<C0> (or anything else) to force Perl not to UTF-8
- encode your string, and then follow this with a C<U*> somewhere in your
- pattern.
See perlhack for details on how to submit this patch.
This document was originally written by Nathan Torkington, and is maintained by the perl5-porters mailing list.
perlhaiku - Perl version 5.10+ on Haiku
This file contains instructions how to build Perl for Haiku and lists known problems.
The build procedure is completely standard:
- ./Configure -de
- make
- make install
Make perl executable and create a symlink for libperl:
- chmod a+x /boot/common/bin/perl
- cd /boot/common/lib; ln -s perl5/5.18.2/BePC-haiku/CORE/libperl.so .
Replace 5.18.2
with your respective version of Perl.
The following problems are encountered with Haiku revision 28311:
Perl cannot be compiled with threading support ATM.
The ext/Socket/t/socketpair.t test fails. More precisely: the subtests using datagram sockets fail. Unix datagram sockets aren't implemented in Haiku yet.
A subtest of the ext/Sys/Syslog/t/syslog.t test fails. This is due to Haiku not implementing /dev/log support yet.
The tests lib/Net/Ping/t/450_service.t and lib/Net/Ping/t/510_ping_udp.t fail. This is due to bugs in Haiku's network stack implementation.
For Haiku specific problems contact the HaikuPorts developers: http://ports.haiku-files.org/
The initial Haiku port was done by Ingo Weinhold <ingo_weinhold@gmx.de>.
Last update: 2008-10-29
perlhist - the Perl history records
This document aims to record the Perl source code releases.
Perl history in brief, by Larry Wall:
- Perl 0 introduced Perl to my officemates.
- Perl 1 introduced Perl to the world, and changed /\(...\|...\)/ to
- /(...|...)/. \(Dan Faigin still hasn't forgiven me. :-\)
- Perl 2 introduced Henry Spencer's regular expression package.
- Perl 3 introduced the ability to handle binary data (embedded nulls).
- Perl 4 introduced the first Camel book. Really. We mostly just
- switched version numbers so the book could refer to 4.000.
- Perl 5 introduced everything else, including the ability to
- introduce everything else.
Larry Wall, Andy Dougherty, Tom Christiansen, Charles Bailey, Nick
Ing-Simmons, Chip Salzenberg, Tim Bunce, Malcolm Beattie, Gurusamy
Sarathy, Graham Barr, Jarkko Hietaniemi, Hugo van der Sanden,
Michael Schwern, Rafael Garcia-Suarez, Nicholas Clark, Richard Clamp,
Leon Brocard, Dave Mitchell, Jesse Vincent, Ricardo Signes, Steve Hay,
Matt S Trout, David Golden, Florian Ragwitz, Tatsuhiko Miyagawa,
Chris BinGOs
Williams, Zefram, Ævar Arnfjörð Bjarmason, Stevan
Little, Dave Rolsky, Max Maischein, Abigail, Jesse Luehrs, Tony Cook,
Dominic Hargreaves, Aaron Crane and Aristotle Pagaltzis.
[from Porting/pumpkin.pod in the Perl source code distribution]
Chip Salzenberg gets credit for that, with a nod to his cow orker, David Croy. We had passed around various names (baton, token, hot potato) but none caught on. Then, Chip asked:
[begin quote]
- Who has the patch pumpkin?
To explain: David Croy once told me that at a previous job, there was one tape drive and multiple systems that used it for backups. But instead of some high-tech exclusion software, they used a low-tech method to prevent multiple simultaneous backups: a stuffed pumpkin. No one was allowed to make backups unless they had the "backup pumpkin".
[end quote]
The name has stuck. The holder of the pumpkin is sometimes called the pumpking (keeping the source afloat?) or the pumpkineer (pulling the strings?).
- Pump- Release Date Notes
- king (by no means
- comprehensive,
- see Changes*
- for details)
- ======================================================================
- Larry 0 Classified. Don't ask.
- Larry 1.000 1987-Dec-18
- 1.001..10 1988-Jan-30
- 1.011..14 1988-Feb-02
- Schwern 1.0.15 2002-Dec-18 Modernization
- Richard 1.0_16 2003-Dec-18
- Larry 2.000 1988-Jun-05
- 2.001 1988-Jun-28
- Larry 3.000 1989-Oct-18
- 3.001 1989-Oct-26
- 3.002..4 1989-Nov-11
- 3.005 1989-Nov-18
- 3.006..8 1989-Dec-22
- 3.009..13 1990-Mar-02
- 3.014 1990-Mar-13
- 3.015 1990-Mar-14
- 3.016..18 1990-Mar-28
- 3.019..27 1990-Aug-10 User subs.
- 3.028 1990-Aug-14
- 3.029..36 1990-Oct-17
- 3.037 1990-Oct-20
- 3.040 1990-Nov-10
- 3.041 1990-Nov-13
- 3.042..43 1991-Jan-??
- 3.044 1991-Jan-12
- Larry 4.000 1991-Mar-21
- 4.001..3 1991-Apr-12
- 4.004..9 1991-Jun-07
- 4.010 1991-Jun-10
- 4.011..18 1991-Nov-05
- 4.019 1991-Nov-11 Stable.
- 4.020..33 1992-Jun-08
- 4.034 1992-Jun-11
- 4.035 1992-Jun-23
- Larry 4.036 1993-Feb-05 Very stable.
- 5.000alpha1 1993-Jul-31
- 5.000alpha2 1993-Aug-16
- 5.000alpha3 1993-Oct-10
- 5.000alpha4 1993-???-??
- 5.000alpha5 1993-???-??
- 5.000alpha6 1994-Mar-18
- 5.000alpha7 1994-Mar-25
- Andy 5.000alpha8 1994-Apr-04
- Larry 5.000alpha9 1994-May-05 ext appears.
- 5.000alpha10 1994-Jun-11
- 5.000alpha11 1994-Jul-01
- Andy 5.000a11a 1994-Jul-07 To fit 14.
- 5.000a11b 1994-Jul-14
- 5.000a11c 1994-Jul-19
- 5.000a11d 1994-Jul-22
- Larry 5.000alpha12 1994-Aug-04
- Andy 5.000a12a 1994-Aug-08
- 5.000a12b 1994-Aug-15
- 5.000a12c 1994-Aug-22
- 5.000a12d 1994-Aug-22
- 5.000a12e 1994-Aug-22
- 5.000a12f 1994-Aug-24
- 5.000a12g 1994-Aug-24
- 5.000a12h 1994-Aug-24
- Larry 5.000beta1 1994-Aug-30
- Andy 5.000b1a 1994-Sep-06
- Larry 5.000beta2 1994-Sep-14 Core slushified.
- Andy 5.000b2a 1994-Sep-14
- 5.000b2b 1994-Sep-17
- 5.000b2c 1994-Sep-17
- Larry 5.000beta3 1994-Sep-??
- Andy 5.000b3a 1994-Sep-18
- 5.000b3b 1994-Sep-22
- 5.000b3c 1994-Sep-23
- 5.000b3d 1994-Sep-27
- 5.000b3e 1994-Sep-28
- 5.000b3f 1994-Sep-30
- 5.000b3g 1994-Oct-04
- Andy 5.000b3h 1994-Oct-07
- Larry? 5.000gamma 1994-Oct-13?
- Larry 5.000 1994-Oct-17
- Andy 5.000a 1994-Dec-19
- 5.000b 1995-Jan-18
- 5.000c 1995-Jan-18
- 5.000d 1995-Jan-18
- 5.000e 1995-Jan-18
- 5.000f 1995-Jan-18
- 5.000g 1995-Jan-18
- 5.000h 1995-Jan-18
- 5.000i 1995-Jan-26
- 5.000j 1995-Feb-07
- 5.000k 1995-Feb-11
- 5.000l 1995-Feb-21
- 5.000m 1995-Feb-28
- 5.000n 1995-Mar-07
- 5.000o 1995-Mar-13?
- Larry 5.001 1995-Mar-13
- Andy 5.001a 1995-Mar-15
- 5.001b 1995-Mar-31
- 5.001c 1995-Apr-07
- 5.001d 1995-Apr-14
- 5.001e 1995-Apr-18 Stable.
- 5.001f 1995-May-31
- 5.001g 1995-May-25
- 5.001h 1995-May-25
- 5.001i 1995-May-30
- 5.001j 1995-Jun-05
- 5.001k 1995-Jun-06
- 5.001l 1995-Jun-06 Stable.
- 5.001m 1995-Jul-02 Very stable.
- 5.001n 1995-Oct-31 Very unstable.
- 5.002beta1 1995-Nov-21
- 5.002b1a 1995-Dec-04
- 5.002b1b 1995-Dec-04
- 5.002b1c 1995-Dec-04
- 5.002b1d 1995-Dec-04
- 5.002b1e 1995-Dec-08
- 5.002b1f 1995-Dec-08
- Tom 5.002b1g 1995-Dec-21 Doc release.
- Andy 5.002b1h 1996-Jan-05
- 5.002b2 1996-Jan-14
- Larry 5.002b3 1996-Feb-02
- Andy 5.002gamma 1996-Feb-11
- Larry 5.002delta 1996-Feb-27
- Larry 5.002 1996-Feb-29 Prototypes.
- Charles 5.002_01 1996-Mar-25
- 5.003 1996-Jun-25 Security release.
- 5.003_01 1996-Jul-31
- Nick 5.003_02 1996-Aug-10
- Andy 5.003_03 1996-Aug-28
- 5.003_04 1996-Sep-02
- 5.003_05 1996-Sep-12
- 5.003_06 1996-Oct-07
- 5.003_07 1996-Oct-10
- Chip 5.003_08 1996-Nov-19
- 5.003_09 1996-Nov-26
- 5.003_10 1996-Nov-29
- 5.003_11 1996-Dec-06
- 5.003_12 1996-Dec-19
- 5.003_13 1996-Dec-20
- 5.003_14 1996-Dec-23
- 5.003_15 1996-Dec-23
- 5.003_16 1996-Dec-24
- 5.003_17 1996-Dec-27
- 5.003_18 1996-Dec-31
- 5.003_19 1997-Jan-04
- 5.003_20 1997-Jan-07
- 5.003_21 1997-Jan-15
- 5.003_22 1997-Jan-16
- 5.003_23 1997-Jan-25
- 5.003_24 1997-Jan-29
- 5.003_25 1997-Feb-04
- 5.003_26 1997-Feb-10
- 5.003_27 1997-Feb-18
- 5.003_28 1997-Feb-21
- 5.003_90 1997-Feb-25 Ramping up to the 5.004 release.
- 5.003_91 1997-Mar-01
- 5.003_92 1997-Mar-06
- 5.003_93 1997-Mar-10
- 5.003_94 1997-Mar-22
- 5.003_95 1997-Mar-25
- 5.003_96 1997-Apr-01
- 5.003_97 1997-Apr-03 Fairly widely used.
- 5.003_97a 1997-Apr-05
- 5.003_97b 1997-Apr-08
- 5.003_97c 1997-Apr-10
- 5.003_97d 1997-Apr-13
- 5.003_97e 1997-Apr-15
- 5.003_97f 1997-Apr-17
- 5.003_97g 1997-Apr-18
- 5.003_97h 1997-Apr-24
- 5.003_97i 1997-Apr-25
- 5.003_97j 1997-Apr-28
- 5.003_98 1997-Apr-30
- 5.003_99 1997-May-01
- 5.003_99a 1997-May-09
- p54rc1 1997-May-12 Release Candidates.
- p54rc2 1997-May-14
- Chip 5.004 1997-May-15 A major maintenance release.
- Tim 5.004_01-t1 1997-???-?? The 5.004 maintenance track.
- 5.004_01-t2 1997-Jun-11 aka perl5.004m1t2
- 5.004_01 1997-Jun-13
- 5.004_01_01 1997-Jul-29 aka perl5.004m2t1
- 5.004_01_02 1997-Aug-01 aka perl5.004m2t2
- 5.004_01_03 1997-Aug-05 aka perl5.004m2t3
- 5.004_02 1997-Aug-07
- 5.004_02_01 1997-Aug-12 aka perl5.004m3t1
- 5.004_03-t2 1997-Aug-13 aka perl5.004m3t2
- 5.004_03 1997-Sep-05
- 5.004_04-t1 1997-Sep-19 aka perl5.004m4t1
- 5.004_04-t2 1997-Sep-23 aka perl5.004m4t2
- 5.004_04-t3 1997-Oct-10 aka perl5.004m4t3
- 5.004_04-t4 1997-Oct-14 aka perl5.004m4t4
- 5.004_04 1997-Oct-15
- 5.004_04-m1 1998-Mar-04 (5.004m5t1) Maint. trials for 5.004_05.
- 5.004_04-m2 1998-May-01
- 5.004_04-m3 1998-May-15
- 5.004_04-m4 1998-May-19
- 5.004_05-MT5 1998-Jul-21
- 5.004_05-MT6 1998-Oct-09
- 5.004_05-MT7 1998-Nov-22
- 5.004_05-MT8 1998-Dec-03
- Chip 5.004_05-MT9 1999-Apr-26
- 5.004_05 1999-Apr-29
- Malcolm 5.004_50 1997-Sep-09 The 5.005 development track.
- 5.004_51 1997-Oct-02
- 5.004_52 1997-Oct-15
- 5.004_53 1997-Oct-16
- 5.004_54 1997-Nov-14
- 5.004_55 1997-Nov-25
- 5.004_56 1997-Dec-18
- 5.004_57 1998-Feb-03
- 5.004_58 1998-Feb-06
- 5.004_59 1998-Feb-13
- 5.004_60 1998-Feb-20
- 5.004_61 1998-Feb-27
- 5.004_62 1998-Mar-06
- 5.004_63 1998-Mar-17
- 5.004_64 1998-Apr-03
- 5.004_65 1998-May-15
- 5.004_66 1998-May-29
- Sarathy 5.004_67 1998-Jun-15
- 5.004_68 1998-Jun-23
- 5.004_69 1998-Jun-29
- 5.004_70 1998-Jul-06
- 5.004_71 1998-Jul-09
- 5.004_72 1998-Jul-12
- 5.004_73 1998-Jul-13
- 5.004_74 1998-Jul-14 5.005 beta candidate.
- 5.004_75 1998-Jul-15 5.005 beta1.
- 5.004_76 1998-Jul-21 5.005 beta2.
- Sarathy 5.005 1998-Jul-22 Oneperl.
- Sarathy 5.005_01 1998-Jul-27 The 5.005 maintenance track.
- 5.005_02-T1 1998-Aug-02
- 5.005_02-T2 1998-Aug-05
- 5.005_02 1998-Aug-08
- Graham 5.005_03-MT1 1998-Nov-30
- 5.005_03-MT2 1999-Jan-04
- 5.005_03-MT3 1999-Jan-17
- 5.005_03-MT4 1999-Jan-26
- 5.005_03-MT5 1999-Jan-28
- 5.005_03-MT6 1999-Mar-05
- 5.005_03 1999-Mar-28
- Leon 5.005_04-RC1 2004-Feb-05
- 5.005_04-RC2 2004-Feb-18
- 5.005_04 2004-Feb-23
- 5.005_05-RC1 2009-Feb-16
- Sarathy 5.005_50 1998-Jul-26 The 5.6 development track.
- 5.005_51 1998-Aug-10
- 5.005_52 1998-Sep-25
- 5.005_53 1998-Oct-31
- 5.005_54 1998-Nov-30
- 5.005_55 1999-Feb-16
- 5.005_56 1999-Mar-01
- 5.005_57 1999-May-25
- 5.005_58 1999-Jul-27
- 5.005_59 1999-Aug-02
- 5.005_60 1999-Aug-02
- 5.005_61 1999-Aug-20
- 5.005_62 1999-Oct-15
- 5.005_63 1999-Dec-09
- 5.5.640 2000-Feb-02
- 5.5.650 2000-Feb-08 beta1
- 5.5.660 2000-Feb-22 beta2
- 5.5.670 2000-Feb-29 beta3
- 5.6.0-RC1 2000-Mar-09 Release candidate 1.
- 5.6.0-RC2 2000-Mar-14 Release candidate 2.
- 5.6.0-RC3 2000-Mar-21 Release candidate 3.
- Sarathy 5.6.0 2000-Mar-22
- Sarathy 5.6.1-TRIAL1 2000-Dec-18 The 5.6 maintenance track.
- 5.6.1-TRIAL2 2001-Jan-31
- 5.6.1-TRIAL3 2001-Mar-19
- 5.6.1-foolish 2001-Apr-01 The "fools-gold" release.
- 5.6.1 2001-Apr-08
- Rafael 5.6.2-RC1 2003-Nov-08
- 5.6.2 2003-Nov-15 Fix new build issues
- Jarkko 5.7.0 2000-Sep-02 The 5.7 track: Development.
- 5.7.1 2001-Apr-09
- 5.7.2 2001-Jul-13 Virtual release candidate 0.
- 5.7.3 2002-Mar-05
- 5.8.0-RC1 2002-Jun-01
- 5.8.0-RC2 2002-Jun-21
- 5.8.0-RC3 2002-Jul-13
- Jarkko 5.8.0 2002-Jul-18
- Jarkko 5.8.1-RC1 2003-Jul-10 The 5.8 maintenance track
- 5.8.1-RC2 2003-Jul-11
- 5.8.1-RC3 2003-Jul-30
- 5.8.1-RC4 2003-Aug-01
- 5.8.1-RC5 2003-Sep-22
- 5.8.1 2003-Sep-25
- Nicholas 5.8.2-RC1 2003-Oct-27
- 5.8.2-RC2 2003-Nov-03
- 5.8.2 2003-Nov-05
- 5.8.3-RC1 2004-Jan-07
- 5.8.3 2004-Jan-14
- 5.8.4-RC1 2004-Apr-05
- 5.8.4-RC2 2004-Apr-15
- 5.8.4 2004-Apr-21
- 5.8.5-RC1 2004-Jul-06
- 5.8.5-RC2 2004-Jul-08
- 5.8.5 2004-Jul-19
- 5.8.6-RC1 2004-Nov-11
- 5.8.6 2004-Nov-27
- 5.8.7-RC1 2005-May-18
- 5.8.7 2005-May-30
- 5.8.8-RC1 2006-Jan-20
- 5.8.8 2006-Jan-31
- 5.8.9-RC1 2008-Nov-10
- 5.8.9-RC2 2008-Dec-06
- 5.8.9 2008-Dec-14
- Hugo 5.9.0 2003-Oct-27 The 5.9 development track
- Rafael 5.9.1 2004-Mar-16
- 5.9.2 2005-Apr-01
- 5.9.3 2006-Jan-28
- 5.9.4 2006-Aug-15
- 5.9.5 2007-Jul-07
- 5.10.0-RC1 2007-Nov-17
- 5.10.0-RC2 2007-Nov-25
- Rafael 5.10.0 2007-Dec-18
- David M 5.10.1-RC1 2009-Aug-06 The 5.10 maintenance track
- 5.10.1-RC2 2009-Aug-18
- 5.10.1 2009-Aug-22
- Jesse 5.11.0 2009-Oct-02 The 5.11 development track
- 5.11.1 2009-Oct-20
- Leon 5.11.2 2009-Nov-20
- Jesse 5.11.3 2009-Dec-20
- Ricardo 5.11.4 2010-Jan-20
- Steve 5.11.5 2010-Feb-20
- Jesse 5.12.0-RC0 2010-Mar-21
- 5.12.0-RC1 2010-Mar-29
- 5.12.0-RC2 2010-Apr-01
- 5.12.0-RC3 2010-Apr-02
- 5.12.0-RC4 2010-Apr-06
- 5.12.0-RC5 2010-Apr-09
- Jesse 5.12.0 2010-Apr-12
- Jesse 5.12.1-RC2 2010-May-13 The 5.12 maintenance track
- 5.12.1-RC1 2010-May-09
- 5.12.1 2010-May-16
- 5.12.2-RC2 2010-Aug-31
- 5.12.2 2010-Sep-06
- Ricardo 5.12.3-RC1 2011-Jan-09
- Ricardo 5.12.3-RC2 2011-Jan-14
- Ricardo 5.12.3-RC3 2011-Jan-17
- Ricardo 5.12.3 2011-Jan-21
- Leon 5.12.4-RC1 2011-Jun-08
- Leon 5.12.4 2011-Jun-20
- Dominic 5.12.5 2012-Nov-10
- Leon 5.13.0 2010-Apr-20 The 5.13 development track
- Ricardo 5.13.1 2010-May-20
- Matt 5.13.2 2010-Jun-22
- David G 5.13.3 2010-Jul-20
- Florian 5.13.4 2010-Aug-20
- Steve 5.13.5 2010-Sep-19
- Miyagawa 5.13.6 2010-Oct-20
- BinGOs 5.13.7 2010-Nov-20
- Zefram 5.13.8 2010-Dec-20
- Jesse 5.13.9 2011-Jan-20
- Ævar 5.13.10 2011-Feb-20
- Florian 5.13.11 2011-Mar-20
- Jesse 5.14.0RC1 2011-Apr-20
- Jesse 5.14.0RC2 2011-May-04
- Jesse 5.14.0RC3 2011-May-11
- Jesse 5.14.0 2011-May-14 The 5.14 maintenance track
- Jesse 5.14.1 2011-Jun-16
- Florian 5.14.2-RC1 2011-Sep-19
- 5.14.2 2011-Sep-26
- Dominic 5.14.3 2012-Oct-12
- David M 5.14.4-RC1 2013-Mar-05
- David M 5.14.4-RC2 2013-Mar-07
- David M 5.14.4 2013-Mar-10
- David G 5.15.0 2011-Jun-20 The 5.15 development track
- Zefram 5.15.1 2011-Jul-20
- Ricardo 5.15.2 2011-Aug-20
- Stevan 5.15.3 2011-Sep-20
- Florian 5.15.4 2011-Oct-20
- Steve 5.15.5 2011-Nov-20
- Dave R 5.15.6 2011-Dec-20
- BinGOs 5.15.7 2012-Jan-20
- Max M 5.15.8 2012-Feb-20
- Abigail 5.15.9 2012-Mar-20
- Ricardo 5.16.0-RC0 2012-May-10
- Ricardo 5.16.0-RC1 2012-May-14
- Ricardo 5.16.0-RC2 2012-May-15
- Ricardo 5.16.0 2012-May-20 The 5.16 maintenance track
- Ricardo 5.16.1 2012-Aug-08
- Ricardo 5.16.2 2012-Nov-01
- Ricardo 5.16.3-RC1 2013-Mar-06
- Ricardo 5.16.3 2013-Mar-11
- Zefram 5.17.0 2012-May-26 The 5.17 development track
- Jesse L 5.17.1 2012-Jun-20
- TonyC 5.17.2 2012-Jul-20
- Steve 5.17.3 2012-Aug-20
- Florian 5.17.4 2012-Sep-20
- Florian 5.17.5 2012-Oct-20
- Ricardo 5.17.6 2012-Nov-20
- Dave R 5.17.7 2012-Dec-18
- Aaron 5.17.8 2013-Jan-20
- BinGOs 5.17.9 2013-Feb-20
- Max M 5.17.10 2013-Mar-21
- Ricardo 5.18.0-RC1 2013-May-11 The 5.18 maintenance track
- Ricardo 5.18.0-RC2 2013-May-12
- Ricardo 5.18.0-RC3 2013-May-13
- Ricardo 5.18.0-RC4 2013-May-15
- Ricardo 5.18.0 2013-May-18
- Ricardo 5.18.1-RC1 2013-Aug-01
- Ricardo 5.18.1-RC2 2013-Aug-03
- Ricardo 5.18.1-RC3 2013-Aug-08
- Ricardo 5.18.1 2013-Aug-12
- Ricardo 5.18.2 2014-Jan-06
- Ricardo 5.19.0 2013-May-20 The 5.19 development track
- David G 5.19.1 2013-Jun-21
- Aristotle 5.19.2 2013-Jul-22
For example the notation "core: 212 29" in the release 1.000 means that it had in the core 212 kilobytes, in 29 files. The "core".."doc" are explained below.
- release core lib ext t doc
- ======================================================================
- 1.000 212 29 - - - - 38 51 62 3
- 1.014 219 29 - - - - 39 52 68 4
- 2.000 309 31 2 3 - - 55 57 92 4
- 2.001 312 31 2 3 - - 55 57 94 4
- 3.000 508 36 24 11 - - 79 73 156 5
- 3.044 645 37 61 20 - - 90 74 190 6
- 4.000 635 37 59 20 - - 91 75 198 4
- 4.019 680 37 85 29 - - 98 76 199 4
- 4.036 709 37 89 30 - - 98 76 208 5
- 5.000alpha2 785 50 114 32 - - 112 86 209 5
- 5.000alpha3 801 50 117 33 - - 121 87 209 5
- 5.000alpha9 1022 56 149 43 116 29 125 90 217 6
- 5.000a12h 978 49 140 49 205 46 152 97 228 9
- 5.000b3h 1035 53 232 70 216 38 162 94 218 21
- 5.000 1038 53 250 76 216 38 154 92 536 62
- 5.001m 1071 54 388 82 240 38 159 95 544 29
- 5.002 1121 54 661 101 287 43 155 94 847 35
- 5.003 1129 54 680 102 291 43 166 100 853 35
- 5.003_07 1231 60 748 106 396 53 213 137 976 39
- 5.004 1351 60 1230 136 408 51 355 161 1587 55
- 5.004_01 1356 60 1258 138 410 51 358 161 1587 55
- 5.004_04 1375 60 1294 139 413 51 394 162 1629 55
- 5.004_05 1463 60 1435 150 394 50 445 175 1855 59
- 5.004_51 1401 61 1260 140 413 53 358 162 1594 56
- 5.004_53 1422 62 1295 141 438 70 394 162 1637 56
- 5.004_56 1501 66 1301 140 447 74 408 165 1648 57
- 5.004_59 1555 72 1317 142 448 74 424 171 1678 58
- 5.004_62 1602 77 1327 144 629 92 428 173 1674 58
- 5.004_65 1626 77 1358 146 615 92 446 179 1698 60
- 5.004_68 1856 74 1382 152 619 92 463 187 1784 60
- 5.004_70 1863 75 1456 154 675 92 494 194 1809 60
- 5.004_73 1874 76 1467 152 762 102 506 196 1883 61
- 5.004_75 1877 76 1467 152 770 103 508 196 1896 62
- 5.005 1896 76 1469 152 795 103 509 197 1945 63
- 5.005_03 1936 77 1541 153 813 104 551 201 2176 72
- 5.005_50 1969 78 1842 301 795 103 514 198 1948 63
- 5.005_53 1999 79 1885 303 806 104 602 224 2002 67
- 5.005_56 2086 79 1970 307 866 113 672 238 2221 75
- 5.6.0 2820 79 2626 364 1096 129 863 280 2840 93
- 5.6.1 2946 78 2921 430 1171 132 1024 304 3330 102
- 5.6.2 2947 78 3143 451 1247 127 1303 387 3406 102
- 5.7.0 2977 80 2801 425 1250 132 975 307 3206 100
- 5.7.1 3351 84 3442 455 1944 167 1334 357 3698 124
- 5.7.2 3491 87 4858 618 3290 298 1598 449 3910 139
- 5.7.3 3299 85 4295 537 2196 300 2176 626 4171 120
- 5.8.0 3489 87 4533 585 2437 331 2588 726 4368 125
- 5.8.1 3674 90 5104 623 2604 353 2983 836 4625 134
- 5.8.2 3633 90 5111 623 2623 357 3019 848 4634 135
- 5.8.3 3625 90 5141 624 2660 363 3083 869 4669 136
- 5.8.4 3653 90 5170 634 2684 368 3148 885 4689 137
- 5.8.5 3664 90 4260 303 2707 369 3208 898 4689 138
- 5.8.6 3690 90 4271 303 3141 396 3411 925 4709 139
- 5.8.7 3788 90 4322 307 3297 401 3485 964 4744 141
- 5.8.8 3895 90 4357 314 3409 431 3622 1017 4979 144
- 5.8.9 4132 93 5508 330 3826 529 4364 1234 5348 152
- 5.9.0 3657 90 4951 626 2603 354 3011 841 4609 135
- 5.9.1 3580 90 5196 634 2665 367 3186 889 4725 138
- 5.9.2 3863 90 4654 312 3283 403 3551 973 4800 142
- 5.9.3 4096 91 5318 381 4806 597 4272 1214 5139 147
- 5.9.4 4393 94 5718 415 4578 642 4646 1310 5335 153
- 5.9.5 4681 96 6849 479 4827 671 5155 1490 5572 159
- 5.10.0 4710 97 7050 486 4899 673 5275 1503 5673 160
- 5.10.1 4858 98 7440 519 6195 921 6147 1751 5151 163
- 5.12.0 4999 100 1146 121 15227 2176 6400 1843 5342 168
- 5.12.1 5000 100 1146 121 15283 2178 6407 1846 5354 169
- 5.12.2 5003 100 1146 121 15404 2178 6413 1846 5376 170
- 5.12.3 5004 100 1146 121 15529 2180 6417 1848 5391 171
- 5.14.0 5328 104 1100 114 17779 2479 7697 2130 5871 188
- 5.16.0 5562 109 1077 80 20504 2702 8750 2375 4815 152
- 5.18.0 5892 113 1088 79 20077 2760 9365 2439 4943 154
The "core"..."doc" mean the following files from the Perl source code distribution. The glob notation ** means recursively, (.) means regular files.
- core *.[hcy]
- lib lib/**/*.p[ml]
- ext ext/**/*.{[hcyt],xs,pm} (for -5.10.1) or
- {dist,ext,cpan}/**/*.{[hcyt],xs,pm} (for 5.12.0-)
- t t/**/*(.) (for 1-5.005_56) or **/*.t (for 5.6.0-5.7.3)
- doc {README*,INSTALL,*[_.]man{,.?},pod/**/*.pod}
Here are some statistics for the other subdirectories and one file in the Perl source distribution for somewhat more selected releases.
- ======================================================================
- Legend: kB #
- 1.014 2.001 3.044
- Configure 31 1 37 1 62 1
- eg - - 34 28 47 39
- h2pl - - - - 12 12
- msdos - - - - 41 13
- os2 - - - - 63 22
- usub - - - - 21 16
- x2p 103 17 104 17 137 17
- ======================================================================
- 4.000 4.019 4.036
- atarist - - - - 113 31
- Configure 73 1 83 1 86 1
- eg 47 39 47 39 47 39
- emacs 67 4 67 4 67 4
- h2pl 12 12 12 12 12 12
- hints - - 5 42 11 56
- msdos 57 15 58 15 60 15
- os2 81 29 81 29 113 31
- usub 25 7 43 8 43 8
- x2p 147 18 152 19 154 19
- ======================================================================
- 5.000a2 5.000a12h 5.000b3h 5.000 5.001m
- apollo 8 3 8 3 8 3 8 3 8 3
- atarist 113 31 113 31 - - - - - -
- bench - - 0 1 - - - - - -
- Bugs 2 5 26 1 - - - - - -
- dlperl 40 5 - - - - - - - -
- do 127 71 - - - - - - - -
- Configure - - 153 1 159 1 160 1 180 1
- Doc - - 26 1 75 7 11 1 11 1
- eg 79 58 53 44 51 43 54 44 54 44
- emacs 67 4 104 6 104 6 104 1 104 6
- h2pl 12 12 12 12 12 12 12 12 12 12
- hints 11 56 12 46 18 48 18 48 44 56
- msdos 60 15 60 15 - - - - - -
- os2 113 31 113 31 - - - - - -
- U - - 62 8 112 42 - - - -
- usub 43 8 - - - - - - - -
- vms - - 80 7 123 9 184 15 304 20
- x2p 171 22 171 21 162 20 162 20 279 20
- ======================================================================
- 5.002 5.003 5.003_07
- Configure 201 1 201 1 217 1
- eg 54 44 54 44 54 44
- emacs 108 1 108 1 143 1
- h2pl 12 12 12 12 12 12
- hints 73 59 77 60 90 62
- os2 84 17 56 10 117 42
- plan9 - - - - 79 15
- Porting - - - - 51 1
- utils 87 7 88 7 97 7
- vms 500 24 475 26 505 27
- x2p 280 20 280 20 280 19
- ======================================================================
- 5.004 5.004_04 5.004_62 5.004_65 5.004_68
- beos - - - - - - 1 1 1 1
- Configure 225 1 225 1 240 1 248 1 256 1
- cygwin32 23 5 23 5 23 5 24 5 24 5
- djgpp - - - - 14 5 14 5 14 5
- eg 81 62 81 62 81 62 81 62 81 62
- emacs 194 1 204 1 212 2 212 2 212 2
- h2pl 12 12 12 12 12 12 12 12 12 12
- hints 129 69 132 71 144 72 151 74 155 74
- os2 121 42 127 42 127 44 129 44 129 44
- plan9 82 15 82 15 82 15 82 15 82 15
- Porting 94 2 109 4 203 6 234 8 241 9
- qnx 1 2 1 2 1 2 1 2 1 2
- utils 112 8 118 8 124 8 156 9 159 9
- vms 518 34 524 34 538 34 569 34 569 34
- win32 285 33 378 36 470 39 493 39 575 41
- x2p 281 19 281 19 281 19 282 19 281 19
- ======================================================================
- 5.004_70 5.004_73 5.004_75 5.005 5.005_03
- apollo - - - - - - - - 0 1
- beos 1 1 1 1 1 1 1 1 1 1
- Configure 256 1 256 1 264 1 264 1 270 1
- cygwin32 24 5 24 5 24 5 24 5 24 5
- djgpp 14 5 14 5 14 5 14 5 15 5
- eg 86 65 86 65 86 65 86 65 86 65
- emacs 262 2 262 2 262 2 262 2 274 2
- h2pl 12 12 12 12 12 12 12 12 12 12
- hints 157 74 157 74 159 74 160 74 179 77
- mint - - - - - - - - 4 7
- mpeix - - - - 5 3 5 3 5 3
- os2 129 44 139 44 142 44 143 44 148 44
- plan9 82 15 82 15 82 15 82 15 82 15
- Porting 241 9 253 9 259 10 264 12 272 13
- qnx 1 2 1 2 1 2 1 2 1 2
- utils 160 9 160 9 160 9 160 9 164 9
- vms 570 34 572 34 573 34 575 34 583 34
- vos - - - - - - - - 156 10
- win32 577 41 585 41 585 41 587 41 600 42
- x2p 281 19 281 19 281 19 281 19 281 19
- ======================================================================
- 5.6.0 5.6.1 5.6.2 5.7.3
- apollo 8 3 8 3 8 3 8 3
- beos 5 2 5 2 5 2 6 4
- Configure 346 1 361 1 363 1 394 1
- Cross - - - - - - 4 2
- djgpp 19 6 19 6 19 6 21 7
- eg 112 71 112 71 112 71 - -
- emacs 303 4 319 4 319 4 319 4
- epoc 29 8 35 8 35 8 36 8
- h2pl 24 15 24 15 24 15 24 15
- hints 242 83 250 84 321 89 272 87
- mint 11 9 11 9 11 9 11 9
- mpeix 9 4 9 4 9 4 9 4
- NetWare - - - - - - 423 57
- os2 214 59 224 60 224 60 357 66
- plan9 92 17 92 17 92 17 85 15
- Porting 361 15 390 16 390 16 425 21
- qnx 5 3 5 3 5 3 5 3
- utils 228 12 221 11 222 11 267 13
- uts - - - - - - 12 3
- vmesa 25 4 25 4 25 4 25 4
- vms 686 38 627 38 627 38 649 36
- vos 227 12 249 15 248 15 281 17
- win32 755 41 782 42 801 42 1006 50
- x2p 307 20 307 20 307 20 345 20
- ======================================================================
- 5.8.0 5.8.1 5.8.2 5.8.3 5.8.4
- apollo 8 3 8 3 8 3 8 3 8 3
- beos 6 4 6 4 6 4 6 4 6 4
- Configure 472 1 493 1 493 1 493 1 494 1
- Cross 4 2 45 10 45 10 45 10 45 10
- djgpp 21 7 21 7 21 7 21 7 21 7
- emacs 319 4 329 4 329 4 329 4 329 4
- epoc 33 8 33 8 33 8 33 8 33 8
- h2pl 24 15 24 15 24 15 24 15 24 15
- hints 294 88 321 89 321 89 321 89 348 91
- mint 11 9 11 9 11 9 11 9 11 9
- mpeix 24 5 25 5 25 5 25 5 25 5
- NetWare 488 61 490 61 490 61 490 61 488 61
- os2 361 66 445 67 450 67 488 67 488 67
- plan9 85 15 325 17 325 17 325 17 321 17
- Porting 479 22 537 32 538 32 539 32 538 33
- qnx 5 3 5 3 5 3 5 3 5 3
- utils 275 15 258 16 258 16 263 19 263 19
- uts 12 3 12 3 12 3 12 3 12 3
- vmesa 25 4 25 4 25 4 25 4 25 4
- vms 648 36 654 36 654 36 656 36 656 36
- vos 330 20 335 20 335 20 335 20 335 20
- win32 1062 49 1125 49 1127 49 1126 49 1181 56
- x2p 347 20 348 20 348 20 348 20 348 20
- ======================================================================
- 5.8.5 5.8.6 5.8.7 5.8.8 5.8.9
- apollo 8 3 8 3 8 3 8 3 8 3
- beos 6 4 6 4 8 4 8 4 8 4
- Configure 494 1 494 1 495 1 506 1 520 1
- Cross 45 10 45 10 45 10 45 10 46 10
- djgpp 21 7 21 7 21 7 21 7 21 7
- emacs 329 4 329 4 329 4 329 4 406 4
- epoc 33 8 33 8 33 8 34 8 35 8
- h2pl 24 15 24 15 24 15 24 15 24 15
- hints 350 91 352 91 355 94 360 94 387 99
- mint 11 9 11 9 11 9 11 9 11 9
- mpeix 25 5 25 5 25 5 49 6 49 6
- NetWare 488 61 488 61 488 61 490 61 491 61
- os2 488 67 488 67 488 67 488 67 552 70
- plan9 321 17 321 17 321 17 322 17 324 17
- Porting 538 34 548 35 549 35 564 37 625 41
- qnx 5 3 5 3 5 3 5 3 5 3
- utils 265 19 265 19 266 19 267 19 281 21
- uts 12 3 12 3 12 3 12 3 12 3
- vmesa 25 4 25 4 25 4 25 4 25 4
- vms 657 36 658 36 662 36 664 36 716 35
- vos 335 20 335 20 335 20 336 21 345 22
- win32 1183 56 1190 56 1199 56 1219 56 1484 68
- x2p 349 20 349 20 349 20 349 19 350 19
- ======================================================================
- 5.9.0 5.9.1 5.9.2 5.9.3 5.9.4
- apollo 8 3 8 3 8 3 8 3 8 3
- beos 6 4 6 4 8 4 8 4 8 4
- Configure 493 1 493 1 495 1 508 1 512 1
- Cross 45 10 45 10 45 10 45 10 46 10
- djgpp 21 7 21 7 21 7 21 7 21 7
- emacs 329 4 329 4 329 4 329 4 329 4
- epoc 33 8 33 8 33 8 34 8 34 8
- h2pl 24 15 24 15 24 15 24 15 24 15
- hints 321 89 346 91 355 94 359 94 366 96
- mad - - - - - - - - 174 6
- mint 11 9 11 9 11 9 11 9 11 9
- mpeix 25 5 25 5 25 5 49 6 49 6
- NetWare 489 61 487 61 487 61 489 61 489 61
- os2 444 67 488 67 488 67 488 67 488 67
- plan9 325 17 321 17 321 17 322 17 323 17
- Porting 537 32 536 33 549 36 564 38 576 38
- qnx 5 3 5 3 5 3 5 3 5 3
- symbian - - - - - - 293 53 293 53
- utils 258 16 263 19 268 20 273 23 275 24
- uts 12 3 12 3 12 3 12 3 12 3
- vmesa 25 4 25 4 25 4 25 4 25 4
- vms 660 36 547 33 553 33 661 33 696 33
- vos 11 7 11 7 11 7 11 7 11 7
- win32 1120 49 1124 51 1191 56 1209 56 1719 90
- x2p 348 20 348 20 349 20 349 19 349 19
- ======================================================================
- 5.9.5 5.10.0 5.10.1 5.12.0 5.12.1
- apollo 8 3 8 3 0 3 0 3 0 3
- beos 8 4 8 4 4 4 4 4 4 4
- Configure 518 1 518 1 533 1 536 1 536 1
- Cross 122 15 122 15 119 15 118 15 118 15
- djgpp 21 7 21 7 17 7 17 7 17 7
- emacs 329 4 406 4 402 4 402 4 402 4
- epoc 34 8 35 8 31 8 31 8 31 8
- h2pl 24 15 24 15 12 15 12 15 12 15
- hints 377 98 381 98 385 100 368 97 368 97
- mad 182 8 182 8 174 8 174 8 174 8
- mint 11 9 11 9 3 9 - - - -
- mpeix 49 6 49 6 45 6 45 6 45 6
- NetWare 489 61 489 61 465 61 466 61 466 61
- os2 552 70 552 70 507 70 507 70 507 70
- plan9 324 17 324 17 316 17 316 17 316 17
- Porting 627 40 632 40 933 53 749 54 749 54
- qnx 5 3 5 4 1 4 1 4 1 4
- symbian 300 54 300 54 290 54 288 54 288 54
- utils 260 26 264 27 268 27 269 27 269 27
- uts 12 3 12 3 8 3 8 3 8 3
- vmesa 25 4 25 4 21 4 21 4 21 4
- vms 690 32 722 32 693 30 645 18 645 18
- vos 19 8 19 8 16 8 16 8 16 8
- win32 1482 68 1485 68 1497 70 1841 73 1841 73
- x2p 349 19 349 19 345 19 345 19 345 19
- ======================================================================
- 5.12.2 5.12.3 5.14.0 5.16.0 5.18.0
- apollo 0 3 0 3 - - - - - -
- beos 4 4 4 4 5 4 5 4 - -
- Configure 536 1 536 1 539 1 547 1 550 1
- Cross 118 15 118 15 118 15 118 15 118 15
- djgpp 17 7 17 7 18 7 18 7 18 7
- emacs 402 4 402 4 - - - - - -
- epoc 31 8 31 8 32 8 30 8 - -
- h2pl 12 15 12 15 15 15 15 15 13 15
- hints 368 97 368 97 370 96 371 96 354 91
- mad 174 8 174 8 176 8 176 8 174 8
- mpeix 45 6 45 6 46 6 46 6 - -
- NetWare 466 61 466 61 473 61 472 61 469 61
- os2 507 70 507 70 518 70 519 70 510 70
- plan9 316 17 316 17 319 17 319 17 318 17
- Porting 750 54 750 54 855 60 1093 69 1149 70
- qnx 1 4 1 4 2 4 2 4 1 4
- symbian 288 54 288 54 292 54 292 54 290 54
- utils 269 27 269 27 249 29 245 30 246 31
- uts 8 3 8 3 9 3 9 3 - -
- vmesa 21 4 21 4 22 4 22 4 - -
- vms 646 18 644 18 639 17 571 15 564 15
- vos 16 8 16 8 17 8 9 7 8 7
- win32 1841 73 1841 73 1833 72 1655 67 1157 62
- x2p 345 19 345 19 346 19 345 19 344 20
The "diff lines kB" means that for example the patch 5.003_08, to be applied on top of the 5.003_07 (or whatever was before the 5.003_08) added lines for 110 kilobytes, it removed lines for 19 kilobytes, and changed lines for 424 kilobytes. Just the lines themselves are counted, not their context. The "+ - !" become from the diff(1) context diff output format.
- Pump- Release Date diff lines kB
- king -------------
- + - !
- ======================================================================
- Chip 5.003_08 1996-Nov-19 110 19 424
- 5.003_09 1996-Nov-26 38 9 248
- 5.003_10 1996-Nov-29 29 2 27
- 5.003_11 1996-Dec-06 73 12 165
- 5.003_12 1996-Dec-19 275 6 436
- 5.003_13 1996-Dec-20 95 1 56
- 5.003_14 1996-Dec-23 23 7 333
- 5.003_15 1996-Dec-23 0 0 1
- 5.003_16 1996-Dec-24 12 3 50
- 5.003_17 1996-Dec-27 19 1 14
- 5.003_18 1996-Dec-31 21 1 32
- 5.003_19 1997-Jan-04 80 3 85
- 5.003_20 1997-Jan-07 18 1 146
- 5.003_21 1997-Jan-15 38 10 221
- 5.003_22 1997-Jan-16 4 0 18
- 5.003_23 1997-Jan-25 71 15 119
- 5.003_24 1997-Jan-29 426 1 20
- 5.003_25 1997-Feb-04 21 8 169
- 5.003_26 1997-Feb-10 16 1 15
- 5.003_27 1997-Feb-18 32 10 38
- 5.003_28 1997-Feb-21 58 4 66
- 5.003_90 1997-Feb-25 22 2 34
- 5.003_91 1997-Mar-01 37 1 39
- 5.003_92 1997-Mar-06 16 3 69
- 5.003_93 1997-Mar-10 12 3 15
- 5.003_94 1997-Mar-22 407 7 200
- 5.003_95 1997-Mar-25 41 1 37
- 5.003_96 1997-Apr-01 283 5 261
- 5.003_97 1997-Apr-03 13 2 34
- 5.003_97a 1997-Apr-05 57 1 27
- 5.003_97b 1997-Apr-08 14 1 20
- 5.003_97c 1997-Apr-10 20 1 16
- 5.003_97d 1997-Apr-13 8 0 16
- 5.003_97e 1997-Apr-15 15 4 46
- 5.003_97f 1997-Apr-17 7 1 33
- 5.003_97g 1997-Apr-18 6 1 42
- 5.003_97h 1997-Apr-24 23 3 68
- 5.003_97i 1997-Apr-25 23 1 31
- 5.003_97j 1997-Apr-28 36 1 49
- 5.003_98 1997-Apr-30 171 12 539
- 5.003_99 1997-May-01 6 0 7
- 5.003_99a 1997-May-09 36 2 61
- p54rc1 1997-May-12 8 1 11
- p54rc2 1997-May-14 6 0 40
- 5.004 1997-May-15 4 0 4
- Tim 5.004_01 1997-Jun-13 222 14 57
- 5.004_02 1997-Aug-07 112 16 119
- 5.004_03 1997-Sep-05 109 0 17
- 5.004_04 1997-Oct-15 66 8 173
In more modern times, named releases don't come as often, and as progress can be followed (nearly) instantly (with rsync, and since late 2008, git) patches between versions are no longer provided. However, that doesn't keep us from calculating how large a patch could have been. Which is shown in the table below. Unless noted otherwise, the size mentioned is the patch to bring version x.y.z to x.y.z+1.
- Sarathy 5.6.1 2001-Apr-08 531 44 651
- Rafael 5.6.2 2003-Nov-15 20 11 1819
- Jarkko 5.8.0 2002-Jul-18 1205 31 471 From 5.7.3
- 5.8.1 2003-Sep-25 243 102 6162
- Nicholas 5.8.2 2003-Nov-05 10 50 788
- 5.8.3 2004-Jan-14 31 13 360
- 5.8.4 2004-Apr-21 33 8 299
- 5.8.5 2004-Jul-19 11 19 255
- 5.8.6 2004-Nov-27 35 3 192
- 5.8.7 2005-May-30 75 34 778
- 5.8.8 2006-Jan-31 131 42 1251
- 5.8.9 2008-Dec-14 340 132 12988
- Hugo 5.9.0 2003-Oct-27 281 168 7132 From 5.8.0
- Rafael 5.9.1 2004-Mar-16 57 250 2107
- 5.9.2 2005-Apr-01 720 57 858
- 5.9.3 2006-Jan-28 1124 102 1906
- 5.9.4 2006-Aug-15 896 60 862
- 5.9.5 2007-Jul-07 1149 128 1062
- 5.10.0 2007-Dec-18 50 31 13111 From 5.9.5
Jarkko Hietaniemi <jhi@iki.fi>.
Thanks to the collective memory of the Perlfolk. In addition to the Keepers of the Pumpkin also Alan Champion, Mark Dominus, Andreas König, John Macdonald, Matthias Neeracher, Jeff Okamoto, Michael Peppler, Randal Schwartz, and Paul D. Smith sent corrections and additions. Abigail added file and patch size data for the 5.6.0 - 5.10 era.
perlhpux - Perl version 5 on Hewlett-Packard Unix (HP-UX) systems
This document describes various features of HP's Unix operating system (HP-UX) that will affect how Perl version 5 (hereafter just Perl) is compiled and/or runs.
Application release September 2001, HP-UX 11.00 is the first to ship with Perl. By the time it was perl-5.6.1 in /opt/perl. The first occurrence is on CD 5012-7954 and can be installed using
- swinstall -s /cdrom perl
assuming you have mounted that CD on /cdrom.
That build was a portable hppa-1.1 multithread build that supports large files compiled with gcc-2.9-hppa-991112.
If you perform a new installation, then (a newer) Perl will be installed automatically. Pre-installed HP-UX systems now have more recent versions of Perl and the updated modules.
The official (threaded) builds from HP, as they are shipped on the Application DVD/CD's are available on http://www.software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=PERL for both PA-RISC and IPF (Itanium Processor Family). They are built with the HP ANSI-C compiler. Up till 5.8.8 that was done by ActiveState.
To see what version is included on the DVD (assumed here to be mounted on /cdrom), issue this command:
- # swlist -s /cdrom perl
- # perl D.5.8.8.B 5.8.8 Perl Programming Language
- perl.Perl5-32 D.5.8.8.B 32-bit 5.8.8 Perl Programming Language with Extensions
- perl.Perl5-64 D.5.8.8.B 64-bit 5.8.8 Perl Programming Language with Extensions
To see what is installed on your system:
- # swlist -R perl
- # perl E.5.8.8.J Perl Programming Language
- # perl.Perl5-32 E.5.8.8.J 32-bit Perl Programming Language with Extensions
- perl.Perl5-32.PERL-MAN E.5.8.8.J 32-bit Perl Man Pages for IA
- perl.Perl5-32.PERL-RUN E.5.8.8.J 32-bit Perl Binaries for IA
- # perl.Perl5-64 E.5.8.8.J 64-bit Perl Programming Language with Extensions
- perl.Perl5-64.PERL-MAN E.5.8.8.J 64-bit Perl Man Pages for IA
- perl.Perl5-64.PERL-RUN E.5.8.8.J 64-bit Perl Binaries for IA
HP porting centre tries to keep up with customer demand and release updates from the Open Source community. Having precompiled Perl binaries available is obvious, though "up-to-date" is something relative. At the moment of writing only perl-5.10.1 was available (with 5.16.3 being the latest stable release from the porters point of view).
The HP porting centres are limited in what systems they are allowed to port to and they usually choose the two most recent OS versions available.
HP has asked the porting centre to move Open Source binaries from /opt to /usr/local, so binaries produced since the start of July 2002 are located in /usr/local.
One of HP porting centres URL's is http://hpux.connect.org.uk/ The port currently available is built with GNU gcc.
To get even more recent perl depots for the whole range of HP-UX, visit H.Merijn Brand's site at http://mirrors.develooper.com/hpux/#Perl. Carefully read the notes to see if the available versions suit your needs.
When compiling Perl, you must use an ANSI C compiler. The C compiler that ships with all HP-UX systems is a K&R compiler that should only be used to build new kernels.
Perl can be compiled with either HP's ANSI C compiler or with gcc. The former is recommended, as not only can it compile Perl with no difficulty, but also can take advantage of features listed later that require the use of HP compiler-specific command-line flags.
If you decide to use gcc, make sure your installation is recent and complete, and be sure to read the Perl INSTALL file for more gcc-specific details.
HP's HP9000 Unix systems run on HP's own Precision Architecture (PA-RISC) chip. HP-UX used to run on the Motorola MC68000 family of chips, but any machine with this chip in it is quite obsolete and this document will not attempt to address issues for compiling Perl on the Motorola chipset.
The version of PA-RISC at the time of this document's last update is 2.0, which is also the last there will be. HP PA-RISC systems are usually refered to with model description "HP 9000". The last CPU in this series is the PA-8900. Support for PA-RISC architectured machines officially ends as shown in the following table:
- PA-RISC End-of-Life Roadmap
- +--------+----------------+----------------+-----------------+
- | HP9000 | Superdome | PA-8700 | Spring 2011 |
- | 4-128 | | PA-8800/sx1000 | Summer 2012 |
- | cores | | PA-8900/sx1000 | 2014 |
- | | | PA-8900/sx2000 | 2015 |
- +--------+----------------+----------------+-----------------+
- | HP9000 | rp7410, rp8400 | PA-8700 | Spring 2011 |
- | 2-32 | rp7420, rp8420 | PA-8800/sx1000 | 2012 |
- | cores | rp7440, rp8440 | PA-8900/sx1000 | Autumn 2013 |
- | | | PA-8900/sx2000 | 2015 |
- +--------+----------------+----------------+-----------------+
- | HP9000 | rp44x0 | PA-8700 | Spring 2011 |
- | 1-8 | | PA-8800/rp44x0 | 2012 |
- | cores | | PA-8900/rp44x0 | 2014 |
- +--------+----------------+----------------+-----------------+
- | HP9000 | rp34x0 | PA-8700 | Spring 2011 |
- | 1-4 | | PA-8800/rp34x0 | 2012 |
- | cores | | PA-8900/rp34x0 | 2014 |
- +--------+----------------+----------------+-----------------+
From http://www.hp.com/products1/evolution/9000/faqs.html
- The last order date for HP 9000 systems was December 31, 2008.
A complete list of models at the time the OS was built is in the file /usr/sam/lib/mo/sched.models. The first column corresponds to the last part of the output of the "model" command. The second column is the PA-RISC version and the third column is the exact chip type used. (Start browsing at the bottom to prevent confusion ;-)
- # model
- 9000/800/L1000-44
- # grep L1000-44 /usr/sam/lib/mo/sched.models
- L1000-44 2.0 PA8500
An executable compiled on a PA-RISC 2.0 platform will not execute on a PA-RISC 1.1 platform, even if they are running the same version of HP-UX. If you are building Perl on a PA-RISC 2.0 platform and want that Perl to also run on a PA-RISC 1.1, the compiler flags +DAportable and +DS32 should be used.
It is no longer possible to compile PA-RISC 1.0 executables on either the PA-RISC 1.1 or 2.0 platforms. The command-line flags are accepted, but the resulting executable will not run when transferred to a PA-RISC 1.0 system.
The original version of PA-RISC, HP no longer sells any system with this chip.
The following systems contained PA-RISC 1.0 chips:
- 600, 635, 645, 808, 815, 822, 825, 832, 834, 835, 840, 842, 845, 850,
- 852, 855, 860, 865, 870, 890
An upgrade to the PA-RISC design, it shipped for many years in many different system.
The following systems contain with PA-RISC 1.1 chips:
- 705, 710, 712, 715, 720, 722, 725, 728, 730, 735, 742, 743, 744, 745,
- 747, 750, 755, 770, 777, 778, 779, 800, 801, 803, 806, 807, 809, 811,
- 813, 816, 817, 819, 821, 826, 827, 829, 831, 837, 839, 841, 847, 849,
- 851, 856, 857, 859, 867, 869, 877, 887, 891, 892, 897, A180, A180C,
- B115, B120, B132L, B132L+, B160L, B180L, C100, C110, C115, C120,
- C160L, D200, D210, D220, D230, D250, D260, D310, D320, D330, D350,
- D360, D410, DX0, DX5, DXO, E25, E35, E45, E55, F10, F20, F30, G30,
- G40, G50, G60, G70, H20, H30, H40, H50, H60, H70, I30, I40, I50, I60,
- I70, J200, J210, J210XC, K100, K200, K210, K220, K230, K400, K410,
- K420, S700i, S715, S744, S760, T500, T520
The most recent upgrade to the PA-RISC design, it added support for 64-bit integer data.
As of the date of this document's last update, the following systems contain PA-RISC 2.0 chips:
- 700, 780, 781, 782, 783, 785, 802, 804, 810, 820, 861, 871, 879, 889,
- 893, 895, 896, 898, 899, A400, A500, B1000, B2000, C130, C140, C160,
- C180, C180+, C180-XP, C200+, C400+, C3000, C360, C3600, CB260, D270,
- D280, D370, D380, D390, D650, J220, J2240, J280, J282, J400, J410,
- J5000, J5500XM, J5600, J7000, J7600, K250, K260, K260-EG, K270, K360,
- K370, K380, K450, K460, K460-EG, K460-XP, K470, K570, K580, L1000,
- L2000, L3000, N4000, R380, R390, SD16000, SD32000, SD64000, T540,
- T600, V2000, V2200, V2250, V2500, V2600
Just before HP took over Compaq, some systems were renamed. the link that contained the explanation is dead, so here's a short summary:
- HP 9000 A-Class servers, now renamed HP Server rp2400 series.
- HP 9000 L-Class servers, now renamed HP Server rp5400 series.
- HP 9000 N-Class servers, now renamed HP Server rp7400.
- rp2400, rp2405, rp2430, rp2450, rp2470, rp3410, rp3440, rp4410,
- rp4440, rp5400, rp5405, rp5430, rp5450, rp5470, rp7400, rp7405,
- rp7410, rp7420, rp7440, rp8400, rp8420, rp8440, Superdome
The current naming convention is:
- aadddd
- ||||`+- 00 - 99 relative capacity & newness (upgrades, etc.)
- |||`--- unique number for each architecture to ensure different
- ||| systems do not have the same numbering across
- ||| architectures
- ||`---- 1 - 9 identifies family and/or relative positioning
- ||
- |`----- c = ia32 (cisc)
- | p = pa-risc
- | x = ia-64 (Itanium & Itanium 2)
- | h = housing
- `------ t = tower
- r = rack optimized
- s = super scalable
- b = blade
- sa = appliance
HP-UX also runs on the new Itanium processor. This requires the use of a different version of HP-UX (currently 11.23 or 11i v2), and with the exception of a few differences detailed below and in later sections, Perl should compile with no problems.
Although PA-RISC binaries can run on Itanium systems, you should not attempt to use a PA-RISC version of Perl on an Itanium system. This is because shared libraries created on an Itanium system cannot be loaded while running a PA-RISC executable.
HP Itanium 2 systems are usually refered to with model description "HP Integrity".
HP also ships servers with the 128-bit Itanium processor(s). The cx26x0 is told to have Madison 6. As of the date of this document's last update, the following systems contain Itanium or Itanium 2 chips (this is likely to be out of date):
- BL60p, BL860c, BL870c, BL890c, cx2600, cx2620, rx1600, rx1620, rx2600,
- rx2600hptc, rx2620, rx2660, rx2800, rx3600, rx4610, rx4640, rx5670,
- rx6600, rx7420, rx7620, rx7640, rx8420, rx8620, rx8640, rx9610,
- sx1000, sx2000
To see all about your machine, type
- # model
- ia64 hp server rx2600
- # /usr/contrib/bin/machinfo
Not all architectures (PA = PA-RISC, IPF = Itanium Processor Family) support all versions of HP-UX, here is a short list
- HP-UX version Kernel Architecture End-of-factory support
- ------------- ------ ------------ ----------------------------------
- 10.20 32 bit PA 30-Jun-2003
- 11.00 32/64 PA 31-Dec-2006
- 11.11 11i v1 32/64 PA 31-Dec-2015
- 11.22 11i v2 64 IPF 30-Apr-2004
- 11.23 11i v2 64 PA & IPF 31-Dec-2015
- 11.31 11i v3 64 PA & IPF 31-Dec-2020 (PA) 31-Dec-2022 (IPF)
See for the full list of hardware/OS support and expected end-of-life http://www.hp.com/go/hpuxservermatrix
HP-UX supports dynamically loadable libraries (shared libraries). Shared libraries end with the suffix .sl. On Itanium systems, they end with the suffix .so.
Shared libraries created on a platform using a particular PA-RISC version are not usable on platforms using an earlier PA-RISC version by default. However, this backwards compatibility may be enabled using the same +DAportable compiler flag (with the same PA-RISC 1.0 caveat mentioned above).
Shared libraries created on an Itanium platform cannot be loaded on a PA-RISC platform. Shared libraries created on a PA-RISC platform can only be loaded on an Itanium platform if it is a PA-RISC executable that is attempting to load the PA-RISC library. A PA-RISC shared library cannot be loaded into an Itanium executable nor vice-versa.
To create a shared library, the following steps must be performed:
- 1. Compile source modules with +z or +Z flag to create a .o module
- which contains Position-Independent Code (PIC). The linker will
- tell you in the next step if +Z was needed.
- (For gcc, the appropriate flag is -fpic or -fPIC.)
- 2. Link the shared library using the -b flag. If the code calls
- any functions in other system libraries (e.g., libm), it must
- be included on this line.
(Note that these steps are usually handled automatically by the extension's Makefile).
If these dependent libraries are not listed at shared library creation time, you will get fatal "Unresolved symbol" errors at run time when the library is loaded.
You may create a shared library that refers to another library, which may be either an archive library or a shared library. If this second library is a shared library, this is called a "dependent library". The dependent library's name is recorded in the main shared library, but it is not linked into the shared library. Instead, it is loaded when the main shared library is loaded. This can cause problems if you build an extension on one system and move it to another system where the libraries may not be located in the same place as on the first system.
If the referred library is an archive library, then it is treated as a simple collection of .o modules (all of which must contain PIC). These modules are then linked into the shared library.
Note that it is okay to create a library which contains a dependent library that is already linked into perl.
Some extensions, like DB_File and Compress::Zlib use/require prebuilt
libraries for the perl extensions/modules to work. If these libraries
are built using the default configuration, it might happen that you
run into an error like "invalid loader fixup" during load phase.
HP is aware of this problem. Search the HP-UX cxx-dev forums for
discussions about the subject. The short answer is that everything
(all libraries, everything) must be compiled with +z
or +Z
to be
PIC (position independent code). (For gcc, that would be
-fpic
or -fPIC
). In HP-UX 11.00 or newer the linker
error message should tell the name of the offending object file.
A more general approach is to intervene manually, as with an example for the DB_File module, which requires SleepyCat's libdb.sl:
- # cd .../db-3.2.9/build_unix
- # vi Makefile
- ... add +Z to all cflags to create shared objects
- CFLAGS= -c $(CPPFLAGS) +Z -Ae +O2 +Onolimit \
- -I/usr/local/include -I/usr/include/X11R6
- CXXFLAGS= -c $(CPPFLAGS) +Z -Ae +O2 +Onolimit \
- -I/usr/local/include -I/usr/include/X11R6
- # make clean
- # make
- # mkdir tmp
- # cd tmp
- # ar x ../libdb.a
- # ld -b -o libdb-3.2.sl *.o
- # mv libdb-3.2.sl /usr/local/lib
- # rm *.o
- # cd /usr/local/lib
- # rm -f libdb.sl
- # ln -s libdb-3.2.sl libdb.sl
- # cd .../DB_File-1.76
- # make distclean
- # perl Makefile.PL
- # make
- # make test
- # make install
As of db-4.2.x it is no longer needed to do this by hand. Sleepycat has changed the configuration process to add +z on HP-UX automatically.
- # cd .../db-4.2.25/build_unix
- # env CFLAGS=+DD64 LDFLAGS=+DD64 ../dist/configure
should work to generate 64bit shared libraries for HP-UX 11.00 and 11i.
It is no longer possible to link PA-RISC 1.0 shared libraries (even though the command-line flags are still present).
PA-RISC and Itanium object files are not interchangeable. Although you may be able to use ar to create an archive library of PA-RISC object files on an Itanium system, you cannot link against it using an Itanium link editor.
When using this compiler to build Perl, you should make sure that the flag -Aa is added to the cpprun and cppstdin variables in the config.sh file (though see the section on 64-bit perl below). If you are using a recent version of the Perl distribution, these flags are set automatically.
Even though HP-UX 10.20 and 11.00 are not actively maintained by HP anymore, updates for the HP ANSI C compiler are still available from time to time, and it might be advisable to see if updates are applicable. At the moment of writing, the latests available patches for 11.00 that should be applied are PHSS_35098, PHSS_35175, PHSS_35100, PHSS_33036, and PHSS_33902). If you have a SUM account, you can use it to search for updates/patches. Enter "ANSI" as keyword.
When you are going to use the GNU C compiler (gcc), and you don't have gcc yet, you can either build it yourself from the sources (available from e.g. http://gcc.gnu.org/mirrors.html) or fetch a prebuilt binary from the HP porting center at http://hpux.connect.org.uk/hppd/cgi-bin/search?term=gcc&Search=Search or from the DSPP (you need to be a member) at http://h21007.www2.hp.com/portal/site/dspp/menuitem.863c3e4cbcdc3f3515b49c108973a801?ciid=2a08725cc2f02110725cc2f02110275d6e10RCRD&jumpid=reg_r1002_usen_c-001_title_r0001 (Browse through the list, because there are often multiple versions of the same package available).
Most mentioned distributions are depots. H.Merijn Brand has made prebuilt gcc binaries available on http://mirrors.develooper.com/hpux/ and/or http://www.cmve.net/~merijn/ for HP-UX 10.20 (only 32bit), HP-UX 11.00, HP-UX 11.11 (HP-UX 11i v1), and HP-UX 11.23 (HP-UX 11i v2 PA-RISC) in both 32- and 64-bit versions. For HP-UX 11.23 IPF and HP-UX 11.31 IPF depots are available too. The IPF versions do not need two versions of GNU gcc.
On PA-RISC you need a different compiler for 32-bit applications and for 64-bit applications. On PA-RISC, 32-bit objects and 64-bit objects do not mix. Period. There is no different behaviour for HP C-ANSI-C or GNU gcc. So if you require your perl binary to use 64-bit libraries, like Oracle-64bit, you MUST build a 64-bit perl.
Building a 64-bit capable gcc on PA-RISC from source is possible only when you have the HP C-ANSI C compiler or an already working 64-bit binary of gcc available. Best performance for perl is achieved with HP's native compiler.
Beginning with HP-UX version 10.20, files larger than 2GB (2^31 bytes) may be created and manipulated. Three separate methods of doing this are available. Of these methods, the best method for Perl is to compile using the -Duselargefiles flag to Configure. This causes Perl to be compiled using structures and functions in which these are 64 bits wide, rather than 32 bits wide. (Note that this will only work with HP's ANSI C compiler. If you want to compile Perl using gcc, you will have to get a version of the compiler that supports 64-bit operations. See above for where to find it.)
There are some drawbacks to this approach. One is that any extension which calls any file-manipulating C function will need to be recompiled (just follow the usual "perl Makefile.PL; make; make test; make install" procedure).
The list of functions that will need to recompiled is: creat, fgetpos, fopen, freopen, fsetpos, fstat, fstatvfs, fstatvfsdev, ftruncate, ftw, lockf, lseek, lstat, mmap, nftw, open, prealloc, stat, statvfs, statvfsdev, tmpfile, truncate, getrlimit, setrlimit
Another drawback is only valid for Perl versions before 5.6.0. This drawback is that the seek and tell functions (both the builtin version and POSIX module version) will not perform correctly.
It is strongly recommended that you use this flag when you run Configure. If you do not do this, but later answer the question about large files when Configure asks you, you may get a configuration that cannot be compiled, or that does not function as expected.
It is possible to compile a version of threaded Perl on any version of HP-UX before 10.30, but it is strongly suggested that you be running on HP-UX 11.00 at least.
To compile Perl with threads, add -Dusethreads to the arguments of Configure. Verify that the -D_POSIX_C_SOURCE=199506L compiler flag is automatically added to the list of flags. Also make sure that -lpthread is listed before -lc in the list of libraries to link Perl with. The hints provided for HP-UX during Configure will try very hard to get this right for you.
HP-UX versions before 10.30 require a separate installation of a POSIX threads library package. Two examples are the HP DCE package, available on "HP-UX Hardware Extensions 3.0, Install and Core OS, Release 10.20, April 1999 (B3920-13941)" or the Freely available PTH package, available on H.Merijn's site (http://mirrors.develooper.com/hpux/). The use of PTH will be unsupported in perl-5.12 and up and is rather buggy in 5.11.x.
If you are going to use the HP DCE package, the library used for threading is /usr/lib/libcma.sl, but there have been multiple updates of that library over time. Perl will build with the first version, but it will not pass the test suite. Older Oracle versions might be a compelling reason not to update that library, otherwise please find a newer version in one of the following patches: PHSS_19739, PHSS_20608, or PHSS_23672
reformatted output:
- d3:/usr/lib 106 > what libcma-*.1
- libcma-00000.1:
- HP DCE/9000 1.5 Module: libcma.sl (Export)
- Date: Apr 29 1996 22:11:24
- libcma-19739.1:
- HP DCE/9000 1.5 PHSS_19739-40 Module: libcma.sl (Export)
- Date: Sep 4 1999 01:59:07
- libcma-20608.1:
- HP DCE/9000 1.5 PHSS_20608 Module: libcma.1 (Export)
- Date: Dec 8 1999 18:41:23
- libcma-23672.1:
- HP DCE/9000 1.5 PHSS_23672 Module: libcma.1 (Export)
- Date: Apr 9 2001 10:01:06
- d3:/usr/lib 107 >
If you choose for the PTH package, use swinstall to install pth in the default location (/opt/pth), and then make symbolic links to the libraries from /usr/lib
- # cd /usr/lib
- # ln -s /opt/pth/lib/libpth* .
For building perl to support Oracle, it needs to be linked with libcl and libpthread. So even if your perl is an unthreaded build, these libraries might be required. See "Oracle on HP-UX" below.
Beginning with HP-UX 11.00, programs compiled under HP-UX can take advantage of the LP64 programming environment (LP64 means Longs and Pointers are 64 bits wide), in which scalar variables will be able to hold numbers larger than 2^32 with complete precision. Perl has proven to be consistent and reliable in 64bit mode since 5.8.1 on all HP-UX 11.xx.
As of the date of this document, Perl is fully 64-bit compliant on HP-UX 11.00 and up for both cc- and gcc builds. If you are about to build a 64-bit perl with GNU gcc, please read the gcc section carefully.
Should a user have the need for compiling Perl in the LP64 environment, use the -Duse64bitall flag to Configure. This will force Perl to be compiled in a pure LP64 environment (with the +DD64 flag for HP C-ANSI-C, with no additional options for GNU gcc 64-bit on PA-RISC, and with -mlp64 for GNU gcc on Itanium). If you want to compile Perl using gcc, you will have to get a version of the compiler that supports 64-bit operations.)
You can also use the -Duse64bitint flag to Configure. Although there are some minor differences between compiling Perl with this flag versus the -Duse64bitall flag, they should not be noticeable from a Perl user's perspective. When configuring -Duse64bitint using a 64bit gcc on a pa-risc architecture, -Duse64bitint is silently promoted to -Duse64bitall.
In both cases, it is strongly recommended that you use these flags when you run Configure. If you do not use do this, but later answer the questions about 64-bit numbers when Configure asks you, you may get a configuration that cannot be compiled, or that does not function as expected.
Using perl to connect to Oracle databases through DBI and DBD::Oracle has caused a lot of people many headaches. Read README.hpux in the DBD::Oracle for much more information. The reason to mention it here is that Oracle requires a perl built with libcl and libpthread, the latter even when perl is build without threads. Building perl using all defaults, but still enabling to build DBD::Oracle later on can be achieved using
- Configure -A prepend:libswanted='cl pthread ' ...
Do not forget the space before the trailing quote.
Also note that this does not (yet) work with all configurations, it is known to fail with 64-bit versions of GCC.
If you attempt to compile Perl with (POSIX) threads on an 11.X system and also link in the GDBM library, then Perl will immediately core dump when it starts up. The only workaround at this point is to relink the GDBM library under 11.X, then relink it into Perl.
the error might show something like:
Pthread internal error: message: __libc_reinit() failed, file: ../pthreads/pthread.c, line: 1096 Return Pointer is 0xc082bf33 sh: 5345 Quit(coredump)
and Configure will give up.
If you are compiling Perl on a remotely-mounted NFS filesystem, the test io/fs.t may fail on test #18. This appears to be a bug in HP-UX and no fix is currently available.
By default, HP-UX comes configured with a maximum data segment size of 64MB. This is too small to correctly compile Perl with the maximum optimization levels. You can increase the size of the maxdsiz kernel parameter through the use of SAM.
When using the GUI version of SAM, click on the Kernel Configuration icon, then the Configurable Parameters icon. Scroll down and select the maxdsiz line. From the Actions menu, select the Modify Configurable Parameter item. Insert the new formula into the Formula/Value box. Then follow the instructions to rebuild your kernel and reboot your system.
In general, a value of 256MB (or "256*1024*1024") is sufficient for Perl to compile at maximum optimization.
You may get a bus error core dump from the op/pwent or op/grent tests. If compiled with -g you will see a stack trace much like the following:
- #0 0xc004216c in () from /usr/lib/libc.2
- #1 0xc00d7550 in __nss_src_state_destr () from /usr/lib/libc.2
- #2 0xc00d7768 in __nss_src_state_destr () from /usr/lib/libc.2
- #3 0xc00d78a8 in nss_delete () from /usr/lib/libc.2
- #4 0xc01126d8 in endpwent () from /usr/lib/libc.2
- #5 0xd1950 in Perl_pp_epwent () from ./perl
- #6 0x94d3c in Perl_runops_standard () from ./perl
- #7 0x23728 in S_run_body () from ./perl
- #8 0x23428 in perl_run () from ./perl
- #9 0x2005c in main () from ./perl
The key here is the nss_delete
call. One workaround for this
bug seems to be to create add to the file /etc/nsswitch.conf
(at least) the following lines
- group: files
- passwd: files
Whether you are using NIS does not matter. Amazingly enough, the same bug also affects Solaris.
There seems to be a broken system header file in HP-UX 11.00 that breaks perl building in 32bit mode with GNU gcc-4.x causing this error. The same file for HP-UX 11.11 (even though the file is older) does not show this failure, and has the correct definition, so the best fix is to patch the header to match:
- --- /usr/include/inttypes.h 2001-04-20 18:42:14 +0200
- +++ /usr/include/inttypes.h 2000-11-14 09:00:00 +0200
- @@ -72,7 +72,7 @@
- #define UINT32_C(__c) __CONCAT_U__(__c)
- #else /* __LP64 */
- #define INT32_C(__c) __CONCAT__(__c,l)
- -#define UINT32_C(__c) __CONCAT__(__CONCAT_U__(__c),l)
- +#define UINT32_C(__c) __CONCAT__(__c,ul)
- #endif /* __LP64 */
- #define INT64_C(__c) __CONCAT_L__(__c,l)
HP-UX 11 Y2K patch "Y2K-1100 B.11.00.B0125 HP-UX Core OS Year 2000 Patch Bundle" has been reported to break the io/fs test #18 which tests whether utime() can change timestamps. The Y2K patch seems to break utime() so that over NFS the timestamps do not get changed (on local filesystems utime() still works). This has probably been fixed on your system by now.
H.Merijn Brand <h.m.brand@xs4all.nl> Jeff Okamoto <okamoto@corp.hp.com>
With much assistance regarding shared libraries from Marc Sabatella.
perlhurd - Perl version 5 on Hurd
If you want to use Perl on the Hurd, I recommend using the Debian GNU/Hurd distribution ( see http://www.debian.org/ ), even if an official, stable release has not yet been made. The old "gnu-0.2" binary distribution will most certainly have additional problems.
The Perl test suite may still report some errors on the Hurd. The "lib/anydbm" and "pragma/warnings" tests will almost certainly fail. Both failures are not really specific to the Hurd, as indicated by the test suite output.
The socket tests may fail if the network is not configured. You have to make "/hurd/pfinet" the translator for "/servers/socket/2", giving it the right arguments. Try "/hurd/pfinet --help" for more information.
Here are the statistics for Perl 5.005_62 on my system:
- Failed Test Status Wstat Total Fail Failed List of failed
- -------------------------------------------------------------------------
- lib/anydbm.t 12 1 8.33% 12
- pragma/warnings 333 1 0.30% 215
- 8 tests and 24 subtests skipped.
- Failed 2/229 test scripts, 99.13% okay. 2/10850 subtests failed, 99.98% okay.
There are quite a few systems out there that do worse!
However, since I am running a very recent Hurd snapshot, in which a lot of bugs that were exposed by the Perl test suite have been fixed, you may encounter more failures. Likely candidates are: "op/stat", "lib/io_pipe", "lib/io_sock", "lib/io_udp" and "lib/time".
In any way, if you're seeing failures beyond those mentioned in this document, please consider upgrading to the latest Hurd before reporting the failure as a bug.
Mark Kettenis <kettenis@gnu.org>
Last Updated: Fri, 29 Oct 1999 22:50:30 +0200
perlintern - autogenerated documentation of purely internal Perl functions
This file is the autogenerated documentation of functions in the Perl interpreter that are documented using Perl's internal documentation format but are not marked as part of the Perl API. In other words, they are not for use in extensions!
Return an entry from the BHK structure. which is a preprocessor token indicating which entry to return. If the appropriate flag is not set this will return NULL. The type of the return value depends on which entry you ask for.
NOTE: this function is experimental and may change or be removed without notice.
- void * BhkENTRY(BHK *hk, which)
Return the BHK's flags.
NOTE: this function is experimental and may change or be removed without notice.
- U32 BhkFLAGS(BHK *hk)
Call all the registered block hooks for type which. which is a preprocessing token; the type of arg depends on which.
NOTE: this function is experimental and may change or be removed without notice.
- void CALL_BLOCK_HOOKS(which, arg)
Each CV has a pointer, CvOUTSIDE()
, to its lexically enclosing
CV (if any). Because pointers to anonymous sub prototypes are
stored in &
pad slots, it is a possible to get a circular reference,
with the parent pointing to the child and vice-versa. To avoid the
ensuing memory leak, we do not increment the reference count of the CV
pointed to by CvOUTSIDE
in the one specific instance that the parent
has a &
pad slot pointing back to us. In this case, we set the
CvWEAKOUTSIDE
flag in the child. This allows us to determine under what
circumstances we should decrement the refcount of the parent when freeing
the child.
There is a further complication with non-closure anonymous subs (i.e. those that do not refer to any lexicals outside that sub). In this case, the anonymous prototype is shared rather than being cloned. This has the consequence that the parent may be freed while there are still active children, eg
In this case, the BEGIN is freed immediately after execution since there
are no active references to it: the anon sub prototype has
CvWEAKOUTSIDE
set since it's not a closure, and $a points to the same
CV, so it doesn't contribute to BEGIN's refcount either. When $a is
executed, the eval '$x'
causes the chain of CvOUTSIDE
s to be followed,
and the freed BEGIN is accessed.
To avoid this, whenever a CV and its associated pad is freed, any
&
entries in the pad are explicitly removed from the pad, and if the
refcount of the pointed-to anon sub is still positive, then that
child's CvOUTSIDE
is set to point to its grandparent. This will only
occur in the single specific case of a non-closure anon prototype
having one or more active references (such as $a
above).
One other thing to consider is that a CV may be merely undefined
rather than freed, eg undef &foo
. In this case, its refcount may
not have reached zero, but we still delete its pad and its CvROOT
etc.
Since various children may still have their CvOUTSIDE
pointing at this
undefined CV, we keep its own CvOUTSIDE
for the time being, so that
the chain of lexical scopes is unbroken. For example, the following
should print 123:
dump the contents of a CV
- void cv_dump(CV *cv, const char *title)
When a CV has a reference count on its slab (CvSLABBED), it is responsible for making sure it is freed. (Hence, no two CVs should ever have a reference count on the same slab.) The CV only needs to reference the slab during compilation. Once it is compiled and CvROOT attached, it has finished its job, so it can forget the slab.
- void cv_forget_slab(CV *cv)
Dump the contents of a padlist
- void do_dump_pad(I32 level, PerlIO *file,
- PADLIST *padlist, int full)
"Introduce" my variables to visible status. This is called during parsing at the end of each statement to make lexical variables visible to subsequent statements.
- U32 intro_my()
Duplicates a pad.
- PADLIST * padlist_dup(PADLIST *srcpad,
- CLONE_PARAMS *param)
Allocates a place in the currently-compiling
pad (via pad_alloc in perlapi) and
then stores a name for that entry. namesv is adopted and becomes the
name entry; it must already contain the name string and be sufficiently
upgraded. typestash and ourstash and the padadd_STATE
flag get
added to namesv. None of the other
processing of pad_add_name_pvn in perlapi
is done. Returns the offset of the allocated pad slot.
- PADOFFSET pad_alloc_name(SV *namesv, U32 flags,
- HV *typestash, HV *ourstash)
Update the pad compilation state variables on entry to a new block.
- void pad_block_start(int full)
Check for duplicate declarations: report any of:
is_our
indicates that the name to check is an 'our' declaration.
- void pad_check_dup(SV *name, U32 flags,
- const HV *ourstash)
Find a named lexical anywhere in a chain of nested pads. Add fake entries in the inner pads if it's found in an outer one.
Returns the offset in the bottom pad of the lex or the fake lex. cv is the CV in which to start the search, and seq is the current cop_seq to match against. If warn is true, print appropriate warnings. The out_* vars return values, and so are pointers to where the returned values should be stored. out_capture, if non-null, requests that the innermost instance of the lexical is captured; out_name_sv is set to the innermost matched namesv or fake namesv; out_flags returns the flags normally associated with the IVX field of a fake namesv.
Note that pad_findlex() is recursive; it recurses up the chain of CVs, then comes back down, adding fake entries as it goes. It has to be this way because fake namesvs in anon protoypes have to store in xlow the index into the parent pad.
For any anon CVs in the pad, change CvOUTSIDE of that CV from old_cv to new_cv if necessary. Needed when a newly-compiled CV has to be moved to a pre-existing CV struct.
- void pad_fixup_inner_anons(PADLIST *padlist,
- CV *old_cv, CV *new_cv)
Free the SV at offset po in the current pad.
- void pad_free(PADOFFSET po)
Cleanup at end of scope during compilation: set the max seq number for lexicals in this scope and warn of any lexicals that never got introduced.
- void pad_leavemy()
Push a new pad frame onto the padlist, unless there's already a pad at this depth, in which case don't bother creating a new one. Then give the new pad an @_ in slot zero.
- void pad_push(PADLIST *padlist, int depth)
Mark all the current temporaries for reuse
- void pad_reset()
Abandon the tmp in the current pad at offset po and replace with a new one.
- void pad_swipe(PADOFFSET po, bool refadjust)
This function assigns the prototype of the named core function to sv
, or
to a new mortal SV if sv
is NULL. It returns the modified sv
, or
NULL if the core function has no prototype. code
is a code as returned
by keyword()
. It must not be equal to 0 or -KEY_CORE.
Check for the cases 0 or 3 of cur_env.je_ret, only used inside an eval context.
0 is used as continue inside eval,
3 is used for a die caught by an inner eval - continue inner loop
See cop.h: je_mustcatch, when set at any runlevel to TRUE, means eval ops must establish a local jmpenv to handle exception traps.
- OP* docatch(OP *o)
If the typeglob gv
can be expressed more succinctly, by having
something other than a real GV in its place in the stash, replace it
with the optimised form. Basic requirements for this are that gv
is a real typeglob, is sufficiently ordinary, and is only referenced
from its package. This function is meant to be used when a GV has been
looked up in part to see what was there, causing upgrading, but based
on what was found it turns out that the real GV isn't required after all.
If gv
is a completely empty typeglob, it is deleted from the stash.
If gv
is a typeglob containing only a sufficiently-ordinary constant
sub, the typeglob is replaced with a scalar-reference placeholder that
more compactly represents the same thing.
NOTE: this function is experimental and may change or be removed without notice.
- void gv_try_downgrade(GV* gv)
Adds a name to a stash's internal list of effective names. See
hv_ename_delete
.
This is called when a stash is assigned to a new location in the symbol table.
- void hv_ename_add(HV *hv, const char *name, U32 len,
- U32 flags)
Removes a name from a stash's internal list of effective names. If this is
the name returned by HvENAME
, then another name in the list will take
its place (HvENAME
will use it).
This is called when a stash is deleted from the symbol table.
- void hv_ename_delete(HV *hv, const char *name,
- U32 len, U32 flags)
Generates and returns a HV *
representing the content of a
refcounted_he
chain.
flags is currently unused and must be zero.
- HV * refcounted_he_chain_2hv(
- const struct refcounted_he *c, U32 flags
- )
Like refcounted_he_fetch_pvn, but takes a nul-terminated string instead of a string/length pair.
- SV * refcounted_he_fetch_pv(
- const struct refcounted_he *chain,
- const char *key, U32 hash, U32 flags
- )
Search along a refcounted_he
chain for an entry with the key specified
by keypv and keylen. If flags has the REFCOUNTED_HE_KEY_UTF8
bit set, the key octets are interpreted as UTF-8, otherwise they
are interpreted as Latin-1. hash is a precomputed hash of the key
string, or zero if it has not been precomputed. Returns a mortal scalar
representing the value associated with the key, or &PL_sv_placeholder
if there is no value associated with the key.
- SV * refcounted_he_fetch_pvn(
- const struct refcounted_he *chain,
- const char *keypv, STRLEN keylen, U32 hash,
- U32 flags
- )
Like refcounted_he_fetch_pvn, but takes a literal string instead of a string/length pair, and no precomputed hash.
- SV * refcounted_he_fetch_pvs(
- const struct refcounted_he *chain,
- const char *key, U32 flags
- )
Like refcounted_he_fetch_pvn, but takes a Perl scalar instead of a string/length pair.
- SV * refcounted_he_fetch_sv(
- const struct refcounted_he *chain, SV *key,
- U32 hash, U32 flags
- )
Decrements the reference count of a refcounted_he
by one. If the
reference count reaches zero the structure's memory is freed, which
(recursively) causes a reduction of its parent refcounted_he
's
reference count. It is safe to pass a null pointer to this function:
no action occurs in this case.
- void refcounted_he_free(struct refcounted_he *he)
Increment the reference count of a refcounted_he
. The pointer to the
refcounted_he
is also returned. It is safe to pass a null pointer
to this function: no action occurs and a null pointer is returned.
- struct refcounted_he * refcounted_he_inc(
- struct refcounted_he *he
- )
Like refcounted_he_new_pvn, but takes a nul-terminated string instead of a string/length pair.
- struct refcounted_he * refcounted_he_new_pv(
- struct refcounted_he *parent,
- const char *key, U32 hash,
- SV *value, U32 flags
- )
Creates a new refcounted_he
. This consists of a single key/value
pair and a reference to an existing refcounted_he
chain (which may
be empty), and thus forms a longer chain. When using the longer chain,
the new key/value pair takes precedence over any entry for the same key
further along the chain.
The new key is specified by keypv and keylen. If flags has
the REFCOUNTED_HE_KEY_UTF8
bit set, the key octets are interpreted
as UTF-8, otherwise they are interpreted as Latin-1. hash is
a precomputed hash of the key string, or zero if it has not been
precomputed.
value is the scalar value to store for this key. value is copied
by this function, which thus does not take ownership of any reference
to it, and later changes to the scalar will not be reflected in the
value visible in the refcounted_he
. Complex types of scalar will not
be stored with referential integrity, but will be coerced to strings.
value may be either null or &PL_sv_placeholder
to indicate that no
value is to be associated with the key; this, as with any non-null value,
takes precedence over the existence of a value for the key further along
the chain.
parent points to the rest of the refcounted_he
chain to be
attached to the new refcounted_he
. This function takes ownership
of one reference to parent, and returns one reference to the new
refcounted_he
.
- struct refcounted_he * refcounted_he_new_pvn(
- struct refcounted_he *parent,
- const char *keypv,
- STRLEN keylen, U32 hash,
- SV *value, U32 flags
- )
Like refcounted_he_new_pvn, but takes a literal string instead of a string/length pair, and no precomputed hash.
- struct refcounted_he * refcounted_he_new_pvs(
- struct refcounted_he *parent,
- const char *key, SV *value,
- U32 flags
- )
Like refcounted_he_new_pvn, but takes a Perl scalar instead of a string/length pair.
- struct refcounted_he * refcounted_he_new_sv(
- struct refcounted_he *parent,
- SV *key, U32 hash, SV *value,
- U32 flags
- )
Function called by do_readline
to spawn a glob (or do the glob inside
perl on VMS). This code used to be inline, but now perl uses File::Glob
this glob starter is only used by miniperl during the build process.
Moving it away shrinks pp_hot.c; shrinking pp_hot.c helps speed perl up.
NOTE: this function is experimental and may change or be removed without notice.
- PerlIO* start_glob(SV *tmpglob, IO *io)
Triggered by a delete from %^H, records the key to
PL_compiling.cop_hints_hash
.
- int magic_clearhint(SV* sv, MAGIC* mg)
Triggered by clearing %^H, resets PL_compiling.cop_hints_hash
.
- int magic_clearhints(SV* sv, MAGIC* mg)
Invoke a magic method (like FETCH).
sv
and mg
are the tied thingy and the tie magic.
meth
is the name of the method to call.
argc
is the number of args (in addition to $self) to pass to the method.
The flags
can be:
The arguments themselves are any values following the flags
argument.
Returns the SV (if any) returned by the method, or NULL on failure.
- SV* magic_methcall(SV *sv, const MAGIC *mg,
- const char *meth, U32 flags,
- U32 argc, ...)
Triggered by a store to %^H, records the key/value pair to
PL_compiling.cop_hints_hash
. It is assumed that hints aren't storing
anything that would need a deep copy. Maybe we should warn if we find a
reference.
- int magic_sethint(SV* sv, MAGIC* mg)
Copy some of the magic from an existing SV to new localized version of that SV. Container magic (eg %ENV, $1, tie) gets copied, value magic doesn't (eg taint, pos).
If setmagic is false then no set magic will be called on the new (empty) SV. This typically means that assignment will soon follow (e.g. 'local $x = $y'), and that will handle the magic.
- void mg_localize(SV* sv, SV* nsv, bool setmagic)
Returns the Depth-First Search linearization of @ISA
the given stash. The return value is a read-only AV*.
level
should be 0 (it is used internally in this
function's recursion).
You are responsible for SvREFCNT_inc()
on the
return value if you plan to store it anywhere
semi-permanently (otherwise it might be deleted
out from under you the next time the cache is
invalidated).
- AV* mro_get_linear_isa_dfs(HV* stash, U32 level)
Takes the necessary steps (cache invalidations, mostly)
when the @ISA of the given package has changed. Invoked
by the setisa
magic, should not need to invoke directly.
- void mro_isa_changed_in(HV* stash)
Call this function to signal to a stash that it has been assigned to
another spot in the stash hierarchy. stash
is the stash that has been
assigned. oldstash
is the stash it replaces, if any. gv
is the glob
that is actually being assigned to.
This can also be called with a null first argument to
indicate that oldstash
has been deleted.
This function invalidates isa caches on the old stash, on all subpackages
nested inside it, and on the subclasses of all those, including
non-existent packages that have corresponding entries in stash
.
It also sets the effective names (HvENAME
) on all the stashes as
appropriate.
If the gv
is present and is not in the symbol table, then this function
simply returns. This checked will be skipped if flags & 1
.
- void mro_package_moved(HV * const stash,
- HV * const oldstash,
- const GV * const gv,
- U32 flags)
This function finalizes the optree. Should be called directly after the complete optree is built. It does some additional checking which can't be done in the normal ck_xxx functions and makes the tree thread-safe.
- void finalize_optree(OP* o)
Save the current pad in the given context block structure.
- void CX_CURPAD_SAVE(struct context)
Access the SV at offset po in the saved current pad in the given context block structure (can be used as an lvalue).
- SV * CX_CURPAD_SV(struct context, PADOFFSET po)
Whether this is an "our" variable.
- bool PadnameIsOUR(PADNAME pn)
Whether this is a "state" variable.
- bool PadnameIsSTATE(PADNAME pn)
The stash in which this "our" variable was declared.
- HV * PadnameOURSTASH()
Whether this entry belongs to an outer pad.
- bool PadnameOUTER(PADNAME pn)
The stash associated with a typed lexical. This returns the %Foo:: hash
for my Foo $bar
.
- HV * PadnameTYPE(PADNAME pn)
Get the value from slot po
in the base (DEPTH=1) pad of a padlist
- SV * PAD_BASE_SV(PADLIST padlist, PADOFFSET po)
Clone the state variables associated with running and compiling pads.
- void PAD_CLONE_VARS(PerlInterpreter *proto_perl,
- CLONE_PARAMS* param)
Return the flags for the current compiling pad name
at offset po
. Assumes a valid slot entry.
- U32 PAD_COMPNAME_FLAGS(PADOFFSET po)
The generation number of the name at offset po
in the current
compiling pad (lvalue). Note that SvUVX
is hijacked for this purpose.
- STRLEN PAD_COMPNAME_GEN(PADOFFSET po)
Sets the generation number of the name at offset po
in the current
ling pad (lvalue) to gen
. Note that SvUV_set
is hijacked for this purpose.
- STRLEN PAD_COMPNAME_GEN_set(PADOFFSET po, int gen)
Return the stash associated with an our variable.
Assumes the slot entry is a valid our lexical.
- HV * PAD_COMPNAME_OURSTASH(PADOFFSET po)
Return the name of the current compiling pad name
at offset po
. Assumes a valid slot entry.
- char * PAD_COMPNAME_PV(PADOFFSET po)
Return the type (stash) of the current compiling pad name at offset
po
. Must be a valid name. Returns null if not typed.
- HV * PAD_COMPNAME_TYPE(PADOFFSET po)
When PERL_MAD is enabled, this is a small no-op function that gets called at the start of each pad-related function. It can be breakpointed to track all pad operations. The parameter is a string indicating the type of pad operation being performed.
NOTE: this function is experimental and may change or be removed without notice.
- void pad_peg(const char *s)
Restore the old pad saved into the local variable opad by PAD_SAVE_LOCAL()
- void PAD_RESTORE_LOCAL(PAD *opad)
Save the current pad to the local variable opad, then make the current pad equal to npad
- void PAD_SAVE_LOCAL(PAD *opad, PAD *npad)
Save the current pad then set it to null.
- void PAD_SAVE_SETNULLPAD()
Set the slot at offset po
in the current pad to sv
- SV * PAD_SETSV(PADOFFSET po, SV* sv)
Set the current pad to be pad n
in the padlist, saving
the previous current pad. NB currently this macro expands to a string too
long for some compilers, so it's best to replace it with
- SAVECOMPPAD();
- PAD_SET_CUR_NOSAVE(padlist,n);
- void PAD_SET_CUR(PADLIST padlist, I32 n)
like PAD_SET_CUR, but without the save
- void PAD_SET_CUR_NOSAVE(PADLIST padlist, I32 n)
Get the value at offset po
in the current pad
- void PAD_SV(PADOFFSET po)
Lightweight and lvalue version of PAD_SV
.
Get or set the value at offset po
in the current pad.
Unlike PAD_SV
, does not print diagnostics with -DX.
For internal use only.
- SV * PAD_SVl(PADOFFSET po)
Clear the pointed to pad value on scope exit. (i.e. the runtime action of 'my')
- void SAVECLEARSV(SV **svp)
save PL_comppad and PL_curpad
- void SAVECOMPPAD()
Save a pad slot (used to restore after an iteration)
XXX DAPM it would make more sense to make the arg a PADOFFSET void SAVEPADSV(PADOFFSET po)
When Perl is run in debugging mode, with the -d switch, this SV is a
boolean which indicates whether subs are being single-stepped.
Single-stepping is automatically turned on after every step. This is the C
variable which corresponds to Perl's $DB::single variable. See
PL_DBsub
.
- SV * PL_DBsingle
When Perl is run in debugging mode, with the -d switch, this GV contains
the SV which holds the name of the sub being debugged. This is the C
variable which corresponds to Perl's $DB::sub variable. See
PL_DBsingle
.
- GV * PL_DBsub
Trace variable used when Perl is run in debugging mode, with the -d
switch. This is the C variable which corresponds to Perl's $DB::trace
variable. See PL_DBsingle
.
- SV * PL_DBtrace
The C variable which corresponds to Perl's $^W warning variable.
- bool PL_dowarn
The GV which was last used for a filehandle input operation. (<FH>
)
- GV* PL_last_in_gv
The glob containing the output field separator - *,
in Perl space.
- GV* PL_ofsgv
The input record separator - $/
in Perl space.
- SV* PL_rs
Declare Just SP
. This is actually identical to dSP
, and declares
a local copy of perl's stack pointer, available via the SP
macro.
See SP
. (Available for backward source code compatibility with the
old (Perl 5.005) thread model.)
- djSP;
True if this op will be the return value of an lvalue subroutine
A quick flag check to see whether an sv should be passed to sv_force_normal to be "downgraded" before SvIVX or SvPVX can be modified directly.
For example, if your scalar is a reference and you want to modify the SvIVX slot, you can't just do SvROK_off, as that will leak the referent.
This is used internally by various sv-modifying functions, such as sv_setsv, sv_setiv and sv_pvn_force.
One case that this does not handle is a gv without SvFAKE set. After
- if (SvTHINKFIRST(gv)) sv_force_normal(gv);
it will still be a gv.
SvTHINKFIRST sometimes produces false positives. In those cases sv_force_normal does nothing.
- U32 SvTHINKFIRST(SV *sv)
Given a chunk of memory, link it to the head of the list of arenas, and split it into a list of free SVs.
- void sv_add_arena(char *const ptr, const U32 size,
- const U32 flags)
Decrement the refcnt of each remaining SV, possibly triggering a cleanup. This function may have to be called multiple times to free SVs which are in complex self-referential hierarchies.
- I32 sv_clean_all()
Attempt to destroy all objects not yet freed.
- void sv_clean_objs()
Deallocate the memory used by all arenas. Note that all the individual SV heads and bodies within the arenas must already have been freed.
- void sv_free_arenas()
Return an SV with the numeric value of the source SV, doing any necessary
reference or overload conversion. You must use the SvNUM(sv)
macro to
access this function.
NOTE: this function is experimental and may change or be removed without notice.
- SV* sv_2num(SV *const sv)
Copies a stringified representation of the source SV into the destination SV. Automatically performs any necessary mg_get and coercion of numeric values into strings. Guaranteed to preserve UTF8 flag even from overloaded objects. Similar in nature to sv_2pv[_flags] but operates directly on an SV instead of just the string. Mostly uses sv_2pv_flags to do its work, except when that would lose the UTF-8'ness of the PV.
- void sv_copypv(SV *const dsv, SV *const ssv)
Returns a SV describing what the SV passed in is a reference to.
- SV* sv_ref(SV *dst, const SV *const sv,
- const int ob)
Find the name of the undefined variable (if any) that caused the operator to issue a "Use of uninitialized value" warning. If match is true, only return a name if its value matches uninit_sv. So roughly speaking, if a unary operator (such as OP_COS) generates a warning, then following the direct child of the op may yield an OP_PADSV or OP_GV that gives the name of the undefined variable. On the other hand, with OP_ADD there are two branches to follow, so we only print the variable name if we get an exact match.
The name is returned as a mortal SV.
Assumes that PL_op is the op that originally triggered the error, and that PL_comppad/PL_curpad points to the currently executing pad.
NOTE: this function is experimental and may change or be removed without notice.
- SV* find_uninit_var(const OP *const obase,
- const SV *const uninit_sv,
- bool top)
Print appropriate "Use of uninitialized variable" warning.
- void report_uninit(const SV *uninit_sv)
The following functions are currently undocumented. If you use one of them, you may wish to consider creating and submitting documentation for it.
The autodocumentation system was originally added to the Perl core by Benjamin Stuhl. Documentation is by whoever was kind enough to document their functions.
perlinterp - An overview of the Perl interpreter
This document provides an overview of how the Perl interpreter works at the level of C code, along with pointers to the relevant C source code files.
The work of the interpreter has two main stages: compiling the code into the internal representation, or bytecode, and then executing it. Compiled code in perlguts explains exactly how the compilation stage happens.
Here is a short breakdown of perl's operation:
The action begins in perlmain.c. (or miniperlmain.c for miniperl) This is very high-level code, enough to fit on a single screen, and it resembles the code found in perlembed; most of the real action takes place in perl.c
perlmain.c is generated by ExtUtils::Miniperl
from
miniperlmain.c at make time, so you should make perl to follow this
along.
First, perlmain.c allocates some memory and constructs a Perl interpreter, along these lines:
- 1 PERL_SYS_INIT3(&argc,&argv,&env);
- 2
- 3 if (!PL_do_undump) {
- 4 my_perl = perl_alloc();
- 5 if (!my_perl)
- 6 exit(1);
- 7 perl_construct(my_perl);
- 8 PL_perl_destruct_level = 0;
- 9 }
Line 1 is a macro, and its definition is dependent on your operating
system. Line 3 references PL_do_undump
, a global variable - all
global variables in Perl start with PL_
. This tells you whether the
current running program was created with the -u
flag to perl and
then undump, which means it's going to be false in any sane context.
Line 4 calls a function in perl.c to allocate memory for a Perl interpreter. It's quite a simple function, and the guts of it looks like this:
- my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
Here you see an example of Perl's system abstraction, which we'll see
later: PerlMem_malloc
is either your system's malloc
, or Perl's
own malloc
as defined in malloc.c if you selected that option at
configure time.
Next, in line 7, we construct the interpreter using perl_construct, also in perl.c; this sets up all the special variables that Perl needs, the stacks, and so on.
Now we pass Perl the command line options, and tell it to go:
- exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
- if (!exitstatus)
- perl_run(my_perl);
- exitstatus = perl_destruct(my_perl);
- perl_free(my_perl);
perl_parse
is actually a wrapper around S_parse_body
, as defined
in perl.c, which processes the command line options, sets up any
statically linked XS modules, opens the program and calls yyparse
to
parse it.
The aim of this stage is to take the Perl source, and turn it into an op tree. We'll see what one of those looks like later. Strictly speaking, there's three things going on here.
yyparse
, the parser, lives in perly.c, although you're better off
reading the original YACC input in perly.y. (Yes, Virginia, there
is a YACC grammar for Perl!) The job of the parser is to take your
code and "understand" it, splitting it into sentences, deciding which
operands go with which operators and so on.
The parser is nobly assisted by the lexer, which chunks up your input
into tokens, and decides what type of thing each token is: a variable
name, an operator, a bareword, a subroutine, a core function, and so
on. The main point of entry to the lexer is yylex
, and that and its
associated routines can be found in toke.c. Perl isn't much like
other computer languages; it's highly context sensitive at times, it
can be tricky to work out what sort of token something is, or where a
token ends. As such, there's a lot of interplay between the tokeniser
and the parser, which can get pretty frightening if you're not used to
it.
As the parser understands a Perl program, it builds up a tree of operations for the interpreter to perform during execution. The routines which construct and link together the various operations are to be found in op.c, and will be examined later.
Now the parsing stage is complete, and the finished tree represents the
operations that the Perl interpreter needs to perform to execute our
program. Next, Perl does a dry run over the tree looking for
optimisations: constant expressions such as 3 + 4
will be computed
now, and the optimizer will also see if any multiple operations can be
replaced with a single one. For instance, to fetch the variable
$foo
, instead of grabbing the glob *foo
and looking at the scalar
component, the optimizer fiddles the op tree to use a function which
directly looks up the scalar in question. The main optimizer is peep
in op.c, and many ops have their own optimizing functions.
Now we're finally ready to go: we have compiled Perl byte code, and all
that's left to do is run it. The actual execution is done by the
runops_standard
function in run.c; more specifically, it's done
by these three innocent looking lines:
- while ((PL_op = PL_op->op_ppaddr(aTHX))) {
- PERL_ASYNC_CHECK();
- }
You may be more comfortable with the Perl version of that:
- PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
Well, maybe not. Anyway, each op contains a function pointer, which
stipulates the function which will actually carry out the operation.
This function will return the next op in the sequence - this allows for
things like if
which choose the next op dynamically at run time. The
PERL_ASYNC_CHECK
makes sure that things like signals interrupt
execution if required.
The actual functions called are known as PP code, and they're spread
between four files: pp_hot.c contains the "hot" code, which is most
often used and highly optimized, pp_sys.c contains all the
system-specific functions, pp_ctl.c contains the functions which
implement control structures (if
, while
and the like) and pp.c
contains everything else. These are, if you like, the C code for Perl's
built-in functions and operators.
Note that each pp_
function is expected to return a pointer to the
next op. Calls to perl subs (and eval blocks) are handled within the
same runops loop, and do not consume extra space on the C stack. For
example, pp_entersub
and pp_entertry
just push a CxSUB
or
CxEVAL
block struct onto the context stack which contain the address
of the op following the sub call or eval. They then return the first op
of that sub or eval block, and so execution continues of that sub or
block. Later, a pp_leavesub
or pp_leavetry
op pops the CxSUB
or CxEVAL
, retrieves the return op from it, and returns it.
Perl's exception handing (i.e. die etc.) is built on top of the
low-level setjmp()
/longjmp()
C-library functions. These basically
provide a way to capture the current PC and SP registers and later
restore them; i.e. a longjmp()
continues at the point in code where
a previous setjmp()
was done, with anything further up on the C
stack being lost. This is why code should always save values using
SAVE_FOO
rather than in auto variables.
The perl core wraps setjmp()
etc in the macros JMPENV_PUSH
and
JMPENV_JUMP
. The basic rule of perl exceptions is that exit, and
die (in the absence of eval) perform a JMPENV_JUMP(2)
, while
die within eval does a JMPENV_JUMP(3)
.
At entry points to perl, such as perl_parse()
, perl_run()
and
call_sv(cv, G_EVAL)
each does a JMPENV_PUSH
, then enter a runops
loop or whatever, and handle possible exception returns. For a 2
return, final cleanup is performed, such as popping stacks and calling
CHECK
or END
blocks. Amongst other things, this is how scope
cleanup still occurs during an exit.
If a die can find a CxEVAL
block on the context stack, then the
stack is popped to that level and the return op in that block is
assigned to PL_restartop
; then a JMPENV_JUMP(3)
is performed.
This normally passes control back to the guard. In the case of
perl_run
and call_sv
, a non-null PL_restartop
triggers
re-entry to the runops loop. The is the normal way that die or
croak
is handled within an eval.
Sometimes ops are executed within an inner runops loop, such as tie, sort or overload code. In this case, something like
would cause a longjmp right back to the guard in perl_run
, popping
both runops loops, which is clearly incorrect. One way to avoid this is
for the tie code to do a JMPENV_PUSH
before executing FETCH
in
the inner runops loop, but for efficiency reasons, perl in fact just
sets a flag, using CATCH_SET(TRUE)
. The pp_require
,
pp_entereval
and pp_entertry
ops check this flag, and if true,
they call docatch
, which does a JMPENV_PUSH
and starts a new
runops level to execute the code, rather than doing it on the current
loop.
As a further optimisation, on exit from the eval block in the FETCH
,
execution of the code following the block is still carried on in the
inner loop. When an exception is raised, docatch
compares the
JMPENV
level of the CxEVAL
with PL_top_env
and if they differ,
just re-throws the exception. In this way any inner loops get popped.
Here's an example.
- 1: eval { tie @a, 'A' };
- 2: sub A::TIEARRAY {
- 3: eval { die };
- 4: die;
- 5: }
To run this code, perl_run
is called, which does a JMPENV_PUSH
then enters a runops loop. This loop executes the eval and tie ops on
line 1, with the eval pushing a CxEVAL
onto the context stack.
The pp_tie
does a CATCH_SET(TRUE)
, then starts a second runops
loop to execute the body of TIEARRAY
. When it executes the entertry
op on line 3, CATCH_GET
is true, so pp_entertry
calls docatch
which does a JMPENV_PUSH
and starts a third runops loop, which then
executes the die op. At this point the C call stack looks like this:
- Perl_pp_die
- Perl_runops # third loop
- S_docatch_body
- S_docatch
- Perl_pp_entertry
- Perl_runops # second loop
- S_call_body
- Perl_call_sv
- Perl_pp_tie
- Perl_runops # first loop
- S_run_body
- perl_run
- main
and the context and data stacks, as shown by -Dstv
, look like:
- STACK 0: MAIN
- CX 0: BLOCK =>
- CX 1: EVAL => AV() PV("A"\0)
- retop=leave
- STACK 1: MAGIC
- CX 0: SUB =>
- retop=(null)
- CX 1: EVAL => *
- retop=nextstate
The die pops the first CxEVAL
off the context stack, sets
PL_restartop
from it, does a JMPENV_JUMP(3)
, and control returns
to the top docatch
. This then starts another third-level runops
level, which executes the nextstate, pushmark and die ops on line 4. At
the point that the second pp_die
is called, the C call stack looks
exactly like that above, even though we are no longer within an inner
eval; this is because of the optimization mentioned earlier. However,
the context stack now looks like this, ie with the top CxEVAL popped:
- STACK 0: MAIN
- CX 0: BLOCK =>
- CX 1: EVAL => AV() PV("A"\0)
- retop=leave
- STACK 1: MAGIC
- CX 0: SUB =>
- retop=(null)
The die on line 4 pops the context stack back down to the CxEVAL, leaving it as:
- STACK 0: MAIN
- CX 0: BLOCK =>
As usual, PL_restartop
is extracted from the CxEVAL
, and a
JMPENV_JUMP(3)
done, which pops the C stack back to the docatch:
- S_docatch
- Perl_pp_entertry
- Perl_runops # second loop
- S_call_body
- Perl_call_sv
- Perl_pp_tie
- Perl_runops # first loop
- S_run_body
- perl_run
- main
In this case, because the JMPENV
level recorded in the CxEVAL
differs from the current one, docatch
just does a JMPENV_JUMP(3)
and the C stack unwinds to:
- perl_run
- main
Because PL_restartop
is non-null, run_body
starts a new runops
loop and execution continues.
You should by now have had a look at perlguts, which tells you about Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do that now.
These variables are used not only to represent Perl-space variables, but also any constants in the code, as well as some structures completely internal to Perl. The symbol table, for instance, is an ordinary Perl hash. Your code is represented by an SV as it's read into the parser; any program files you call are opened via ordinary Perl filehandles, and so on.
The core Devel::Peek module lets us examine SVs from a
Perl program. Let's see, for instance, how Perl treats the constant
"hello"
.
- % perl -MDevel::Peek -e 'Dump("hello")'
- 1 SV = PV(0xa041450) at 0xa04ecbc
- 2 REFCNT = 1
- 3 FLAGS = (POK,READONLY,pPOK)
- 4 PV = 0xa0484e0 "hello"\0
- 5 CUR = 5
- 6 LEN = 6
Reading Devel::Peek
output takes a bit of practise, so let's go
through it line by line.
Line 1 tells us we're looking at an SV which lives at 0xa04ecbc
in
memory. SVs themselves are very simple structures, but they contain a
pointer to a more complex structure. In this case, it's a PV, a
structure which holds a string value, at location 0xa041450
. Line 2
is the reference count; there are no other references to this data, so
it's 1.
Line 3 are the flags for this SV - it's OK to use it as a PV, it's a
read-only SV (because it's a constant) and the data is a PV internally.
Next we've got the contents of the string, starting at location
0xa0484e0
.
Line 5 gives us the current length of the string - note that this does
not include the null terminator. Line 6 is not the length of the
string, but the length of the currently allocated buffer; as the string
grows, Perl automatically extends the available storage via a routine
called SvGROW
.
You can get at any of these quantities from C very easily; just add
Sv
to the name of the field shown in the snippet, and you've got a
macro which will return the value: SvCUR(sv)
returns the current
length of the string, SvREFCOUNT(sv)
returns the reference count,
SvPV(sv, len)
returns the string itself with its length, and so on.
More macros to manipulate these properties can be found in perlguts.
Let's take an example of manipulating a PV, from sv_catpvn
, in
sv.c
- 1 void
- 2 Perl_sv_catpvn(pTHX_ SV *sv, const char *ptr, STRLEN len)
- 3 {
- 4 STRLEN tlen;
- 5 char *junk;
- 6 junk = SvPV_force(sv, tlen);
- 7 SvGROW(sv, tlen + len + 1);
- 8 if (ptr == junk)
- 9 ptr = SvPVX(sv);
- 10 Move(ptr,SvPVX(sv)+tlen,len,char);
- 11 SvCUR(sv) += len;
- 12 *SvEND(sv) = '\0';
- 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */
- 14 SvTAINT(sv);
- 15 }
This is a function which adds a string, ptr
, of length len
onto
the end of the PV stored in sv
. The first thing we do in line 6 is
make sure that the SV has a valid PV, by calling the SvPV_force
macro to force a PV. As a side effect, tlen
gets set to the current
value of the PV, and the PV itself is returned to junk
.
In line 7, we make sure that the SV will have enough room to
accommodate the old string, the new string and the null terminator. If
LEN
isn't big enough, SvGROW
will reallocate space for us.
Now, if junk
is the same as the string we're trying to add, we can
grab the string directly from the SV; SvPVX
is the address of the PV
in the SV.
Line 10 does the actual catenation: the Move
macro moves a chunk of
memory around: we move the string ptr
to the end of the PV - that's
the start of the PV plus its current length. We're moving len
bytes
of type char
. After doing so, we need to tell Perl we've extended
the string, by altering CUR
to reflect the new length. SvEND
is a
macro which gives us the end of the string, so that needs to be a
"\0"
.
Line 13 manipulates the flags; since we've changed the PV, any IV or NV
values will no longer be valid: if we have $a=10; $a.="6";
we don't
want to use the old IV of 10. SvPOK_only_utf8
is a special
UTF-8-aware version of SvPOK_only
, a macro which turns off the IOK
and NOK flags and turns on POK. The final SvTAINT
is a macro which
launders tainted data if taint mode is turned on.
AVs and HVs are more complicated, but SVs are by far the most common variable type being thrown around. Having seen something of how we manipulate these, let's go on and look at how the op tree is constructed.
First, what is the op tree, anyway? The op tree is the parsed representation of your program, as we saw in our section on parsing, and it's the sequence of operations that Perl goes through to execute your program, as we saw in Running.
An op is a fundamental operation that Perl can perform: all the built-in functions and operators are ops, and there are a series of ops which deal with concepts the interpreter needs internally - entering and leaving a block, ending a statement, fetching a variable, and so on.
The op tree is connected in two ways: you can imagine that there are two "routes" through it, two orders in which you can traverse the tree. First, parse order reflects how the parser understood the code, and secondly, execution order tells perl what order to perform the operations in.
The easiest way to examine the op tree is to stop Perl after it has finished parsing, and get it to dump out the tree. This is exactly what the compiler backends B::Terse, B::Concise and B::Debug do.
Let's have a look at how Perl sees $a = $b + $c
:
- % perl -MO=Terse -e '$a=$b+$c'
- 1 LISTOP (0x8179888) leave
- 2 OP (0x81798b0) enter
- 3 COP (0x8179850) nextstate
- 4 BINOP (0x8179828) sassign
- 5 BINOP (0x8179800) add [1]
- 6 UNOP (0x81796e0) null [15]
- 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b
- 8 UNOP (0x81797e0) null [15]
- 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c
- 10 UNOP (0x816b4f0) null [15]
- 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a
Let's start in the middle, at line 4. This is a BINOP, a binary
operator, which is at location 0x8179828
. The specific operator in
question is sassign
- scalar assignment - and you can find the code
which implements it in the function pp_sassign
in pp_hot.c. As a
binary operator, it has two children: the add operator, providing the
result of $b+$c
, is uppermost on line 5, and the left hand side is
on line 10.
Line 10 is the null op: this does exactly nothing. What is that doing there? If you see the null op, it's a sign that something has been optimized away after parsing. As we mentioned in Optimization, the optimization stage sometimes converts two operations into one, for example when fetching a scalar variable. When this happens, instead of rewriting the op tree and cleaning up the dangling pointers, it's easier just to replace the redundant operation with the null op. Originally, the tree would have looked like this:
- 10 SVOP (0x816b4f0) rv2sv [15]
- 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a
That is, fetch the a
entry from the main symbol table, and then look
at the scalar component of it: gvsv
(pp_gvsv
into pp_hot.c)
happens to do both these things.
The right hand side, starting at line 5 is similar to what we've just
seen: we have the add
op (pp_add
also in pp_hot.c) add
together two gvsv
s.
Now, what's this about?
- 1 LISTOP (0x8179888) leave
- 2 OP (0x81798b0) enter
- 3 COP (0x8179850) nextstate
enter
and leave
are scoping ops, and their job is to perform any
housekeeping every time you enter and leave a block: lexical variables
are tidied up, unreferenced variables are destroyed, and so on. Every
program will have those first three lines: leave
is a list, and its
children are all the statements in the block. Statements are delimited
by nextstate
, so a block is a collection of nextstate
ops, with
the ops to be performed for each statement being the children of
nextstate
. enter
is a single op which functions as a marker.
That's how Perl parsed the program, from top to bottom:
- Program
- |
- Statement
- |
- =
- / \
- / \
- $a +
- / \
- $b $c
However, it's impossible to perform the operations in this order:
you have to find the values of $b
and $c
before you add them
together, for instance. So, the other thread that runs through the op
tree is the execution order: each op has a field op_next
which
points to the next op to be run, so following these pointers tells us
how perl executes the code. We can traverse the tree in this order
using the exec option to B::Terse
:
- % perl -MO=Terse,exec -e '$a=$b+$c'
- 1 OP (0x8179928) enter
- 2 COP (0x81798c8) nextstate
- 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b
- 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c
- 5 BINOP (0x8179878) add [1]
- 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a
- 7 BINOP (0x81798a0) sassign
- 8 LISTOP (0x8179900) leave
This probably makes more sense for a human: enter a block, start a
statement. Get the values of $b
and $c
, and add them together.
Find $a
, and assign one to the other. Then leave.
The way Perl builds up these op trees in the parsing process can be
unravelled by examining perly.y, the YACC grammar. Let's take the
piece we need to construct the tree for $a = $b + $c
- 1 term : term ASSIGNOP term
- 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
- 3 | term ADDOP term
- 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
If you're not used to reading BNF grammars, this is how it works:
You're fed certain things by the tokeniser, which generally end up in
upper case. Here, ADDOP
, is provided when the tokeniser sees +
in
your code. ASSIGNOP
is provided when =
is used for assigning.
These are "terminal symbols", because you can't get any simpler than
them.
The grammar, lines one and three of the snippet above, tells you how to
build up more complex forms. These complex forms, "non-terminal
symbols" are generally placed in lower case. term
here is a
non-terminal symbol, representing a single expression.
The grammar gives you the following rule: you can make the thing on the
left of the colon if you see all the things on the right in sequence.
This is called a "reduction", and the aim of parsing is to completely
reduce the input. There are several different ways you can perform a
reduction, separated by vertical bars: so, term
followed by =
followed by term
makes a term
, and term
followed by +
followed by term
can also make a term
.
So, if you see two terms with an =
or +
, between them, you can
turn them into a single expression. When you do this, you execute the
code in the block on the next line: if you see =
, you'll do the code
in line 2. If you see +
, you'll do the code in line 4. It's this
code which contributes to the op tree.
- | term ADDOP term
- { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
What this does is creates a new binary op, and feeds it a number of
variables. The variables refer to the tokens: $1
is the first token
in the input, $2
the second, and so on - think regular expression
backreferences. $$
is the op returned from this reduction. So, we
call newBINOP
to create a new binary operator. The first parameter
to newBINOP
, a function in op.c, is the op type. It's an addition
operator, so we want the type to be ADDOP
. We could specify this
directly, but it's right there as the second token in the input, so we
use $2
. The second parameter is the op's flags: 0 means "nothing
special". Then the things to add: the left and right hand side of our
expression, in scalar context.
When perl executes something like addop
, how does it pass on its
results to the next op? The answer is, through the use of stacks. Perl
has a number of stacks to store things it's currently working on, and
we'll look at the three most important ones here.
Arguments are passed to PP code and returned from PP code using the
argument stack, ST
. The typical way to handle arguments is to pop
them off the stack, deal with them how you wish, and then push the
result back onto the stack. This is how, for instance, the cosine
operator works:
- NV value;
- value = POPn;
- value = Perl_cos(value);
- XPUSHn(value);
We'll see a more tricky example of this when we consider Perl's macros
below. POPn
gives you the NV (floating point value) of the top SV on
the stack: the $x
in cos($x). Then we compute the cosine, and
push the result back as an NV. The X
in XPUSHn
means that the
stack should be extended if necessary - it can't be necessary here,
because we know there's room for one more item on the stack, since
we've just removed one! The XPUSH*
macros at least guarantee safety.
Alternatively, you can fiddle with the stack directly: SP
gives you
the first element in your portion of the stack, and TOP*
gives you
the top SV/IV/NV/etc. on the stack. So, for instance, to do unary
negation of an integer:
- SETi(-TOPi);
Just set the integer value of the top stack entry to its negation.
Argument stack manipulation in the core is exactly the same as it is in XSUBs - see perlxstut, perlxs and perlguts for a longer description of the macros used in stack manipulation.
I say "your portion of the stack" above because PP code doesn't
necessarily get the whole stack to itself: if your function calls
another function, you'll only want to expose the arguments aimed for
the called function, and not (necessarily) let it get at your own data.
The way we do this is to have a "virtual" bottom-of-stack, exposed to
each function. The mark stack keeps bookmarks to locations in the
argument stack usable by each function. For instance, when dealing with
a tied variable, (internally, something with "P" magic) Perl has to
call methods for accesses to the tied variables. However, we need to
separate the arguments exposed to the method to the argument exposed to
the original function - the store or fetch or whatever it may be.
Here's roughly how the tied push is implemented; see av_push
in
av.c:
- 1 PUSHMARK(SP);
- 2 EXTEND(SP,2);
- 3 PUSHs(SvTIED_obj((SV*)av, mg));
- 4 PUSHs(val);
- 5 PUTBACK;
- 6 ENTER;
- 7 call_method("PUSH", G_SCALAR|G_DISCARD);
- 8 LEAVE;
Let's examine the whole implementation, for practice:
- 1 PUSHMARK(SP);
Push the current state of the stack pointer onto the mark stack. This is so that when we've finished adding items to the argument stack, Perl knows how many things we've added recently.
- 2 EXTEND(SP,2);
- 3 PUSHs(SvTIED_obj((SV*)av, mg));
- 4 PUSHs(val);
We're going to add two more items onto the argument stack: when you
have a tied array, the PUSH
subroutine receives the object and the
value to be pushed, and that's exactly what we have here - the tied
object, retrieved with SvTIED_obj
, and the value, the SV val
.
- 5 PUTBACK;
Next we tell Perl to update the global stack pointer from our internal
variable: dSP
only gave us a local copy, not a reference to the
global.
- 6 ENTER;
- 7 call_method("PUSH", G_SCALAR|G_DISCARD);
- 8 LEAVE;
ENTER
and LEAVE
localise a block of code - they make sure that
all variables are tidied up, everything that has been localised gets
its previous value returned, and so on. Think of them as the { and
} of a Perl block.
To actually do the magic method call, we have to call a subroutine in
Perl space: call_method
takes care of that, and it's described in
perlcall. We call the PUSH
method in scalar context, and we're
going to discard its return value. The call_method() function removes
the top element of the mark stack, so there is nothing for the caller
to clean up.
C doesn't have a concept of local scope, so perl provides one. We've
seen that ENTER
and LEAVE
are used as scoping braces; the save
stack implements the C equivalent of, for example:
- {
- local $foo = 42;
- ...
- }
See Localizing changes in perlguts for how to use the save stack.
One thing you'll notice about the Perl source is that it's full of macros. Some have called the pervasive use of macros the hardest thing to understand, others find it adds to clarity. Let's take an example, the code which implements the addition operator:
- 1 PP(pp_add)
- 2 {
- 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
- 4 {
- 5 dPOPTOPnnrl_ul;
- 6 SETn( left + right );
- 7 RETURN;
- 8 }
- 9 }
Every line here (apart from the braces, of course) contains a macro. The first line sets up the function declaration as Perl expects for PP code; line 3 sets up variable declarations for the argument stack and the target, the return value of the operation. Finally, it tries to see if the addition operation is overloaded; if so, the appropriate subroutine is called.
Line 5 is another variable declaration - all variable declarations
start with d
- which pops from the top of the argument stack two NVs
(hence nn
) and puts them into the variables right
and left
,
hence the rl
. These are the two operands to the addition operator.
Next, we call SETn
to set the NV of the return value to the result
of adding the two values. This done, we return - the RETURN
macro
makes sure that our return value is properly handled, and we pass the
next operator to run back to the main run loop.
Most of these macros are explained in perlapi, and some of the more
important ones are explained in perlxs as well. Pay special
attention to Background and PERL_IMPLICIT_CONTEXT in perlguts for
information on the [pad]THX_? macros.
For more information on the Perl internals, please see the documents listed at Internals and C Language Interface in perl.
perlintro -- a brief introduction and overview of Perl
This document is intended to give you a quick overview of the Perl programming language, along with pointers to further documentation. It is intended as a "bootstrap" guide for those who are new to the language, and provides just enough information for you to be able to read other peoples' Perl and understand roughly what it's doing, or write your own simple scripts.
This introductory document does not aim to be complete. It does not even aim to be entirely accurate. In some cases perfection has been sacrificed in the goal of getting the general idea across. You are strongly advised to follow this introduction with more information from the full Perl manual, the table of contents to which can be found in perltoc.
Throughout this document you'll see references to other parts of the
Perl documentation. You can read that documentation using the perldoc
command or whatever method you're using to read this document.
Throughout Perl's documentation, you'll find numerous examples intended to help explain the discussed features. Please keep in mind that many of them are code fragments rather than complete programs.
These examples often reflect the style and preference of the author of
that piece of the documentation, and may be briefer than a corresponding
line of code in a real program. Except where otherwise noted, you
should assume that use strict
and use warnings
statements
appear earlier in the "program", and that any variables used have
already been declared, even if those declarations have been omitted
to make the example easier to read.
Do note that the examples have been written by many different authors over a period of several decades. Styles and techniques will therefore differ, although some effort has been made to not vary styles too widely in the same sections. Do not consider one style to be better than others - "There's More Than One Way To Do It" is one of Perl's mottos. After all, in your journey as a programmer, you are likely to encounter different styles.
Perl is a general-purpose programming language originally developed for text manipulation and now used for a wide range of tasks including system administration, web development, network programming, GUI development, and more.
The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). Its major features are that it's easy to use, supports both procedural and object-oriented (OO) programming, has powerful built-in support for text processing, and has one of the world's most impressive collections of third-party modules.
Different definitions of Perl are given in perl, perlfaq1 and no doubt other places. From this we can determine that Perl is different things to different people, but that lots of people think it's at least worth writing about.
To run a Perl program from the Unix command line:
- perl progname.pl
Alternatively, put this as the first line of your script:
- #!/usr/bin/env perl
... and run the script as /path/to/script.pl. Of course, it'll need
to be executable first, so chmod 755 script.pl (under Unix).
(This start line assumes you have the env program. You can also put
directly the path to your perl executable, like in #!/usr/bin/perl
).
For more information, including instructions for other platforms such as Windows and Mac OS, read perlrun.
Perl by default is very forgiving. In order to make it more robust it is recommended to start every program with the following lines:
The two additional lines request from perl to catch various common
problems in your code. They check different things so you need both. A
potential problem caught by use strict;
will cause your code to stop
immediately when it is encountered, while use warnings;
will merely
give a warning (like the command-line switch -w) and let your code run.
To read more about them check their respective manual pages at strict
and warnings.
A Perl script or program consists of one or more statements. These
statements are simply written in the script in a straightforward
fashion. There is no need to have a main()
function or anything of
that kind.
Perl statements end in a semi-colon:
- print "Hello, world";
Comments start with a hash symbol and run to the end of the line
- # This is a comment
Whitespace is irrelevant:
- "Hello, world"
- ;
... except inside quoted strings:
- # this would print with a linebreak in the middle
- print "Hello
- world";
Double quotes or single quotes may be used around literal strings:
However, only double quotes "interpolate" variables and special
characters such as newlines (\n
):
Numbers don't need quotes around them:
- print 42;
You can use parentheses for functions' arguments or omit them according to your personal taste. They are only required occasionally to clarify issues of precedence.
More detailed information about Perl syntax can be found in perlsyn.
Perl has three main variable types: scalars, arrays, and hashes.
A scalar represents a single value:
Scalar values can be strings, integers or floating point numbers, and Perl
will automatically convert between them as required. There is no need
to pre-declare your variable types, but you have to declare them using
the my keyword the first time you use them. (This is one of the
requirements of use strict;
.)
Scalar values can be used in various ways:
There are a number of "magic" scalars with names that look like
punctuation or line noise. These special variables are used for all
kinds of purposes, and are documented in perlvar. The only one you
need to know about for now is $_
which is the "default variable".
It's used as the default argument to a number of functions in Perl, and
it's set implicitly by certain looping constructs.
- print; # prints contents of $_ by default
An array represents a list of values:
Arrays are zero-indexed. Here's how you get at elements in an array:
The special variable $#array
tells you the index of the last element
of an array:
- print $mixed[$#mixed]; # last element, prints 1.23
You might be tempted to use $#array + 1
to tell you how many items there
are in an array. Don't bother. As it happens, using @array
where Perl
expects to find a scalar value ("in scalar context") will give you the number
of elements in the array:
- if (@animals < 5) { ... }
The elements we're getting from the array start with a $
because
we're getting just a single value out of the array; you ask for a scalar,
you get a scalar.
To get multiple values from an array:
- @animals[0,1]; # gives ("camel", "llama");
- @animals[0..2]; # gives ("camel", "llama", "owl");
- @animals[1..$#animals]; # gives all except the first element
This is called an "array slice".
You can do various useful things to lists:
There are a couple of special arrays too, such as @ARGV
(the command
line arguments to your script) and @_
(the arguments passed to a
subroutine). These are documented in perlvar.
A hash represents a set of key/value pairs:
- my %fruit_color = ("apple", "red", "banana", "yellow");
You can use whitespace and the =>
operator to lay them out more
nicely:
- my %fruit_color = (
- apple => "red",
- banana => "yellow",
- );
To get at hash elements:
- $fruit_color{"apple"}; # gives "red"
You can get at lists of keys and values with keys() and
values().
Hashes have no particular internal order, though you can sort the keys and loop through them.
Just like special scalars and arrays, there are also special hashes.
The most well known of these is %ENV
which contains environment
variables. Read all about it (and other special variables) in
perlvar.
Scalars, arrays and hashes are documented more fully in perldata.
More complex data types can be constructed using references, which allow you to build lists and hashes within lists and hashes.
A reference is a scalar value and can refer to any other Perl data type. So by storing a reference as the value of an array or hash element, you can easily create lists and hashes within lists and hashes. The following example shows a 2 level hash of hash structure using anonymous hash references.
Exhaustive information on the topic of references can be found in perlreftut, perllol, perlref and perldsc.
Throughout the previous section all the examples have used the syntax:
- my $var = "value";
The my is actually not required; you could just use:
- $var = "value";
However, the above usage will create global variables throughout your
program, which is bad programming practice. my creates lexically
scoped variables instead. The variables are scoped to the block
(i.e. a bunch of statements surrounded by curly-braces) in which they
are defined.
Using my in combination with a use strict;
at the top of
your Perl scripts means that the interpreter will pick up certain common
programming errors. For instance, in the example above, the final
print $y
would cause a compile-time error and prevent you from
running the program. Using strict
is highly recommended.
Perl has most of the usual conditional and looping constructs. As of Perl
5.10, it even has a case/switch statement (spelled given
/when
). See
Switch Statements in perlsyn for more details.
The conditions can be any Perl expression. See the list of operators in the next section for information on comparison and boolean logic operators, which are commonly used in conditional statements.
There's also a negated version of it:
- unless ( condition ) {
- ...
- }
This is provided as a more readable version of if (!condition).
Note that the braces are required in Perl, even if you've only got one line in the block. However, there is a clever way of making your one-line conditional blocks more English like:
- while ( condition ) {
- ...
- }
There's also a negated version, for the same reason we have unless
:
- until ( condition ) {
- ...
- }
You can also use while
in a post-condition:
Exactly like C:
- for ($i = 0; $i <= $max; $i++) {
- ...
- }
The C style for loop is rarely needed in Perl since Perl provides
the more friendly list scanning foreach
loop.
The foreach
keyword is actually a synonym for the for
keyword. See Foreach Loops in perlsyn.
For more detail on looping constructs (and some that weren't mentioned in this overview) see perlsyn.
Perl comes with a wide selection of builtin functions. Some of the ones
we've already seen include print, sort and reverse. A list of
them is given at the start of perlfunc and you can easily read
about any given function by using perldoc -f functionname.
Perl operators are documented in full in perlop, but here are a few of the most common ones:
- + addition
- - subtraction
- * multiplication
- / division
(Why do we have separate numeric and string comparisons? Because we don't have special variable types, and Perl needs to know whether to sort numerically (where 99 is less than 100) or alphabetically (where 100 comes before 99).
(and
, or
and not
aren't just in the above table as descriptions
of the operators. They're also supported as operators in their own
right. They're more readable than the C-style operators, but have
different precedence to && and friends. Check perlop for more
detail.)
- = assignment
- . string concatenation
- x string multiplication
- .. range operator (creates a list of numbers)
Many operators can be combined with a =
as follows:
- $a += 1; # same as $a = $a + 1
- $a -= 1; # same as $a = $a - 1
- $a .= "\n"; # same as $a = $a . "\n";
You can open a file for input or output using the open() function.
It's documented in extravagant detail in perlfunc and perlopentut,
but in short:
You can read from an open filehandle using the <>
operator. In
scalar context it reads a single line from the filehandle, and in list
context it reads the whole file in, assigning each line to an element of
the list:
Reading in the whole file at one time is called slurping. It can be useful but it may be a memory hog. Most text file processing can be done a line at a time with Perl's looping constructs.
The <>
operator is most often seen in a while
loop:
We've already seen how to print to standard output using print().
However, print() can also take an optional first argument specifying
which filehandle to print to:
When you're done with your filehandles, you should close() them
(though to be honest, Perl will clean up after you if you forget):
Perl's regular expression support is both broad and deep, and is the subject of lengthy documentation in perlrequick, perlretut, and elsewhere. However, in short:
The //
matching operator is documented in perlop. It operates on
$_
by default, or can be bound to another variable using the =~
binding operator (also documented in perlop).
- s/foo/bar/; # replaces foo with bar in $_
- $a =~ s/foo/bar/; # replaces foo with bar in $a
- $a =~ s/foo/bar/g; # replaces ALL INSTANCES of foo with bar
- # in $a
You don't just have to match on fixed strings. In fact, you can match on just about anything you could dream of by using more complex regular expressions. These are documented at great length in perlre, but for the meantime, here's a quick cheat sheet:
- . a single character
- \s a whitespace character (space, tab, newline,
- ...)
- \S non-whitespace character
- \d a digit (0-9)
- \D a non-digit
- \w a word character (a-z, A-Z, 0-9, _)
- \W a non-word character
- [aeiou] matches a single character in the given set
- [^aeiou] matches a single character outside the given
- set
- (foo|bar|baz) matches any of the alternatives specified
- ^ start of string
- $ end of string
Quantifiers can be used to specify how many of the previous thing you want to match on, where "thing" means either a literal character, one of the metacharacters listed above, or a group of characters or metacharacters in parentheses.
Some brief examples:
- /^\d+/ string starts with one or more digits
- /^$/ nothing in the string (start and end are
- adjacent)
- /(\d\s){3}/ three digits, each followed by a whitespace
- character (eg "3 4 5 ")
- /(a.)+/ matches a string in which every odd-numbered
- letter is a (eg "abacadaf")
- # This loop reads from STDIN, and prints non-blank lines:
- while (<>) {
- next if /^$/;
- print;
- }
As well as grouping, parentheses serve a second purpose. They can be
used to capture the results of parts of the regexp match for later use.
The results end up in $1
, $2
and so on.
Perl regexps also support backreferences, lookaheads, and all kinds of other complex details. Read all about them in perlrequick, perlretut, and perlre.
Writing subroutines is easy:
Now we can use the subroutine just as any other built-in function:
- logger("We have a logger subroutine!");
What's that shift? Well, the arguments to a subroutine are available
to us as a special array called @_
(see perlvar for more on that).
The default argument to the shift function just happens to be @_
.
So my $logmessage = shift;
shifts the first item off the list of
arguments and assigns it to $logmessage
.
We can manipulate @_
in other ways too:
Subroutines can also return values:
Then use it like:
- $sq = square(8);
For more information on writing subroutines, see perlsub.
OO Perl is relatively simple and is implemented using references which know what sort of object they are based on Perl's concept of packages. However, OO Perl is largely beyond the scope of this document. Read perlootut and perlobj.
As a beginning Perl programmer, your most common use of OO Perl will be in using third-party modules, which are documented below.
Perl modules provide a range of features to help you avoid reinventing the wheel, and can be downloaded from CPAN ( http://www.cpan.org/ ). A number of popular modules are included with the Perl distribution itself.
Categories of modules range from text manipulation to network protocols to database integration to graphics. A categorized list of modules is also available from CPAN.
To learn how to install modules you download from CPAN, read perlmodinstall.
To learn how to use a particular module, use perldoc Module::Name.
Typically you will want to use Module::Name, which will then give
you access to exported functions or an OO interface to the module.
perlfaq contains questions and answers related to many common tasks, and often provides suggestions for good CPAN modules to use.
perlmod describes Perl modules in general. perlmodlib lists the modules which came with your Perl installation.
If you feel the urge to write Perl modules, perlnewmod will give you good advice.
Kirrily "Skud" Robert <skud@cpan.org>
perliol - C API for Perl's implementation of IO in Layers.
- /* Defining a layer ... */
- #include <perliol.h>
This document describes the behavior and implementation of the PerlIO
abstraction described in perlapio when USE_PERLIO
is defined (and
USE_SFIO
is not).
The PerlIO abstraction was introduced in perl5.003_02 but languished as just an abstraction until perl5.7.0. However during that time a number of perl extensions switched to using it, so the API is mostly fixed to maintain (source) compatibility.
The aim of the implementation is to provide the PerlIO API in a flexible and platform neutral manner. It is also a trial of an "Object Oriented C, with vtables" approach which may be applied to Perl 6.
PerlIO is a stack of layers.
The low levels of the stack work with the low-level operating system calls (file descriptors in C) getting bytes in and out, the higher layers of the stack buffer, filter, and otherwise manipulate the I/O, and return characters (or bytes) to Perl. Terms above and below are used to refer to the relative positioning of the stack layers.
A layer contains a "vtable", the table of I/O operations (at C level a table of function pointers), and status flags. The functions in the vtable implement operations like "open", "read", and "write".
When I/O, for example "read", is requested, the request goes from Perl first down the stack using "read" functions of each layer, then at the bottom the input is requested from the operating system services, then the result is returned up the stack, finally being interpreted as Perl data.
The requests do not necessarily go always all the way down to the operating system: that's where PerlIO buffering comes into play.
When you do an open() and specify extra PerlIO layers to be deployed, the layers you specify are "pushed" on top of the already existing default stack. One way to see it is that "operating system is on the left" and "Perl is on the right".
What exact layers are in this default stack depends on a lot of things: your operating system, Perl version, Perl compile time configuration, and Perl runtime configuration. See PerlIO, PERLIO in perlrun, and open for more information.
binmode() operates similarly to open(): by default the specified layers are pushed on top of the existing stack.
However, note that even as the specified layers are "pushed on top" for open() and binmode(), this doesn't mean that the effects are limited to the "top": PerlIO layers can be very 'active' and inspect and affect layers also deeper in the stack. As an example there is a layer called "raw" which repeatedly "pops" layers until it reaches the first layer that has declared itself capable of handling binary data. The "pushed" layers are processed in left-to-right order.
sysopen() operates (unsurprisingly) at a lower level in the stack than open(). For example in Unix or Unix-like systems sysopen() operates directly at the level of file descriptors: in the terms of PerlIO layers, it uses only the "unix" layer, which is a rather thin wrapper on top of the Unix file descriptors.
Initial discussion of the ability to modify IO streams behaviour used the term "discipline" for the entities which were added. This came (I believe) from the use of the term in "sfio", which in turn borrowed it from "line disciplines" on Unix terminals. However, this document (and the C code) uses the term "layer".
This is, I hope, a natural term given the implementation, and should avoid connotations that are inherent in earlier uses of "discipline" for things which are rather different.
The basic data structure is a PerlIOl:
- typedef struct _PerlIO PerlIOl;
- typedef struct _PerlIO_funcs PerlIO_funcs;
- typedef PerlIOl *PerlIO;
- struct _PerlIO
- {
- PerlIOl * next; /* Lower layer */
- PerlIO_funcs * tab; /* Functions for this layer */
- IV flags; /* Various flags for state */
- };
A PerlIOl *
is a pointer to the struct, and the application
level PerlIO *
is a pointer to a PerlIOl *
- i.e. a pointer
to a pointer to the struct. This allows the application level PerlIO *
to remain constant while the actual PerlIOl *
underneath
changes. (Compare perl's SV *
which remains constant while its
sv_any
field changes as the scalar's type changes.) An IO stream is
then in general represented as a pointer to this linked-list of
"layers".
It should be noted that because of the double indirection in a PerlIO *
,
a &(perlio->next) "is" a PerlIO *
, and so to some degree
at least one layer can use the "standard" API on the next layer down.
A "layer" is composed of two parts:
The functions and attributes of the "layer class".
The per-instance data for a particular handle.
The functions and attributes are accessed via the "tab" (for table)
member of PerlIOl
. The functions (methods of the layer "class") are
fixed, and are defined by the PerlIO_funcs
type. They are broadly the
same as the public PerlIO_xxxxx
functions:
- struct _PerlIO_funcs
- {
- Size_t fsize;
- char * name;
- Size_t size;
- IV kind;
- IV (*Pushed)(pTHX_ PerlIO *f,const char *mode,SV *arg, PerlIO_funcs *tab);
- IV (*Popped)(pTHX_ PerlIO *f);
- PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab,
- PerlIO_list_t *layers, IV n,
- const char *mode,
- int fd, int imode, int perm,
- PerlIO *old,
- int narg, SV **args);
- IV (*Binmode)(pTHX_ PerlIO *f);
- SV * (*Getarg)(pTHX_ PerlIO *f, CLONE_PARAMS *param, int flags)
- IV (*Fileno)(pTHX_ PerlIO *f);
- PerlIO * (*Dup)(pTHX_ PerlIO *f, PerlIO *o, CLONE_PARAMS *param, int flags)
- /* Unix-like functions - cf sfio line disciplines */
- SSize_t (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count);
- SSize_t (*Unread)(pTHX_ PerlIO *f, const void *vbuf, Size_t count);
- SSize_t (*Write)(pTHX_ PerlIO *f, const void *vbuf, Size_t count);
- IV (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence);
- Off_t (*Tell)(pTHX_ PerlIO *f);
- IV (*Close)(pTHX_ PerlIO *f);
- /* Stdio-like buffered IO functions */
- IV (*Flush)(pTHX_ PerlIO *f);
- IV (*Fill)(pTHX_ PerlIO *f);
- IV (*Eof)(pTHX_ PerlIO *f);
- IV (*Error)(pTHX_ PerlIO *f);
- void (*Clearerr)(pTHX_ PerlIO *f);
- void (*Setlinebuf)(pTHX_ PerlIO *f);
- /* Perl's snooping functions */
- STDCHAR * (*Get_base)(pTHX_ PerlIO *f);
- Size_t (*Get_bufsiz)(pTHX_ PerlIO *f);
- STDCHAR * (*Get_ptr)(pTHX_ PerlIO *f);
- SSize_t (*Get_cnt)(pTHX_ PerlIO *f);
- void (*Set_ptrcnt)(pTHX_ PerlIO *f,STDCHAR *ptr,SSize_t cnt);
- };
The first few members of the struct give a function table size for
compatibility check "name" for the layer, the size to malloc
for the per-instance data,
and some flags which are attributes of the class as whole (such as whether it is a buffering
layer), then follow the functions which fall into four basic groups:
Opening and setup functions
Basic IO operations
Stdio class buffering options.
Functions to support Perl's traditional "fast" access to the buffer.
A layer does not have to implement all the functions, but the whole table has to be present. Unimplemented slots can be NULL (which will result in an error when called) or can be filled in with stubs to "inherit" behaviour from a "base class". This "inheritance" is fixed for all instances of the layer, but as the layer chooses which stubs to populate the table, limited "multiple inheritance" is possible.
The per-instance data are held in memory beyond the basic PerlIOl struct, by making a PerlIOl the first member of the layer's struct thus:
- typedef struct
- {
- struct _PerlIO base; /* Base "class" info */
- STDCHAR * buf; /* Start of buffer */
- STDCHAR * end; /* End of valid part of buffer */
- STDCHAR * ptr; /* Current position in buffer */
- Off_t posn; /* Offset of buf into the file */
- Size_t bufsiz; /* Real size of buffer */
- IV oneword; /* Emergency buffer */
- } PerlIOBuf;
In this way (as for perl's scalars) a pointer to a PerlIOBuf can be treated as a pointer to a PerlIOl.
- table perlio unix
- | |
- +-----------+ +----------+ +--------+
- PerlIO ->| |--->| next |--->| NULL |
- +-----------+ +----------+ +--------+
- | | | buffer | | fd |
- +-----------+ | | +--------+
- | | +----------+
The above attempts to show how the layer scheme works in a simple case.
The application's PerlIO *
points to an entry in the table(s)
representing open (allocated) handles. For example the first three slots
in the table correspond to stdin
,stdout
and stderr
. The table
in turn points to the current "top" layer for the handle - in this case
an instance of the generic buffering layer "perlio". That layer in turn
points to the next layer down - in this case the low-level "unix" layer.
The above is roughly equivalent to a "stdio" buffered stream, but with much more flexibility:
If Unix level read/write/lseek
is not appropriate for (say)
sockets then the "unix" layer can be replaced (at open time or even
dynamically) with a "socket" layer.
Different handles can have different buffering schemes. The "top"
layer could be the "mmap" layer if reading disk files was quicker
using mmap
than read. An "unbuffered" stream can be implemented
simply by not having a buffer layer.
Extra layers can be inserted to process the data as it flows through. This was the driving need for including the scheme in perl 5.7.0+ - we needed a mechanism to allow data to be translated between perl's internal encoding (conceptually at least Unicode as UTF-8), and the "native" format used by the system. This is provided by the ":encoding(xxxx)" layer which typically sits above the buffering layer.
A layer can be added that does "\n" to CRLF translation. This layer can be used on any platform, not just those that normally do such things.
The generic flag bits are a hybrid of O_XXXXX
style flags deduced
from the mode string passed to PerlIO_open()
, and state bits for
typical buffer layers.
End of file.
Writes are permitted, i.e. opened as "w" or "r+" or "a", etc.
Reads are permitted i.e. opened "r" or "w+" (or even "a+" - ick).
An error has occurred (for PerlIO_error()
).
Truncate file suggested by open mode.
All writes should be appends.
Layer is performing Win32-like "\n" mapped to CR,LF for output and CR,LF
mapped to "\n" for input. Normally the provided "crlf" layer is the only
layer that need bother about this. PerlIO_binmode()
will mess with this
flag rather than add/remove layers if the PERLIO_K_CANCRLF
bit is set
for the layers class.
Data written to this layer should be UTF-8 encoded; data provided by this layer should be considered UTF-8 encoded. Can be set on any layer by ":utf8" dummy layer. Also set on ":encoding" layer.
Layer is unbuffered - i.e. write to next layer down should occur for each write to this layer.
The buffer for this layer currently holds data written to it but not sent to next layer.
The buffer for this layer currently holds unconsumed data read from layer below.
Layer is line buffered. Write data should be passed to next layer down whenever a "\n" is seen. Any data beyond the "\n" should then be processed.
Handle is open.
This instance of this layer supports the "fast gets
" interface.
Normally set based on PERLIO_K_FASTGETS
for the class and by the
existence of the function(s) in the table. However a class that
normally provides that interface may need to avoid it on a
particular instance. The "pending" layer needs to do this when
it is pushed above a layer which does not support the interface.
(Perl's sv_gets()
does not expect the streams fast gets
behaviour
to change during one "get".)
- Size_t fsize;
Size of the function table. This is compared against the value PerlIO code "knows" as a compatibility check. Future versions may be able to tolerate layers compiled against an old version of the headers.
- char * name;
The name of the layer whose open() method Perl should invoke on open(). For example if the layer is called APR, you will call:
- open $fh, ">:APR", ...
and Perl knows that it has to invoke the PerlIOAPR_open() method implemented by the APR layer.
- Size_t size;
The size of the per-instance data structure, e.g.:
- sizeof(PerlIOAPR)
If this field is zero then PerlIO_pushed
does not malloc anything
and assumes layer's Pushed function will do any required layer stack
manipulation - used to avoid malloc/free overhead for dummy layers.
If the field is non-zero it must be at least the size of PerlIOl
,
PerlIO_pushed
will allocate memory for the layer's data structures
and link new layer onto the stream's stack. (If the layer's Pushed
method returns an error indication the layer is popped again.)
- IV kind;
The layer is buffered.
The layer is acceptable to have in a binmode(FH) stack - i.e. it does not (or will configure itself not to) transform bytes passing through it.
Layer can translate between "\n" and CRLF line ends.
Layer allows buffer snooping.
Used when the layer's open() accepts more arguments than usual. The
extra arguments should come not before the MODE
argument. When this
flag is used it's up to the layer to validate the args.
- IV (*Pushed)(pTHX_ PerlIO *f,const char *mode, SV *arg);
The only absolutely mandatory method. Called when the layer is pushed
onto the stack. The mode
argument may be NULL if this occurs
post-open. The arg
will be non-NULL
if an argument string was
passed. In most cases this should call PerlIOBase_pushed()
to
convert mode
into the appropriate PERLIO_F_XXXXX
flags in
addition to any actions the layer itself takes. If a layer is not
expecting an argument it need neither save the one passed to it, nor
provide Getarg()
(it could perhaps Perl_warn
that the argument
was un-expected).
Returns 0 on success. On failure returns -1 and should set errno.
- IV (*Popped)(pTHX_ PerlIO *f);
Called when the layer is popped from the stack. A layer will normally
be popped after Close()
is called. But a layer can be popped
without being closed if the program is dynamically managing layers on
the stream. In such cases Popped()
should free any resources
(buffers, translation tables, ...) not held directly in the layer's
struct. It should also Unread()
any unconsumed data that has been
read and buffered from the layer below back to that layer, so that it
can be re-provided to what ever is now above.
Returns 0 on success and failure. If Popped()
returns true then
perlio.c assumes that either the layer has popped itself, or the
layer is super special and needs to be retained for other reasons.
In most cases it should return false.
- PerlIO * (*Open)(...);
The Open()
method has lots of arguments because it combines the
functions of perl's open, PerlIO_open
, perl's sysopen,
PerlIO_fdopen
and PerlIO_reopen
. The full prototype is as
follows:
- PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab,
- PerlIO_list_t *layers, IV n,
- const char *mode,
- int fd, int imode, int perm,
- PerlIO *old,
- int narg, SV **args);
Open should (perhaps indirectly) call PerlIO_allocate()
to allocate
a slot in the table and associate it with the layers information for
the opened file, by calling PerlIO_push
. The layers is an
array of all the layers destined for the PerlIO *
, and any
arguments passed to them, n is the index into that array of the
layer being called. The macro PerlIOArg
will return a (possibly
NULL
) SV * for the argument passed to the layer.
The mode string is an "fopen()
-like" string which would match
the regular expression /^[I#]?[rwa]\+?[bt]?$/
.
The 'I'
prefix is used during creation of stdin
..stderr
via
special PerlIO_fdopen
calls; the '#'
prefix means that this is
sysopen and that imode and perm should be passed to
PerlLIO_open3
; 'r'
means read, 'w'
means write and
'a'
means append. The '+'
suffix means that both reading and
writing/appending are permitted. The 'b'
suffix means file should
be binary, and 't'
means it is text. (Almost all layers should do
the IO in binary mode, and ignore the b/t bits. The :crlf
layer
should be pushed to handle the distinction.)
If old is not NULL
then this is a PerlIO_reopen
. Perl itself
does not use this (yet?) and semantics are a little vague.
If fd not negative then it is the numeric file descriptor fd,
which will be open in a manner compatible with the supplied mode
string, the call is thus equivalent to PerlIO_fdopen
. In this case
nargs will be zero.
If nargs is greater than zero then it gives the number of arguments
passed to open, otherwise it will be 1 if for example
PerlIO_open
was called. In simple cases SvPV_nolen(*args) is the
pathname to open.
If a layer provides Open()
it should normally call the Open()
method of next layer down (if any) and then push itself on top if that
succeeds. PerlIOBase_open
is provided to do exactly that, so in
most cases you don't have to write your own Open()
method. If this
method is not defined, other layers may have difficulty pushing
themselves on top of it during open.
If PerlIO_push
was performed and open has failed, it must
PerlIO_pop
itself, since if it's not, the layer won't be removed
and may cause bad problems.
Returns NULL
on failure.
- IV (*Binmode)(pTHX_ PerlIO *f);
Optional. Used when :raw
layer is pushed (explicitly or as a result
of binmode(FH)). If not present layer will be popped. If present
should configure layer as binary (or pop itself) and return 0.
If it returns -1 for error binmode will fail with layer
still on the stack.
- SV * (*Getarg)(pTHX_ PerlIO *f,
- CLONE_PARAMS *param, int flags);
Optional. If present should return an SV * representing the string argument passed to the layer when it was pushed. e.g. ":encoding(ascii)" would return an SvPV with value "ascii". (param and flags arguments can be ignored in most cases)
Dup
uses Getarg
to retrieve the argument originally passed to
Pushed
, so you must implement this function if your layer has an
extra argument to Pushed
and will ever be Dup
ed.
- IV (*Fileno)(pTHX_ PerlIO *f);
Returns the Unix/Posix numeric file descriptor for the handle. Normally
PerlIOBase_fileno()
(which just asks next layer down) will suffice
for this.
Returns -1 on error, which is considered to include the case where the layer cannot provide such a file descriptor.
- PerlIO * (*Dup)(pTHX_ PerlIO *f, PerlIO *o,
- CLONE_PARAMS *param, int flags);
XXX: Needs more docs.
Used as part of the "clone" process when a thread is spawned (in which
case param will be non-NULL) and when a stream is being duplicated via
'&' in the open.
Similar to Open
, returns PerlIO* on success, NULL
on failure.
- SSize_t (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count);
Basic read operation.
Typically will call Fill
and manipulate pointers (possibly via the
API). PerlIOBuf_read()
may be suitable for derived classes which
provide "fast gets" methods.
Returns actual bytes read, or -1 on an error.
- SSize_t (*Unread)(pTHX_ PerlIO *f,
- const void *vbuf, Size_t count);
A superset of stdio's ungetc()
. Should arrange for future reads to
see the bytes in vbuf
. If there is no obviously better implementation
then PerlIOBase_unread()
provides the function by pushing a "fake"
"pending" layer above the calling layer.
Returns the number of unread chars.
- SSize_t (*Write)(PerlIO *f, const void *vbuf, Size_t count);
Basic write operation.
Returns bytes written or -1 on an error.
- IV (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence);
Position the file pointer. Should normally call its own Flush
method and then the Seek
method of next layer down.
Returns 0 on success, -1 on failure.
- Off_t (*Tell)(pTHX_ PerlIO *f);
Return the file pointer. May be based on layers cached concept of position to avoid overhead.
Returns -1 on failure to get the file pointer.
- IV (*Close)(pTHX_ PerlIO *f);
Close the stream. Should normally call PerlIOBase_close()
to flush
itself and close layers below, and then deallocate any data structures
(buffers, translation tables, ...) not held directly in the data
structure.
Returns 0 on success, -1 on failure.
- IV (*Flush)(pTHX_ PerlIO *f);
Should make stream's state consistent with layers below. That is, any
buffered write data should be written, and file position of lower layers
adjusted for data read from below but not actually consumed.
(Should perhaps Unread()
such data to the lower layer.)
Returns 0 on success, -1 on failure.
- IV (*Fill)(pTHX_ PerlIO *f);
The buffer for this layer should be filled (for read) from layer below. When you "subclass" PerlIOBuf layer, you want to use its _read method and to supply your own fill method, which fills the PerlIOBuf's buffer.
Returns 0 on success, -1 on failure.
- IV (*Eof)(pTHX_ PerlIO *f);
Return end-of-file indicator. PerlIOBase_eof()
is normally sufficient.
Returns 0 on end-of-file, 1 if not end-of-file, -1 on error.
- IV (*Error)(pTHX_ PerlIO *f);
Return error indicator. PerlIOBase_error()
is normally sufficient.
Returns 1 if there is an error (usually when PERLIO_F_ERROR
is set),
0 otherwise.
- void (*Clearerr)(pTHX_ PerlIO *f);
Clear end-of-file and error indicators. Should call PerlIOBase_clearerr()
to set the PERLIO_F_XXXXX
flags, which may suffice.
- void (*Setlinebuf)(pTHX_ PerlIO *f);
Mark the stream as line buffered. PerlIOBase_setlinebuf()
sets the
PERLIO_F_LINEBUF flag and is normally sufficient.
- STDCHAR * (*Get_base)(pTHX_ PerlIO *f);
Allocate (if not already done so) the read buffer for this layer and return pointer to it. Return NULL on failure.
- Size_t (*Get_bufsiz)(pTHX_ PerlIO *f);
Return the number of bytes that last Fill()
put in the buffer.
- STDCHAR * (*Get_ptr)(pTHX_ PerlIO *f);
Return the current read pointer relative to this layer's buffer.
- SSize_t (*Get_cnt)(pTHX_ PerlIO *f);
Return the number of bytes left to be read in the current buffer.
- void (*Set_ptrcnt)(pTHX_ PerlIO *f,
- STDCHAR *ptr, SSize_t cnt);
Adjust the read pointer and count of bytes to match ptr
and/or cnt
.
The application (or layer above) must ensure they are consistent.
(Checking is allowed by the paranoid.)
To ask for the next layer down use PerlIONext(PerlIO *f).
To check that a PerlIO* is valid use PerlIOValid(PerlIO *f). (All this does is really just to check that the pointer is non-NULL and that the pointer behind that is non-NULL.)
PerlIOBase(PerlIO *f) returns the "Base" pointer, or in other words,
the PerlIOl*
pointer.
PerlIOSelf(PerlIO* f, type) return the PerlIOBase cast to a type.
Perl_PerlIO_or_Base(PerlIO* f, callback, base, failure, args) either calls the callback from the functions of the layer f (just by the name of the IO function, like "Read") with the args, or if there is no such callback, calls the base version of the callback with the same args, or if the f is invalid, set errno to EBADF and return failure.
Perl_PerlIO_or_fail(PerlIO* f, callback, failure, args) either calls the callback of the functions of the layer f with the args, or if there is no such callback, set errno to EINVAL. Or if the f is invalid, set errno to EBADF and return failure.
Perl_PerlIO_or_Base_void(PerlIO* f, callback, base, args) either calls the callback of the functions of the layer f with the args, or if there is no such callback, calls the base version of the callback with the same args, or if the f is invalid, set errno to EBADF.
Perl_PerlIO_or_fail_void(PerlIO* f, callback, args) either calls the callback of the functions of the layer f with the args, or if there is no such callback, set errno to EINVAL. Or if the f is invalid, set errno to EBADF.
If you find the implementation document unclear or not sufficient, look at the existing PerlIO layer implementations, which include:
The perlio.c and perliol.h in the Perl core implement the "unix", "perlio", "stdio", "crlf", "utf8", "byte", "raw", "pending" layers, and also the "mmap" and "win32" layers if applicable. (The "win32" is currently unfinished and unused, to see what is used instead in Win32, see Querying the layers of filehandles in PerlIO .)
PerlIO::encoding, PerlIO::scalar, PerlIO::via in the Perl core.
PerlIO::gzip and APR::PerlIO (mod_perl 2.0) on CPAN.
PerlIO::via::QuotedPrint in the Perl core and PerlIO::via::* on CPAN.
If you are creating a PerlIO layer, you may want to be lazy, in other words, implement only the methods that interest you. The other methods you can either replace with the "blank" methods
- PerlIOBase_noop_ok
- PerlIOBase_noop_fail
(which do nothing, and return zero and -1, respectively) or for certain methods you may assume a default behaviour by using a NULL method. The Open method looks for help in the 'parent' layer. The following table summarizes the behaviour:
- method behaviour with NULL
- Clearerr PerlIOBase_clearerr
- Close PerlIOBase_close
- Dup PerlIOBase_dup
- Eof PerlIOBase_eof
- Error PerlIOBase_error
- Fileno PerlIOBase_fileno
- Fill FAILURE
- Flush SUCCESS
- Getarg SUCCESS
- Get_base FAILURE
- Get_bufsiz FAILURE
- Get_cnt FAILURE
- Get_ptr FAILURE
- Open INHERITED
- Popped SUCCESS
- Pushed SUCCESS
- Read PerlIOBase_read
- Seek FAILURE
- Set_cnt FAILURE
- Set_ptrcnt FAILURE
- Setlinebuf PerlIOBase_setlinebuf
- Tell FAILURE
- Unread PerlIOBase_unread
- Write FAILURE
- FAILURE Set errno (to EINVAL in Unixish, to LIB$_INVARG in VMS) and
- return -1 (for numeric return values) or NULL (for pointers)
- INHERITED Inherited from the layer below
- SUCCESS Return 0 (for numeric return values) or a pointer
The file perlio.c
provides the following layers:
A basic non-buffered layer which calls Unix/POSIX read(), write(),
lseek()
, close(). No buffering. Even on platforms that distinguish
between O_TEXT and O_BINARY this layer is always O_BINARY.
A very complete generic buffering layer which provides the whole of
PerlIO API. It is also intended to be used as a "base class" for other
layers. (For example its Read()
method is implemented in terms of
the Get_cnt()
/Get_ptr()
/Set_ptrcnt()
methods).
"perlio" over "unix" provides a complete replacement for stdio as seen
via PerlIO API. This is the default for USE_PERLIO when system's stdio
does not permit perl's "fast gets" access, and which do not
distinguish between O_TEXT
and O_BINARY
.
A layer which provides the PerlIO API via the layer scheme, but
implements it by calling system's stdio. This is (currently) the default
if system's stdio provides sufficient access to allow perl's "fast gets"
access and which do not distinguish between O_TEXT
and O_BINARY
.
A layer derived using "perlio" as a base class. It provides Win32-like
"\n" to CR,LF translation. Can either be applied above "perlio" or serve
as the buffer layer itself. "crlf" over "unix" is the default if system
distinguishes between O_TEXT
and O_BINARY
opens. (At some point
"unix" will be replaced by a "native" Win32 IO layer on that platform,
as Win32's read/write layer has various drawbacks.) The "crlf" layer is
a reasonable model for a layer which transforms data in some way.
If Configure detects mmap()
functions this layer is provided (with
"perlio" as a "base") which does "read" operations by mmap()ing the
file. Performance improvement is marginal on modern systems, so it is
mainly there as a proof of concept. It is likely to be unbundled from
the core at some point. The "mmap" layer is a reasonable model for a
minimalist "derived" layer.
An "internal" derivative of "perlio" which can be used to provide
Unread() function for layers which have no buffer or cannot be
bothered. (Basically this layer's Fill()
pops itself off the stack
and so resumes reading from layer below.)
A dummy layer which never exists on the layer stack. Instead when
"pushed" it actually pops the stack removing itself, it then calls
Binmode function table entry on all the layers in the stack - normally
this (via PerlIOBase_binmode) removes any layers which do not have
PERLIO_K_RAW
bit set. Layers can modify that behaviour by defining
their own Binmode entry.
Another dummy layer. When pushed it pops itself and sets the
PERLIO_F_UTF8
flag on the layer which was (and now is once more)
the top of the stack.
In addition perlio.c also provides a number of PerlIOBase_xxxx()
functions which are intended to be used in the table slots of classes
which do not need to do anything special for a particular method.
Layers can be made available by extension modules. When an unknown layer is encountered the PerlIO code will perform the equivalent of :
- use PerlIO 'layer';
Where layer is the unknown layer. PerlIO.pm will then attempt to:
- require PerlIO::layer;
If after that process the layer is still not defined then the open
will fail.
The following extension layers are bundled with perl:
- use Encoding;
makes this layer available, although PerlIO.pm "knows" where to find it. It is an example of a layer which takes an argument as it is called thus:
- open( $fh, "<:encoding(iso-8859-7)", $pathname );
Provides support for reading data from and writing data to a scalar.
- open( $fh, "+<:scalar", \$scalar );
When a handle is so opened, then reads get bytes from the string value
of $scalar, and writes change the value. In both cases the position
in $scalar starts as zero but can be altered via seek, and
determined via tell.
Please note that this layer is implied when calling open() thus:
- open( $fh, "+<", \$scalar );
Provided to allow layers to be implemented as Perl code. For instance:
See PerlIO::via for details.
Things that need to be done to improve this document.
Explain how to make a valid fh without going through open()(i.e. apply a layer). For example if the file is not opened through perl, but we want to get back a fh, like it was opened by Perl.
How PerlIO_apply_layera fits in, where its docs, was it made public?
Currently the example could be something like this:
- PerlIO *foo_to_PerlIO(pTHX_ char *mode, ...)
- {
- char *mode; /* "w", "r", etc */
- const char *layers = ":APR"; /* the layer name */
- PerlIO *f = PerlIO_allocate(aTHX);
- if (!f) {
- return NULL;
- }
- PerlIO_apply_layers(aTHX_ f, mode, layers);
- if (f) {
- PerlIOAPR *st = PerlIOSelf(f, PerlIOAPR);
- /* fill in the st struct, as in _open() */
- st->file = file;
- PerlIOBase(f)->flags |= PERLIO_F_OPEN;
- return f;
- }
- return NULL;
- }
fix/add the documentation in places marked as XXX.
The handling of errors by the layer is not specified. e.g. when $! should be set explicitly, when the error handling should be just delegated to the top layer.
Probably give some hints on using SETERRNO() or pointers to where they can be found.
I think it would help to give some concrete examples to make it easier to understand the API. Of course I agree that the API has to be concise, but since there is no second document that is more of a guide, I think that it'd make it easier to start with the doc which is an API, but has examples in it in places where things are unclear, to a person who is not a PerlIO guru (yet).
perlipc - Perl interprocess communication (signals, fifos, pipes, safe subprocesses, sockets, and semaphores)
The basic IPC facilities of Perl are built out of the good old Unix signals, named pipes, pipe opens, the Berkeley socket routines, and SysV IPC calls. Each is used in slightly different situations.
Perl uses a simple signal handling model: the %SIG hash contains names or references of user-installed signal handlers. These handlers will be called with an argument which is the name of the signal that triggered it. A signal may be generated intentionally from a particular keyboard sequence like control-C or control-Z, sent to you from another process, or triggered automatically by the kernel when special events transpire, like a child process exiting, your own process running out of stack space, or hitting a process file-size limit.
For example, to trap an interrupt signal, set up a handler like this:
Prior to Perl 5.8.0 it was necessary to do as little as you possibly could in your handler; notice how all we do is set a global variable and then raise an exception. That's because on most systems, libraries are not re-entrant; particularly, memory allocation and I/O routines are not. That meant that doing nearly anything in your handler could in theory trigger a memory fault and subsequent core dump - see Deferred Signals (Safe Signals) below.
The names of the signals are the ones listed out by kill -l
on your
system, or you can retrieve them using the CPAN module IPC::Signal.
You may also choose to assign the strings "IGNORE"
or "DEFAULT"
as
the handler, in which case Perl will try to discard the signal or do the
default thing.
On most Unix platforms, the CHLD
(sometimes also known as CLD
) signal
has special behavior with respect to a value of "IGNORE"
.
Setting $SIG{CHLD}
to "IGNORE"
on such a platform has the effect of
not creating zombie processes when the parent process fails to wait()
on its child processes (i.e., child processes are automatically reaped).
Calling wait() with $SIG{CHLD}
set to "IGNORE"
usually returns
-1
on such platforms.
Some signals can be neither trapped nor ignored, such as the KILL and STOP (but not the TSTP) signals. Note that ignoring signals makes them disappear. If you only want them blocked temporarily without them getting lost you'll have to use POSIX' sigprocmask.
Sending a signal to a negative process ID means that you send the signal
to the entire Unix process group. This code sends a hang-up signal to all
processes in the current process group, and also sets $SIG{HUP} to "IGNORE"
so it doesn't kill itself:
Another interesting signal to send is signal number zero. This doesn't actually affect a child process, but instead checks whether it's alive or has changed its UIDs.
Signal number zero may fail because you lack permission to send the
signal when directed at a process whose real or saved UID is not
identical to the real or effective UID of the sending process, even
though the process is alive. You may be able to determine the cause of
failure using $!
or %!
.
You might also want to employ anonymous functions for simple signal handlers:
SIGCHLD handlers require some special care. If a second child dies while in the signal handler caused by the first death, we won't get another signal. So must loop here else we will leave the unreaped child as a zombie. And the next time two children die we get another zombie. And so on.
Be careful: qx(), system(), and some modules for calling external commands do a fork(), then wait() for the result. Thus, your signal handler will be called. Because wait() was already called by system() or qx(), the wait() in the signal handler will see no more zombies and will therefore block.
The best way to prevent this issue is to use waitpid(), as in the following example:
- use POSIX ":sys_wait_h"; # for nonblocking read
- my %children;
- $SIG{CHLD} = sub {
- # don't change $! and $? outside handler
- local ($!, $?);
- my $pid = waitpid(-1, WNOHANG);
- return if $pid == -1;
- return unless defined $children{$pid};
- delete $children{$pid};
- cleanup_child($pid, $?);
- };
- while (1) {
- my $pid = fork();
- die "cannot fork" unless defined $pid;
- if ($pid == 0) {
- # ...
- exit 0;
- } else {
- $children{$pid}=1;
- # ...
- system($command);
- # ...
- }
- }
Signal handling is also used for timeouts in Unix. While safely
protected within an eval{} block, you set a signal handler to trap
alarm signals and then schedule to have one delivered to you in some
number of seconds. Then try your blocking operation, clearing the alarm
when it's done but not before you've exited your eval{} block. If it
goes off, you'll use die() to jump out of the block.
Here's an example:
If the operation being timed out is system() or qx(), this technique is liable to generate zombies. If this matters to you, you'll need to do your own fork() and exec(), and kill the errant child process.
For more complex signal handling, you might see the standard POSIX module. Lamentably, this is almost entirely undocumented, but the t/lib/posix.t file from the Perl source distribution has some examples in it.
A process that usually starts when the system boots and shuts down
when the system is shut down is called a daemon (Disk And Execution
MONitor). If a daemon process has a configuration file which is
modified after the process has been started, there should be a way to
tell that process to reread its configuration file without stopping
the process. Many daemons provide this mechanism using a SIGHUP
signal handler. When you want to tell the daemon to reread the file,
simply send it the SIGHUP
signal.
The following example implements a simple daemon, which restarts
itself every time the SIGHUP
signal is received. The actual code is
located in the subroutine code()
, which just prints some debugging
info to show that it works; it should be replaced with the real code.
- #!/usr/bin/perl -w
- use POSIX ();
- use FindBin ();
- use File::Basename ();
- use File::Spec::Functions;
- $| = 1;
- # make the daemon cross-platform, so exec always calls the script
- # itself with the right path, no matter how the script was invoked.
- my $script = File::Basename::basename($0);
- my $SELF = catfile($FindBin::Bin, $script);
- # POSIX unmasks the sigprocmask properly
- $SIG{HUP} = sub {
- print "got SIGHUP\n";
- exec($SELF, @ARGV) || die "$0: couldn't restart: $!";
- };
- code();
- sub code {
- print "PID: $$\n";
- print "ARGV: @ARGV\n";
- my $count = 0;
- while (++$count) {
- sleep 2;
- print "$count\n";
- }
- }
Before Perl 5.8.0, installing Perl code to deal with signals exposed you to danger from two things. First, few system library functions are re-entrant. If the signal interrupts while Perl is executing one function (like malloc(3) or printf(3)), and your signal handler then calls the same function again, you could get unpredictable behavior--often, a core dump. Second, Perl isn't itself re-entrant at the lowest levels. If the signal interrupts Perl while Perl is changing its own internal data structures, similarly unpredictable behavior may result.
There were two things you could do, knowing this: be paranoid or be
pragmatic. The paranoid approach was to do as little as possible in your
signal handler. Set an existing integer variable that already has a
value, and return. This doesn't help you if you're in a slow system call,
which will just restart. That means you have to die to longjmp(3) out
of the handler. Even this is a little cavalier for the true paranoiac,
who avoids die in a handler because the system is out to get you.
The pragmatic approach was to say "I know the risks, but prefer the
convenience", and to do anything you wanted in your signal handler,
and be prepared to clean up core dumps now and again.
Perl 5.8.0 and later avoid these problems by "deferring" signals. That is, when the signal is delivered to the process by the system (to the C code that implements Perl) a flag is set, and the handler returns immediately. Then at strategic "safe" points in the Perl interpreter (e.g. when it is about to execute a new opcode) the flags are checked and the Perl level handler from %SIG is executed. The "deferred" scheme allows much more flexibility in the coding of signal handlers as we know the Perl interpreter is in a safe state, and that we are not in a system library function when the handler is called. However the implementation does differ from previous Perls in the following ways:
As the Perl interpreter looks at signal flags only when it is about to execute a new opcode, a signal that arrives during a long-running opcode (e.g. a regular expression operation on a very large string) will not be seen until the current opcode completes.
If a signal of any given type fires multiple times during an opcode
(such as from a fine-grained timer), the handler for that signal will
be called only once, after the opcode completes; all other
instances will be discarded. Furthermore, if your system's signal queue
gets flooded to the point that there are signals that have been raised
but not yet caught (and thus not deferred) at the time an opcode
completes, those signals may well be caught and deferred during
subsequent opcodes, with sometimes surprising results. For example, you
may see alarms delivered even after calling alarm(0) as the latter
stops the raising of alarms but does not cancel the delivery of alarms
raised but not yet caught. Do not depend on the behaviors described in
this paragraph as they are side effects of the current implementation and
may change in future versions of Perl.
When a signal is delivered (e.g., SIGINT from a control-C) the operating
system breaks into IO operations like read(2), which is used to
implement Perl's readline() function, the <>
operator. On older
Perls the handler was called immediately (and as read is not "unsafe",
this worked well). With the "deferred" scheme the handler is not called
immediately, and if Perl is using the system's stdio
library that
library may restart the read without returning to Perl to give it a
chance to call the %SIG handler. If this happens on your system the
solution is to use the :perlio
layer to do IO--at least on those handles
that you want to be able to break into with signals. (The :perlio
layer
checks the signal flags and calls %SIG handlers before resuming IO
operation.)
The default in Perl 5.8.0 and later is to automatically use
the :perlio
layer.
Note that it is not advisable to access a file handle within a signal handler where that signal has interrupted an I/O operation on that same handle. While perl will at least try hard not to crash, there are no guarantees of data integrity; for example, some data might get dropped or written twice.
Some networking library functions like gethostbyname() are known to have their own implementations of timeouts which may conflict with your timeouts. If you have problems with such functions, try using the POSIX sigaction() function, which bypasses Perl safe signals. Be warned that this does subject you to possible memory corruption, as described above.
Instead of setting $SIG{ALRM}
:
try something like the following:
Another way to disable the safe signal behavior locally is to use
the Perl::Unsafe::Signals
module from CPAN, which affects
all signals.
On systems that supported it, older versions of Perl used the
SA_RESTART flag when installing %SIG handlers. This meant that
restartable system calls would continue rather than returning when
a signal arrived. In order to deliver deferred signals promptly,
Perl 5.8.0 and later do not use SA_RESTART. Consequently,
restartable system calls can fail (with $! set to EINTR
) in places
where they previously would have succeeded.
The default :perlio
layer retries read, write
and close as described above; interrupted wait and
waitpid calls will always be retried.
Certain signals like SEGV, ILL, and BUS are generated by virtual memory addressing errors and similar "faults". These are normally fatal: there is little a Perl-level handler can do with them. So Perl delivers them immediately rather than attempting to defer them.
On some operating systems certain signal handlers are supposed to "do
something" before returning. One example can be CHLD or CLD, which
indicates a child process has completed. On some operating systems the
signal handler is expected to wait for the completed child
process. On such systems the deferred signal scheme will not work for
those signals: it does not do the wait. Again the failure will
look like a loop as the operating system will reissue the signal because
there are completed child processes that have not yet been waited for.
If you want the old signal behavior back despite possible
memory corruption, set the environment variable PERL_SIGNALS
to
"unsafe"
. This feature first appeared in Perl 5.8.1.
A named pipe (often referred to as a FIFO) is an old Unix IPC mechanism for processes communicating on the same machine. It works just like regular anonymous pipes, except that the processes rendezvous using a filename and need not be related.
To create a named pipe, use the POSIX::mkfifo()
function.
You can also use the Unix command mknod(1), or on some systems, mkfifo(1). These may not be in your normal path, though.
A fifo is convenient when you want to connect a process to an unrelated one. When you open a fifo, the program will block until there's something on the other end.
For example, let's say you'd like to have your .signature file be a named pipe that has a Perl program on the other end. Now every time any program (like a mailer, news reader, finger program, etc.) tries to read from that file, the reading program will read the new signature from your program. We'll use the pipe-checking file-test operator, -p, to find out whether anyone (or anything) has accidentally removed our fifo.
- chdir(); # go home
- my $FIFO = ".signature";
- while (1) {
- unless (-p $FIFO) {
- unlink $FIFO; # discard any failure, will catch later
- require POSIX; # delayed loading of heavy module
- POSIX::mkfifo($FIFO, 0700)
- || die "can't mkfifo $FIFO: $!";
- }
- # next line blocks till there's a reader
- open (FIFO, "> $FIFO") || die "can't open $FIFO: $!";
- print FIFO "John Smith (smith\@host.org)\n", `fortune -s`;
- close(FIFO) || die "can't close $FIFO: $!";
- sleep 2; # to avoid dup signals
- }
Perl's basic open() statement can also be used for unidirectional interprocess communication by either appending or prepending a pipe symbol to the second argument to open(). Here's how to start something up in a child process you intend to write to:
And here's how to start up a child process you intend to read from:
If one can be sure that a particular program is a Perl script expecting filenames in @ARGV, the clever programmer can write something like this:
- % program f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile
and no matter which sort of shell it's called from, the Perl program will read from the file f1, the process cmd1, standard input (tmpfile in this case), the f2 file, the cmd2 command, and finally the f3 file. Pretty nifty, eh?
You might notice that you could use backticks for much the same effect as opening a pipe for reading:
While this is true on the surface, it's much more efficient to process the file one line or record at a time because then you don't have to read the whole thing into memory at once. It also gives you finer control of the whole process, letting you kill off the child process early if you'd like.
Be careful to check the return values from both open() and close(). If you're writing to a pipe, you should also trap SIGPIPE. Otherwise, think of what happens when you start up a pipe to a command that doesn't exist: the open() will in all likelihood succeed (it only reflects the fork()'s success), but then your output will fail--spectacularly. Perl can't know whether the command worked, because your command is actually running in a separate process whose exec() might have failed. Therefore, while readers of bogus commands return just a quick EOF, writers to bogus commands will get hit with a signal, which they'd best be prepared to handle. Consider:
The reason for not checking the return value from print() is because of pipe buffering; physical writes are delayed. That won't blow up until the close, and it will blow up with a SIGPIPE. To catch it, you could use this:
Both the main process and any child processes it forks share the same STDIN, STDOUT, and STDERR filehandles. If both processes try to access them at once, strange things can happen. You may also want to close or reopen the filehandles for the child. You can get around this by opening your pipe with open(), but on some systems this means that the child process cannot outlive the parent.
You can run a command in the background with:
- system("cmd &");
The command's STDOUT and STDERR (and possibly STDIN, depending on your shell) will be the same as the parent's. You won't need to catch SIGCHLD because of the double-fork taking place; see below for details.
In some cases (starting server processes, for instance) you'll want to completely dissociate the child process from the parent. This is often called daemonization. A well-behaved daemon will also chdir() to the root directory so it doesn't prevent unmounting the filesystem containing the directory from which it was launched, and redirect its standard file descriptors from and to /dev/null so that random output doesn't wind up on the user's terminal.
- use POSIX "setsid";
- sub daemonize {
- chdir("/") || die "can't chdir to /: $!";
- open(STDIN, "< /dev/null") || die "can't read /dev/null: $!";
- open(STDOUT, "> /dev/null") || die "can't write to /dev/null: $!";
- defined(my $pid = fork()) || die "can't fork: $!";
- exit if $pid; # non-zero now means I am the parent
- (setsid() != -1) || die "Can't start a new session: $!";
- open(STDERR, ">&STDOUT") || die "can't dup stdout: $!";
- }
The fork() has to come before the setsid() to ensure you aren't a
process group leader; the setsid() will fail if you are. If your
system doesn't have the setsid() function, open /dev/tty and use the
TIOCNOTTY
ioctl() on it instead. See tty(4) for details.
Non-Unix users should check their Your_OS::Process module for
other possible solutions.
Another interesting approach to IPC is making your single program go
multiprocess and communicate between--or even amongst--yourselves. The
open() function will accept a file argument of either "-|"
or "|-"
to do a very interesting thing: it forks a child connected to the
filehandle you've opened. The child is running the same program as the
parent. This is useful for safely opening a file when running under an
assumed UID or GID, for example. If you open a pipe to minus, you can
write to the filehandle you opened and your kid will find it in his
STDIN. If you open a pipe from minus, you can read from the filehandle
you opened whatever your kid writes to his STDOUT.
- use English qw[ -no_match_vars ];
- my $PRECIOUS = "/path/to/some/safe/file";
- my $sleep_count;
- my $pid;
- do {
- $pid = open(KID_TO_WRITE, "|-");
- unless (defined $pid) {
- warn "cannot fork: $!";
- die "bailing out" if $sleep_count++ > 6;
- sleep 10;
- }
- } until defined $pid;
- if ($pid) { # I am the parent
- print KID_TO_WRITE @some_data;
- close(KID_TO_WRITE) || warn "kid exited $?";
- } else { # I am the child
- # drop permissions in setuid and/or setgid programs:
- ($EUID, $EGID) = ($UID, $GID);
- open (OUTFILE, "> $PRECIOUS")
- || die "can't open $PRECIOUS: $!";
- while (<STDIN>) {
- print OUTFILE; # child's STDIN is parent's KID_TO_WRITE
- }
- close(OUTFILE) || die "can't close $PRECIOUS: $!";
- exit(0); # don't forget this!!
- }
Another common use for this construct is when you need to execute something without the shell's interference. With system(), it's straightforward, but you can't use a pipe open or backticks safely. That's because there's no way to stop the shell from getting its hands on your arguments. Instead, use lower-level control to call exec() directly.
Here's a safe backtick or pipe open for read:
- my $pid = open(KID_TO_READ, "-|");
- defined($pid) || die "can't fork: $!";
- if ($pid) { # parent
- while (<KID_TO_READ>) {
- # do something interesting
- }
- close(KID_TO_READ) || warn "kid exited $?";
- } else { # child
- ($EUID, $EGID) = ($UID, $GID); # suid only
- exec($program, @options, @args)
- || die "can't exec program: $!";
- # NOTREACHED
- }
And here's a safe pipe open for writing:
- my $pid = open(KID_TO_WRITE, "|-");
- defined($pid) || die "can't fork: $!";
- $SIG{PIPE} = sub { die "whoops, $program pipe broke" };
- if ($pid) { # parent
- print KID_TO_WRITE @data;
- close(KID_TO_WRITE) || warn "kid exited $?";
- } else { # child
- ($EUID, $EGID) = ($UID, $GID);
- exec($program, @options, @args)
- || die "can't exec program: $!";
- # NOTREACHED
- }
It is very easy to dead-lock a process using this form of open(), or indeed with any use of pipe() with multiple subprocesses. The example above is "safe" because it is simple and calls exec(). See Avoiding Pipe Deadlocks for general safety principles, but there are extra gotchas with Safe Pipe Opens.
In particular, if you opened the pipe using open FH, "|-"
, then you
cannot simply use close() in the parent process to close an unwanted
writer. Consider this code:
- my $pid = open(WRITER, "|-"); # fork open a kid
- defined($pid) || die "first fork failed: $!";
- if ($pid) {
- if (my $sub_pid = fork()) {
- defined($sub_pid) || die "second fork failed: $!";
- close(WRITER) || die "couldn't close WRITER: $!";
- # now do something else...
- }
- else {
- # first write to WRITER
- # ...
- # then when finished
- close(WRITER) || die "couldn't close WRITER: $!";
- exit(0);
- }
- }
- else {
- # first do something with STDIN, then
- exit(0);
- }
In the example above, the true parent does not want to write to the WRITER
filehandle, so it closes it. However, because WRITER was opened using
open FH, "|-"
, it has a special behavior: closing it calls
waitpid() (see waitpid), which waits for the subprocess
to exit. If the child process ends up waiting for something happening
in the section marked "do something else", you have deadlock.
This can also be a problem with intermediate subprocesses in more complicated code, which will call waitpid() on all open filehandles during global destruction--in no predictable order.
To solve this, you must manually use pipe(), fork(), and the form of open() which sets one file descriptor to another, as shown below:
- pipe(READER, WRITER) || die "pipe failed: $!";
- $pid = fork();
- defined($pid) || die "first fork failed: $!";
- if ($pid) {
- close READER;
- if (my $sub_pid = fork()) {
- defined($sub_pid) || die "first fork failed: $!";
- close(WRITER) || die "can't close WRITER: $!";
- }
- else {
- # write to WRITER...
- # ...
- # then when finished
- close(WRITER) || die "can't close WRITER: $!";
- exit(0);
- }
- # write to WRITER...
- }
- else {
- open(STDIN, "<&READER") || die "can't reopen STDIN: $!";
- close(WRITER) || die "can't close WRITER: $!";
- # do something...
- exit(0);
- }
Since Perl 5.8.0, you can also use the list form of open for pipes.
This is preferred when you wish to avoid having the shell interpret
metacharacters that may be in your command string.
So for example, instead of using:
One would use either of these:
Because there are more than three arguments to open(), forks the ps(1)
command without spawning a shell, and reads its standard output via the
PS_PIPE
filehandle. The corresponding syntax to write to command
pipes is to use "|-"
in place of "-|"
.
This was admittedly a rather silly example, because you're using string literals whose content is perfectly safe. There is therefore no cause to resort to the harder-to-read, multi-argument form of pipe open(). However, whenever you cannot be assured that the program arguments are free of shell metacharacters, the fancier form of open() should be used. For example:
Here the multi-argument form of pipe open() is preferred because the pattern and indeed even the filenames themselves might hold metacharacters.
Be aware that these operations are full Unix forks, which means they may not be correctly implemented on all alien systems. Additionally, these are not true multithreading. To learn more about threading, see the modules file mentioned below in the SEE ALSO section.
Whenever you have more than one subprocess, you must be careful that each closes whichever half of any pipes created for interprocess communication it is not using. This is because any child process reading from the pipe and expecting an EOF will never receive it, and therefore never exit. A single process closing a pipe is not enough to close it; the last process with the pipe open must close it for it to read EOF.
Certain built-in Unix features help prevent this most of the time. For
instance, filehandles have a "close on exec" flag, which is set en masse
under control of the $^F
variable. This is so any filehandles you
didn't explicitly route to the STDIN, STDOUT or STDERR of a child
program will be automatically closed.
Always explicitly and immediately call close() on the writable end of any pipe, unless that process is actually writing to it. Even if you don't explicitly call close(), Perl will still close() all filehandles during global destruction. As previously discussed, if those filehandles have been opened with Safe Pipe Open, this will result in calling waitpid(), which may again deadlock.
While this works reasonably well for unidirectional communication, what about bidirectional communication? The most obvious approach doesn't work:
- # THIS DOES NOT WORK!!
- open(PROG_FOR_READING_AND_WRITING, "| some program |")
If you forget to use warnings
, you'll miss out entirely on the
helpful diagnostic message:
If you really want to, you can use the standard open2() from the
IPC::Open2
module to catch both ends. There's also an open3() in
IPC::Open3
for tridirectional I/O so you can also catch your child's
STDERR, but doing so would then require an awkward select() loop and
wouldn't allow you to use normal Perl input operations.
If you look at its source, you'll see that open2() uses low-level primitives like the pipe() and exec() syscalls to create all the connections. Although it might have been more efficient by using socketpair(), this would have been even less portable than it already is. The open2() and open3() functions are unlikely to work anywhere except on a Unix system, or at least one purporting POSIX compliance.
Here's an example of using open2():
The problem with this is that buffering is really going to ruin your
day. Even though your Writer
filehandle is auto-flushed so the process
on the other end gets your data in a timely manner, you can't usually do
anything to force that process to give its data to you in a similarly quick
fashion. In this special case, we could actually so, because we gave
cat a -u flag to make it unbuffered. But very few commands are
designed to operate over pipes, so this seldom works unless you yourself
wrote the program on the other end of the double-ended pipe.
A solution to this is to use a library which uses pseudottys to make your
program behave more reasonably. This way you don't have to have control
over the source code of the program you're using. The Expect
module
from CPAN also addresses this kind of thing. This module requires two
other modules from CPAN, IO::Pty
and IO::Stty
. It sets up a pseudo
terminal to interact with programs that insist on talking to the terminal
device driver. If your system is supported, this may be your best bet.
If you want, you may make low-level pipe() and fork() syscalls to stitch this together by hand. This example only talks to itself, but you could reopen the appropriate handles to STDIN and STDOUT and call other processes. (The following example lacks proper error checking.)
- #!/usr/bin/perl -w
- # pipe1 - bidirectional communication using two pipe pairs
- # designed for the socketpair-challenged
- use IO::Handle; # thousands of lines just for autoflush :-(
- pipe(PARENT_RDR, CHILD_WTR); # XXX: check failure?
- pipe(CHILD_RDR, PARENT_WTR); # XXX: check failure?
- CHILD_WTR->autoflush(1);
- PARENT_WTR->autoflush(1);
- if ($pid = fork()) {
- close PARENT_RDR;
- close PARENT_WTR;
- print CHILD_WTR "Parent Pid $$ is sending this\n";
- chomp($line = <CHILD_RDR>);
- print "Parent Pid $$ just read this: '$line'\n";
- close CHILD_RDR; close CHILD_WTR;
- waitpid($pid, 0);
- } else {
- die "cannot fork: $!" unless defined $pid;
- close CHILD_RDR;
- close CHILD_WTR;
- chomp($line = <PARENT_RDR>);
- print "Child Pid $$ just read this: '$line'\n";
- print PARENT_WTR "Child Pid $$ is sending this\n";
- close PARENT_RDR;
- close PARENT_WTR;
- exit(0);
- }
But you don't actually have to make two pipe calls. If you have the socketpair() system call, it will do this all for you.
- #!/usr/bin/perl -w
- # pipe2 - bidirectional communication using socketpair
- # "the best ones always go both ways"
- use Socket;
- use IO::Handle; # thousands of lines just for autoflush :-(
- # We say AF_UNIX because although *_LOCAL is the
- # POSIX 1003.1g form of the constant, many machines
- # still don't have it.
- socketpair(CHILD, PARENT, AF_UNIX, SOCK_STREAM, PF_UNSPEC)
- || die "socketpair: $!";
- CHILD->autoflush(1);
- PARENT->autoflush(1);
- if ($pid = fork()) {
- close PARENT;
- print CHILD "Parent Pid $$ is sending this\n";
- chomp($line = <CHILD>);
- print "Parent Pid $$ just read this: '$line'\n";
- close CHILD;
- waitpid($pid, 0);
- } else {
- die "cannot fork: $!" unless defined $pid;
- close CHILD;
- chomp($line = <PARENT>);
- print "Child Pid $$ just read this: '$line'\n";
- print PARENT "Child Pid $$ is sending this\n";
- close PARENT;
- exit(0);
- }
While not entirely limited to Unix-derived operating systems (e.g., WinSock on PCs provides socket support, as do some VMS libraries), you might not have sockets on your system, in which case this section probably isn't going to do you much good. With sockets, you can do both virtual circuits like TCP streams and datagrams like UDP packets. You may be able to do even more depending on your system.
The Perl functions for dealing with sockets have the same names as the corresponding system calls in C, but their arguments tend to differ for two reasons. First, Perl filehandles work differently than C file descriptors. Second, Perl already knows the length of its strings, so you don't need to pass that information.
One of the major problems with ancient, antemillennial socket code in Perl
was that it used hard-coded values for some of the constants, which
severely hurt portability. If you ever see code that does anything like
explicitly setting $AF_INET = 2
, you know you're in for big trouble.
An immeasurably superior approach is to use the Socket
module, which more
reliably grants access to the various constants and functions you'll need.
If you're not writing a server/client for an existing protocol like NNTP or SMTP, you should give some thought to how your server will know when the client has finished talking, and vice-versa. Most protocols are based on one-line messages and responses (so one party knows the other has finished when a "\n" is received) or multi-line messages and responses that end with a period on an empty line ("\n.\n" terminates a message/response).
The Internet line terminator is "\015\012". Under ASCII variants of Unix, that could usually be written as "\r\n", but under other systems, "\r\n" might at times be "\015\015\012", "\012\012\015", or something completely different. The standards specify writing "\015\012" to be conformant (be strict in what you provide), but they also recommend accepting a lone "\012" on input (be lenient in what you require). We haven't always been very good about that in the code in this manpage, but unless you're on a Mac from way back in its pre-Unix dark ages, you'll probably be ok.
Use Internet-domain sockets when you want to do client-server communication that might extend to machines outside of your own system.
Here's a sample TCP client using Internet-domain sockets:
- #!/usr/bin/perl -w
- use strict;
- use Socket;
- my ($remote, $port, $iaddr, $paddr, $proto, $line);
- $remote = shift || "localhost";
- $port = shift || 2345; # random port
- if ($port =~ /\D/) { $port = getservbyname($port, "tcp") }
- die "No port" unless $port;
- $iaddr = inet_aton($remote) || die "no host: $remote";
- $paddr = sockaddr_in($port, $iaddr);
- $proto = getprotobyname("tcp");
- socket(SOCK, PF_INET, SOCK_STREAM, $proto) || die "socket: $!";
- connect(SOCK, $paddr) || die "connect: $!";
- while ($line = <SOCK>) {
- print $line;
- }
- close (SOCK) || die "close: $!";
- exit(0);
And here's a corresponding server to go along with it. We'll
leave the address as INADDR_ANY
so that the kernel can choose
the appropriate interface on multihomed hosts. If you want sit
on a particular interface (like the external side of a gateway
or firewall machine), fill this in with your real address instead.
- #!/usr/bin/perl -Tw
- use strict;
- BEGIN { $ENV{PATH} = "/usr/bin:/bin" }
- use Socket;
- use Carp;
- my $EOL = "\015\012";
- sub logmsg { print "$0 $$: @_ at ", scalar localtime(), "\n" }
- my $port = shift || 2345;
- die "invalid port" unless if $port =~ /^ \d+ $/x;
- my $proto = getprotobyname("tcp");
- socket(Server, PF_INET, SOCK_STREAM, $proto) || die "socket: $!";
- setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, pack("l", 1))
- || die "setsockopt: $!";
- bind(Server, sockaddr_in($port, INADDR_ANY)) || die "bind: $!";
- listen(Server, SOMAXCONN) || die "listen: $!";
- logmsg "server started on port $port";
- my $paddr;
- $SIG{CHLD} = \&REAPER;
- for ( ; $paddr = accept(Client, Server); close Client) {
- my($port, $iaddr) = sockaddr_in($paddr);
- my $name = gethostbyaddr($iaddr, AF_INET);
- logmsg "connection from $name [",
- inet_ntoa($iaddr), "]
- at port $port";
- print Client "Hello there, $name, it's now ",
- scalar localtime(), $EOL;
- }
And here's a multithreaded version. It's multithreaded in that like most typical servers, it spawns (fork()s) a slave server to handle the client request so that the master server can quickly go back to service a new client.
- #!/usr/bin/perl -Tw
- use strict;
- BEGIN { $ENV{PATH} = "/usr/bin:/bin" }
- use Socket;
- use Carp;
- my $EOL = "\015\012";
- sub spawn; # forward declaration
- sub logmsg { print "$0 $$: @_ at ", scalar localtime(), "\n" }
- my $port = shift || 2345;
- die "invalid port" unless if $port =~ /^ \d+ $/x;
- my $proto = getprotobyname("tcp");
- socket(Server, PF_INET, SOCK_STREAM, $proto) || die "socket: $!";
- setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, pack("l", 1))
- || die "setsockopt: $!";
- bind(Server, sockaddr_in($port, INADDR_ANY)) || die "bind: $!";
- listen(Server, SOMAXCONN) || die "listen: $!";
- logmsg "server started on port $port";
- my $waitedpid = 0;
- my $paddr;
- use POSIX ":sys_wait_h";
- use Errno;
- sub REAPER {
- local $!; # don't let waitpid() overwrite current error
- while ((my $pid = waitpid(-1, WNOHANG)) > 0 && WIFEXITED($?)) {
- logmsg "reaped $waitedpid" . ($? ? " with exit $?" : "");
- }
- $SIG{CHLD} = \&REAPER; # loathe SysV
- }
- $SIG{CHLD} = \&REAPER;
- while (1) {
- $paddr = accept(Client, Server) || do {
- # try again if accept() returned because got a signal
- next if $!{EINTR};
- die "accept: $!";
- };
- my ($port, $iaddr) = sockaddr_in($paddr);
- my $name = gethostbyaddr($iaddr, AF_INET);
- logmsg "connection from $name [",
- inet_ntoa($iaddr),
- "] at port $port";
- spawn sub {
- $| = 1;
- print "Hello there, $name, it's now ", scalar localtime(), $EOL;
- exec "/usr/games/fortune" # XXX: "wrong" line terminators
- or confess "can't exec fortune: $!";
- };
- close Client;
- }
- sub spawn {
- my $coderef = shift;
- unless (@_ == 0 && $coderef && ref($coderef) eq "CODE") {
- confess "usage: spawn CODEREF";
- }
- my $pid;
- unless (defined($pid = fork())) {
- logmsg "cannot fork: $!";
- return;
- }
- elsif ($pid) {
- logmsg "begat $pid";
- return; # I'm the parent
- }
- # else I'm the child -- go spawn
- open(STDIN, "<&Client") || die "can't dup client to stdin";
- open(STDOUT, ">&Client") || die "can't dup client to stdout";
- ## open(STDERR, ">&STDOUT") || die "can't dup stdout to stderr";
- exit($coderef->());
- }
This server takes the trouble to clone off a child version via fork() for each incoming request. That way it can handle many requests at once, which you might not always want. Even if you don't fork(), the listen() will allow that many pending connections. Forking servers have to be particularly careful about cleaning up their dead children (called "zombies" in Unix parlance), because otherwise you'll quickly fill up your process table. The REAPER subroutine is used here to call waitpid() for any child processes that have finished, thereby ensuring that they terminate cleanly and don't join the ranks of the living dead.
Within the while loop we call accept() and check to see if it returns a false value. This would normally indicate a system error needs to be reported. However, the introduction of safe signals (see Deferred Signals (Safe Signals) above) in Perl 5.8.0 means that accept() might also be interrupted when the process receives a signal. This typically happens when one of the forked subprocesses exits and notifies the parent process with a CHLD signal.
If accept() is interrupted by a signal, $! will be set to EINTR. If this happens, we can safely continue to the next iteration of the loop and another call to accept(). It is important that your signal handling code not modify the value of $!, or else this test will likely fail. In the REAPER subroutine we create a local version of $! before calling waitpid(). When waitpid() sets $! to ECHILD as it inevitably does when it has no more children waiting, it updates the local copy and leaves the original unchanged.
You should use the -T flag to enable taint checking (see perlsec) even if we aren't running setuid or setgid. This is always a good idea for servers or any program run on behalf of someone else (like CGI scripts), because it lessens the chances that people from the outside will be able to compromise your system.
Let's look at another TCP client. This one connects to the TCP "time" service on a number of different machines and shows how far their clocks differ from the system on which it's being run:
- #!/usr/bin/perl -w
- use strict;
- use Socket;
- my $SECS_OF_70_YEARS = 2208988800;
- sub ctime { scalar localtime(shift() || time()) }
- my $iaddr = gethostbyname("localhost");
- my $proto = getprotobyname("tcp");
- my $port = getservbyname("time", "tcp");
- my $paddr = sockaddr_in(0, $iaddr);
- my($host);
- $| = 1;
- printf "%-24s %8s %s\n", "localhost", 0, ctime();
- foreach $host (@ARGV) {
- printf "%-24s ", $host;
- my $hisiaddr = inet_aton($host) || die "unknown host";
- my $hispaddr = sockaddr_in($port, $hisiaddr);
- socket(SOCKET, PF_INET, SOCK_STREAM, $proto)
- || die "socket: $!";
- connect(SOCKET, $hispaddr) || die "connect: $!";
- my $rtime = pack("C4", ());
- read(SOCKET, $rtime, 4);
- close(SOCKET);
- my $histime = unpack("N", $rtime) - $SECS_OF_70_YEARS;
- printf "%8d %s\n", $histime - time(), ctime($histime);
- }
That's fine for Internet-domain clients and servers, but what about local communications? While you can use the same setup, sometimes you don't want to. Unix-domain sockets are local to the current host, and are often used internally to implement pipes. Unlike Internet domain sockets, Unix domain sockets can show up in the file system with an ls(1) listing.
- % ls -l /dev/log
- srw-rw-rw- 1 root 0 Oct 31 07:23 /dev/log
You can test for these with Perl's -S file test:
- unless (-S "/dev/log") {
- die "something's wicked with the log system";
- }
Here's a sample Unix-domain client:
And here's a corresponding server. You don't have to worry about silly network terminators here because Unix domain sockets are guaranteed to be on the localhost, and thus everything works right.
- #!/usr/bin/perl -Tw
- use strict;
- use Socket;
- use Carp;
- BEGIN { $ENV{PATH} = "/usr/bin:/bin" }
- sub spawn; # forward declaration
- sub logmsg { print "$0 $$: @_ at ", scalar localtime(), "\n" }
- my $NAME = "catsock";
- my $uaddr = sockaddr_un($NAME);
- my $proto = getprotobyname("tcp");
- socket(Server, PF_UNIX, SOCK_STREAM, 0) || die "socket: $!";
- unlink($NAME);
- bind (Server, $uaddr) || die "bind: $!";
- listen(Server, SOMAXCONN) || die "listen: $!";
- logmsg "server started on $NAME";
- my $waitedpid;
- use POSIX ":sys_wait_h";
- sub REAPER {
- my $child;
- while (($waitedpid = waitpid(-1, WNOHANG)) > 0) {
- logmsg "reaped $waitedpid" . ($? ? " with exit $?" : "");
- }
- $SIG{CHLD} = \&REAPER; # loathe SysV
- }
- $SIG{CHLD} = \&REAPER;
- for ( $waitedpid = 0;
- accept(Client, Server) || $waitedpid;
- $waitedpid = 0, close Client)
- {
- next if $waitedpid;
- logmsg "connection on $NAME";
- spawn sub {
- print "Hello there, it's now ", scalar localtime(), "\n";
- exec("/usr/games/fortune") || die "can't exec fortune: $!";
- };
- }
- sub spawn {
- my $coderef = shift();
- unless (@_ == 0 && $coderef && ref($coderef) eq "CODE") {
- confess "usage: spawn CODEREF";
- }
- my $pid;
- unless (defined($pid = fork())) {
- logmsg "cannot fork: $!";
- return;
- }
- elsif ($pid) {
- logmsg "begat $pid";
- return; # I'm the parent
- }
- else {
- # I'm the child -- go spawn
- }
- open(STDIN, "<&Client") || die "can't dup client to stdin";
- open(STDOUT, ">&Client") || die "can't dup client to stdout";
- ## open(STDERR, ">&STDOUT") || die "can't dup stdout to stderr";
- exit($coderef->());
- }
As you see, it's remarkably similar to the Internet domain TCP server, so much so, in fact, that we've omitted several duplicate functions--spawn(), logmsg(), ctime(), and REAPER()--which are the same as in the other server.
So why would you ever want to use a Unix domain socket instead of a simpler named pipe? Because a named pipe doesn't give you sessions. You can't tell one process's data from another's. With socket programming, you get a separate session for each client; that's why accept() takes two arguments.
For example, let's say that you have a long-running database server daemon that you want folks to be able to access from the Web, but only if they go through a CGI interface. You'd have a small, simple CGI program that does whatever checks and logging you feel like, and then acts as a Unix-domain client and connects to your private server.
For those preferring a higher-level interface to socket programming, the IO::Socket module provides an object-oriented approach. If for some reason you lack this module, you can just fetch IO::Socket from CPAN, where you'll also find modules providing easy interfaces to the following systems: DNS, FTP, Ident (RFC 931), NIS and NISPlus, NNTP, Ping, POP3, SMTP, SNMP, SSLeay, Telnet, and Time--to name just a few.
Here's a client that creates a TCP connection to the "daytime" service at port 13 of the host name "localhost" and prints out everything that the server there cares to provide.
When you run this program, you should get something back that looks like this:
- Wed May 14 08:40:46 MDT 1997
Here are what those parameters to the new() constructor mean:
Proto
This is which protocol to use. In this case, the socket handle returned will be connected to a TCP socket, because we want a stream-oriented connection, that is, one that acts pretty much like a plain old file. Not all sockets are this of this type. For example, the UDP protocol can be used to make a datagram socket, used for message-passing.
PeerAddr
This is the name or Internet address of the remote host the server is
running on. We could have specified a longer name like "www.perl.com"
,
or an address like "207.171.7.72"
. For demonstration purposes, we've
used the special hostname "localhost"
, which should always mean the
current machine you're running on. The corresponding Internet address
for localhost is "127.0.0.1"
, if you'd rather use that.
PeerPort
This is the service name or port number we'd like to connect to.
We could have gotten away with using just "daytime"
on systems with a
well-configured system services file,[FOOTNOTE: The system services file
is found in /etc/services under Unixy systems.] but here we've specified the
port number (13) in parentheses. Using just the number would have also
worked, but numeric literals make careful programmers nervous.
Notice how the return value from the new
constructor is used as
a filehandle in the while
loop? That's what's called an indirect
filehandle, a scalar variable containing a filehandle. You can use
it the same way you would a normal filehandle. For example, you
can read one line from it this way:
- $line = <$handle>;
all remaining lines from is this way:
- @lines = <$handle>;
and send a line of data to it this way:
- print $handle "some data\n";
Here's a simple client that takes a remote host to fetch a document from, and then a list of files to get from that host. This is a more interesting client than the previous one because it first sends something to the server before fetching the server's response.
- #!/usr/bin/perl -w
- use IO::Socket;
- unless (@ARGV > 1) { die "usage: $0 host url ..." }
- $host = shift(@ARGV);
- $EOL = "\015\012";
- $BLANK = $EOL x 2;
- for my $document (@ARGV) {
- $remote = IO::Socket::INET->new( Proto => "tcp",
- PeerAddr => $host,
- PeerPort => "http(80)",
- ) || die "cannot connect to httpd on $host";
- $remote->autoflush(1);
- print $remote "GET $document HTTP/1.0" . $BLANK;
- while ( <$remote> ) { print }
- close $remote;
- }
The web server handling the HTTP service is assumed to be at
its standard port, number 80. If the server you're trying to
connect to is at a different port, like 1080 or 8080, you should specify it
as the named-parameter pair, PeerPort => 8080
. The autoflush
method is used on the socket because otherwise the system would buffer
up the output we sent it. (If you're on a prehistoric Mac, you'll also
need to change every "\n"
in your code that sends data over the network
to be a "\015\012"
instead.)
Connecting to the server is only the first part of the process: once you have the connection, you have to use the server's language. Each server on the network has its own little command language that it expects as input. The string that we send to the server starting with "GET" is in HTTP syntax. In this case, we simply request each specified document. Yes, we really are making a new connection for each document, even though it's the same host. That's the way you always used to have to speak HTTP. Recent versions of web browsers may request that the remote server leave the connection open a little while, but the server doesn't have to honor such a request.
Here's an example of running that program, which we'll call webget:
- % webget www.perl.com /guanaco.html
- HTTP/1.1 404 File Not Found
- Date: Thu, 08 May 1997 18:02:32 GMT
- Server: Apache/1.2b6
- Connection: close
- Content-type: text/html
- <HEAD><TITLE>404 File Not Found</TITLE></HEAD>
- <BODY><H1>File Not Found</H1>
- The requested URL /guanaco.html was not found on this server.<P>
- </BODY>
Ok, so that's not very interesting, because it didn't find that particular document. But a long response wouldn't have fit on this page.
For a more featureful version of this program, you should look to the lwp-request program included with the LWP modules from CPAN.
Well, that's all fine if you want to send one command and get one answer, but what about setting up something fully interactive, somewhat like the way telnet works? That way you can type a line, get the answer, type a line, get the answer, etc.
This client is more complicated than the two we've done so far, but if
you're on a system that supports the powerful fork call, the solution
isn't that rough. Once you've made the connection to whatever service
you'd like to chat with, call fork to clone your process. Each of
these two identical process has a very simple job to do: the parent
copies everything from the socket to standard output, while the child
simultaneously copies everything from standard input to the socket.
To accomplish the same thing using just one process would be much
harder, because it's easier to code two processes to do one thing than it
is to code one process to do two things. (This keep-it-simple principle
a cornerstones of the Unix philosophy, and good software engineering as
well, which is probably why it's spread to other systems.)
Here's the code:
- #!/usr/bin/perl -w
- use strict;
- use IO::Socket;
- my ($host, $port, $kidpid, $handle, $line);
- unless (@ARGV == 2) { die "usage: $0 host port" }
- ($host, $port) = @ARGV;
- # create a tcp connection to the specified host and port
- $handle = IO::Socket::INET->new(Proto => "tcp",
- PeerAddr => $host,
- PeerPort => $port)
- || die "can't connect to port $port on $host: $!";
- $handle->autoflush(1); # so output gets there right away
- print STDERR "[Connected to $host:$port]\n";
- # split the program into two processes, identical twins
- die "can't fork: $!" unless defined($kidpid = fork());
- # the if{} block runs only in the parent process
- if ($kidpid) {
- # copy the socket to standard output
- while (defined ($line = <$handle>)) {
- print STDOUT $line;
- }
- kill("TERM", $kidpid); # send SIGTERM to child
- }
- # the else{} block runs only in the child process
- else {
- # copy standard input to the socket
- while (defined ($line = <STDIN>)) {
- print $handle $line;
- }
- exit(0); # just in case
- }
The kill function in the parent's if
block is there to send a
signal to our child process, currently running in the else
block,
as soon as the remote server has closed its end of the connection.
If the remote server sends data a byte at time, and you need that
data immediately without waiting for a newline (which might not happen),
you may wish to replace the while
loop in the parent with the
following:
Making a system call for each byte you want to read is not very efficient (to put it mildly) but is the simplest to explain and works reasonably well.
As always, setting up a server is little bit more involved than running a client.
The model is that the server creates a special kind of socket that
does nothing but listen on a particular port for incoming connections.
It does this by calling the IO::Socket::INET->new()
method with
slightly different arguments than the client did.
This is which protocol to use. Like our clients, we'll
still specify "tcp"
here.
We specify a local
port in the LocalPort
argument, which we didn't do for the client.
This is service name or port number for which you want to be the
server. (Under Unix, ports under 1024 are restricted to the
superuser.) In our sample, we'll use port 9000, but you can use
any port that's not currently in use on your system. If you try
to use one already in used, you'll get an "Address already in use"
message. Under Unix, the netstat -a
command will show
which services current have servers.
The Listen
parameter is set to the maximum number of
pending connections we can accept until we turn away incoming clients.
Think of it as a call-waiting queue for your telephone.
The low-level Socket module has a special symbol for the system maximum, which
is SOMAXCONN.
The Reuse
parameter is needed so that we restart our server
manually without waiting a few minutes to allow system buffers to
clear out.
Once the generic server socket has been created using the parameters
listed above, the server then waits for a new client to connect
to it. The server blocks in the accept method, which eventually accepts a
bidirectional connection from the remote client. (Make sure to autoflush
this handle to circumvent buffering.)
To add to user-friendliness, our server prompts the user for commands.
Most servers don't do this. Because of the prompt without a newline,
you'll have to use the sysread variant of the interactive client above.
This server accepts one of five different commands, sending output back to the client. Unlike most network servers, this one handles only one incoming client at a time. Multithreaded servers are covered in Chapter 16 of the Camel.
Here's the code. We'll
- #!/usr/bin/perl -w
- use IO::Socket;
- use Net::hostent; # for OOish version of gethostbyaddr
- $PORT = 9000; # pick something not in use
- $server = IO::Socket::INET->new( Proto => "tcp",
- LocalPort => $PORT,
- Listen => SOMAXCONN,
- Reuse => 1);
- die "can't setup server" unless $server;
- print "[Server $0 accepting clients]\n";
- while ($client = $server->accept()) {
- $client->autoflush(1);
- print $client "Welcome to $0; type help for command list.\n";
- $hostinfo = gethostbyaddr($client->peeraddr);
- printf "[Connect from %s]\n", $hostinfo ? $hostinfo->name : $client->peerhost;
- print $client "Command? ";
- while ( <$client>) {
- next unless /\S/; # blank line
- if (/quit|exit/i) { last }
- elsif (/date|time/i) { printf $client "%s\n", scalar localtime() }
- elsif (/who/i ) { print $client `who 2>&1` }
- elsif (/cookie/i ) { print $client `/usr/games/fortune 2>&1` }
- elsif (/motd/i ) { print $client `cat /etc/motd 2>&1` }
- else {
- print $client "Commands: quit date who cookie motd\n";
- }
- } continue {
- print $client "Command? ";
- }
- close $client;
- }
Another kind of client-server setup is one that uses not connections, but messages. UDP communications involve much lower overhead but also provide less reliability, as there are no promises that messages will arrive at all, let alone in order and unmangled. Still, UDP offers some advantages over TCP, including being able to "broadcast" or "multicast" to a whole bunch of destination hosts at once (usually on your local subnet). If you find yourself overly concerned about reliability and start building checks into your message system, then you probably should use just TCP to start with.
UDP datagrams are not a bytestream and should not be treated as such. This makes using I/O mechanisms with internal buffering like stdio (i.e. print() and friends) especially cumbersome. Use syswrite(), or better send(), like in the example below.
Here's a UDP program similar to the sample Internet TCP client given earlier. However, instead of checking one host at a time, the UDP version will check many of them asynchronously by simulating a multicast and then using select() to do a timed-out wait for I/O. To do something similar with TCP, you'd have to use a different socket handle for each host.
- #!/usr/bin/perl -w
- use strict;
- use Socket;
- use Sys::Hostname;
- my ( $count, $hisiaddr, $hispaddr, $histime,
- $host, $iaddr, $paddr, $port, $proto,
- $rin, $rout, $rtime, $SECS_OF_70_YEARS);
- $SECS_OF_70_YEARS = 2_208_988_800;
- $iaddr = gethostbyname(hostname());
- $proto = getprotobyname("udp");
- $port = getservbyname("time", "udp");
- $paddr = sockaddr_in(0, $iaddr); # 0 means let kernel pick
- socket(SOCKET, PF_INET, SOCK_DGRAM, $proto) || die "socket: $!";
- bind(SOCKET, $paddr) || die "bind: $!";
- $| = 1;
- printf "%-12s %8s %s\n", "localhost", 0, scalar localtime();
- $count = 0;
- for $host (@ARGV) {
- $count++;
- $hisiaddr = inet_aton($host) || die "unknown host";
- $hispaddr = sockaddr_in($port, $hisiaddr);
- defined(send(SOCKET, 0, 0, $hispaddr)) || die "send $host: $!";
- }
- $rin = "";
- vec($rin, fileno(SOCKET), 1) = 1;
- # timeout after 10.0 seconds
- while ($count && select($rout = $rin, undef, undef, 10.0)) {
- $rtime = "";
- $hispaddr = recv(SOCKET, $rtime, 4, 0) || die "recv: $!";
- ($port, $hisiaddr) = sockaddr_in($hispaddr);
- $host = gethostbyaddr($hisiaddr, AF_INET);
- $histime = unpack("N", $rtime) - $SECS_OF_70_YEARS;
- printf "%-12s ", $host;
- printf "%8d %s\n", $histime - time(), scalar localtime($histime);
- $count--;
- }
This example does not include any retries and may consequently fail to contact a reachable host. The most prominent reason for this is congestion of the queues on the sending host if the number of hosts to contact is sufficiently large.
While System V IPC isn't so widely used as sockets, it still has some
interesting uses. However, you cannot use SysV IPC or Berkeley mmap() to
have a variable shared amongst several processes. That's because Perl
would reallocate your string when you weren't wanting it to. You might
look into the IPC::Shareable
or threads::shared
modules for that.
Here's a small example showing shared memory usage.
- use IPC::SysV qw(IPC_PRIVATE IPC_RMID S_IRUSR S_IWUSR);
- $size = 2000;
- $id = shmget(IPC_PRIVATE, $size, S_IRUSR | S_IWUSR);
- defined($id) || die "shmget: $!";
- print "shm key $id\n";
- $message = "Message #1";
- shmwrite($id, $message, 0, 60) || die "shmwrite: $!";
- print "wrote: '$message'\n";
- shmread($id, $buff, 0, 60) || die "shmread: $!";
- print "read : '$buff'\n";
- # the buffer of shmread is zero-character end-padded.
- substr($buff, index($buff, "\0")) = "";
- print "un" unless $buff eq $message;
- print "swell\n";
- print "deleting shm $id\n";
- shmctl($id, IPC_RMID, 0) || die "shmctl: $!";
Here's an example of a semaphore:
Put this code in a separate file to be run in more than one process. Call the file take:
- # create a semaphore
- $IPC_KEY = 1234;
- $id = semget($IPC_KEY, 0, 0);
- defined($id) || die "semget: $!";
- $semnum = 0;
- $semflag = 0;
- # "take" semaphore
- # wait for semaphore to be zero
- $semop = 0;
- $opstring1 = pack("s!s!s!", $semnum, $semop, $semflag);
- # Increment the semaphore count
- $semop = 1;
- $opstring2 = pack("s!s!s!", $semnum, $semop, $semflag);
- $opstring = $opstring1 . $opstring2;
- semop($id, $opstring) || die "semop: $!";
Put this code in a separate file to be run in more than one process. Call this file give:
- # "give" the semaphore
- # run this in the original process and you will see
- # that the second process continues
- $IPC_KEY = 1234;
- $id = semget($IPC_KEY, 0, 0);
- die unless defined($id);
- $semnum = 0;
- $semflag = 0;
- # Decrement the semaphore count
- $semop = -1;
- $opstring = pack("s!s!s!", $semnum, $semop, $semflag);
- semop($id, $opstring) || die "semop: $!";
The SysV IPC code above was written long ago, and it's definitely clunky looking. For a more modern look, see the IPC::SysV module.
A small example demonstrating SysV message queues:
- use IPC::SysV qw(IPC_PRIVATE IPC_RMID IPC_CREAT S_IRUSR S_IWUSR);
- my $id = msgget(IPC_PRIVATE, IPC_CREAT | S_IRUSR | S_IWUSR);
- defined($id) || die "msgget failed: $!";
- my $sent = "message";
- my $type_sent = 1234;
- msgsnd($id, pack("l! a*", $type_sent, $sent), 0)
- || die "msgsnd failed: $!";
- msgrcv($id, my $rcvd_buf, 60, 0, 0)
- || die "msgrcv failed: $!";
- my($type_rcvd, $rcvd) = unpack("l! a*", $rcvd_buf);
- if ($rcvd eq $sent) {
- print "okay\n";
- } else {
- print "not okay\n";
- }
- msgctl($id, IPC_RMID, 0) || die "msgctl failed: $!\n";
Most of these routines quietly but politely return undef when they
fail instead of causing your program to die right then and there due to
an uncaught exception. (Actually, some of the new Socket conversion
functions do croak() on bad arguments.) It is therefore essential to
check return values from these functions. Always begin your socket
programs this way for optimal success, and don't forget to add the -T
taint-checking flag to the #!
line for servers:
These routines all create system-specific portability problems. As noted elsewhere, Perl is at the mercy of your C libraries for much of its system behavior. It's probably safest to assume broken SysV semantics for signals and to stick with simple TCP and UDP socket operations; e.g., don't try to pass open file descriptors over a local UDP datagram socket if you want your code to stand a chance of being portable.
Tom Christiansen, with occasional vestiges of Larry Wall's original version and suggestions from the Perl Porters.
There's a lot more to networking than this, but this should get you started.
For intrepid programmers, the indispensable textbook is Unix Network Programming, 2nd Edition, Volume 1 by W. Richard Stevens (published by Prentice-Hall). Most books on networking address the subject from the perspective of a C programmer; translation to Perl is left as an exercise for the reader.
The IO::Socket(3) manpage describes the object library, and the Socket(3) manpage describes the low-level interface to sockets. Besides the obvious functions in perlfunc, you should also check out the modules file at your nearest CPAN site, especially http://www.cpan.org/modules/00modlist.long.html#ID5_Networking_. See perlmodlib or best yet, the Perl FAQ for a description of what CPAN is and where to get it if the previous link doesn't work for you.
Section 5 of CPAN's modules file is devoted to "Networking, Device Control (modems), and Interprocess Communication", and contains numerous unbundled modules numerous networking modules, Chat and Expect operations, CGI programming, DCE, FTP, IPC, NNTP, Proxy, Ptty, RPC, SNMP, SMTP, Telnet, Threads, and ToolTalk--to name just a few.
perlirix - Perl version 5 on Irix systems
This document describes various features of Irix that will affect how Perl version 5 (hereafter just Perl) is compiled and/or runs.
Use
- sh Configure -Dcc='cc -n32'
to compile Perl 32-bit. Don't bother with -n32 unless you have 7.1 or later compilers (use cc -version to check).
(Building 'cc -n32' is the default.)
Use
- sh Configure -Dcc='cc -64' -Duse64bitint
This requires require a 64-bit MIPS CPU (R8000, R10000, ...)
You can also use
- sh Configure -Dcc='cc -64' -Duse64bitall
but that makes no difference compared with the -Duse64bitint because
of the cc -64
.
You can also do
- sh Configure -Dcc='cc -n32' -Duse64bitint
to use long longs for the 64-bit integer type, in case you don't have a 64-bit CPU.
If you are using gcc, just
- sh Configure -Dcc=gcc -Duse64bitint
should be enough, the Configure should automatically probe for the correct 64-bit settings.
Some Irix cc versions, e.g. 7.3.1.1m (try cc -version) have been known to have issues (coredumps) when compiling perl.c. If you've used -OPT:fast_io=ON and this happens, try removing it. If that fails, or you didn't use that, then try adjusting other optimization options (-LNO, -INLINE, -O3 to -O2, etcetera). The compiler bug has been reported to SGI. (Allen Smith <easmith@beatrice.rutgers.edu>)
If you get complaints about so_locations then search in the file hints/irix_6.sh for "lddflags" and do the suggested adjustments. (David Billinghurst <David.Billinghurst@riotinto.com.au>)
Do not try to use Perl's malloc, this will lead into very mysterious errors (especially with -Duse64bitall).
Run Configure with -Duseithreads which will configure Perl with the Perl 5.8.0 "interpreter threads", see threads.
For Irix 6.2 with perl threads, you have to have the following patches installed:
- 1404 Irix 6.2 Posix 1003.1b man pages
- 1645 Irix 6.2 & 6.3 POSIX header file updates
- 2000 Irix 6.2 Posix 1003.1b support modules
- 2254 Pthread library fixes
- 2401 6.2 all platform kernel rollup
IMPORTANT: Without patch 2401, a kernel bug in Irix 6.2 will cause your machine to panic and crash when running threaded perl. Irix 6.3 and later are okay.
- Thanks to Hannu Napari <Hannu.Napari@hut.fi> for the IRIX
- pthreads patches information.
While running Configure and when building, you are likely to get quite a few of these warnings:
- ld:
- The shared object /usr/lib/libm.so did not resolve any symbols.
- You may want to remove it from your link line.
Ignore them: in IRIX 5.3 there is no way to quieten ld about this.
During compilation you will see this warning from toke.c:
- uopt: Warning: Perl_yylex: this procedure not optimized because it
- exceeds size threshold; to optimize this procedure, use -Olimit option
- with value >= 4252.
Ignore the warning.
In IRIX 5.3 and with Perl 5.8.1 (Perl 5.8.0 didn't compile in IRIX 5.3) the following failures are known.
- Failed Test Stat Wstat Total Fail Failed List of Failed
- --------------------------------------------------------------------------
- ../ext/List/Util/t/shuffle.t 0 139 ?? ?? % ??
- ../lib/Math/Trig.t 255 65280 29 12 41.38% 24-29
- ../lib/sort.t 0 138 119 72 60.50% 48-119
- 56 tests and 474 subtests skipped.
- Failed 3/811 test scripts, 99.63% okay. 78/75813 subtests failed, 99.90% okay.
They are suspected to be compiler errors (at least the shuffle.t failure is known from some IRIX 6 setups) and math library errors (the Trig.t failure), but since IRIX 5 is long since end-of-lifed, further fixes for the IRIX are unlikely. If you can get gcc for 5.3, you could try that, too, since gcc in IRIX 6 is a known workaround for at least the shuffle.t and sort.t failures.
Jarkko Hietaniemi <jhi@iki.fi>
Please report any errors, updates, or suggestions to perlbug@perl.org.
perlivp - Perl Installation Verification Procedure
perlivp [-p] [-v] [-h]
The perlivp program is set up at Perl source code build time to test the Perl version it was built under. It can be used after running:
- make install
(or your platform's equivalent procedure) to verify that perl and its libraries have been installed correctly. A correct installation is verified by output that looks like:
- ok 1
- ok 2
etc.
Prints out a brief help message.
Gives a description of each test prior to performing it.
Gives more detailed information about each test, after it has been performed. Note that any failed tests ought to print out some extra information whether or not -v is thrown.
Likely to occur for a perl binary that was not properly installed. Correct by conducting a proper installation.
Likely to occur for a perl that was not properly installed. Correct by conducting a proper installation.
Likely to occur for a perl library tree that was not properly installed. Correct by conducting a proper installation.
One of the two modules that is used by perlivp was not present in the installation. This is a serious error since it adversely affects perlivp's ability to function. You may be able to correct this by performing a proper perl installation.
An attempt to eval "require $module"
failed, even though the list of
extensions indicated that it should succeed. Correct by conducting a proper
installation.
This test not coming out ok could indicate that you have in fact installed
a bLuRfle.pm module or that the eval " require \"$module_name.pm\"; "
test may give misleading results with your installation of perl. If yours
is the latter case then please let the author know.
One or more files turned up missing according to a run of
ExtUtils::Installed -> validate()
over your installation.
Correct by conducting a proper installation.
For further information on how to conduct a proper installation consult the INSTALL file that comes with the perl source and the README file for your platform.
Peter Prymmer
perllexwarn - Perl Lexical Warnings
The use warnings
pragma enables to control precisely what warnings are
to be enabled in which parts of a Perl program. It's a more flexible
alternative for both the command line flag -w and the equivalent Perl
variable, $^W
.
This pragma works just like the strict
pragma.
This means that the scope of the warning pragma is limited to the
enclosing block. It also means that the pragma setting will not
leak across files (via use, require or do). This allows
authors to independently define the degree of warning checks that will
be applied to their module.
By default, optional warnings are disabled, so any legacy code that doesn't attempt to control the warnings will work unchanged.
All warnings are enabled in a block by either of these:
Similarly all warnings are disabled in a block by either of these:
For example, consider the code below:
The code in the enclosing block has warnings enabled, but the inner
block has them disabled. In this case that means the assignment to the
scalar $c
will trip the "Scalar value @a[0] better written as $a[0]"
warning, but the assignment to the scalar $b
will not.
Before the introduction of lexical warnings, Perl had two classes of warnings: mandatory and optional.
As its name suggests, if your code tripped a mandatory warning, you
would get a warning whether you wanted it or not.
For example, the code below would always produce an "isn't numeric"
warning about the "2:".
- my $a = "2:" + 3;
With the introduction of lexical warnings, mandatory warnings now become
default warnings. The difference is that although the previously
mandatory warnings are still enabled by default, they can then be
subsequently enabled or disabled with the lexical warning pragma. For
example, in the code below, an "isn't numeric"
warning will only
be reported for the $a
variable.
Note that neither the -w flag or the $^W
can be used to
disable/enable default warnings. They are still mandatory in this case.
$^W
Although very useful, the big problem with using -w on the command line to enable warnings is that it is all or nothing. Take the typical scenario when you are writing a Perl program. Parts of the code you will write yourself, but it's very likely that you will make use of pre-written Perl modules. If you use the -w flag in this case, you end up enabling warnings in pieces of code that you haven't written.
Similarly, using $^W
to either disable or enable blocks of code is
fundamentally flawed. For a start, say you want to disable warnings in
a block of code. You might expect this to be enough to do the trick:
When this code is run with the -w flag, a warning will be produced
for the $a
line: "Reversed += operator"
.
The problem is that Perl has both compile-time and run-time warnings. To disable compile-time warnings you need to rewrite the code like this:
The other big problem with $^W
is the way you can inadvertently
change the warning setting in unexpected places in your code. For example,
when the code below is run (without the -w flag), the second call
to doit
will trip a "Use of uninitialized value"
warning, whereas
the first will not.
This is a side-effect of $^W
being dynamically scoped.
Lexical warnings get around these limitations by allowing finer control over where warnings can or can't be tripped.
There are three Command Line flags that can be used to control when warnings are (or aren't) produced:
This is the existing flag. If the lexical warnings pragma is not used in any of you code, or any of the modules that you use, this flag will enable warnings everywhere. See Backward Compatibility for details of how this flag interacts with lexical warnings.
If the -W flag is used on the command line, it will enable all warnings
throughout the program regardless of whether warnings were disabled
locally using no warnings
or $^W =0
. This includes all files that get
included via use, require or do.
Think of it as the Perl equivalent of the "lint" command.
Does the exact opposite to the -W flag, i.e. it disables all warnings.
If you are used to working with a version of Perl prior to the
introduction of lexically scoped warnings, or have code that uses both
lexical warnings and $^W
, this section will describe how they interact.
How Lexical Warnings interact with -w/$^W
:
If none of the three command line flags (-w, -W or -X) that
control warnings is used and neither $^W
nor the warnings
pragma
are used, then default warnings will be enabled and optional warnings
disabled.
This means that legacy code that doesn't attempt to control the warnings
will work unchanged.
The -w flag just sets the global $^W
variable as in 5.005. This
means that any legacy code that currently relies on manipulating $^W
to control warning behavior will still work as is.
Apart from now being a boolean, the $^W
variable operates in exactly
the same horrible uncontrolled global way, except that it cannot
disable/enable default warnings.
If a piece of code is under the control of the warnings
pragma,
both the $^W
variable and the -w flag will be ignored for the
scope of the lexical warning.
The only way to override a lexical warnings setting is with the -W or -X command line flags.
The combined effect of 3 & 4 is that it will allow code which uses
the warnings
pragma to control the warning behavior of $^W-type
code (using a local $^W=0
) if it really wants to, but not vice-versa.
A hierarchy of "categories" have been defined to allow groups of warnings to be enabled/disabled in isolation.
The current hierarchy is:
- all -+
- |
- +- closure
- |
- +- deprecated
- |
- +- exiting
- |
- +- experimental --+
- | |
- | +- experimental::lexical_subs
- |
- +- glob
- |
- +- imprecision
- |
- +- io ------------+
- | |
- | +- closed
- | |
- | +- exec
- | |
- | +- layer
- | |
- | +- newline
- | |
- | +- pipe
- | |
- | +- unopened
- |
- +- misc
- |
- +- numeric
- |
- +- once
- |
- +- overflow
- |
- +- pack
- |
- +- portable
- |
- +- recursion
- |
- +- redefine
- |
- +- regexp
- |
- +- severe --------+
- | |
- | +- debugging
- | |
- | +- inplace
- | |
- | +- internal
- | |
- | +- malloc
- |
- +- signal
- |
- +- substr
- |
- +- syntax --------+
- | |
- | +- ambiguous
- | |
- | +- bareword
- | |
- | +- digit
- | |
- | +- illegalproto
- | |
- | +- parenthesis
- | |
- | +- precedence
- | |
- | +- printf
- | |
- | +- prototype
- | |
- | +- qw
- | |
- | +- reserved
- | |
- | +- semicolon
- |
- +- taint
- |
- +- threads
- |
- +- uninitialized
- |
- +- unpack
- |
- +- untie
- |
- +- utf8 ----------+
- | |
- | +- non_unicode
- | |
- | +- nonchar
- | |
- | +- surrogate
- |
- +- void
Just like the "strict" pragma any of these categories can be combined
Also like the "strict" pragma, if there is more than one instance of the
warnings
pragma in a given scope the cumulative effect is additive.
To determine which category a specific warning has been assigned to see perldiag.
Note: In Perl 5.6.1, the lexical warnings category "deprecated" was a sub-category of the "syntax" category. It is now a top-level category in its own right.
The presence of the word "FATAL" in the category list will escalate any
warnings detected from the categories specified in the lexical scope
into fatal errors. In the code below, the use of time, length
and join can all produce a "Useless use of xxx in void context"
warning.
When run it produces this output
- Useless use of time in void context at fatal line 3.
- Useless use of length in void context at fatal line 7.
The scope where length is used has escalated the void
warnings
category into a fatal error, so the program terminates immediately it
encounters the warning.
To explicitly turn off a "FATAL" warning you just disable the warning it is associated with. So, for example, to disable the "void" warning in the example above, either of these will do the trick:
If you want to downgrade a warning that has been escalated into a fatal error back to a normal warning, you can use the "NONFATAL" keyword. For example, the code below will promote all warnings into fatal errors, except for those in the "syntax" category.
- use warnings FATAL => 'all', NONFATAL => 'syntax';
The warnings
pragma provides a number of functions that are useful for
module authors. These are used when you want to report a module-specific
warning to a calling module has enabled warnings via the warnings
pragma.
Consider the module MyMod::Abc
below.
The call to warnings::register
will create a new warnings category
called "MyMod::Abc", i.e. the new category name matches the current
package name. The open function in the module will display a warning
message if it gets given a relative path as a parameter. This warnings
will only be displayed if the code that uses MyMod::Abc
has actually
enabled them with the warnings
pragma like below.
It is also possible to test whether the pre-defined warnings categories are
set in the calling module with the warnings::enabled
function. Consider
this snippet of code:
- package MyMod::Abc;
- sub open {
- warnings::warnif("deprecated",
- "open is deprecated, use new instead");
- new(@_);
- }
- sub new
- ...
- 1;
The function open has been deprecated, so code has been included to
display a warning message whenever the calling module has (at least) the
"deprecated" warnings category enabled. Something like this, say.
Either the warnings::warn
or warnings::warnif
function should be
used to actually display the warnings message. This is because they can
make use of the feature that allows warnings to be escalated into fatal
errors. So in this case
the warnings::warnif
function will detect this and die after
displaying the warning message.
The three warnings functions, warnings::warn
, warnings::warnif
and warnings::enabled
can optionally take an object reference in place
of a category name. In this case the functions will use the class name
of the object as the warnings category.
Consider this example:
- package Original;
- no warnings;
- use warnings::register;
- sub new
- {
- my $class = shift;
- bless [], $class;
- }
- sub check
- {
- my $self = shift;
- my $value = shift;
- if ($value % 2 && warnings::enabled($self))
- { warnings::warn($self, "Odd numbers are unsafe") }
- }
- sub doit
- {
- my $self = shift;
- my $value = shift;
- $self->check($value);
- # ...
- }
- 1;
- package Derived;
- use warnings::register;
- use Original;
- our @ISA = qw( Original );
- sub new
- {
- my $class = shift;
- bless [], $class;
- }
- 1;
The code below makes use of both modules, but it only enables warnings from
Derived
.
When this code is run only the Derived
object, $b
, will generate
a warning.
- Odd numbers are unsafe at main.pl line 7
Notice also that the warning is reported at the line where the object is first used.
When registering new categories of warning, you can supply more names to warnings::register like this:
- package MyModule;
- use warnings::register qw(format precision);
- ...
- warnings::warnif('MyModule::format', '...');
Paul Marquess
perllinux - Perl version 5 on Linux systems
This document describes various features of Linux that will affect how Perl version 5 (hereafter just Perl) is compiled and/or runs.
Sun Microsystems has released a port of their Sun Studio compilers for Linux. As of November 2005, only an alpha version has been released. Until a release of these compilers is made, support for compiling Perl with these compiler experimental.
Also, some special instructions for building Perl with Sun Studio on Linux.
Following the normal Configure
, you have to run make as follows:
- LDLOADLIBS=-lc make
LDLOADLIBS
is an environment variable used by the linker to link modules
/ext modules to glibc. Currently, that environment variable is not getting
populated by a combination of Config
entries and ExtUtil::MakeMaker
.
While there may be a bug somewhere in Perl's configuration or
ExtUtil::MakeMaker
causing the problem, the most likely cause is an
incomplete understanding of Sun Studio by this author. Further investigation
is needed to get this working better.
Steve Peters <steve@fisharerojo.org>
Please report any errors, updates, or suggestions to perlbug@perl.org.
perllocale - Perl locale handling (internationalization and localization)
In the beginning there was ASCII, the "American Standard Code for Information Interchange", which works quite well for Americans with their English alphabet and dollar-denominated currency. But it doesn't work so well even for other English speakers, who may use different currencies, such as the pound sterling (as the symbol for that currency is not in ASCII); and it's hopelessly inadequate for many of the thousands of the world's other languages.
To address these deficiencies, the concept of locales was invented (formally the ISO C, XPG4, POSIX 1.c "locale system"). And applications were and are being written that use the locale mechanism. The process of making such an application take account of its users' preferences in these kinds of matters is called internationalization (often abbreviated as i18n); telling such an application about a particular set of preferences is known as localization (l10n).
Perl was extended to support the locale system. This is controlled per application by using one pragma, one function call, and several environment variables.
Unfortunately, there are quite a few deficiencies with the design (and often, the implementations) of locales, and their use for character sets has mostly been supplanted by Unicode (see perlunitut for an introduction to that, and keep on reading here for how Unicode interacts with locales in Perl).
Perl continues to support the old locale system, and starting in v5.16,
provides a hybrid way to use the Unicode character set, along with the
other portions of locales that may not be so problematic.
(Unicode is also creating CLDR
, the "Common Locale Data Repository",
http://cldr.unicode.org/ which includes more types of information than
are available in the POSIX locale system. At the time of this writing,
there was no CPAN module that provides access to this XML-encoded data.
However, many of its locales have the POSIX-only data extracted, and are
available at http://unicode.org/Public/cldr/latest/.)
A locale is a set of data that describes various aspects of how various communities in the world categorize their world. These categories are broken down into the following types (some of which include a brief note here):
This indicates how numbers should be formatted for human readability, for example the character used as the decimal point.
This for the most part is beyond the scope of Perl
This indicates the ordering of letters for comparison and sorting. In Latin alphabets, for example, "b", generally follows "a".
This indicates, for example if a character is an uppercase letter.
More details on the categories are given below in LOCALE CATEGORIES.
Together, these categories go a long way towards being able to customize a single program to run in many different locations. But there are deficiencies, so keep reading.
Perl will not use locales unless specifically requested to (see NOTES below
for the partial exception of write()). But even if there is such a
request, all of the following must be true for it to work properly:
Your operating system must support the locale system. If it does, you should find that the setlocale() function is a documented part of its C library.
Definitions for locales that you use must be installed. You, or your system administrator, must make sure that this is the case. The available locales, the location in which they are kept, and the manner in which they are installed all vary from system to system. Some systems provide only a few, hard-wired locales and do not allow more to be added. Others allow you to add "canned" locales provided by the system supplier. Still others allow you or the system administrator to define and add arbitrary locales. (You may have to ask your supplier to provide canned locales that are not delivered with your operating system.) Read your system documentation for further illumination.
Perl must believe that the locale system is supported. If it does,
perl -V:d_setlocale will say that the value for d_setlocale
is
define
.
If you want a Perl application to process and present your data
according to a particular locale, the application code should include
the use locale
pragma (see The use locale pragma) where
appropriate, and at least one of the following must be true:
The locale-determining environment variables (see ENVIRONMENT) must be correctly set up at the time the application is started, either by yourself or by whomever set up your system account; or
The application must set its own locale using the method described in The setlocale function.
By default, Perl ignores the current locale. The use locale
pragma tells Perl to use the current locale for some operations.
Starting in v5.16, there is an optional parameter to this pragma:
- use locale ':not_characters';
This parameter allows better mixing of locales and Unicode, and is
described fully in Unicode and UTF-8, but briefly, it tells Perl to
not use the character portions of the locale definition, that is
the LC_CTYPE
and LC_COLLATE
categories. Instead it will use the
native (extended by Unicode) character set. When using this parameter,
you are responsible for getting the external character set translated
into the native/Unicode one (which it already will be if it is one of
the increasingly popular UTF-8 locales). There are convenient ways of
doing this, as described in Unicode and UTF-8.
The current locale is set at execution time by
setlocale() described below. If that function
hasn't yet been called in the course of the program's execution, the
current locale is that which was determined by the ENVIRONMENT in
effect at the start of the program, except that
LC_NUMERIC is always
initialized to the C locale (mentioned under Finding locales).
If there is no valid environment, the current locale is undefined. It
is likely, but not necessarily, the "C" locale.
The operations that are affected by locale are:
use locale ':not_characters';
Format declarations (format()) use LC_NUMERIC
The POSIX date formatting function (strftime()) uses LC_TIME
.
use locale;
The above operations are affected, as well as the following:
The comparison operators (lt
, le
, cmp
, ge
, and gt
) and
the POSIX string collation functions strcoll() and strxfrm() use
LC_COLLATE
. sort() is also affected if used without an
explicit comparison function, because it uses cmp
by default.
Note: eq
and ne
are unaffected by locale: they always
perform a char-by-char comparison of their scalar operands. What's
more, if cmp
finds that its operands are equal according to the
collation sequence specified by the current locale, it goes on to
perform a char-by-char comparison, and only returns 0 (equal) if the
operands are char-for-char identical. If you really want to know whether
two strings--which eq
and cmp
may consider different--are equal
as far as collation in the locale is concerned, see the discussion in
Category LC_COLLATE: Collation.
Regular expressions and case-modification functions (uc(), lc(),
ucfirst(), and lcfirst()) use LC_CTYPE
The default behavior is restored with the no locale
pragma, or
upon reaching the end of the block enclosing use locale
.
Note that use locale
and use locale ':not_characters'
may be
nested, and that what is in effect within an inner scope will revert to
the outer scope's rules at the end of the inner scope.
The string result of any operation that uses locale information is tainted, as it is possible for a locale to be untrustworthy. See SECURITY.
You can switch locales as often as you wish at run time with the POSIX::setlocale() function:
- # Import locale-handling tool set from POSIX module.
- # This example uses: setlocale -- the function call
- # LC_CTYPE -- explained below
- use POSIX qw(locale_h);
- # query and save the old locale
- $old_locale = setlocale(LC_CTYPE);
- setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
- # LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"
- setlocale(LC_CTYPE, "");
- # LC_CTYPE now reset to default defined by LC_ALL/LC_CTYPE/LANG
- # environment variables. See below for documentation.
- # restore the old locale
- setlocale(LC_CTYPE, $old_locale);
The first argument of setlocale() gives the category, the second the locale. The category tells in what aspect of data processing you want to apply locale-specific rules. Category names are discussed in LOCALE CATEGORIES and ENVIRONMENT. The locale is the name of a collection of customization information corresponding to a particular combination of language, country or territory, and codeset. Read on for hints on the naming of locales: not all systems name locales as in the example.
If no second argument is provided and the category is something else than LC_ALL, the function returns a string naming the current locale for the category. You can use this value as the second argument in a subsequent call to setlocale().
If no second argument is provided and the category is LC_ALL, the result is implementation-dependent. It may be a string of concatenated locale names (separator also implementation-dependent) or a single locale name. Please consult your setlocale(3) man page for details.
If a second argument is given and it corresponds to a valid locale, the locale for the category is set to that value, and the function returns the now-current locale value. You can then use this in yet another call to setlocale(). (In some implementations, the return value may sometimes differ from the value you gave as the second argument--think of it as an alias for the value you gave.)
As the example shows, if the second argument is an empty string, the category's locale is returned to the default specified by the corresponding environment variables. Generally, this results in a return to the default that was in force when Perl started up: changes to the environment made by the application after startup may or may not be noticed, depending on your system's C library.
If the second argument does not correspond to a valid locale, the locale for the category is not changed, and the function returns undef.
Note that Perl ignores the current LC_CTYPE
and LC_COLLATE
locales
within the scope of a use locale ':not_characters'
.
For further information about the categories, consult setlocale(3).
For locales available in your system, consult also setlocale(3) to see whether it leads to the list of available locales (search for the SEE ALSO section). If that fails, try the following command lines:
- locale -a
- nlsinfo
- ls /usr/lib/nls/loc
- ls /usr/lib/locale
- ls /usr/lib/nls
- ls /usr/share/locale
and see whether they list something resembling these
- en_US.ISO8859-1 de_DE.ISO8859-1 ru_RU.ISO8859-5
- en_US.iso88591 de_DE.iso88591 ru_RU.iso88595
- en_US de_DE ru_RU
- en de ru
- english german russian
- english.iso88591 german.iso88591 russian.iso88595
- english.roman8 russian.koi8r
Sadly, even though the calling interface for setlocale() has been
standardized, names of locales and the directories where the
configuration resides have not been. The basic form of the name is
language_territory.codeset, but the latter parts after
language are not always present. The language and country
are usually from the standards ISO 3166 and ISO 639, the
two-letter abbreviations for the countries and the languages of the
world, respectively. The codeset part often mentions some ISO
8859 character set, the Latin codesets. For example, ISO 8859-1
is the so-called "Western European codeset" that can be used to encode
most Western European languages adequately. Again, there are several
ways to write even the name of that one standard. Lamentably.
Two special locales are worth particular mention: "C" and "POSIX". Currently these are effectively the same locale: the difference is mainly that the first one is defined by the C standard, the second by the POSIX standard. They define the default locale in which every program starts in the absence of locale information in its environment. (The default default locale, if you will.) Its language is (American) English and its character codeset ASCII. Warning. The C locale delivered by some vendors may not actually exactly match what the C standard calls for. So beware.
NOTE: Not all systems have the "POSIX" locale (not all systems are POSIX-conformant), so use "C" when you need explicitly to specify this default locale.
You may encounter the following warning message at Perl startup:
- perl: warning: Setting locale failed.
- perl: warning: Please check that your locale settings:
- LC_ALL = "En_US",
- LANG = (unset)
- are supported and installed on your system.
- perl: warning: Falling back to the standard locale ("C").
This means that your locale settings had LC_ALL set to "En_US" and LANG exists but has no value. Perl tried to believe you but could not. Instead, Perl gave up and fell back to the "C" locale, the default locale that is supposed to work no matter what. This usually means your locale settings were wrong, they mention locales your system has never heard of, or the locale installation in your system has problems (for example, some system files are broken or missing). There are quick and temporary fixes to these problems, as well as more thorough and lasting fixes.
The two quickest fixes are either to render Perl silent about any locale inconsistencies or to run Perl under the default locale "C".
Perl's moaning about locale problems can be silenced by setting the environment variable PERL_BADLANG to a zero value, for example "0". This method really just sweeps the problem under the carpet: you tell Perl to shut up even when Perl sees that something is wrong. Do not be surprised if later something locale-dependent misbehaves.
Perl can be run under the "C" locale by setting the environment variable LC_ALL to "C". This method is perhaps a bit more civilized than the PERL_BADLANG approach, but setting LC_ALL (or other locale variables) may affect other programs as well, not just Perl. In particular, external programs run from within Perl will see these changes. If you make the new settings permanent (read on), all programs you run see the changes. See ENVIRONMENT for the full list of relevant environment variables and USING LOCALES for their effects in Perl. Effects in other programs are easily deducible. For example, the variable LC_COLLATE may well affect your sort program (or whatever the program that arranges "records" alphabetically in your system is called).
You can test out changing these variables temporarily, and if the new settings seem to help, put those settings into your shell startup files. Consult your local documentation for the exact details. For in Bourne-like shells (sh, ksh, bash, zsh):
- LC_ALL=en_US.ISO8859-1
- export LC_ALL
This assumes that we saw the locale "en_US.ISO8859-1" using the commands discussed above. We decided to try that instead of the above faulty locale "En_US"--and in Cshish shells (csh, tcsh)
- setenv LC_ALL en_US.ISO8859-1
or if you have the "env" application you can do in any shell
- env LC_ALL=en_US.ISO8859-1 perl ...
If you do not know what shell you have, consult your local helpdesk or the equivalent.
The slower but superior fixes are when you may be able to yourself fix the misconfiguration of your own environment variables. The mis(sing)configuration of the whole system's locales usually requires the help of your friendly system administrator.
First, see earlier in this document about Finding locales. That tells how to find which locales are really supported--and more importantly, installed--on your system. In our example error message, environment variables affecting the locale are listed in the order of decreasing importance (and unset variables do not matter). Therefore, having LC_ALL set to "En_US" must have been the bad choice, as shown by the error message. First try fixing locale settings listed first.
Second, if using the listed commands you see something exactly (prefix matches do not count and case usually counts) like "En_US" without the quotes, then you should be okay because you are using a locale name that should be installed and available in your system. In this case, see Permanently fixing your system's locale configuration.
This is when you see something like:
- perl: warning: Please check that your locale settings:
- LC_ALL = "En_US",
- LANG = (unset)
- are supported and installed on your system.
but then cannot see that "En_US" listed by the above-mentioned commands. You may see things like "en_US.ISO8859-1", but that isn't the same. In this case, try running under a locale that you can list and which somehow matches what you tried. The rules for matching locale names are a bit vague because standardization is weak in this area. See again the Finding locales about general rules.
Contact a system administrator (preferably your own) and report the exact error message you get, and ask them to read this same documentation you are now reading. They should be able to check whether there is something wrong with the locale configuration of the system. The Finding locales section is unfortunately a bit vague about the exact commands and places because these things are not that standardized.
The POSIX::localeconv() function allows you to get particulars of the
locale-dependent numeric formatting information specified by the current
LC_NUMERIC
and LC_MONETARY
locales. (If you just want the name of
the current locale for a particular category, use POSIX::setlocale()
with a single parameter--see The setlocale function.)
localeconv() takes no arguments, and returns a reference to a hash.
The keys of this hash are variable names for formatting, such as
decimal_point
and thousands_sep
. The values are the
corresponding, er, values. See localeconv in POSIX for a longer
example listing the categories an implementation might be expected to
provide; some provide more and others fewer. You don't need an
explicit use locale
, because localeconv() always observes the
current locale.
Here's a simple-minded example program that rewrites its command-line parameters as integers correctly formatted in the current locale:
- use POSIX qw(locale_h);
- # Get some of locale's numeric formatting parameters
- my ($thousands_sep, $grouping) =
- @{localeconv()}{'thousands_sep', 'grouping'};
- # Apply defaults if values are missing
- $thousands_sep = ',' unless $thousands_sep;
- # grouping and mon_grouping are packed lists
- # of small integers (characters) telling the
- # grouping (thousand_seps and mon_thousand_seps
- # being the group dividers) of numbers and
- # monetary quantities. The integers' meanings:
- # 255 means no more grouping, 0 means repeat
- # the previous grouping, 1-254 means use that
- # as the current grouping. Grouping goes from
- # right to left (low to high digits). In the
- # below we cheat slightly by never using anything
- # else than the first grouping (whatever that is).
- if ($grouping) {
- @grouping = unpack("C*", $grouping);
- } else {
- @grouping = (3);
- }
- # Format command line params for current locale
- for (@ARGV) {
- $_ = int; # Chop non-integer part
- 1 while
- s/(\d)(\d{$grouping[0]}($|$thousands_sep))/$1$thousands_sep$2/;
- print "$_";
- }
- print "\n";
Another interface for querying locale-dependent information is the I18N::Langinfo::langinfo() function, available at least in Unix-like systems and VMS.
The following example will import the langinfo() function itself and three constants to be used as arguments to langinfo(): a constant for the abbreviated first day of the week (the numbering starts from Sunday = 1) and two more constants for the affirmative and negative answers for a yes/no question in the current locale.
In other words, in the "C" (or English) locale the above will probably print something like:
- Sun? [yes/no]
See I18N::Langinfo for more information.
The following subsections describe basic locale categories. Beyond these, some combination categories allow manipulation of more than one basic category at a time. See ENVIRONMENT for a discussion of these.
In the scope of use locale
(but not a
use locale ':not_characters'
), Perl looks to the LC_COLLATE
environment variable to determine the application's notions on collation
(ordering) of characters. For example, "b" follows "a" in Latin
alphabets, but where do "á" and "å" belong? And while
"color" follows "chocolate" in English, what about in traditional Spanish?
The following collations all make sense and you may meet any of them if you "use locale".
- A B C D E a b c d e
- A a B b C c D d E e
- a A b B c C d D e E
- a b c d e A B C D E
Here is a code snippet to tell what "word" characters are in the current locale, in that locale's order:
Compare this with the characters that you see and their order if you state explicitly that the locale should be ignored:
This machine-native collation (which is what you get unless use
locale
has appeared earlier in the same block) must be used for
sorting raw binary data, whereas the locale-dependent collation of the
first example is useful for natural text.
As noted in USING LOCALES, cmp
compares according to the current
collation locale when use locale
is in effect, but falls back to a
char-by-char comparison for strings that the locale says are equal. You
can use POSIX::strcoll() if you don't want this fall-back:
- use POSIX qw(strcoll);
- $equal_in_locale =
- !strcoll("space and case ignored", "SpaceAndCaseIgnored");
$equal_in_locale will be true if the collation locale specifies a dictionary-like ordering that ignores space characters completely and which folds case.
If you have a single string that you want to check for "equality in
locale" against several others, you might think you could gain a little
efficiency by using POSIX::strxfrm() in conjunction with eq
:
- use POSIX qw(strxfrm);
- $xfrm_string = strxfrm("Mixed-case string");
- print "locale collation ignores spaces\n"
- if $xfrm_string eq strxfrm("Mixed-casestring");
- print "locale collation ignores hyphens\n"
- if $xfrm_string eq strxfrm("Mixedcase string");
- print "locale collation ignores case\n"
- if $xfrm_string eq strxfrm("mixed-case string");
strxfrm() takes a string and maps it into a transformed string for use
in char-by-char comparisons against other transformed strings during
collation. "Under the hood", locale-affected Perl comparison operators
call strxfrm() for both operands, then do a char-by-char
comparison of the transformed strings. By calling strxfrm() explicitly
and using a non locale-affected comparison, the example attempts to save
a couple of transformations. But in fact, it doesn't save anything: Perl
magic (see Magic Variables in perlguts) creates the transformed version of a
string the first time it's needed in a comparison, then keeps this version around
in case it's needed again. An example rewritten the easy way with
cmp
runs just about as fast. It also copes with null characters
embedded in strings; if you call strxfrm() directly, it treats the first
null it finds as a terminator. don't expect the transformed strings
it produces to be portable across systems--or even from one revision
of your operating system to the next. In short, don't call strxfrm()
directly: let Perl do it for you.
Note: use locale
isn't shown in some of these examples because it isn't
needed: strcoll() and strxfrm() exist only to generate locale-dependent
results, and so always obey the current LC_COLLATE
locale.
In the scope of use locale
(but not a
use locale ':not_characters'
), Perl obeys the LC_CTYPE
locale
setting. This controls the application's notion of which characters are
alphabetic. This affects Perl's \w
regular expression metanotation,
which stands for alphanumeric characters--that is, alphabetic,
numeric, and including other special characters such as the underscore or
hyphen. (Consult perlre for more information about
regular expressions.) Thanks to LC_CTYPE
, depending on your locale
setting, characters like "æ", "ð", "ß", and
"ø" may be understood as \w
characters.
The LC_CTYPE
locale also provides the map used in transliterating
characters between lower and uppercase. This affects the case-mapping
functions--lc(), lcfirst, uc(), and ucfirst(); case-mapping
interpolation with \l
, \L
, \u
, or \U
in double-quoted strings
and s/// substitutions; and case-independent regular expression
pattern matching using the i
modifier.
Finally, LC_CTYPE
affects the POSIX character-class test
functions--isalpha(), islower(), and so on. For example, if you move
from the "C" locale to a 7-bit Scandinavian one, you may find--possibly
to your surprise--that "|" moves from the ispunct() class to isalpha().
Unfortunately, this creates big problems for regular expressions. "|" still
means alternation even though it matches \w
.
Note that there are quite a few things that are unaffected by the
current locale. All the escape sequences for particular characters,
\n
for example, always mean the platform's native one. This means,
for example, that \N
in regular expressions (every character
but new-line) work on the platform character set.
Note: A broken or malicious LC_CTYPE
locale definition may result
in clearly ineligible characters being considered to be alphanumeric by
your application. For strict matching of (mundane) ASCII letters and
digits--for example, in command strings--locale-aware applications
should use \w
with the /a
regular expression modifier. See SECURITY.
After a proper POSIX::setlocale() call, Perl obeys the LC_NUMERIC
locale information, which controls an application's idea of how numbers
should be formatted for human readability by the printf(), sprintf(), and
write() functions. String-to-numeric conversion by the POSIX::strtod()
function is also affected. In most implementations the only effect is to
change the character used for the decimal point--perhaps from "." to ",".
These functions aren't aware of such niceties as thousands separation and
so on. (See The localeconv function if you care about these things.)
Output produced by print() is also affected by the current locale: it corresponds to what you'd get from printf() in the "C" locale. The same is true for Perl's internal conversions between numeric and string formats:
- use POSIX qw(strtod setlocale LC_NUMERIC);
- setlocale LC_NUMERIC, "";
- $n = 5/2; # Assign numeric 2.5 to $n
- $a = " $n"; # Locale-dependent conversion to string
- print "half five is $n\n"; # Locale-dependent output
- printf "half five is %g\n", $n; # Locale-dependent output
- print "DECIMAL POINT IS COMMA\n"
- if $n == (strtod("2,5"))[0]; # Locale-dependent conversion
See also I18N::Langinfo and RADIXCHAR
.
The C standard defines the LC_MONETARY
category, but not a function
that is affected by its contents. (Those with experience of standards
committees will recognize that the working group decided to punt on the
issue.) Consequently, Perl takes no notice of it. If you really want
to use LC_MONETARY
, you can query its contents--see
The localeconv function--and use the information that it returns in your
application's own formatting of currency amounts. However, you may well
find that the information, voluminous and complex though it may be, still
does not quite meet your requirements: currency formatting is a hard nut
to crack.
See also I18N::Langinfo and CRNCYSTR
.
Output produced by POSIX::strftime(), which builds a formatted
human-readable date/time string, is affected by the current LC_TIME
locale. Thus, in a French locale, the output produced by the %B
format element (full month name) for the first month of the year would
be "janvier". Here's how to get a list of long month names in the
current locale:
- use POSIX qw(strftime);
- for (0..11) {
- $long_month_name[$_] =
- strftime("%B", 0, 0, 0, 1, $_, 96);
- }
Note: use locale
isn't needed in this example: as a function that
exists only to generate locale-dependent results, strftime() always
obeys the current LC_TIME
locale.
See also I18N::Langinfo and ABDAY_1
..ABDAY_7
, DAY_1
..DAY_7
,
ABMON_1
..ABMON_12
, and ABMON_1
..ABMON_12
.
The remaining locale category, LC_MESSAGES
(possibly supplemented
by others in particular implementations) is not currently used by
Perl--except possibly to affect the behavior of library functions
called by extensions outside the standard Perl distribution and by the
operating system and its utilities. Note especially that the string
value of $!
and the error messages given by external utilities may
be changed by LC_MESSAGES
. If you want to have portable error
codes, use %!
. See Errno.
Although the main discussion of Perl security issues can be found in perlsec, a discussion of Perl's locale handling would be incomplete if it did not draw your attention to locale-dependent security issues. Locales--particularly on systems that allow unprivileged users to build their own locales--are untrustworthy. A malicious (or just plain broken) locale can make a locale-aware application give unexpected results. Here are a few possibilities:
Regular expression checks for safe file names or mail addresses using
\w
may be spoofed by an LC_CTYPE
locale that claims that
characters such as ">" and "|" are alphanumeric.
String interpolation with case-mapping, as in, say, $dest =
"C:\U$name.$ext"
, may produce dangerous results if a bogus LC_CTYPE
case-mapping table is in effect.
A sneaky LC_COLLATE
locale could result in the names of students with
"D" grades appearing ahead of those with "A"s.
An application that takes the trouble to use information in
LC_MONETARY
may format debits as if they were credits and vice versa
if that locale has been subverted. Or it might make payments in US
dollars instead of Hong Kong dollars.
The date and day names in dates formatted by strftime() could be
manipulated to advantage by a malicious user able to subvert the
LC_DATE
locale. ("Look--it says I wasn't in the building on
Sunday.")
Such dangers are not peculiar to the locale system: any aspect of an application's environment which may be modified maliciously presents similar challenges. Similarly, they are not specific to Perl: any programming language that allows you to write programs that take account of their environment exposes you to these issues.
Perl cannot protect you from all possibilities shown in the
examples--there is no substitute for your own vigilance--but, when
use locale
is in effect, Perl uses the tainting mechanism (see
perlsec) to mark string results that become locale-dependent, and
which may be untrustworthy in consequence. Here is a summary of the
tainting behavior of operators and functions that may be affected by
the locale:
Comparison operators (lt
, le
, ge
, gt
and cmp
):
Scalar true/false (or less/equal/greater) result is never tainted.
Case-mapping interpolation (with \l
, \L
, \u
or \U
)
Result string containing interpolated material is tainted if
use locale
(but not use locale ':not_characters'
) is in effect.
Matching operator (m//):
Scalar true/false result never tainted.
Subpatterns, either delivered as a list-context result or as $1 etc.
are tainted if use locale
(but not use locale ':not_characters'
)
is in effect, and the subpattern regular
expression contains \w
(to match an alphanumeric character), \W
(non-alphanumeric character), \s (whitespace character), or \S
(non whitespace character). The matched-pattern variable, $&, $`
(pre-match), $' (post-match), and $+ (last match) are also tainted if
use locale
is in effect and the regular expression contains \w
,
\W
, \s, or \S
.
Substitution operator (s///):
Has the same behavior as the match operator. Also, the left
operand of =~
becomes tainted when use locale
(but not use locale ':not_characters'
) is in effect if modified as
a result of a substitution based on a regular
expression match involving \w
, \W
, \s, or \S
; or of
case-mapping with \l
, \L
,\u
or \U
.
Output formatting functions (printf() and write()):
Results are never tainted because otherwise even output from print,
for example print(1/7), should be tainted if use locale
is in
effect.
Case-mapping functions (lc(), lcfirst(), uc(), ucfirst()):
Results are tainted if use locale
(but not
use locale ':not_characters'
) is in effect.
POSIX locale-dependent functions (localeconv(), strcoll(), strftime(), strxfrm()):
Results are never tainted.
POSIX character class tests (isalnum(), isalpha(), isdigit(), isgraph(), islower(), isprint(), ispunct(), isspace(), isupper(), isxdigit()):
True/false results are never tainted.
Three examples illustrate locale-dependent tainting. The first program, which ignores its locale, won't run: a value taken directly from the command line may not be used to name an output file when taint checks are enabled.
The program can be made to run by "laundering" the tainted value through a regular expression: the second example--which still ignores locale information--runs, creating the file named on its command line if it can.
Compare this with a similar but locale-aware program:
This third program fails to run because $& is tainted: it is the result
of a match involving \w
while use locale
is in effect.
A string that can suppress Perl's warning about failed locale settings at startup. Failure can occur if the locale support in the operating system is lacking (broken) in some way--or if you mistyped the name of a locale when you set up your environment. If this environment variable is absent, or has a value that does not evaluate to integer zero--that is, "0" or ""-- Perl will complain about locale setting failures.
NOTE: PERL_BADLANG only gives you a way to hide the warning message. The message tells about some problem in your system's locale support, and you should investigate what the problem is.
The following environment variables are not specific to Perl: They are part of the standardized (ISO C, XPG4, POSIX 1.c) setlocale() method for controlling an application's opinion on data.
LC_ALL
is the "override-all" locale environment variable. If
set, it overrides all the rest of the locale environment variables.
NOTE: LANGUAGE
is a GNU extension, it affects you only if you
are using the GNU libc. This is the case if you are using e.g. Linux.
If you are using "commercial" Unixes you are most probably not
using GNU libc and you can ignore LANGUAGE
.
However, in the case you are using LANGUAGE
: it affects the
language of informational, warning, and error messages output by
commands (in other words, it's like LC_MESSAGES
) but it has higher
priority than LC_ALL
. Moreover, it's not a single value but
instead a "path" (":"-separated list) of languages (not locales).
See the GNU gettext
library documentation for more information.
In the absence of LC_ALL
, LC_CTYPE
chooses the character type
locale. In the absence of both LC_ALL
and LC_CTYPE
, LANG
chooses the character type locale.
In the absence of LC_ALL
, LC_COLLATE
chooses the collation
(sorting) locale. In the absence of both LC_ALL
and LC_COLLATE
,
LANG
chooses the collation locale.
In the absence of LC_ALL
, LC_MONETARY
chooses the monetary
formatting locale. In the absence of both LC_ALL
and LC_MONETARY
,
LANG
chooses the monetary formatting locale.
In the absence of LC_ALL
, LC_NUMERIC
chooses the numeric format
locale. In the absence of both LC_ALL
and LC_NUMERIC
, LANG
chooses the numeric format.
In the absence of LC_ALL
, LC_TIME
chooses the date and time
formatting locale. In the absence of both LC_ALL
and LC_TIME
,
LANG
chooses the date and time formatting locale.
LANG
is the "catch-all" locale environment variable. If it is set, it
is used as the last resort after the overall LC_ALL
and the
category-specific LC_...
.
The LC_NUMERIC controls the numeric output:
and also how strings are parsed by POSIX::strtod() as numbers:
Versions of Perl prior to 5.004 mostly ignored locale information,
generally behaving as if something similar to the "C"
locale were
always in force, even if the program environment suggested otherwise
(see The setlocale function). By default, Perl still behaves this
way for backward compatibility. If you want a Perl application to pay
attention to locale information, you must use the use locale
pragma (see The use locale pragma) or, in the unlikely event
that you want to do so for just pattern matching, the
/l
regular expression modifier (see Character set modifiers in perlre) to instruct it to do so.
Versions of Perl from 5.002 to 5.003 did use the LC_CTYPE
information if available; that is, \w
did understand what
were the letters according to the locale environment variables.
The problem was that the user had no control over the feature:
if the C library supported locales, Perl used them.
In versions of Perl prior to 5.004, per-locale collation was possible
using the I18N::Collate
library module. This module is now mildly
obsolete and should be avoided in new applications. The LC_COLLATE
functionality is now integrated into the Perl core language: One can
use locale-specific scalar data completely normally with use locale
,
so there is no longer any need to juggle with the scalar references of
I18N::Collate
.
Comparing and sorting by locale is usually slower than the default sorting; slow-downs of two to four times have been observed. It will also consume more memory: once a Perl scalar variable has participated in any string comparison or sorting operation obeying the locale collation rules, it will take 3-15 times more memory than before. (The exact multiplier depends on the string's contents, the operating system and the locale.) These downsides are dictated more by the operating system's implementation of the locale system than by Perl.
If a program's environment specifies an LC_NUMERIC locale and use
locale
is in effect when the format is declared, the locale is used
to specify the decimal point character in formatted output. Formatted
output cannot be controlled by use locale
at the time when write()
is called.
The Unicode CLDR project extracts the POSIX portion of many of its locales, available at
- http://unicode.org/Public/cldr/latest/
There is a large collection of locale definitions at:
- http://std.dkuug.dk/i18n/WG15-collection/locales/
You should be aware that it is unsupported, and is not claimed to be fit for any purpose. If your system allows installation of arbitrary locales, you may find the definitions useful as they are, or as a basis for the development of your own locales.
"Internationalization" is often abbreviated as i18n because its first and last letters are separated by eighteen others. (You may guess why the internalin ... internaliti ... i18n tends to get abbreviated.) In the same way, "localization" is often abbreviated to l10n.
Internationalization, as defined in the C and POSIX standards, can be criticized as incomplete, ungainly, and having too large a granularity. (Locales apply to a whole process, when it would arguably be more useful to have them apply to a single thread, window group, or whatever.) They also have a tendency, like standards groups, to divide the world into nations, when we all know that the world can equally well be divided into bankers, bikers, gamers, and so on.
The support of Unicode is new starting from Perl version v5.6, and more fully implemented in version v5.8 and later. See perluniintro. It is strongly recommended that when combining Unicode and locale (starting in v5.16), you use
- use locale ':not_characters';
When this form of the pragma is used, only the non-character portions of
locales are used by Perl, for example LC_NUMERIC
. Perl assumes that
you have translated all the characters it is to operate on into Unicode
(actually the platform's native character set (ASCII or EBCDIC) plus
Unicode). For data in files, this can conveniently be done by also
specifying
This pragma arranges for all inputs from files to be translated into
Unicode from the current locale as specified in the environment (see
ENVIRONMENT), and all outputs to files to be translated back
into the locale. (See open). On a per-filehandle basis, you can
instead use the PerlIO::locale module, or the Encode::Locale
module, both available from CPAN. The latter module also has methods to
ease the handling of ARGV
and environment variables, and can be used
on individual strings. Also, if you know that all your locales will be
UTF-8, as many are these days, you can use the -C
command line switch.
This form of the pragma allows essentially seamless handling of locales with Unicode. The collation order will be Unicode's. It is strongly recommended that when you need to order and sort strings that you use the standard module Unicode::Collate which gives much better results in many instances than you can get with the old-style locale handling.
For pre-v5.16 Perls, or if you use the locale pragma without the
:not_characters
parameter, Perl tries to work with both Unicode and
locales--but there are problems.
Perl does not handle multi-byte locales in this case, such as have been
used for various
Asian languages, such as Big5 or Shift JIS. However, the increasingly
common multi-byte UTF-8 locales, if properly implemented, may work
reasonably well (depending on your C library implementation) in this
form of the locale pragma, simply because both
they and Perl store characters that take up multiple bytes the same way.
However, some, if not most, C library implementations may not process
the characters in the upper half of the Latin-1 range (128 - 255)
properly under LC_CTYPE. To see if a character is a particular type
under a locale, Perl uses the functions like isalnum()
. Your C
library may not work for UTF-8 locales with those functions, instead
only working under the newer wide library functions like iswalnum()
.
Perl generally takes the tack to use locale rules on code points that can fit
in a single byte, and Unicode rules for those that can't (though this
isn't uniformly applied, see the note at the end of this section). This
prevents many problems in locales that aren't UTF-8. Suppose the locale
is ISO8859-7, Greek. The character at 0xD7 there is a capital Chi. But
in the ISO8859-1 locale, Latin1, it is a multiplication sign. The POSIX
regular expression character class [[:alpha:]]
will magically match
0xD7 in the Greek locale but not in the Latin one.
However, there are places where this breaks down. Certain constructs are
for Unicode only, such as \p{Alpha}
. They assume that 0xD7 always has its
Unicode meaning (or the equivalent on EBCDIC platforms). Since Latin1 is a
subset of Unicode and 0xD7 is the multiplication sign in both Latin1 and
Unicode, \p{Alpha}
will never match it, regardless of locale. A similar
issue occurs with \N{...}
. It is therefore a bad idea to use \p{}
or
\N{}
under plain use locale
--unless you can guarantee that the
locale will be a ISO8859-1. Use POSIX character classes instead.
Another problem with this approach is that operations that cross the single byte/multiple byte boundary are not well-defined, and so are disallowed. (This boundary is between the codepoints at 255/256.). For example, lower casing LATIN CAPITAL LETTER Y WITH DIAERESIS (U+0178) should return LATIN SMALL LETTER Y WITH DIAERESIS (U+00FF). But in the Greek locale, for example, there is no character at 0xFF, and Perl has no way of knowing what the character at 0xFF is really supposed to represent. Thus it disallows the operation. In this mode, the lowercase of U+0178 is itself.
The same problems ensue if you enable automatic UTF-8-ification of your
standard file handles, default open() layer, and @ARGV
on non-ISO8859-1,
non-UTF-8 locales (by using either the -C command line switch or the
PERL_UNICODE
environment variable; see perlrun).
Things are read in as UTF-8, which would normally imply a Unicode
interpretation, but the presence of a locale causes them to be interpreted
in that locale instead. For example, a 0xD7 code point in the Unicode
input, which should mean the multiplication sign, won't be interpreted by
Perl that way under the Greek locale. This is not a problem
provided you make certain that all locales will always and only be either
an ISO8859-1, or, if you don't have a deficient C library, a UTF-8 locale.
Vendor locales are notoriously buggy, and it is difficult for Perl to test its locale-handling code because this interacts with code that Perl has no control over; therefore the locale-handling code in Perl may be buggy as well. (However, the Unicode-supplied locales should be better, and there is a feed back mechanism to correct any problems. See Freely available locale definitions.)
If you have Perl v5.16, the problems mentioned above go away if you use
the :not_characters
parameter to the locale pragma (except for vendor
bugs in the non-character portions). If you don't have v5.16, and you
do have locales that work, using them may be worthwhile for certain
specific purposes, as long as you keep in mind the gotchas already
mentioned. For example, if the collation for your locales works, it
runs faster under locales than under Unicode::Collate; and you gain
access to such things as the local currency symbol and the names of the
months and days of the week. (But to hammer home the point, in v5.16,
you get this access without the downsides of locales by using the
:not_characters
form of the pragma.)
Note: The policy of using locale rules for code points that can fit in a
byte, and Unicode rules for those that can't is not uniformly applied.
Pre-v5.12, it was somewhat haphazard; in v5.12 it was applied fairly
consistently to regular expression matching except for bracketed
character classes; in v5.14 it was extended to all regex matches; and in
v5.16 to the casing operations such as "\L"
and uc(). For
collation, in all releases, the system's strxfrm()
function is called,
and whatever it does is what you get.
In certain systems, the operating system's locale support
is broken and cannot be fixed or used by Perl. Such deficiencies can
and will result in mysterious hangs and/or Perl core dumps when
use locale
is in effect. When confronted with such a system,
please report in excruciating detail to <perlbug@perl.org>, and
also contact your vendor: bug fixes may exist for these problems
in your operating system. Sometimes such bug fixes are called an
operating system upgrade.
I18N::Langinfo, perluniintro, perlunicode, open, isalnum in POSIX, isalpha in POSIX, isdigit in POSIX, isgraph in POSIX, islower in POSIX, isprint in POSIX, ispunct in POSIX, isspace in POSIX, isupper in POSIX, isxdigit in POSIX, localeconv in POSIX, setlocale in POSIX, strcoll in POSIX, strftime in POSIX, strtod in POSIX, strxfrm in POSIX.
Jarkko Hietaniemi's original perli18n.pod heavily hacked by Dominic Dunlop, assisted by the perl5-porters. Prose worked over a bit by Tom Christiansen, and updated by Perl 5 porters.
perllol - Manipulating Arrays of Arrays in Perl
The simplest two-level data structure to build in Perl is an array of arrays, sometimes casually called a list of lists. It's reasonably easy to understand, and almost everything that applies here will also be applicable later on with the fancier data structures.
An array of an array is just a regular old array @AoA that you can
get at with two subscripts, like $AoA[3][2]
. Here's a declaration
of the array:
Now you should be very careful that the outer bracket type is a round one, that is, a parenthesis. That's because you're assigning to an @array, so you need parentheses. If you wanted there not to be an @AoA, but rather just a reference to it, you could do something more like this:
- # assign a reference to array of array references
- $ref_to_AoA = [
- [ "fred", "barney", "pebbles", "bambam", "dino", ],
- [ "george", "jane", "elroy", "judy", ],
- [ "homer", "bart", "marge", "maggie", ],
- ];
- say $ref_to_AoA->[2][1];
- bart
Notice that the outer bracket type has changed, and so our access syntax
has also changed. That's because unlike C, in perl you can't freely
interchange arrays and references thereto. $ref_to_AoA is a reference to an
array, whereas @AoA is an array proper. Likewise, $AoA[2]
is not an
array, but an array ref. So how come you can write these:
- $AoA[2][2]
- $ref_to_AoA->[2][2]
instead of having to write these:
- $AoA[2]->[2]
- $ref_to_AoA->[2]->[2]
Well, that's because the rule is that on adjacent brackets only (whether square or curly), you are free to omit the pointer dereferencing arrow. But you cannot do so for the very first one if it's a scalar containing a reference, which means that $ref_to_AoA always needs it.
That's all well and good for declaration of a fixed data structure, but what if you wanted to add new elements on the fly, or build it up entirely from scratch?
First, let's look at reading it in from a file. This is something like adding a row at a time. We'll assume that there's a flat file in which each line is a row and each word an element. If you're trying to develop an @AoA array containing all these, here's the right way to do that:
You might also have loaded that from a function:
- for $i ( 1 .. 10 ) {
- $AoA[$i] = [ somefunc($i) ];
- }
Or you might have had a temporary variable sitting around with the array in it.
- for $i ( 1 .. 10 ) {
- @tmp = somefunc($i);
- $AoA[$i] = [ @tmp ];
- }
It's important you make sure to use the [ ]
array reference
constructor. That's because this wouldn't work:
- $AoA[$i] = @tmp; # WRONG!
The reason that doesn't do what you want is because assigning a named array like that to a scalar is taking an array in scalar context, which means just counts the number of elements in @tmp.
If you are running under use strict
(and if you aren't, why in
the world aren't you?), you'll have to add some declarations to
make it happy:
Of course, you don't need the temporary array to have a name at all:
You also don't have to use push(). You could just make a direct assignment if you knew where you wanted to put it:
or even just
You should in general be leery of using functions that could potentially return lists in scalar context without explicitly stating such. This would be clearer to the casual reader:
If you wanted to have a $ref_to_AoA variable as a reference to an array, you'd have to do something like this:
Now you can add new rows. What about adding new columns? If you're dealing with just matrices, it's often easiest to use simple assignment:
- for $x (1 .. 10) {
- for $y (1 .. 10) {
- $AoA[$x][$y] = func($x, $y);
- }
- }
- for $x ( 3, 7, 9 ) {
- $AoA[$x][20] += func2($x);
- }
It doesn't matter whether those elements are already
there or not: it'll gladly create them for you, setting
intervening elements to undef as need be.
If you wanted just to append to a row, you'd have to do something a bit funnier looking:
- # add new columns to an existing row
- push @{ $AoA[0] }, "wilma", "betty"; # explicit deref
Prior to Perl 5.14, this wouldn't even compile:
- push $AoA[0], "wilma", "betty"; # implicit deref
How come? Because once upon a time, the argument to push() had to be a real array, not just a reference to one. That's no longer true. In fact, the line marked "implicit deref" above works just fine--in this instance--to do what the one that says explicit deref did.
The reason I said "in this instance" is because that only works
because $AoA[0]
already held an array reference. If you try that on an
undefined variable, you'll take an exception. That's because the implicit
derefererence will never autovivify an undefined variable the way @{ }
always will:
If you want to take advantage of this new implicit dereferencing behavior, go right ahead: it makes code easier on the eye and wrist. Just understand that older releases will choke on it during compilation. Whenever you make use of something that works only in some given release of Perl and later, but not earlier, you should place a prominent
- use v5.14; # needed for implicit deref of array refs by array ops
directive at the top of the file that needs it. That way when somebody tries to run the new code under an old perl, rather than getting an error like
- Type of arg 1 to push must be array (not array element) at /tmp/a line 8, near ""betty";"
- Execution of /tmp/a aborted due to compilation errors.
they'll be politely informed that
- Perl v5.14.0 required--this is only v5.12.3, stopped at /tmp/a line 1.
- BEGIN failed--compilation aborted at /tmp/a line 1.
Now it's time to print your data structure out. How are you going to do that? Well, if you want only one of the elements, it's trivial:
- print $AoA[0][0];
If you want to print the whole thing, though, you can't say
- print @AoA; # WRONG
because you'll get just references listed, and perl will never automatically dereference things for you. Instead, you have to roll yourself a loop or two. This prints the whole structure, using the shell-style for() construct to loop across the outer set of subscripts.
- for $aref ( @AoA ) {
- say "\t [ @$aref ],";
- }
If you wanted to keep track of subscripts, you might do this:
- for $i ( 0 .. $#AoA ) {
- say "\t elt $i is [ @{$AoA[$i]} ],";
- }
or maybe even this. Notice the inner loop.
- for $i ( 0 .. $#AoA ) {
- for $j ( 0 .. $#{$AoA[$i]} ) {
- say "elt $i $j is $AoA[$i][$j]";
- }
- }
As you can see, it's getting a bit complicated. That's why sometimes is easier to take a temporary on your way through:
- for $i ( 0 .. $#AoA ) {
- $aref = $AoA[$i];
- for $j ( 0 .. $#{$aref} ) {
- say "elt $i $j is $AoA[$i][$j]";
- }
- }
Hmm... that's still a bit ugly. How about this:
- for $i ( 0 .. $#AoA ) {
- $aref = $AoA[$i];
- $n = @$aref - 1;
- for $j ( 0 .. $n ) {
- say "elt $i $j is $AoA[$i][$j]";
- }
- }
When you get tired of writing a custom print for your data structures, you might look at the standard Dumpvalue or Data::Dumper modules. The former is what the Perl debugger uses, while the latter generates parsable Perl code. For example:
- use v5.14; # using the + prototype, new to v5.14
- sub show(+) {
- require Dumpvalue;
- state $prettily = new Dumpvalue::
- tick => q("),
- compactDump => 1, # comment these two lines out
- veryCompact => 1, # if you want a bigger dump
- ;
- dumpValue $prettily @_;
- }
- # Assign a list of array references to an array.
- my @AoA = (
- [ "fred", "barney" ],
- [ "george", "jane", "elroy" ],
- [ "homer", "marge", "bart" ],
- );
- push $AoA[0], "wilma", "betty";
- show @AoA;
will print out:
- 0 0..3 "fred" "barney" "wilma" "betty"
- 1 0..2 "george" "jane" "elroy"
- 2 0..2 "homer" "marge" "bart"
Whereas if you comment out the two lines I said you might wish to, then it shows it to you this way instead:
- 0 ARRAY(0x8031d0)
- 0 "fred"
- 1 "barney"
- 2 "wilma"
- 3 "betty"
- 1 ARRAY(0x803d40)
- 0 "george"
- 1 "jane"
- 2 "elroy"
- 2 ARRAY(0x803e10)
- 0 "homer"
- 1 "marge"
- 2 "bart"
If you want to get at a slice (part of a row) in a multidimensional array, you're going to have to do some fancy subscripting. That's because while we have a nice synonym for single elements via the pointer arrow for dereferencing, no such convenience exists for slices.
Here's how to do one operation using a loop. We'll assume an @AoA variable as before.
- @part = ();
- $x = 4;
- for ($y = 7; $y < 13; $y++) {
- push @part, $AoA[$x][$y];
- }
That same loop could be replaced with a slice operation:
- @part = @{$AoA[4]}[7..12];
or spaced out a bit:
- @part = @{ $AoA[4] } [ 7..12 ];
But as you might well imagine, this can get pretty rough on the reader.
Ah, but what if you wanted a two-dimensional slice, such as having $x run from 4..8 and $y run from 7 to 12? Hmm... here's the simple way:
- @newAoA = ();
- for ($startx = $x = 4; $x <= 8; $x++) {
- for ($starty = $y = 7; $y <= 12; $y++) {
- $newAoA[$x - $startx][$y - $starty] = $AoA[$x][$y];
- }
- }
We can reduce some of the looping through slices
- for ($x = 4; $x <= 8; $x++) {
- push @newAoA, [ @{ $AoA[$x] } [ 7..12 ] ];
- }
If you were into Schwartzian Transforms, you would probably have selected map for that
- @newAoA = map { [ @{ $AoA[$_] } [ 7..12 ] ] } 4 .. 8;
Although if your manager accused you of seeking job security (or rapid insecurity) through inscrutable code, it would be hard to argue. :-) If I were you, I'd put that in a function:
Tom Christiansen <tchrist@perl.com>
Last update: Tue Apr 26 18:30:55 MDT 2011
perlmacos - Perl under Mac OS (Classic)
For Mac OS X see README.macosx
Perl under Mac OS Classic has not been supported since before Perl 5.10 (April 2004).
When we say "Mac OS" below, we mean Mac OS 7, 8, and 9, and not Mac OS X.
The port of Perl to to Mac OS was officially removed as of Perl 5.12, though the last official production release of MacPerl corresponded to Perl 5.6. While Perl 5.10 included the port to Mac OS, ExtUtils::MakeMaker, a core part of Perl's module installation infrastructure officially dropped support for Mac OS in April 2004.
Perl was ported to Mac OS by Matthias Neeracher <neeracher@mac.com>. Chris Nandor <pudge@pobox.com> continued development and maintenance for the duration of the port's life.
perlmacosx - Perl under Mac OS X
This document briefly describes Perl under Mac OS X.
- curl http://www.cpan.org/src/perl-5.18.2.tar.gz > perl-5.18.0.tar.gz
- tar -xzf perl-5.18.2.tar.gz
- cd perl-5.18.2
- ./Configure -des -Dprefix=/usr/local/
- make
- make test
- sudo make install
The latest Perl release (5.18.2 as of this writing) builds without changes under all versions of Mac OS X from 10.3 "Panther" onwards.
In order to build your own version of Perl you will need 'make', which is part of Apple's developer tools - also known as Xcode. From Mac OS X 10.7 "Lion" onwards, it can be downloaded separately as the 'Command Line Tools' bundle directly from https://developer.apple.com/downloads/ (you will need a free account to log in), or as a part of the Xcode suite, freely available at the App Store. Xcode is a pretty big app, so unless you already have it or really want it, you are advised to get the 'Command Line Tools' bundle separately from the link above. If you want to do it from within Xcode, go to Xcode -> Preferences -> Downloads and select the 'Command Line Tools' option.
Between Mac OS X 10.3 "Panther" and 10.6 "Snow Leopard", the 'Command Line Tools' bundle was called 'unix tools', and was usually supplied with Mac OS install DVDs.
Earlier Mac OS X releases (10.2 "Jaguar" and older) did not include a completely thread-safe libc, so threading is not fully supported. Also, earlier releases included a buggy libdb, so some of the DB_File tests are known to fail on those releases.
The default installation location for this release uses the traditional UNIX directory layout under /usr/local. This is the recommended location for most users, and will leave the Apple-supplied Perl and its modules undisturbed.
Using an installation prefix of '/usr' will result in a directory layout that mirrors that of Apple's default Perl, with core modules stored in '/System/Library/Perl/${version}', CPAN modules stored in '/Library/Perl/${version}', and the addition of '/Network/Library/Perl/${version}' to @INC for modules that are stored on a file server and used by many Macs.
First, export the path to the SDK into the build environment:
- export SDK=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk
Please make sure the SDK version (i.e. the numbers right before '.sdk')
matches your system's (in this case, Mac OS X 10.8 "Mountain Lion"), as it is
possible to have more than one SDK installed. Also make sure the path exists
in your system, and if it doesn't please make sure the SDK is properly
installed, as it should come with the 'Command Line Tools' bundle mentioned
above. Finally, if you have an older Mac OS X (10.6 "Snow Leopard" and below)
running Xcode 4.2 or lower, the SDK path might be something like
'/Developer/SDKs/MacOSX10.3.9.sdk'
.
You can use the SDK by exporting some additions to Perl's 'ccflags' and '..flags' config variables:
- ./Configure -Accflags="-nostdinc -B$SDK/usr/include/gcc \
- -B$SDK/usr/lib/gcc -isystem$SDK/usr/include \
- -F$SDK/System/Library/Frameworks" \
- -Aldflags="-Wl,-syslibroot,$SDK" \
- -de
Note: From Mac OS X 10.6 "Snow Leopard" onwards, Apple only supports Intel-based hardware. This means you can safely skip this section unless you have an older Apple computer running on ppc or wish to create a perl binary with backwards compatibility.
You can compile perl as a universal binary (built for both ppc and intel). In Mac OS X 10.4 "Tiger", you must export the 'u' variant of the SDK:
- export SDK=/Developer/SDKs/MacOSX10.4u.sdk
Mac OS X 10.5 "Leopard" and above do not require the 'u' variant.
In addition to the compiler flags used to select the SDK, also add the flags for creating a universal binary:
- ./Configure -Accflags="-arch i686 -arch ppc -nostdinc -B$SDK/usr/include/gcc \
- -B$SDK/usr/lib/gcc -isystem$SDK/usr/include \
- -F$SDK/System/Library/Frameworks" \
- -Aldflags="-arch i686 -arch ppc -Wl,-syslibroot,$SDK" \
- -de
Keep in mind that these compiler and linker settings will also be used when building CPAN modules. For XS modules to be compiled as a universal binary, any libraries it links to must also be universal binaries. The system libraries that Apple includes with the 10.4u SDK are all universal, but user-installed libraries may need to be re-installed as universal binaries.
Follow the instructions in INSTALL to build perl with support for 64-bit
integers (use64bitint
) or both 64-bit integers and 64-bit addressing
(use64bitall
). In the latter case, the resulting binary will run only
on G5-based hosts.
Support for 64-bit addressing is experimental: some aspects of Perl may be
omitted or buggy. Note the messages output by Configure for further
information. Please use perlbug
to submit a problem report in the
event that you encounter difficulties.
When building 64-bit modules, it is your responsibility to ensure that linked
external libraries and frameworks provide 64-bit support: if they do not,
module building may appear to succeed, but attempts to use the module will
result in run-time dynamic linking errors, and subsequent test failures.
You can use file
to discover the architectures supported by a library:
- $ file libgdbm.3.0.0.dylib
- libgdbm.3.0.0.dylib: Mach-O fat file with 2 architectures
- libgdbm.3.0.0.dylib (for architecture ppc): Mach-O dynamically linked shared library ppc
- libgdbm.3.0.0.dylib (for architecture ppc64): Mach-O 64-bit dynamically linked shared library ppc64
Note that this issue precludes the building of many Macintosh-specific CPAN
modules (Mac::*
), as the required Apple frameworks do not provide PPC64
support. Similarly, downloads from Fink or Darwinports are unlikely to provide
64-bit support; the libraries must be rebuilt from source with the appropriate
compiler and linker flags. For further information, see Apple's
64-Bit Transition Guide at
http://developer.apple.com/documentation/Darwin/Conceptual/64bitPorting/index.html.
Mac OS X ships with a dynamically-loaded libperl, but the default for this release is to compile a static libperl. The reason for this is pre-binding. Dynamic libraries can be pre-bound to a specific address in memory in order to decrease load time. To do this, one needs to be aware of the location and size of all previously-loaded libraries. Apple collects this information as part of their overall OS build process, and thus has easy access to it when building Perl, but ordinary users would need to go to a great deal of effort to obtain the information needed for pre-binding.
You can override the default and build a shared libperl if you wish (Configure ... -Duseshrplib).
With Mac OS X 10.4 "Tiger" and newer, there is almost no performance penalty for non-prebound libraries. Earlier releases will suffer a greater load time than either the static library, or Apple's pre-bound dynamic library.
In a word - don't, at least not without a *very* good reason. Your scripts can just as easily begin with "#!/usr/local/bin/perl" as with "#!/usr/bin/perl". Scripts supplied by Apple and other third parties as part of installation packages and such have generally only been tested with the /usr/bin/perl that's installed by Apple.
If you find that you do need to update the system Perl, one issue worth keeping in mind is the question of static vs. dynamic libraries. If you upgrade using the default static libperl, you will find that the dynamic libperl supplied by Apple will not be deleted. If both libraries are present when an application that links against libperl is built, ld will link against the dynamic library by default. So, if you need to replace Apple's dynamic libperl with a static libperl, you need to be sure to delete the older dynamic library after you've installed the update.
If you have installed extra libraries such as GDBM through Fink (in other words, you have libraries under /sw/lib), or libdlcompat to /usr/local/lib, you may need to be extra careful when running Configure to not to confuse Configure and Perl about which libraries to use. Being confused will show up for example as "dyld" errors about symbol problems, for example during "make test". The safest bet is to run Configure as
- Configure ... -Uloclibpth -Dlibpth=/usr/lib
to make Configure look only into the system libraries. If you have some extra library directories that you really want to use (such as newer Berkeley DB libraries in pre-Panther systems), add those to the libpth:
- Configure ... -Uloclibpth -Dlibpth='/usr/lib /opt/lib'
The default of building Perl statically may cause problems with complex applications like Tk: in that case consider building shared Perl
- Configure ... -Duseshrplib
but remember that there's a startup cost to pay in that case (see above "libperl and Prebinding").
Starting with Tiger (Mac OS X 10.4), Apple shipped broken locale files for the eu_ES locale (Basque-Spain). In previous releases of Perl, this resulted in failures in the lib/locale test. These failures have been suppressed in the current release of Perl by making the test ignore the broken locale. If you need to use the eu_ES locale, you should contact Apple support.
There are two ways to use Cocoa from Perl. Apple's PerlObjCBridge module, included with Mac OS X, can be used by standalone scripts to access Foundation (i.e. non-GUI) classes and objects.
An alternative is CamelBones, a framework that allows access to both Foundation and AppKit classes and objects, so that full GUI applications can be built in Perl. CamelBones can be found on SourceForge, at http://www.sourceforge.net/projects/camelbones/.
Unfortunately it is not that difficult somehow manage to break one's Mac OS X Perl rather severely. If all else fails and you want to really, REALLY, start from scratch and remove even your Apple Perl installation (which has become corrupted somehow), the following instructions should do it. Please think twice before following these instructions: they are much like conducting brain surgery to yourself. Without anesthesia. We will not come to fix your system if you do this.
First, get rid of the libperl.dylib:
- # cd /System/Library/Perl/darwin/CORE
- # rm libperl.dylib
Then delete every .bundle file found anywhere in the folders:
- /System/Library/Perl
- /Library/Perl
You can find them for example by
- # find /System/Library/Perl /Library/Perl -name '*.bundle' -print
After this you can either copy Perl from your operating system media
(you will need at least the /System/Library/Perl and /usr/bin/perl),
or rebuild Perl from the source code with Configure -Dprefix=/usr
-Duseshrplib
NOTE: the -Dprefix=/usr to replace the system Perl
works much better with Perl 5.8.1 and later, in Perl 5.8.0 the
settings were not quite right.
"Pacifist" from CharlesSoft (http://www.charlessoft.com/) is a nice way to extract the Perl binaries from the OS media, without having to reinstall the entire OS.
This README was written by Sherm Pendley <sherm@dot-app.org>, and subsequently updated by Dominic Dunlop <domo@computer.org> and Breno G. de Oliveira <garu@cpan.org>. The "Starting From Scratch" recipe was contributed by John Montbriand <montbriand@apple.com>.
Last modified 2013-04-29.
perlmod - Perl modules (packages and symbol tables)
Perl provides a mechanism for alternative namespaces to protect
packages from stomping on each other's variables. In fact, there's
really no such thing as a global variable in Perl. The package
statement declares the compilation unit as being in the given
namespace. The scope of the package declaration is from the
declaration itself through the end of the enclosing block, eval,
or file, whichever comes first (the same scope as the my() and
local() operators). Unqualified dynamic identifiers will be in
this namespace, except for those few identifiers that if unqualified,
default to the main package instead of the current one as described
below. A package statement affects only dynamic variables--including
those you've used local() on--but not lexical variables created
with my(). Typically it would be the first declaration in a file
included by the do, require, or use operators. You can
switch into a package in more than one place; it merely influences
which symbol table is used by the compiler for the rest of that
block. You can refer to variables and filehandles in other packages
by prefixing the identifier with the package name and a double
colon: $Package::Variable
. If the package name is null, the
main
package is assumed. That is, $::sail
is equivalent to
$main::sail
.
The old package delimiter was a single quote, but double colon is now the
preferred delimiter, in part because it's more readable to humans, and
in part because it's more readable to emacs macros. It also makes C++
programmers feel like they know what's going on--as opposed to using the
single quote as separator, which was there to make Ada programmers feel
like they knew what was going on. Because the old-fashioned syntax is still
supported for backwards compatibility, if you try to use a string like
"This is $owner's house"
, you'll be accessing $owner::s
; that is,
the $s variable in package owner
, which is probably not what you meant.
Use braces to disambiguate, as in "This is ${owner}'s house"
.
Packages may themselves contain package separators, as in
$OUTER::INNER::var
. This implies nothing about the order of
name lookups, however. There are no relative packages: all symbols
are either local to the current package, or must be fully qualified
from the outer package name down. For instance, there is nowhere
within package OUTER
that $INNER::var
refers to
$OUTER::INNER::var
. INNER
refers to a totally
separate global package.
Only identifiers starting with letters (or underscore) are stored
in a package's symbol table. All other symbols are kept in package
main
, including all punctuation variables, like $_. In addition,
when unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV,
ARGVOUT, ENV, INC, and SIG are forced to be in package main
,
even when used for other purposes than their built-in ones. If you
have a package called m, s, or y, then you can't use the
qualified form of an identifier because it would be instead interpreted
as a pattern match, a substitution, or a transliteration.
Variables beginning with underscore used to be forced into package
main, but we decided it was more useful for package writers to be able
to use leading underscore to indicate private variables and method names.
However, variables and functions named with a single _
, such as
$_ and sub _
, are still forced into the package main
. See also
The Syntax of Variable Names in perlvar.
evaled strings are compiled in the package in which the eval() was
compiled. (Assignments to $SIG{}
, however, assume the signal
handler specified is in the main
package. Qualify the signal handler
name if you wish to have a signal handler in a package.) For an
example, examine perldb.pl in the Perl library. It initially switches
to the DB
package so that the debugger doesn't interfere with variables
in the program you are trying to debug. At various points, however, it
temporarily switches back to the main
package to evaluate various
expressions in the context of the main
package (or wherever you came
from). See perldebug.
The special symbol __PACKAGE__
contains the current package, but cannot
(easily) be used to construct variable names.
See perlsub for other scoping issues related to my() and local(), and perlref regarding closures.
The symbol table for a package happens to be stored in the hash of that
name with two colons appended. The main symbol table's name is thus
%main::
, or %::
for short. Likewise the symbol table for the nested
package mentioned earlier is named %OUTER::INNER::
.
The value in each entry of the hash is what you are referring to when you
use the *name
typeglob notation.
- local *main::foo = *main::bar;
You can use this to print out all the variables in a package, for instance. The standard but antiquated dumpvar.pl library and the CPAN module Devel::Symdump make use of this.
The results of creating new symbol table entries directly or modifying any entries that are not already typeglobs are undefined and subject to change between releases of perl.
Assignment to a typeglob performs an aliasing operation, i.e.,
- *dick = *richard;
causes variables, subroutines, formats, and file and directory handles
accessible via the identifier richard
also to be accessible via the
identifier dick
. If you want to alias only a particular variable or
subroutine, assign a reference instead:
- *dick = \$richard;
Which makes $richard and $dick the same variable, but leaves @richard and @dick as separate arrays. Tricky, eh?
There is one subtle difference between the following statements:
- *foo = *bar;
- *foo = \$bar;
*foo = *bar
makes the typeglobs themselves synonymous while
*foo = \$bar
makes the SCALAR portions of two distinct typeglobs
refer to the same scalar value. This means that the following code:
Would print '1', because $foo
holds a reference to the original
$bar
. The one that was stuffed away by local() and which will be
restored when the block ends. Because variables are accessed through the
typeglob, you can use *foo = *bar
to create an alias which can be
localized. (But be aware that this means you can't have a separate
@foo
and @bar
, etc.)
What makes all of this important is that the Exporter module uses glob aliasing as the import/export mechanism. Whether or not you can properly localize a variable that has been exported from a module depends on how it was exported:
- @EXPORT = qw($FOO); # Usual form, can't be localized
- @EXPORT = qw(*FOO); # Can be localized
You can work around the first case by using the fully qualified name
($Package::FOO
) where you need a local value, or by overriding it
by saying *FOO = *Package::FOO
in your script.
The *x = \$y
mechanism may be used to pass and return cheap references
into or from subroutines if you don't want to copy the whole
thing. It only works when assigning to dynamic variables, not
lexicals.
On return, the reference will overwrite the hash slot in the symbol table specified by the *some_hash typeglob. This is a somewhat tricky way of passing around references cheaply when you don't want to have to remember to dereference variables explicitly.
Another use of symbol tables is for making "constant" scalars.
- *PI = \3.14159265358979;
Now you cannot alter $PI
, which is probably a good thing all in all.
This isn't the same as a constant subroutine, which is subject to
optimization at compile-time. A constant subroutine is one prototyped
to take no arguments and to return a constant expression. See
perlsub for details on these. The use constant
pragma is a
convenient shorthand for these.
You can say *foo{PACKAGE}
and *foo{NAME}
to find out what name and
package the *foo symbol table entry comes from. This may be useful
in a subroutine that gets passed typeglobs as arguments:
This prints
- You gave me main::foo
- You gave me bar::baz
The *foo{THING}
notation can also be used to obtain references to the
individual elements of *foo. See perlref.
Subroutine definitions (and declarations, for that matter) need not necessarily be situated in the package whose symbol table they occupy. You can define a subroutine outside its package by explicitly qualifying the name of the subroutine:
This is just a shorthand for a typeglob assignment at compile time:
- BEGIN { *Some_package::foo = sub { ... } }
and is not the same as writing:
In the first two versions, the body of the subroutine is lexically in the main package, not in Some_package. So something like this:
- package main;
- $Some_package::name = "fred";
- $main::name = "barney";
- sub Some_package::foo {
- print "in ", __PACKAGE__, ": \$name is '$name'\n";
- }
- Some_package::foo();
prints:
- in main: $name is 'barney'
rather than:
- in Some_package: $name is 'fred'
This also has implications for the use of the SUPER:: qualifier (see perlobj).
Five specially named code blocks are executed at the beginning and at
the end of a running Perl program. These are the BEGIN
,
UNITCHECK
, CHECK
, INIT
, and END
blocks.
These code blocks can be prefixed with sub to give the appearance of a
subroutine (although this is not considered good style). One should note
that these code blocks don't really exist as named subroutines (despite
their appearance). The thing that gives this away is the fact that you can
have more than one of these code blocks in a program, and they will get
all executed at the appropriate moment. So you can't execute any of
these code blocks by name.
A BEGIN
code block is executed as soon as possible, that is, the moment
it is completely defined, even before the rest of the containing file (or
string) is parsed. You may have multiple BEGIN
blocks within a file (or
eval'ed string); they will execute in order of definition. Because a BEGIN
code block executes immediately, it can pull in definitions of subroutines
and such from other files in time to be visible to the rest of the compile
and run time. Once a BEGIN
has run, it is immediately undefined and any
code it used is returned to Perl's memory pool.
An END
code block is executed as late as possible, that is, after
perl has finished running the program and just before the interpreter
is being exited, even if it is exiting as a result of a die() function.
(But not if it's morphing into another program via exec, or
being blown out of the water by a signal--you have to trap that yourself
(if you can).) You may have multiple END
blocks within a file--they
will execute in reverse order of definition; that is: last in, first
out (LIFO). END
blocks are not executed when you run perl with the
-c
switch, or if compilation fails.
Note that END
code blocks are not executed at the end of a string
eval(): if any END
code blocks are created in a string eval(),
they will be executed just as any other END
code block of that package
in LIFO order just before the interpreter is being exited.
Inside an END
code block, $?
contains the value that the program is
going to pass to exit(). You can modify $?
to change the exit
value of the program. Beware of changing $?
by accident (e.g. by
running something via system).
Inside of a END
block, the value of ${^GLOBAL_PHASE}
will be
"END"
.
UNITCHECK
, CHECK
and INIT
code blocks are useful to catch the
transition between the compilation phase and the execution phase of
the main program.
UNITCHECK
blocks are run just after the unit which defined them has
been compiled. The main program file and each module it loads are
compilation units, as are string evals, run-time code compiled using the
(?{ }) construct in a regex, calls to do FILE
, require FILE
,
and code after the -e
switch on the command line.
BEGIN
and UNITCHECK
blocks are not directly related to the phase of
the interpreter. They can be created and executed during any phase.
CHECK
code blocks are run just after the initial Perl compile phase ends
and before the run time begins, in LIFO order. CHECK
code blocks are used
in the Perl compiler suite to save the compiled state of the program.
Inside of a CHECK
block, the value of ${^GLOBAL_PHASE}
will be
"CHECK"
.
INIT
blocks are run just before the Perl runtime begins execution, in
"first in, first out" (FIFO) order.
Inside of an INIT
block, the value of ${^GLOBAL_PHASE}
will be "INIT"
.
The CHECK
and INIT
blocks in code compiled by require, string do,
or string eval will not be executed if they occur after the end of the
main compilation phase; that can be a problem in mod_perl and other persistent
environments which use those functions to load code at runtime.
When you use the -n and -p switches to Perl, BEGIN
and
END
work just as they do in awk, as a degenerate case.
Both BEGIN
and CHECK
blocks are run when you use the -c
switch for a compile-only syntax check, although your main code
is not.
The begincheck program makes it all clear, eventually:
- #!/usr/bin/perl
- # begincheck
- print "10. Ordinary code runs at runtime.\n";
- END { print "16. So this is the end of the tale.\n" }
- INIT { print " 7. INIT blocks run FIFO just before runtime.\n" }
- UNITCHECK {
- print " 4. And therefore before any CHECK blocks.\n"
- }
- CHECK { print " 6. So this is the sixth line.\n" }
- print "11. It runs in order, of course.\n";
- BEGIN { print " 1. BEGIN blocks run FIFO during compilation.\n" }
- END { print "15. Read perlmod for the rest of the story.\n" }
- CHECK { print " 5. CHECK blocks run LIFO after all compilation.\n" }
- INIT { print " 8. Run this again, using Perl's -c switch.\n" }
- print "12. This is anti-obfuscated code.\n";
- END { print "14. END blocks run LIFO at quitting time.\n" }
- BEGIN { print " 2. So this line comes out second.\n" }
- UNITCHECK {
- print " 3. UNITCHECK blocks run LIFO after each file is compiled.\n"
- }
- INIT { print " 9. You'll see the difference right away.\n" }
- print "13. It merely _looks_ like it should be confusing.\n";
- __END__
There is no special class syntax in Perl, but a package may act as a class if it provides subroutines to act as methods. Such a package may also derive some of its methods from another class (package) by listing the other package name(s) in its global @ISA array (which must be a package global, not a lexical).
For more on this, see perlootut and perlobj.
A module is just a set of related functions in a library file, i.e., a Perl package with the same name as the file. It is specifically designed to be reusable by other modules or programs. It may do this by providing a mechanism for exporting some of its symbols into the symbol table of any package using it, or it may function as a class definition and make its semantics available implicitly through method calls on the class and its objects, without explicitly exporting anything. Or it can do a little of both.
For example, to start a traditional, non-OO module called Some::Module, create a file called Some/Module.pm and start with this template:
- package Some::Module; # assumes Some/Module.pm
- use strict;
- use warnings;
- BEGIN {
- require Exporter;
- # set the version for version checking
- our $VERSION = 1.00;
- # Inherit from Exporter to export functions and variables
- our @ISA = qw(Exporter);
- # Functions and variables which are exported by default
- our @EXPORT = qw(func1 func2);
- # Functions and variables which can be optionally exported
- our @EXPORT_OK = qw($Var1 %Hashit func3);
- }
- # exported package globals go here
- our $Var1 = '';
- our %Hashit = ();
- # non-exported package globals go here
- # (they are still accessible as $Some::Module::stuff)
- our @more = ();
- our $stuff = '';
- # file-private lexicals go here, before any functions which use them
- my $priv_var = '';
- my %secret_hash = ();
- # here's a file-private function as a closure,
- # callable as $priv_func->();
- my $priv_func = sub {
- ...
- };
- # make all your functions, whether exported or not;
- # remember to put something interesting in the {} stubs
- sub func1 { ... }
- sub func2 { ... }
- # this one isn't exported, but could be called directly
- # as Some::Module::func3()
- sub func3 { ... }
- END { ... } # module clean-up code here (global destructor)
- 1; # don't forget to return a true value from the file
Then go on to declare and use your variables in functions without any qualifications. See Exporter and the perlmodlib for details on mechanics and style issues in module creation.
Perl modules are included into your program by saying
- use Module;
or
- use Module LIST;
This is exactly equivalent to
or
As a special case
- use Module ();
is exactly equivalent to
All Perl module files have the extension .pm. The use operator
assumes this so you don't have to spell out "Module.pm" in quotes.
This also helps to differentiate new modules from old .pl and
.ph files. Module names are also capitalized unless they're
functioning as pragmas; pragmas are in effect compiler directives,
and are sometimes called "pragmatic modules" (or even "pragmata"
if you're a classicist).
The two statements:
differ from each other in two ways. In the first case, any double
colons in the module name, such as Some::Module
, are translated
into your system's directory separator, usually "/". The second
case does not, and would have to be specified literally. The other
difference is that seeing the first require clues in the compiler
that uses of indirect object notation involving "SomeModule", as
in $ob = purge SomeModule
, are method calls, not function calls.
(Yes, this really can make a difference.)
Because the use statement implies a BEGIN
block, the importing
of semantics happens as soon as the use statement is compiled,
before the rest of the file is compiled. This is how it is able
to function as a pragma mechanism, and also how modules are able to
declare subroutines that are then visible as list or unary operators for
the rest of the current file. This will not work if you use require
instead of use. With require you can get into this problem:
In general, use Module ()
is recommended over require Module
,
because it determines module availability at compile time, not in the
middle of your program's execution. An exception would be if two modules
each tried to use each other, and each also called a function from
that other module. In that case, it's easy to use require instead.
Perl packages may be nested inside other package names, so we can have
package names containing ::
. But if we used that package name
directly as a filename it would make for unwieldy or impossible
filenames on some systems. Therefore, if a module's name is, say,
Text::Soundex
, then its definition is actually found in the library
file Text/Soundex.pm.
Perl modules always have a .pm file, but there may also be
dynamically linked executables (often ending in .so) or autoloaded
subroutine definitions (often ending in .al) associated with the
module. If so, these will be entirely transparent to the user of
the module. It is the responsibility of the .pm file to load
(or arrange to autoload) any additional functionality. For example,
although the POSIX module happens to do both dynamic loading and
autoloading, the user can say just use POSIX
to get it all.
Perl supports a type of threads called interpreter threads (ithreads). These threads can be used explicitly and implicitly.
Ithreads work by cloning the data tree so that no data is shared
between different threads. These threads can be used by using the threads
module or by doing fork() on win32 (fake fork() support). When a
thread is cloned all Perl data is cloned, however non-Perl data cannot
be cloned automatically. Perl after 5.8.0 has support for the CLONE
special subroutine. In CLONE
you can do whatever
you need to do,
like for example handle the cloning of non-Perl data, if necessary.
CLONE
will be called once as a class method for every package that has it
defined (or inherits it). It will be called in the context of the new thread,
so all modifications are made in the new area. Currently CLONE is called with
no parameters other than the invocant package name, but code should not assume
that this will remain unchanged, as it is likely that in future extra parameters
will be passed in to give more information about the state of cloning.
If you want to CLONE all objects you will need to keep track of them per package. This is simply done using a hash and Scalar::Util::weaken().
Perl after 5.8.7 has support for the CLONE_SKIP
special subroutine.
Like CLONE
, CLONE_SKIP
is called once per package; however, it is
called just before cloning starts, and in the context of the parent
thread. If it returns a true value, then no objects of that class will
be cloned; or rather, they will be copied as unblessed, undef values.
For example: if in the parent there are two references to a single blessed
hash, then in the child there will be two references to a single undefined
scalar value instead.
This provides a simple mechanism for making a module threadsafe; just add
sub CLONE_SKIP { 1 }
at the top of the class, and DESTROY()
will
now only be called once per object. Of course, if the child thread needs
to make use of the objects, then a more sophisticated approach is
needed.
Like CLONE
, CLONE_SKIP
is currently called with no parameters other
than the invocant package name, although that may change. Similarly, to
allow for future expansion, the return value should be a single 0
or
1
value.
See perlmodlib for general style issues related to building Perl modules and classes, as well as descriptions of the standard library and CPAN, Exporter for how Perl's standard import/export mechanism works, perlootut and perlobj for in-depth information on creating classes, perlobj for a hard-core reference document on objects, perlsub for an explanation of functions and scoping, and perlxstut and perlguts for more information on writing extension modules.
perlmodinstall - Installing CPAN Modules
You can think of a module as the fundamental unit of reusable Perl code; see perlmod for details. Whenever anyone creates a chunk of Perl code that they think will be useful to the world, they register as a Perl developer at http://www.cpan.org/modules/04pause.html so that they can then upload their code to the CPAN. The CPAN is the Comprehensive Perl Archive Network and can be accessed at http://www.cpan.org/ , and searched at http://search.cpan.org/ .
This documentation is for people who want to download CPAN modules and install them on their own computer.
First, are you sure that the module isn't already on your system? Try
perl -MFoo -e 1
. (Replace "Foo" with the name of the module; for
instance, perl -MCGI::Carp -e 1
.)
If you don't see an error message, you have the module. (If you do
see an error message, it's still possible you have the module, but
that it's not in your path, which you can display with perl -e
"print qq(@INC)"
.) For the remainder of this document, we'll assume
that you really honestly truly lack an installed module, but have
found it on the CPAN.
So now you have a file ending in .tar.gz (or, less often, .zip). You know there's a tasty module inside. There are four steps you must now take:
Here's how to perform each step for each operating system. This is <not> a substitute for reading the README and INSTALL files that might have come with your module!
Also note that these instructions are tailored for installing the
module into your system's repository of Perl modules, but you can
install modules into any directory you wish. For instance, where I
say perl Makefile.PL
, you can substitute perl Makefile.PL
PREFIX=/my/perl_directory to install the modules into
/my/perl_directory. Then you can use the modules from your Perl
programs with use lib "/my/perl_directory/lib/site_perl";
or
sometimes just use "/my/perl_directory";
. If you're on a system
that requires superuser/root access to install modules into the
directories you see when you type perl -e "print qq(@INC)"
, you'll
want to install them into a local directory (such as your home
directory) and use this approach.
If you're on a Unix or Unix-like system,
You can use Andreas Koenig's CPAN module ( http://www.cpan.org/modules/by-module/CPAN ) to automate the following steps, from DECOMPRESS through INSTALL.
A. DECOMPRESS
Decompress the file with gzip -d yourmodule.tar.gz
You can get gzip from ftp://prep.ai.mit.edu/pub/gnu/
Or, you can combine this step with the next to save disk space:
- gzip -dc yourmodule.tar.gz | tar -xof -
B. UNPACK
Unpack the result with tar -xof yourmodule.tar
C. BUILD
Go into the newly-created directory and type:
- perl Makefile.PL
- make test
or
- perl Makefile.PL PREFIX=/my/perl_directory
to install it locally. (Remember that if you do this, you'll have to
put use lib "/my/perl_directory";
near the top of the program that
is to use this module.
D. INSTALL
While still in that directory, type:
- make install
Make sure you have the appropriate permissions to install the module in your Perl 5 library directory. Often, you'll need to be root.
That's all you need to do on Unix systems with dynamic linking. Most Unix systems have dynamic linking. If yours doesn't, or if for another reason you have a statically-linked perl, and the module requires compilation, you'll need to build a new Perl binary that includes the module. Again, you'll probably need to be root.
If you're running ActivePerl (Win95/98/2K/NT/XP, Linux, Solaris),
First, type ppm
from a shell and see whether ActiveState's PPM
repository has your module. If so, you can install it with ppm
and
you won't have to bother with any of the other steps here. You might
be able to use the CPAN instructions from the "Unix or Linux" section
above as well; give it a try. Otherwise, you'll have to follow the
steps below.
- A. DECOMPRESS
You can use the shareware Winzip ( http://www.winzip.com ) to decompress and unpack modules.
- B. UNPACK
If you used WinZip, this was already done for you.
- C. BUILD
You'll need the nmake
utility, available at
http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/nmake15.exe
or dmake, available on CPAN.
http://search.cpan.org/dist/dmake/
Does the module require compilation (i.e. does it have files that end in .xs, .c, .h, .y, .cc, .cxx, or .C)? If it does, life is now officially tough for you, because you have to compile the module yourself (no easy feat on Windows). You'll need a compiler such as Visual C++. Alternatively, you can download a pre-built PPM package from ActiveState. http://aspn.activestate.com/ASPN/Downloads/ActivePerl/PPM/
Go into the newly-created directory and type:
- perl Makefile.PL
- nmake test
- D. INSTALL
While still in that directory, type:
- nmake install
If you're using a Macintosh with "Classic" MacOS and MacPerl,
A. DECOMPRESS
First, make sure you have the latest cpan-mac distribution ( http://www.cpan.org/authors/id/CNANDOR/ ), which has utilities for doing all of the steps. Read the cpan-mac directions carefully and install it. If you choose not to use cpan-mac for some reason, there are alternatives listed here.
After installing cpan-mac, drop the module archive on the untarzipme droplet, which will decompress and unpack for you.
Or, you can either use the shareware StuffIt Expander program ( http://my.smithmicro.com/mac/stuffit/ ) or the freeware MacGzip program ( http://persephone.cps.unizar.es/general/gente/spd/gzip/gzip.html ).
B. UNPACK
If you're using untarzipme or StuffIt, the archive should be extracted now. Or, you can use the freeware suntar or Tar ( http://hyperarchive.lcs.mit.edu/HyperArchive/Archive/cmp/ ).
C. BUILD
Check the contents of the distribution. Read the module's documentation, looking for reasons why you might have trouble using it with MacPerl. Look for .xs and .c files, which normally denote that the distribution must be compiled, and you cannot install it "out of the box." (See PORTABILITY.)
D. INSTALL
If you are using cpan-mac, just drop the folder on the installme droplet, and use the module.
Or, if you aren't using cpan-mac, do some manual labor.
Make sure the newlines for the modules are in Mac format, not Unix format. If they are not then you might have decompressed them incorrectly. Check your decompression and unpacking utilities settings to make sure they are translating text files properly.
As a last resort, you can use the perl one-liner:
- perl -i.bak -pe 's/(?:\015)?\012/\015/g' <filenames>
on the source files.
Then move the files (probably just the .pm files, though there
may be some additional ones, too; check the module documentation)
to their final destination: This will
most likely be in $ENV{MACPERL}site_lib: (i.e.,
HD:MacPerl folder:site_lib:). You can add new paths to
the default @INC
in the Preferences menu item in the
MacPerl application ($ENV{MACPERL}site_lib: is added
automagically). Create whatever directory structures are required
(i.e., for Some::Module
, create
$ENV{MACPERL}site_lib:Some: and put
Module.pm
in that directory).
Then run the following script (or something like it):
If you're on the DJGPP port of DOS,
- A. DECOMPRESS
djtarx ( ftp://ftp.delorie.com/pub/djgpp/current/v2/ ) will both uncompress and unpack.
- B. UNPACK
See above.
- C. BUILD
Go into the newly-created directory and type:
- perl Makefile.PL
- make test
You will need the packages mentioned in README.dos in the Perl distribution.
- D. INSTALL
While still in that directory, type:
- make install
You will need the packages mentioned in README.dos in the Perl distribution.
If you're on OS/2,
Get the EMX development suite and gzip/tar, from either Hobbes ( http://hobbes.nmsu.edu ) or Leo ( http://www.leo.org ), and then follow the instructions for Unix.
If you're on VMS,
When downloading from CPAN, save your file with a .tgz
extension instead of .tar.gz. All other periods in the
filename should be replaced with underscores. For example,
Your-Module-1.33.tar.gz
should be downloaded as
Your-Module-1_33.tgz.
A. DECOMPRESS
Type
- gzip -d Your-Module.tgz
or, for zipped modules, type
- unzip Your-Module.zip
Executables for gzip, zip, and VMStar:
- http://www.hp.com/go/openvms/freeware/
and their source code:
- http://www.fsf.org/order/ftp.html
Note that GNU's gzip/gunzip is not the same as Info-ZIP's zip/unzip package. The former is a simple compression tool; the latter permits creation of multi-file archives.
B. UNPACK
If you're using VMStar:
- VMStar xf Your-Module.tar
Or, if you're fond of VMS command syntax:
- tar/extract/verbose Your_Module.tar
C. BUILD
Make sure you have MMS (from Digital) or the freeware MMK ( available from MadGoat at http://www.madgoat.com ). Then type this to create the DESCRIP.MMS for the module:
- perl Makefile.PL
Now you're ready to build:
- mms test
Substitute mmk
for mms
above if you're using MMK.
D. INSTALL
Type
- mms install
Substitute mmk
for mms
above if you're using MMK.
If you're on MVS,
Introduce the .tar.gz file into an HFS as binary; don't translate from ASCII to EBCDIC.
A. DECOMPRESS
Decompress the file with gzip -d yourmodule.tar.gz
You can get gzip from http://www.s390.ibm.com/products/oe/bpxqp1.html
B. UNPACK
Unpack the result with
- pax -o to=IBM-1047,from=ISO8859-1 -r < yourmodule.tar
The BUILD and INSTALL steps are identical to those for Unix. Some modules generate Makefiles that work better with GNU make, which is available from http://www.mks.com/s390/gnu/
Note that not all modules will work with on all platforms. See perlport for more information on portability issues. Read the documentation to see if the module will work on your system. There are basically three categories of modules that will not work "out of the box" with all platforms (with some possibility of overlap):
Those that should, but don't. These need to be fixed; consider contacting the author and possibly writing a patch.
Those that need to be compiled, where the target platform doesn't have compilers readily available. (These modules contain .xs or .c files, usually.) You might be able to find existing binaries on the CPAN or elsewhere, or you might want to try getting compilers and building it yourself, and then release the binary for other poor souls to use.
Those that are targeted at a specific platform. (Such as the Win32:: modules.) If the module is targeted specifically at a platform other than yours, you're out of luck, most likely.
Check the CPAN Testers if a module should work with your platform but it doesn't behave as you'd expect, or you aren't sure whether or not a module will work under your platform. If the module you want isn't listed there, you can test it yourself and let CPAN Testers know, you can join CPAN Testers, or you can request it be tested.
- http://testers.cpan.org/
If you have any suggested changes for this page, let me know. Please don't send me mail asking for help on how to install your modules. There are too many modules, and too few Orwants, for me to be able to answer or even acknowledge all your questions. Contact the module author instead, or post to comp.lang.perl.modules, or ask someone familiar with Perl on your operating system.
Jon Orwant
orwant@medita.mit.edu
with invaluable help from Chris Nandor, and valuable help from Brandon Allbery, Charles Bailey, Graham Barr, Dominic Dunlop, Jarkko Hietaniemi, Ben Holzman, Tom Horsley, Nick Ing-Simmons, Tuomas J. Lukka, Laszlo Molnar, Alan Olsen, Peter Prymmer, Gurusamy Sarathy, Christoph Spalinger, Dan Sugalski, Larry Virden, and Ilya Zakharevich.
First version July 22, 1998; last revised November 21, 2001.
Copyright (C) 1998, 2002, 2003 Jon Orwant. All Rights Reserved.
This document may be distributed under the same terms as Perl itself.
perlmodlib - constructing new Perl modules and finding existing ones
Many modules are included in the Perl distribution. These are described below, and all end in .pm. You may discover compiled library files (usually ending in .so) or small pieces of modules to be autoloaded (ending in .al); these were automatically generated by the installation process. You may also discover files in the library directory that end in either .pl or .ph. These are old libraries supplied so that old programs that use them still run. The .pl files will all eventually be converted into standard modules, and the .ph files made by h2ph will probably end up as extension modules made by h2xs. (Some .ph values may already be available through the POSIX, Errno, or Fcntl modules.) The pl2pm file in the distribution may help in your conversion, but it's just a mechanical process and therefore far from bulletproof.
They work somewhat like compiler directives (pragmata) in that they
tend to affect the compilation of your program, and thus will usually
work well only when used within a use, or no. Most of these
are lexically scoped, so an inner BLOCK may countermand them
by saying:
which lasts until the end of that BLOCK.
Some pragmas are lexically scoped--typically those that affect the
$^H
hints variable. Others affect the current package instead,
like use vars
and use subs
, which allow you to predeclare a
variables or subroutines within a particular file rather than
just a block. Such declarations are effective for the entire file
for which they were declared. You cannot rescind them with no
vars
or no subs
.
The following pragmas are defined (and have their own documentation).
Set indexing base via $[
Get/set subroutine or variable attributes
Replace functions with ones that succeed or die with lexical scope
Exceptions from autodying functions.
Exceptions from autodying system().
Provide hints about user subroutines to autodie
Postpone load of modules until a function is used
Establish an ISA relationship with base classes at compile time
Transparent BigInteger support for Perl
Transparent BigNumber support for Perl
Transparent BigNumber/BigRational support for Perl
Use MakeMaker's uninstalled version of a package
Force byte semantics rather than character semantics
Access to Unicode character names and named character sequences; also define character names
Declare constants
Perl pragma for deprecating the core version of a module
Produce verbose warning diagnostics
Allows you to write your script in non-ascii or non-utf8
Warn on implicit encoding conversions
Enable new features
Compile-time class fields
Control the filetest permission operators
use a Perl module if a condition holds
Use modules bundled in inc/ if they are newer than installed ones
Use integer arithmetic instead of floating point
Request less of something
Manipulate @INC at compile time
Use or avoid POSIX locales for built-in operations
Method Resolution Order
Set default PerlIO layers for input and output
Restrict unsafe operations when compiling
Package for overloading Perl operations
Lexically control overloading
Establish an ISA relationship with base classes at compile time
Look up Perl documentation in Pod format.
Frequently asked questions about Perl
General Questions About Perl
Obtaining and Learning about Perl
Programming Tools
Data Manipulation
Files and Formats
Regular Expressions
General Perl Language Issues
System Interaction
Web, Email and Networking
Perl builtin functions
Perl Glossary
Plain Old Documentation: format specification and notes
Perl predefined variables
XS language reference manual
Tutorial for writing XSUBs
Perl XS C/Perl type mapping
Alter regular expression behaviour
Enable simple signal handling
Control sort() behaviour
Restrict unsafe constructs
Predeclare sub names
Perl interpreter-based threads
Perl extension for sharing data structures between threads
Enable/disable UTF-8 (or UTF-EBCDIC) in source code
Predeclare global variable names
Perl extension for Version Objects
Control VMS-specific language features
Control optional warnings
Warnings import function
Standard, bundled modules are all expected to behave in a well-defined manner with respect to namespace pollution because they use the Exporter module. See their own documentation for details.
It's possible that not all modules listed below are installed on your system. For example, the GDBM_File module will not be installed if you don't have the gdbm library.
Provide framework for multiple DBMs
Easily interact with CPAN from the command line
Implements the prove
command.
State storage for the prove
command.
Individual test suite results.
Individual test results.
A generic archive extracting mechanism
Module for manipulations of tar archives
A subclass for in-memory extracted file from Archive::Tar
Simpler definition of attribute handlers
Load subroutines only on demand
Split a package for autoloading
The Perl Compiler Backend
Walk Perl syntax tree, printing concise info about ops
Walk Perl syntax tree, printing debug info about ops
Perl compiler backend to produce perl code
Perl lint
Adds debugging stringification to B::
Show lexical variables used in functions or files
Walk Perl syntax tree, printing terse info about ops
Generates cross reference reports for Perl programs
Benchmark running times of Perl code
Socket
Networking constants and support functions
Handle Common Gateway Interface requests and responses
Backward compatibility module for CGI.pm
CGI routines for writing to the HTTPD (or other) error log
Interface to HTTP Cookies
CGI Interface for Fast CGI
Module to produce nicely formatted HTML code
Simple Interface to Server Push
Backward compatibility module for defunct CGI::Switch
Internal utilities used by CGI module
Namespace for Perl's core routines
Query, download and build perl modules from CPAN sites
A recipe book for programming with CPAN.pm
Internal debugging for CPAN.pm
Read and match distroprefs
Utility for CPAN::Config file Initialization
Internal configuration handling for CPAN.pm
Interface between CPAN.pm and Kwalify.pm
The distribution metadata for a CPAN dist
Convert CPAN distribution metadata structures
An optional feature provided by a CPAN distribution
History of CPAN Meta Spec changes
A set of distribution prerequisites by phase and type
A set of version requirements for a CPAN dist
Specification for CPAN distribution metadata
Validate CPAN distribution metadata structures
Read and write a subset of YAML for CPAN Meta files
Wrapper around CPAN.pm without using any XS module
Internal queue support for CPAN.pm
Internal handling of tar archives for CPAN.pm
Utility functions to compare CPAN versions
API & CLI access to the CPAN mirrors
Programmer's interface to CPANPLUS
Return value objects
Configuration defaults and heuristics for CPANPLUS
Set the environment for the CPANPLUS base dir
Configuration for CPANPLUS
Base class for plugins
Distribution class for installation snapshots
Base class for custom distribution classes
CPANPLUS plugin to install packages that use Build.PL
Constants for CPANPLUS::Dist::Build
Distribution class for MakeMaker related modules
Sample code to create your own Dist::* plugin
Error handling for CPANPLUS
CPANPLUS Frequently Asked Questions
Developing CPANPLUS
CPANPLUS internals
Internals for archive extraction
Internals for fetching files
Internals for sending test reports
Internals for searching for modules
Internals for updating source files
In memory implementation
SQLite implementation
Convenience functions for CPANPLUS
CPAN module objects for CPANPLUS
CPAN author object for CPANPLUS
Dummy author object for CPANPLUS
Checking the checksum of a distribution
Fake module object for internal use
Self-updating for CPANPLUS
Base class for CPANPLUS shells
CPAN.pm emulation for CPANPLUS
The default CPANPLUS shell
Add custom sources to CPANPLUS
Documentation on how to write your own plugins
Connect to a remote CPANPLUS
Read in CPANPLUS commands
Alternative warn and die for modules
Declare struct-like datatypes as Perl classes
Low-Level Interface to bzip2 compression library
Low-Level Interface to zlib compression library
Interface to zlib compression library
Access Perl configuration information
Structured data retrieval of perl -V output
Get pathname of current working directory
Programmatic interface to the Perl debugging API
Filter DBM keys/values
Filter for DBM_Filter
Filter for DBM_Filter
Filter for DBM_Filter
Filter for DBM_Filter
Filter for DBM_Filter
Perl5 access to Berkeley DB version 1.x
Stringified perl data structures, suitable for both printing and eval
Find all the inner packages of a package
Perl/Pollution/Portability
A data debugging tool for the XS programmer
Generate stubs for a SelfLoading module
Modules that calculate message digests
Perl interface to the MD5 Algorithm
Perl extension for SHA-1/224/256/384/512
Digest base class
Calculate digests of files
Supply object methods for directory handles
Provides screen dump of Perl data.
Dynamically load C libraries into Perl code
Character encodings in Perl
Alias definitions to encodings
Single Byte Encodings
Internally used by Encode::??::ISO_2022_*
China-based Chinese Encodings
Internally used by Encode::CN
Internally used by Encode
EBCDIC Encodings
Object Oriented Encoder
Encode Implementation Base Class
ESTI GSM 03.38 Encoding
Guesses encoding from data
Japanese Encodings
Internally used by Encode::JP::2022_JP*
Internally used by Encode::JP
Korean Encodings
Internally used by Encode::KR
MIME 'B' and 'Q' header encoding
Internally used by Encode
A detailed document on Encode and PerlIO
Encodings supported by Encode
Symbol Encodings
Taiwan-based Chinese Encodings
Various Unicode Transformation Formats
UTF-7 encoding
Use nice English (or awk) names for ugly punctuation variables
Perl module that imports environment variables as scalars or arrays
System errno constants
Implements default import method for modules
Exporter guts
Compile and link C code for Perl modules
Builder class for Windows platforms
Utilities to replace common UNIX commands in Makefiles etc.
Commands for the MM's to use in Makefiles
Generate XS code to import C header constants
Base class for ExtUtils::Constant objects
Helper functions for ExtUtils::Constant
Generate C code for XS modules' constants.
Utilities for embedding Perl in C/C++ applications
Install files from here to there
Inventory management of installed modules
Determine libraries to use and how to use them
OS adjusted ExtUtils::MakeMaker subclass
AIX specific subclass of ExtUtils::MM_Unix
Platform-agnostic MM methods
Methods to override UN*X behaviour in ExtUtils::MakeMaker
Methods to override UN*X behaviour in ExtUtils::MakeMaker
DOS specific subclass of ExtUtils::MM_Unix
Special behaviors for OS X
Once produced Makefiles for MacOS Classic
Methods to override UN*X behaviour in ExtUtils::MakeMaker
Methods to override UN*X behaviour in ExtUtils::MakeMaker
QNX specific subclass of ExtUtils::MM_Unix
U/WIN specific subclass of ExtUtils::MM_Unix
Methods used by ExtUtils::MakeMaker
Methods to override UN*X behaviour in ExtUtils::MakeMaker
VOS specific subclass of ExtUtils::MM_Unix
Methods to override UN*X behaviour in ExtUtils::MakeMaker
Method to customize MakeMaker for Win9X
ExtUtils::MakeMaker subclass for customization
Create a module Makefile
Wrapper around Config.pm
Frequently Asked Questions About MakeMaker
Writing a module with MakeMaker
Utilities to write and check a MANIFEST file
Make a bootstrap file for use by DynaLoader
Write linker options files for dynamic extension
Manage .packlist files
Converts Perl XS code into C code
Initialization values for some globals
Subroutines used with ExtUtils::ParseXS
Read/Write/Modify Perl/XS typemap files
Quick commands for handling typemaps
Entry in the INPUT section of a typemap
Entry in the OUTPUT section of a typemap
Entry in the TYPEMAP section of a typemap
Keep sets of symbol names palatable to the VMS linker
Add blib/* directories to @INC
Replace functions with equivalents which succeed or die
Load the C Fcntl.h defines
Parse file paths into directory, filename and suffix.
Run many filetest checks on a tree
Compare files or filehandles
Copy files or filehandles
DOS like globbing and then some
A generic file fetching mechanism
Traverse a directory tree.
Perl extension for BSD glob routine
Extend File Glob to Allow Input and Output Files
Create or remove directory trees
Portably perform operations on file names
Methods for Cygwin file specs
Methods for Epoc file specs
Portably perform operations on file names
File::Spec for Mac OS (Classic)
Methods for OS/2 file specs
File::Spec for Unix, base for other File::Spec modules
Methods for VMS file specs
Methods for Win32 file specs
Return name and handle of a temporary file safely
By-name interface to Perl's built-in stat() functions
Keep more files open than the system permits
Supply object methods for filehandles
Simplified source filtering
Perl Source Filter Utility Module
Locate directory of original perl script
Perl5 access to the gdbm library.
Extended processing of command line options
Process single-character switches with switch clustering
A small, simple, correct HTTP/1.1 client
A selection of general-utility hash subroutines
Support for Inside-Out Classes
Compare 8-bit scalar data according to the current locale
Functions for dealing with RFC3066-style language tags
Detect the user's language preferences
Tags and names for human languages
Query locale information
Load various IO modules
Base Class for IO::Compress modules
Write bzip2 files/buffers
Write RFC 1950 files/buffers
Frequently Asked Questions about IO::Compress
Write RFC 1952 files/buffers
Write RFC 1951 files/buffers
Write zip files/buffers
Supply object methods for directory handles
Supply object methods for filehandles
Supply object methods for I/O handles
Supply object methods for pipes
Object interface to system poll call
Supply seek based methods for I/O objects
OO interface to the select system call
Object interface to socket communications
Object interface for AF_INET domain sockets
Object interface for AF_UNIX domain sockets
Uncompress zlib-based (zip, gzip) file/buffer
Uncompress gzip, zip, bzip2 or lzop file/buffer
Base Class for IO::Uncompress modules
Read bzip2 files/buffers
Read RFC 1952 files/buffers
Read RFC 1950 files/buffers
Read RFC 1951 files/buffers
Read zip files/buffers
IO:: style interface to Compress::Zlib
Finding and running system commands made easy
SysV Msg IPC object class
Open a process for both reading and writing using open2()
Open a process for reading, writing, and error handling using open3()
SysV Semaphore IPC object class
SysV Shared Memory IPC object class
System V IPC constants and system calls
JSON::XS compatible pure-Perl module.
Dummy module providing JSON::PP::Boolean
A selection of general-utility list subroutines
Indicate if List::Util was compiled with a C compiler
A distribution of modules to handle locale codes
A description of the callable function in each module
Details changes to Locale::Codes
Constants for Locale codes
Standard codes for country identification
Country codes for the Locale::Codes::Country module
Retired country codes for the Locale::Codes::Country module
Standard codes for currency identification
Currency codes for the Locale::Codes::Currency module
Retired currency codes for the Locale::Codes::Currency module
Standard codes for language extension identification
Langext codes for the Locale::Codes::LangExt module
Retired langext codes for the Locale::Codes::LangExt module
Standard codes for language extension identification
Langfam codes for the Locale::Codes::LangFam module
Retired langfam codes for the Locale::Codes::LangFam module
Standard codes for language variation identification
Langvar codes for the Locale::Codes::LangVar module
Retired langvar codes for the Locale::Codes::LangVar module
Standard codes for language identification
Language codes for the Locale::Codes::Language module
Retired language codes for the Locale::Codes::Language module
Standard codes for script identification
Script codes for the Locale::Codes::Script module
Retired script codes for the Locale::Codes::Script module
Standard codes for country identification
Standard codes for currency identification
Standard codes for language identification
Framework for localization
Recipes for using Locale::Maketext
Deprecated module to load Locale::Maketext utf8 code
Deprecated module to load Locale::Maketext utf8 code
Simple interface to Locale::Maketext::Lexicon
Article about software localization
Standard codes for script identification
A generic message storing mechanism;
Configuration options for Log::Message
Message handlers for Log::Message
Message objects for Log::Message
Simplified interface to Log::Message
Encoding and decoding of base64 strings
Encoding and decoding of quoted-printable strings
Arbitrary size floating point math package
Arbitrary size integer/float math package
Pure Perl module to support Math::BigInt
Emulate low-level math with BigInt code
Math::BigInt::Calc with some XS for more speed
Arbitrary big rational numbers
Complex numbers and associated mathematical functions
Trigonometric functions
Make functions faster by trading space for time
Glue to provide EXISTS for AnyDBM_File for Storable use
Plug-in module for automatic expiration of memoized values
Test for Memoize expiration semantics
Test for Memoize expiration semantics
Glue to provide EXISTS for NDBM_File for Storable use
Glue to provide EXISTS for SDBM_File for Storable use
Store Memoized data in Storable database
Build and install Perl modules
API Reference for Module Authors
Authoring Module::Build modules
Default methods for Module::Build
How to bundle Module::Build with a distribution
Compatibility with ExtUtils::MakeMaker
Configuration for Module::Build
Examples of Module::Build Usage
DEPRECATED
Create persistent distribution configuration modules
Perl Package Manager file creation
Builder class for Amiga platforms
Stub class for unknown platforms
Builder class for EBCDIC platforms
Builder class for MPEiX platforms
Builder class for MacOS platforms
Builder class for RiscOS platforms
Builder class for Unix platforms
Builder class for VMS platforms
Builder class for VOS platforms
Builder class for Windows platforms
Builder class for AIX platform
Builder class for Cygwin platform
Builder class for Mac OS X platform
Builder class for OS/2 platform
DEPRECATED
DEPRECATED
What modules shipped with versions of perl
What utilities shipped with versions of perl
Runtime require of both modules and files
Looking up module information / loading at runtime
Mark modules as loaded or unloaded
Gather package and POD information from perl module files
Automatically give your module the ability to have plugins
Automatically give your module the ability to have plugins
Tied access to ndbm files
Provide a pseudo-class NEXT (et al) that allows method redispatch
Network Command class (as used by FTP, SMTP etc)
Local configuration data for libnet
Attempt to evaluate the current host's internet name and domain
FTP Client class
NNTP Client class
OO interface to users netrc file
Post Office Protocol 3 Client class (RFC1939)
Check a remote host for reachability
Simple Mail Transfer Protocol Client
Time and daytime network client interface
By-name interface to Perl's built-in gethost*() functions
Libnet Frequently Asked Questions
By-name interface to Perl's built-in getnet*() functions
By-name interface to Perl's built-in getproto*() functions
By-name interface to Perl's built-in getserv*() functions
Generic interface to Perl Compiler backends
Tied access to odbm files
Interface to create per object accessors
Disable named opcodes when compiling perl code
Perl interface to IEEE Std 1003.1
List all constants declared in a package
A generic input parsing/checking mechanism.
Parse META.yml and META.json CPAN metadata files
Map Perl operating system names to generic types
On demand loader for PerlIO layers and root of PerlIO::* name space
Encoding layer
Memory mapped IO
In-memory IO, scalar IO
Helper class for PerlIO layers implemented in perl
PerlIO layer for quoted-printable strings
For resolving Pod E<...> sequences
Group Perl's functions a la perlfunc.pod
Module to convert pod files to HTML
Convert Pod data to formatted Latex
Convert POD data to formatted *roff input
Parse an L<> formatting code in POD text
Look up Perl documentation in Pod format.
Base for Pod::Perldoc formatters
Customized option parser for Pod::Perldoc
Render Pod with ANSI color escapes
Let Perldoc check Pod for errors
Let Perldoc render Pod as man pages
Let Perldoc convert Pod to nroff
Let Perldoc render Pod as ... Pod!
Let Perldoc render Pod as RTF
Render Pod with terminal escapes
Let Perldoc render Pod as plaintext
Let Perldoc use Tk::Pod to render Pod
Let Perldoc render Pod as XML
Framework for parsing Pod
Check the Pod syntax of a document
Put Pod::Simple into trace/debug mode
Dump Pod-parsing events as text
Turn Pod into XML
Convert Pod to HTML
Convert several Pod files to several HTML files
Represent "section" attributes of L codes
Turn Pod::Simple events into method calls
A pull-parser interface to parsing Pod
End-tokens from Pod::Simple::PullParser
Start-tokens from Pod::Simple::PullParser
Text-tokens from Pod::Simple::PullParser
Tokens from Pod::Simple::PullParser
Format Pod as RTF
Find POD documents in directory trees
Parse Pod into a simple parse tree
Write a formatter as a Pod::Simple subclass
Format Pod as plaintext
Get the text content of Pod
Format Pod as validating XHTML
Turn Pod into XML
Convert POD data to formatted ASCII text
Convert POD data to formatted color ASCII text
Convert POD data to ASCII text with format escapes
Tied access to sdbm files
Compile and execute code in restricted compartments
A selection of general-utility scalar subroutines
Look - search for key in dictionary file
Save and restore selected file handle
Load functions only on demand
Persistence for Perl data structures
Manipulate Perl symbols and their names
Try every conceivable way to get hostname
Perl interface to the UNIX syslog(3) calls
Win32 support for Sys::Syslog
Base class that provides common functionality to TAP::Parser
Base class for harness output delegates
Run Perl test scripts with color
Harness output delegate for default console output
Harness output delegate for parallel console output
Harness output delegate for default console output
Harness output delegate for file output
Harness output delegate for file output
Abstract base class for harness output delegate
Run test scripts with statistics
Base class that provides common functionality to all TAP::*
modules
Parse TAP output
Aggregate TAP::Parser results
A grammar for the Test Anything Protocol.
Base class for TAP source iterators
Iterator for array-based TAP sources
Iterator for process-based TAP sources
Iterator for filehandle-based TAP sources
Figures out which SourceHandler objects to use for a given Source
Multiplex multiple TAP::Parsers
Base class for TAP::Parser output objects
Bailout result token.
Comment result token.
Plan result token.
TAP pragma token.
Test result token.
Unknown result token.
TAP syntax version token.
YAML result token.
Factory for creating TAP::Parser output objects
Schedule tests during parallel testing
A single testing job.
A no-op job.
A TAP source & meta data about it
Base class for different TAP source handlers
Stream output from an executable TAP source
Stream TAP from a text file.
Stream TAP from an IO::Handle or a GLOB.
Stream TAP from a Perl executable
Stream output from raw TAP in a scalar/array ref.
Internal TAP::Parser utilities
Read YAMLish data from iterator
Write YAMLish data
Color screen output using ANSI escape sequences
Perl termcap interface
Perl word completion module
Perl interface to various readline packages.
Term::ReadLine UI made easy
History function
Provides a simple framework for writing test scripts
Backend for building test libraries
Base class for test modules
Test testsuites that have been built with
Turn on colour in Test::Builder::Tester
Run Perl standard test scripts with statistics
Yet another framework for writing test scripts
Basic utilities for writing tests.
A tutorial about writing really basic tests
Abbrev - create an abbreviation table from a list
Extract delimited text sequences from strings.
Parse text into an array of tokens or array of arrays
Implementation of the soundex algorithm.
Expand and unexpand tabs like unix expand(1) and unexpand(1)
Line wrapping to form simple paragraphs
Manipulate threads in Perl (for old code only)
Thread-safe queues
Thread-safe semaphores
Base class for tied arrays
Access the lines of a disk file via a Perl array
Base class definitions for tied handles
Base class definitions for tied hashes
Named regexp capture buffers
Add data to hash when needed
Use references as hash keys
Base class definitions for tied scalars
Base class definitions for tied handles
Fixed-table-size, fixed-key-length hashing
High resolution alarm, sleep, gettimeofday, interval timers
Efficiently compute time from local and GMT time
Object Oriented time objects
A simple API to convert seconds to other date values
By-name interface to Perl's built-in gmtime() function
By-name interface to Perl's built-in localtime() function
Internal object used by Time::gmtime and Time::localtime
Base class for ALL classes (blessed references)
Unicode Collation Algorithm
Weighting CJK Unified Ideographs
Weighting CJK Unified Ideographs
Weighting JIS KANJI for Unicode::Collate
Weighting CJK Unified Ideographs
Weighting CJK Unified Ideographs
Weighting CJK Unified Ideographs
Weighting CJK Unified Ideographs
Linguistic tailoring for DUCET via Unicode::Collate
Unicode Normalization Forms
Unicode character database
By-name interface to Perl's built-in getgr*() functions
By-name interface to Perl's built-in getpw*() functions
Perl extension to manipulate DCL symbols
Standard I/O functions via VMS extensions
Interfaces to some Win32 API Functions
Low-level access to Win32 system API calls for files/dirs.
Win32 CORE function stubs
Test the perl C API
Module to test the XS typemaps distributed with perl
Dynamically load C libraries into Perl code
Perl extension for Version Objects
To find out all modules installed on your system, including those without documentation or outside the standard release, just use the following command (under the default win32 shell, double quotes should be used instead of single quotes).
- % perl -MFile::Find=find -MFile::Spec::Functions -Tlwe \
- 'find { wanted => sub { print canonpath $_ if /\.pm\z/ },
- no_chdir => 1 }, @INC'
(The -T is here to prevent '.' from being listed in @INC.) They should all have their own documentation installed and accessible via your system man(1) command. If you do not have a find program, you can use the Perl find2perl program instead, which generates Perl code as output you can run through perl. If you have a man program but it doesn't find your modules, you'll have to fix your manpath. See perl for details. If you have no system man command, you might try the perldoc program.
Note also that the command perldoc perllocal
gives you a (possibly
incomplete) list of the modules that have been further installed on
your system. (The perllocal.pod file is updated by the standard MakeMaker
install process.)
Extension modules are written in C (or a mix of Perl and C). They are usually dynamically loaded into Perl if and when you need them, but may also be linked in statically. Supported extension modules include Socket, Fcntl, and POSIX.
Many popular C extension modules do not come bundled (at least, not completely) due to their sizes, volatility, or simply lack of time for adequate testing and configuration across the multitude of platforms on which Perl was beta-tested. You are encouraged to look for them on CPAN (described below), or using web search engines like Alta Vista or Google.
CPAN stands for Comprehensive Perl Archive Network; it's a globally replicated trove of Perl materials, including documentation, style guides, tricks and traps, alternate ports to non-Unix systems and occasional binary distributions for these. Search engines for CPAN can be found at http://www.cpan.org/
Most importantly, CPAN includes around a thousand unbundled modules, some of which require a C compiler to build. Major categories of modules are:
Language Extensions and Documentation Tools
Development Support
Operating System Interfaces
Networking, Device Control (modems) and InterProcess Communication
Data Types and Data Type Utilities
Database Interfaces
User Interfaces
Interfaces to / Emulations of Other Programming Languages
File Names, File Systems and File Locking (see also File Handles)
String Processing, Language Text Processing, Parsing, and Searching
Option, Argument, Parameter, and Configuration File Processing
Internationalization and Locale
Authentication, Security, and Encryption
World Wide Web, HTML, HTTP, CGI, MIME
Server and Daemon Utilities
Archiving and Compression
Images, Pixmap and Bitmap Manipulation, Drawing, and Graphing
Mail and Usenet News
Control Flow Utilities (callbacks and exceptions etc)
File Handle and Input/Output Stream Utilities
Miscellaneous Modules
The list of the registered CPAN sites follows. Please note that the sorting order is alphabetical on fields:
Continent | |-->Country | |-->[state/province] | |-->ftp | |-->[http]
and thus the North American servers happen to be listed between the European and the South American sites.
Registered CPAN sites
- http://cpan.mirror.ac.za/
- ftp://cpan.mirror.ac.za/
- http://mirror.is.co.za/pub/cpan/
- ftp://ftp.is.co.za/pub/cpan/
- ftp://ftp.saix.net/pub/CPAN/
- http://cpan.wenzk.com/
- http://ftp.cuhk.edu.hk/pub/packages/perl/CPAN/
- ftp://ftp.cuhk.edu.hk/pub/packages/perl/CPAN/
- http://mirrors.geoexpat.com/cpan/
- http://perlmirror.indialinks.com/
- http://cpan.biz.net.id/
- http://komo.vlsm.org/CPAN/
- ftp://komo.vlsm.org/CPAN/
- http://cpan.cermin.lipi.go.id/
- ftp://cermin.lipi.go.id/pub/CPAN/
- http://cpan.pesat.net.id/
- ftp://ftp.u-aizu.ac.jp/pub/CPAN
- ftp://ftp.kddilabs.jp/CPAN/
- http://ftp.nara.wide.ad.jp/pub/CPAN/
- ftp://ftp.nara.wide.ad.jp/pub/CPAN/
- http://ftp.jaist.ac.jp/pub/CPAN/
- ftp://ftp.jaist.ac.jp/pub/CPAN/
- ftp://ftp.dti.ad.jp/pub/lang/CPAN/
- ftp://ftp.ring.gr.jp/pub/lang/perl/CPAN/
- http://ftp.riken.jp/lang/CPAN/
- ftp://ftp.riken.jp/lang/CPAN/
- http://ftp.yz.yamagata-u.ac.jp/pub/lang/cpan/
- ftp://ftp.yz.yamagata-u.ac.jp/pub/lang/cpan/
- http://ftp.kaist.ac.kr/pub/CPAN
- ftp://ftp.kaist.ac.kr/pub/CPAN
- http://cpan.mirror.cdnetworks.com/
- ftp://cpan.mirror.cdnetworks.com/CPAN/
- http://cpan.sarang.net/
- ftp://cpan.sarang.net/CPAN/
- http://cpan.tomsk.ru/
- ftp://cpan.tomsk.ru/
- http://mirror.averse.net/pub/CPAN
- ftp://mirror.averse.net/pub/CPAN
- http://cpan.mirror.choon.net/
- http://cpan.oss.eznetsols.org
- ftp://ftp.oss.eznetsols.org/cpan
- http://ftp.cse.yzu.edu.tw/pub/CPAN/
- ftp://ftp.cse.yzu.edu.tw/pub/CPAN/
- http://cpan.nctu.edu.tw/
- ftp://cpan.nctu.edu.tw/
- ftp://ftp.ncu.edu.tw/CPAN/
- http://cpan.cdpa.nsysu.edu.tw/
- ftp://cpan.cdpa.nsysu.edu.tw/Unix/Lang/CPAN/
- http://cpan.stu.edu.tw
- ftp://ftp.stu.edu.tw/CPAN
- http://ftp.stu.edu.tw/CPAN
- ftp://ftp.stu.edu.tw/pub/CPAN
- http://cpan.cs.pu.edu.tw/
- ftp://cpan.cs.pu.edu.tw/pub/CPAN
- http://mirrors.issp.co.th/cpan/
- ftp://mirrors.issp.co.th/cpan/
- http://mirror.yourconnect.com/CPAN/
- ftp://mirror.yourconnect.com/CPAN/
- http://cpan.gazi.edu.tr/
- http://cpan.inode.at/
- ftp://cpan.inode.at
- http://gd.tuwien.ac.at/languages/perl/CPAN/
- ftp://gd.tuwien.ac.at/pub/CPAN/
- http://ftp.belnet.be/mirror/ftp.cpan.org/
- ftp://ftp.belnet.be/mirror/ftp.cpan.org/
- http://ftp.easynet.be/pub/CPAN/
- http://cpan.weepee.org/
- http://cpan.blic.net/
- http://cpan.cbox.biz/
- ftp://cpan.cbox.biz/cpan/
- http://cpan.digsys.bg/
- ftp://ftp.digsys.bg/pub/CPAN
- http://ftp.carnet.hr/pub/CPAN/
- ftp://ftp.carnet.hr/pub/CPAN/
- ftp://ftp.fi.muni.cz/pub/CPAN/
- http://archive.cpan.cz/
- http://mirrors.dotsrc.org/cpan
- ftp://mirrors.dotsrc.org/cpan/
- http://www.cpan.dk/
- http://mirror.uni-c.dk/pub/CPAN/
- ftp://ftp.funet.fi/pub/languages/perl/CPAN/
- http://mirror.eunet.fi/CPAN
- http://cpan.enstimac.fr/
- ftp://ftp.inria.fr/pub/CPAN/
- http://distrib-coffee.ipsl.jussieu.fr/pub/mirrors/cpan/
- ftp://distrib-coffee.ipsl.jussieu.fr/pub/mirrors/cpan/
- ftp://ftp.lip6.fr/pub/perl/CPAN/
- http://mir2.ovh.net/ftp.cpan.org
- ftp://mir1.ovh.net/ftp.cpan.org
- ftp://ftp.oleane.net/pub/CPAN/
- http://ftp.crihan.fr/mirrors/ftp.cpan.org/
- ftp://ftp.crihan.fr/mirrors/ftp.cpan.org/
- http://ftp.u-strasbg.fr/CPAN
- ftp://ftp.u-strasbg.fr/CPAN
- http://cpan.cict.fr/
- ftp://cpan.cict.fr/pub/CPAN/
- ftp://ftp.fu-berlin.de/unix/languages/perl/
- http://mirrors.softliste.de/cpan/
- ftp://ftp.rub.de/pub/CPAN/
- http://www.planet-elektronik.de/CPAN/
- http://ftp.hosteurope.de/pub/CPAN/
- ftp://ftp.hosteurope.de/pub/CPAN/
- http://www.mirrorspace.org/cpan/
- http://mirror.netcologne.de/cpan/
- ftp://mirror.netcologne.de/cpan/
- ftp://ftp.freenet.de/pub/ftp.cpan.org/pub/CPAN/
- http://ftp-stud.hs-esslingen.de/pub/Mirrors/CPAN/
- ftp://ftp-stud.hs-esslingen.de/pub/Mirrors/CPAN/
- http://mirrors.zerg.biz/cpan/
- http://ftp.gwdg.de/pub/languages/perl/CPAN/
- ftp://ftp.gwdg.de/pub/languages/perl/CPAN/
- http://dl.ambiweb.de/mirrors/ftp.cpan.org/
- http://cpan.mirror.clusters.kg/
- http://cpan.mirror.iphh.net/
- ftp://cpan.mirror.iphh.net/pub/CPAN/
- http://cpan.mirroring.de/
- http://mirror.informatik.uni-mannheim.de/pub/mirrors/CPAN/
- ftp://mirror.informatik.uni-mannheim.de/pub/mirrors/CPAN/
- http://www.chemmedia.de/mirrors/CPAN/
- http://ftp.cw.net/pub/CPAN/
- ftp://ftp.cw.net/pub/CPAN/
- http://cpan.cpantesters.org/
- ftp://cpan.cpantesters.org/CPAN/
- http://cpan.mirrored.de/
- ftp://mirror.petamem.com/CPAN/
- http://cpan.noris.de/
- ftp://cpan.noris.de/pub/CPAN/
- ftp://ftp.mpi-sb.mpg.de/pub/perl/CPAN/
- ftp://ftp.gmd.de/mirrors/CPAN/
- ftp://ftp.forthnet.gr/pub/languages/perl/CPAN
- ftp://ftp.ntua.gr/pub/lang/perl/
- http://cpan.cc.uoc.gr/
- ftp://ftp.cc.uoc.gr/mirrors/CPAN/
- http://cpan.mirrors.enexis.hu/
- ftp://cpan.mirrors.enexis.hu/mirrors/cpan/
- http://cpan.hu/
- http://ftp.rhnet.is/pub/CPAN/
- ftp://ftp.rhnet.is/pub/CPAN/
- http://ftp.esat.net/pub/languages/perl/CPAN/
- ftp://ftp.esat.net/pub/languages/perl/CPAN/
- http://ftp.heanet.ie/mirrors/ftp.perl.org/pub/CPAN
- ftp://ftp.heanet.ie/mirrors/ftp.perl.org/pub/CPAN
- http://bo.mirror.garr.it/mirrors/CPAN/
- http://cpan.panu.it/
- ftp://ftp.panu.it/pub/mirrors/perl/CPAN/
- http://kvin.lv/pub/CPAN/
- http://cpan.waldonet.net.mt/
- ftp://ftp.quicknet.nl/pub/CPAN/
- http://mirror.hostfuss.com/CPAN/
- ftp://mirror.hostfuss.com/CPAN/
- http://mirrors3.kernel.org/cpan/
- ftp://mirrors3.kernel.org/pub/CPAN/
- http://cpan.mirror.versatel.nl/
- ftp://ftp.mirror.versatel.nl/cpan/
- ftp://download.xs4all.nl/pub/mirror/CPAN/
- http://mirror.leaseweb.com/CPAN/
- ftp://mirror.leaseweb.com/CPAN/
- ftp://ftp.cpan.nl/pub/CPAN/
- http://archive.cs.uu.nl/mirror/CPAN/
- ftp://ftp.cs.uu.nl/mirror/CPAN/
- http://luxitude.net/cpan/
- http://piotrkosoft.net/pub/mirrors/CPAN/
- ftp://ftp.piotrkosoft.net/pub/mirrors/CPAN/
- http://ftp.man.poznan.pl/pub/CPAN
- ftp://ftp.man.poznan.pl/pub/CPAN
- ftp://ftp.ps.pl/pub/CPAN/
- ftp://sunsite.icm.edu.pl/pub/CPAN/
- ftp://ftp.tpnet.pl/d4/CPAN/
- http://ftp.astral.ro/pub/CPAN/
- ftp://ftp.astral.ro/pub/CPAN/
- ftp://ftp.lug.ro/CPAN
- http://mirrors.xservers.ro/CPAN/
- http://mirrors.hostingromania.ro/ftp.cpan.org/
- ftp://ftp.hostingromania.ro/mirrors/ftp.cpan.org/
- ftp://ftp.iasi.roedu.net/pub/mirrors/ftp.cpan.org/
- ftp://ftp.aha.ru/CPAN/
- http://cpan.rinet.ru/
- ftp://cpan.rinet.ru/pub/mirror/CPAN/
- ftp://ftp.SpringDaemons.com/pub/CPAN/
- http://mirror.rol.ru/CPAN/
- http://ftp.silvernet.ru/CPAN/
- http://ftp.spbu.ru/CPAN/
- ftp://ftp.spbu.ru/CPAN/
- http://cpan.fyxm.net/
- http://www.klevze.si/cpan
- http://osl.ugr.es/CPAN/
- ftp://ftp.rediris.es/mirror/CPAN/
- http://ftp.gui.uva.es/sites/cpan.org/
- ftp://ftp.gui.uva.es/sites/cpan.org/
- http://mirrors4.kernel.org/cpan/
- ftp://mirrors4.kernel.org/pub/CPAN/
- http://cpan.mirror.solnet.ch/
- ftp://ftp.solnet.ch/mirror/CPAN/
- ftp://ftp.adwired.ch/CPAN/
- http://mirror.switch.ch/ftp/mirror/CPAN/
- ftp://mirror.switch.ch/mirror/CPAN/
- http://cpan.makeperl.org/
- ftp://cpan.makeperl.org/pub/CPAN
- http://cpan.org.ua/
- http://cpan.gafol.net/
- ftp://ftp.gafol.net/pub/cpan/
- http://www.mirrorservice.org/sites/ftp.funet.fi/pub/languages/perl/CPAN/
- ftp://ftp.mirrorservice.org/sites/ftp.funet.fi/pub/languages/perl/CPAN/
- http://mirror.tje.me.uk/pub/mirrors/ftp.cpan.org/
- ftp://mirror.tje.me.uk/pub/mirrors/ftp.cpan.org/
- http://www.mirror.8086.net/sites/CPAN/
- ftp://ftp.mirror.8086.net/sites/CPAN/
- http://cpan.mirror.anlx.net/
- ftp://ftp.mirror.anlx.net/CPAN/
- http://mirror.bytemark.co.uk/CPAN/
- ftp://mirror.bytemark.co.uk/CPAN/
- http://cpan.etla.org/
- ftp://cpan.etla.org/pub/CPAN
- ftp://ftp.demon.co.uk/pub/CPAN/
- http://mirror.sov.uk.goscomb.net/CPAN/
- ftp://mirror.sov.uk.goscomb.net/pub/CPAN/
- http://ftp.plig.net/pub/CPAN/
- ftp://ftp.plig.net/pub/CPAN/
- http://ftp.ticklers.org/pub/CPAN/
- ftp://ftp.ticklers.org/pub/CPAN/
- http://cpan.mirrors.uk2.net/
- ftp://mirrors.uk2.net/pub/CPAN/
- http://mirror.ox.ac.uk/sites/www.cpan.org/
- ftp://mirror.ox.ac.uk/sites/www.cpan.org/
- http://www.securehost.com/mirror/CPAN/
- http://cpan.arcticnetwork.ca
- ftp://mirror.arcticnetwork.ca/pub/CPAN
- http://cpan.sunsite.ualberta.ca/
- ftp://cpan.sunsite.ualberta.ca/pub/CPAN/
- http://theoryx5.uwinnipeg.ca/pub/CPAN/
- ftp://theoryx5.uwinnipeg.ca/pub/CPAN/
- http://arwen.cs.dal.ca/mirror/CPAN/
- ftp://arwen.cs.dal.ca/pub/mirror/CPAN/
- http://CPAN.mirror.rafal.ca/
- ftp://CPAN.mirror.rafal.ca/pub/CPAN/
- ftp://ftp.nrc.ca/pub/CPAN/
- http://mirror.csclub.uwaterloo.ca/pub/CPAN/
- ftp://mirror.csclub.uwaterloo.ca/pub/CPAN/
- http://www.msg.com.mx/CPAN/
- ftp://ftp.msg.com.mx/pub/CPAN/
- http://mirror.hiwaay.net/CPAN/
- ftp://mirror.hiwaay.net/CPAN/
- http://cpan.ezarticleinformation.com/
- http://cpan.knowledgematters.net/
- http://cpan.binkerton.com/
- http://cpan.develooper.com/
- http://mirrors.gossamer-threads.com/CPAN
- http://cpan.schatt.com/
- http://mirrors.kernel.org/cpan/
- ftp://mirrors.kernel.org/pub/CPAN
- http://mirrors2.kernel.org/cpan/
- ftp://mirrors2.kernel.org/pub/CPAN/
- http://cpan.mirror.facebook.net/
- http://mirrors1.kernel.org/cpan/
- ftp://mirrors1.kernel.org/pub/CPAN/
- http://cpan-sj.viaverio.com/
- ftp://cpan-sj.viaverio.com/pub/CPAN/
- http://www.perl.com/CPAN/
- ftp://ftp.cise.ufl.edu/pub/mirrors/CPAN/
- http://mirror.atlantic.net/pub/CPAN/
- ftp://mirror.atlantic.net/pub/CPAN/
- http://mirror.its.uidaho.edu/pub/cpan/
- ftp://mirror.its.uidaho.edu/cpan/
- http://cpan.mirrors.hoobly.com/
- http://cpan.uchicago.edu/pub/CPAN/
- ftp://cpan.uchicago.edu/pub/CPAN/
- http://mirrors.servercentral.net/CPAN/
- http://www.stathy.com/CPAN/
- ftp://www.stathy.com/CPAN/
- ftp://ftp.uwsg.iu.edu/pub/perl/CPAN/
- http://cpan.netnitco.net/
- ftp://cpan.netnitco.net/pub/mirrors/CPAN/
- http://ftp.ndlug.nd.edu/pub/perl/
- ftp://ftp.ndlug.nd.edu/pub/perl/
- http://mirrors.ccs.neu.edu/CPAN/
- http://ftp.wayne.edu/cpan/
- ftp://ftp.wayne.edu/cpan/
- http://cpan.msi.umn.edu/
- http://mirror.datapipe.net/CPAN/
- ftp://mirror.datapipe.net/pub/CPAN/
- http://mirrors.24-7-solutions.net/pub/CPAN/
- ftp://mirrors.24-7-solutions.net/pub/CPAN/
- http://mirror.cc.columbia.edu/pub/software/cpan/
- ftp://mirror.cc.columbia.edu/pub/software/cpan/
- http://cpan.belfry.net/
- http://cpan.erlbaum.net/
- ftp://cpan.erlbaum.net/CPAN/
- http://cpan.hexten.net/
- ftp://cpan.hexten.net/
- ftp://mirror.nyi.net/CPAN/
- http://mirror.rit.edu/CPAN/
- ftp://mirror.rit.edu/CPAN/
- http://www.ibiblio.org/pub/mirrors/CPAN
- ftp://ftp.ncsu.edu/pub/mirror/CPAN/
- http://ftp.osuosl.org/pub/CPAN/
- ftp://ftp.osuosl.org/pub/CPAN/
- http://ftp.epix.net/CPAN/
- ftp://ftp.epix.net/pub/languages/perl/
- http://cpan.pair.com/
- ftp://cpan.pair.com/pub/CPAN/
- http://cpan.mirror.clemson.edu/
- http://mira.sunsite.utk.edu/CPAN/
- http://mirror.uta.edu/CPAN
- ftp://mirror.xmission.com/CPAN/
- http://cpan-du.viaverio.com/
- ftp://cpan-du.viaverio.com/pub/CPAN/
- http://perl.secsup.org/
- ftp://perl.secsup.org/pub/perl/
- ftp://mirror.cogentco.com/pub/CPAN/
- http://cpan.llarian.net/
- ftp://cpan.llarian.net/pub/CPAN/
- ftp://ftp-mirror.internap.com/pub/CPAN/
- http://cpan.mirrors.tds.net
- ftp://cpan.mirrors.tds.net/pub/CPAN
- http://mirror.sit.wisc.edu/pub/CPAN/
- ftp://mirror.sit.wisc.edu/pub/CPAN/
- http://mirror.internode.on.net/pub/cpan/
- ftp://mirror.internode.on.net/pub/cpan/
- http://cpan.mirror.aussiehq.net.au/
- http://mirror.as24220.net/cpan/
- ftp://mirror.as24220.net/cpan/
- ftp://ftp.auckland.ac.nz/pub/perl/CPAN/
- http://cpan.inspire.net.nz
- ftp://cpan.inspire.net.nz/cpan
- http://cpan.catalyst.net.nz/CPAN/
- ftp://cpan.catalyst.net.nz/pub/CPAN/
- http://cpan.patan.com.ar/
- http://cpan.localhost.net.ar
- ftp://mirrors.localhost.net.ar/pub/mirrors/CPAN
- ftp://cpan.pop-mg.com.br/pub/CPAN/
- http://ftp.pucpr.br/CPAN
- ftp://ftp.pucpr.br/CPAN
- http://cpan.kinghost.net/
- http://cpan.dcc.uchile.cl/
- ftp://cpan.dcc.uchile.cl/pub/lang/cpan/
- http://www.laqee.unal.edu.co/CPAN/
- mirror.as24220.net::cpan
- cpan.inode.at::CPAN
- gd.tuwien.ac.at::CPAN
- ftp.belnet.be::packages/cpan
- rsync.linorg.usp.br::CPAN
- rsync.arcticnetwork.ca::CPAN
- CPAN.mirror.rafal.ca::CPAN
- mirror.csclub.uwaterloo.ca::CPAN
- theoryx5.uwinnipeg.ca::CPAN
- www.laqee.unal.edu.co::CPAN
- mirror.uni-c.dk::CPAN
- rsync.nic.funet.fi::CPAN
- rsync://distrib-coffee.ipsl.jussieu.fr/pub/mirrors/cpan/
- mir1.ovh.net::CPAN
- miroir-francais.fr::cpan
- ftp.crihan.fr::CPAN
- rsync://mirror.cict.fr/cpan/
- rsync://mirror.netcologne.de/cpan/
- ftp-stud.hs-esslingen.de::CPAN/
- ftp.gwdg.de::FTP/languages/perl/CPAN/
- cpan.mirror.iphh.net::CPAN
- cpan.cpantesters.org::cpan
- cpan.hu::CPAN
- komo.vlsm.org::CPAN
- mirror.unej.ac.id::cpan
- ftp.esat.net::/pub/languages/perl/CPAN
- ftp.heanet.ie::mirrors/ftp.perl.org/pub/CPAN
- rsync.panu.it::CPAN
- cpan.fastbull.org::CPAN
- ftp.kddilabs.jp::cpan
- ftp.nara.wide.ad.jp::cpan/
- rsync://ftp.jaist.ac.jp/pub/CPAN/
- rsync://ftp.riken.jp/cpan/
- mirror.linuxiso.kz::CPAN
- rsync://mirrors3.kernel.org/mirrors/CPAN/
- rsync://rsync.osmirror.nl/cpan/
- mirror.leaseweb.com::CPAN
- cpan.nautile.nc::CPAN
- mirror.icis.pcz.pl::CPAN
- piotrkosoft.net::mirrors/CPAN
- rsync://cpan.perl.pt/
- ftp.kaist.ac.kr::cpan
- cpan.sarang.net::CPAN
- mirror.averse.net::cpan
- rsync.oss.eznetsols.org
- mirror.ac.za::cpan
- ftp.is.co.za::IS-Mirror/ftp.cpan.org/
- rsync://ftp.gui.uva.es/cpan/
- rsync://mirrors4.kernel.org/mirrors/CPAN/
- ftp.solnet.ch::CPAN
- ftp.ulak.net.tr::CPAN
- gafol.net::cpan
- rsync.mirrorservice.org::ftp.funet.fi/pub/
- rsync://rsync.mirror.8086.net/CPAN/
- rsync.mirror.anlx.net::CPAN
- mirror.bytemark.co.uk::CPAN
- ftp.plig.net::CPAN
- rsync://ftp.ticklers.org:CPAN/
- mirrors.ibiblio.org::CPAN
- cpan-du.viaverio.com::CPAN
- mirror.hiwaay.net::CPAN
- rsync://mira.sunsite.utk.edu/CPAN/
- cpan.mirrors.tds.net::CPAN
- mirror.its.uidaho.edu::cpan
- rsync://mirror.cc.columbia.edu::cpan/
- ftp.fxcorporate.com::CPAN
- rsync.atlantic.net::CPAN
- mirrors.kernel.org::mirrors/CPAN
- rsync://mirrors2.kernel.org/mirrors/CPAN/
- cpan.pair.com::CPAN
- rsync://mirror.rit.edu/CPAN/
- rsync://mirror.facebook.net/cpan/
- rsync://mirrors1.kernel.org/mirrors/CPAN/
- cpan-sj.viaverio.com::CPAN
For an up-to-date listing of CPAN sites, see http://www.cpan.org/SITES or ftp://www.cpan.org/SITES .
(The following section is borrowed directly from Tim Bunce's modules file, available at your nearest CPAN site.)
Perl implements a class using a package, but the presence of a package doesn't imply the presence of a class. A package is just a namespace. A class is a package that provides subroutines that can be used as methods. A method is just a subroutine that expects, as its first argument, either the name of a package (for "static" methods), or a reference to something (for "virtual" methods).
A module is a file that (by convention) provides a class of the same name (sans the .pm), plus an import method in that class that can be called to fetch exported symbols. This module may implement some of its methods by loading dynamic C or C++ objects, but that should be totally transparent to the user of the module. Likewise, the module might set up an AUTOLOAD function to slurp in subroutine definitions on demand, but this is also transparent. Only the .pm file is required to exist. See perlsub, perlobj, and AutoLoader for details about the AUTOLOAD mechanism.
Do similar modules already exist in some form?
If so, please try to reuse the existing modules either in whole or by inheriting useful features into a new class. If this is not practical try to get together with the module authors to work on extending or enhancing the functionality of the existing modules. A perfect example is the plethora of packages in perl4 for dealing with command line options.
If you are writing a module to expand an already existing set of modules, please coordinate with the author of the package. It helps if you follow the same naming scheme and module interaction scheme as the original author.
Try to design the new module to be easy to extend and reuse.
Try to use warnings;
(or use warnings qw(...);
).
Remember that you can add no warnings qw(...);
to individual blocks
of code that need less warnings.
Use blessed references. Use the two argument form of bless to bless into the class name given as the first parameter of the constructor, e.g.,:
or even this if you'd like it to be used as either a static or a virtual method.
Pass arrays as references so more parameters can be added later (it's also faster). Convert functions into methods where appropriate. Split large methods into smaller more flexible ones. Inherit methods from other modules if appropriate.
Avoid class name tests like: die "Invalid" unless ref $ref eq 'FOO'
.
Generally you can delete the eq 'FOO'
part with no harm at all.
Let the objects look after themselves! Generally, avoid hard-wired
class names as far as possible.
Avoid $r->Class::func()
where using @ISA=qw(... Class ...)
and
$r->func()
would work.
Use autosplit so little used or newly added functions won't be a burden to programs that don't use them. Add test functions to the module after __END__ either using AutoSplit or by saying:
Does your module pass the 'empty subclass' test? If you say
@SUBCLASS::ISA = qw(YOURCLASS);
your applications should be able
to use SUBCLASS in exactly the same way as YOURCLASS. For example,
does your application still work if you change: $obj = YOURCLASS->new();
into: $obj = SUBCLASS->new();
?
Avoid keeping any state information in your packages. It makes it difficult for multiple other packages to use yours. Keep state information in objects.
Always use -w.
Try to use strict;
(or use strict qw(...);
).
Remember that you can add no strict qw(...);
to individual blocks
of code that need less strictness.
Always use -w.
Follow the guidelines in perlstyle.
Always use -w.
Some simple style guidelines
The perlstyle manual supplied with Perl has many helpful points.
Coding style is a matter of personal taste. Many people evolve their style over several years as they learn what helps them write and maintain good code. Here's one set of assorted suggestions that seem to be widely used by experienced developers:
Use underscores to separate words. It is generally easier to read $var_names_like_this than $VarNamesLikeThis, especially for non-native speakers of English. It's also a simple rule that works consistently with VAR_NAMES_LIKE_THIS.
Package/Module names are an exception to this rule. Perl informally reserves lowercase module names for 'pragma' modules like integer and strict. Other modules normally begin with a capital letter and use mixed case with no underscores (need to be short and portable).
You may find it helpful to use letter case to indicate the scope or nature of a variable. For example:
- $ALL_CAPS_HERE constants only (beware clashes with Perl vars)
- $Some_Caps_Here package-wide global/static
- $no_caps_here function scope my() or local() variables
Function and method names seem to work best as all lowercase.
e.g., $obj->as_string()
.
You can use a leading underscore to indicate that a variable or function should not be used outside the package that defined it.
Select what to export.
Do NOT export method names!
Do NOT export anything else by default without a good reason!
Exports pollute the namespace of the module user. If you must export try to use @EXPORT_OK in preference to @EXPORT and avoid short or common names to reduce the risk of name clashes.
Generally anything not exported is still accessible from outside the
module using the ModuleName::item_name (or $blessed_ref->method
)
syntax. By convention you can use a leading underscore on names to
indicate informally that they are 'internal' and not for public use.
(It is actually possible to get private functions by saying:
my $subref = sub { ... }; &$subref;
. But there's no way to call that
directly as a method, because a method must have a name in the symbol
table.)
As a general rule, if the module is trying to be object oriented then export nothing. If it's just a collection of functions then @EXPORT_OK anything but use @EXPORT with caution.
Select a name for the module.
This name should be as descriptive, accurate, and complete as possible. Avoid any risk of ambiguity. Always try to use two or more whole words. Generally the name should reflect what is special about what the module does rather than how it does it. Please use nested module names to group informally or categorize a module. There should be a very good reason for a module not to have a nested name. Module names should begin with a capital letter.
Having 57 modules all called Sort will not make life easy for anyone (though having 23 called Sort::Quick is only marginally better :-). Imagine someone trying to install your module alongside many others. If in any doubt ask for suggestions in comp.lang.perl.misc.
If you are developing a suite of related modules/classes it's good practice to use nested classes with a common prefix as this will avoid namespace clashes. For example: Xyz::Control, Xyz::View, Xyz::Model etc. Use the modules in this list as a naming guide.
If adding a new module to a set, follow the original author's standards for naming modules and the interface to methods in those modules.
If developing modules for private internal or project specific use, that will never be released to the public, then you should ensure that their names will not clash with any future public module. You can do this either by using the reserved Local::* category or by using a category name that includes an underscore like Foo_Corp::*.
To be portable each component of a module name should be limited to 11 characters. If it might be used on MS-DOS then try to ensure each is unique in the first 8 characters. Nested modules make this easier.
Have you got it right?
How do you know that you've made the right decisions? Have you picked an interface design that will cause problems later? Have you picked the most appropriate name? Do you have any questions?
The best way to know for sure, and pick up many helpful suggestions, is to ask someone who knows. Comp.lang.perl.misc is read by just about all the people who develop modules and it's the best place to ask.
All you need to do is post a short summary of the module, its purpose and interfaces. A few lines on each of the main methods is probably enough. (If you post the whole module it might be ignored by busy people - generally the very people you want to read it!)
Don't worry about posting if you can't say when the module will be ready - just say so in the message. It might be worth inviting others to help you, they may be able to complete it for you!
README and other Additional Files.
It's well known that software developers usually fully document the software they write. If, however, the world is in urgent need of your software and there is not enough time to write the full documentation please at least provide a README file containing:
A description of the module/package/extension etc.
A copyright notice - see below.
Prerequisites - what else you may need to have.
How to build it - possible changes to Makefile.PL etc.
How to install it.
Recent changes in this release, especially incompatibilities
Changes / enhancements you plan to make in the future.
If the README file seems to be getting too large you may wish to split out some of the sections into separate files: INSTALL, Copying, ToDo etc.
Adding a Copyright Notice.
How you choose to license your work is a personal decision. The general mechanism is to assert your Copyright and then make a declaration of how others may copy/use/modify your work.
Perl, for example, is supplied with two types of licence: The GNU GPL and The Artistic Licence (see the files README, Copying, and Artistic, or perlgpl and perlartistic). Larry has good reasons for NOT just using the GNU GPL.
My personal recommendation, out of respect for Larry, Perl, and the Perl community at large is to state something simply like:
- Copyright (c) 1995 Your Name. All rights reserved.
- This program is free software; you can redistribute it and/or
- modify it under the same terms as Perl itself.
This statement should at least appear in the README file. You may also wish to include it in a Copying file and your source files. Remember to include the other words in addition to the Copyright.
Give the module a version/issue/release number.
To be fully compatible with the Exporter and MakeMaker modules you
should store your module's version number in a non-my package
variable called $VERSION. This should be a positive floating point
number with at least two digits after the decimal (i.e., hundredths,
e.g, $VERSION = "0.01"
). Don't use a "1.3.2" style version.
See Exporter for details.
It may be handy to add a function or method to retrieve the number. Use the number in announcements and archive file names when releasing the module (ModuleName-1.02.tar.Z). See perldoc ExtUtils::MakeMaker.pm for details.
How to release and distribute a module.
It's good idea to post an announcement of the availability of your module (or the module itself if small) to the comp.lang.perl.announce Usenet newsgroup. This will at least ensure very wide once-off distribution.
If possible, register the module with CPAN. You should include details of its location in your announcement.
Some notes about ftp archives: Please use a long descriptive file name that includes the version number. Most incoming directories will not be readable/listable, i.e., you won't be able to see your file after uploading it. Remember to send your email notification message as soon as possible after uploading else your file may get deleted automatically. Allow time for the file to be processed and/or check the file has been processed before announcing its location.
FTP Archives for Perl Modules:
Follow the instructions and links on:
- http://www.cpan.org/modules/00modlist.long.html
- http://www.cpan.org/modules/04pause.html
or upload to one of these sites:
- https://pause.kbx.de/pause/
- http://pause.perl.org/
and notify <modules@perl.org>.
By using the WWW interface you can ask the Upload Server to mirror your modules from your ftp or WWW site into your own directory on CPAN!
Please remember to send me an updated entry for the Module list!
Take care when changing a released module.
Always strive to remain compatible with previous released versions. Otherwise try to add a mechanism to revert to the old behavior if people rely on it. Document incompatible changes.
There is no requirement to convert anything.
If it ain't broke, don't fix it! Perl 4 library scripts should continue to work with no problems. You may need to make some minor changes (like escaping non-array @'s in double quoted strings) but there is no need to convert a .pl file into a Module for just that.
Consider the implications.
All Perl applications that make use of the script will need to be changed (slightly) if the script is converted into a module. Is it worth it unless you plan to make other changes at the same time?
Make the most of the opportunity.
If you are going to convert the script to a module you can use the opportunity to redesign the interface. The guidelines for module creation above include many of the issues you should consider.
The pl2pm utility will get you started.
This utility will read *.pl files (given as parameters) and write corresponding *.pm files. The pl2pm utilities does the following:
Adds the standard Module prologue lines
Converts package specifiers from ' to ::
Converts die(...) to croak(...)
Several other minor changes
Being a mechanical process pl2pm is not bullet proof. The converted code will need careful checking, especially any package statements. Don't delete the original .pl file till the new .pm one works!
Complete applications rarely belong in the Perl Module Library.
Many applications contain some Perl code that could be reused.
Help save the world! Share your code in a form that makes it easy to reuse.
Break-out the reusable code into one or more separate module files.
Take the opportunity to reconsider and redesign the interfaces.
In some cases the 'application' can then be reduced to a small
fragment of code built on top of the reusable modules. In these cases the application could invoked as:
- % perl -e 'use Module::Name; method(@ARGV)' ...
- or
- % perl -mModule::Name ... (in perl5.002 or higher)
Perl does not enforce private and public parts of its modules as you may have been used to in other languages like C++, Ada, or Modula-17. Perl doesn't have an infatuation with enforced privacy. It would prefer that you stayed out of its living room because you weren't invited, not because it has a shotgun.
The module and its user have a contract, part of which is common law,
and part of which is "written". Part of the common law contract is
that a module doesn't pollute any namespace it wasn't asked to. The
written contract for the module (A.K.A. documentation) may make other
provisions. But then you know when you use RedefineTheWorld
that
you're redefining the world and willing to take the consequences.
perlmodstyle - Perl module style guide
This document attempts to describe the Perl Community's "best practice" for writing Perl modules. It extends the recommendations found in perlstyle , which should be considered required reading before reading this document.
While this document is intended to be useful to all module authors, it is particularly aimed at authors who wish to publish their modules on CPAN.
The focus is on elements of style which are visible to the users of a module, rather than those parts which are only seen by the module's developers. However, many of the guidelines presented in this document can be extrapolated and applied successfully to a module's internals.
This document differs from perlnewmod in that it is a style guide rather than a tutorial on creating CPAN modules. It provides a checklist against which modules can be compared to determine whether they conform to best practice, without necessarily describing in detail how to achieve this.
All the advice contained in this document has been gleaned from extensive conversations with experienced CPAN authors and users. Every piece of advice given here is the result of previous mistakes. This information is here to help you avoid the same mistakes and the extra work that would inevitably be required to fix them.
The first section of this document provides an itemized checklist; subsequent sections provide a more detailed discussion of the items on the list. The final section, "Common Pitfalls", describes some of the most popular mistakes made by CPAN authors.
For more detail on each item in this checklist, see below.
Don't re-invent the wheel
Patch, extend or subclass an existing module where possible
Do one thing and do it well
Choose an appropriate name
API should be understandable by the average programmer
Simple methods for simple tasks
Separate functionality from output
Consistent naming of subroutines or methods
Use named parameters (a hash or hashref) when there are more than two parameters
Ensure your module works under use strict
and -w
Stable modules should maintain backwards compatibility
Write documentation in POD
Document purpose, scope and target applications
Document each publically accessible method or subroutine, including params and return values
Give examples of use in your documentation
Provide a README file and perhaps also release notes, changelog, etc
Provide links to further information (URL, email)
Specify pre-requisites in Makefile.PL or Build.PL
Specify Perl version requirements with use
Include tests with your module
Choose a sensible and consistent version numbering scheme (X.YY is the common Perl module numbering scheme)
Increment the version number for every change, no matter how small
Package the module using "make dist"
Choose an appropriate license (GPL/Artistic is a good default)
Try not to launch headlong into developing your module without spending some time thinking first. A little forethought may save you a vast amount of effort later on.
You may not even need to write the module. Check whether it's already been done in Perl, and avoid re-inventing the wheel unless you have a good reason.
Good places to look for pre-existing modules include http://search.cpan.org/ and asking on modules@perl.org
If an existing module almost does what you want, consider writing a patch, writing a subclass, or otherwise extending the existing module rather than rewriting it.
At the risk of stating the obvious, modules are intended to be modular. A Perl developer should be able to use modules to put together the building blocks of their application. However, it's important that the blocks are the right shape, and that the developer shouldn't have to use a big block when all they need is a small one.
Your module should have a clearly defined scope which is no longer than a single sentence. Can your module be broken down into a family of related modules?
Bad example:
"FooBar.pm provides an implementation of the FOO protocol and the related BAR standard."
Good example:
"Foo.pm provides an implementation of the FOO protocol. Bar.pm implements the related BAR protocol."
This means that if a developer only needs a module for the BAR standard, they should not be forced to install libraries for FOO as well.
Make sure you choose an appropriate name for your module early on. This will help people find and remember your module, and make programming with your module more intuitive.
When naming your module, consider the following:
Be descriptive (i.e. accurately describes the purpose of the module).
Be consistent with existing modules.
Reflect the functionality of the module, not the implementation.
Avoid starting a new top-level hierarchy, especially if a suitable hierarchy already exists under which you could place your module.
You should contact modules@perl.org to ask them about your module name before publishing your module. You should also try to ask people who are already familiar with the module's application domain and the CPAN naming system. Authors of similar modules, or modules with similar names, may be a good place to start.
Considerations for module design and coding:
Your module may be object oriented (OO) or not, or it may have both kinds of interfaces available. There are pros and cons of each technique, which should be considered when you design your API.
In Perl Best Practices (copyright 2004, Published by O'Reilly Media, Inc.), Damian Conway provides a list of criteria to use when deciding if OO is the right fit for your problem:
The system being designed is large, or is likely to become large.
The data can be aggregated into obvious structures, especially if there's a large amount of data in each aggregate.
The various types of data aggregate form a natural hierarchy that facilitates the use of inheritance and polymorphism.
You have a piece of data on which many different operations are applied.
You need to perform the same general operations on related types of data, but with slight variations depending on the specific type of data the operations are applied to.
It's likely you'll have to add new data types later.
The typical interactions between pieces of data are best represented by operators.
The implementation of individual components of the system is likely to change over time.
The system design is already object-oriented.
Large numbers of other programmers will be using your code modules.
Think carefully about whether OO is appropriate for your module. Gratuitous object orientation results in complex APIs which are difficult for the average module user to understand or use.
Your interfaces should be understandable by an average Perl programmer. The following guidelines may help you judge whether your API is sufficiently straightforward:
It's better to have numerous simple routines than a few monolithic ones. If your routine changes its behaviour significantly based on its arguments, it's a sign that you should have two (or more) separate routines.
Return your results in the most generic form possible and allow the user to choose how to use them. The most generic form possible is usually a Perl data structure which can then be used to generate a text report, HTML, XML, a database query, or whatever else your users require.
If your routine iterates through some kind of list (such as a list of
files, or records in a database) you may consider providing a callback
so that users can manipulate each element of the list in turn.
File::Find provides an example of this with its
find(\&wanted, $dir)
syntax.
Don't require every module user to jump through the same hoops to achieve a simple result. You can always include optional parameters or routines for more complex or non-standard behaviour. If most of your users have to type a few almost identical lines of code when they start using your module, it's a sign that you should have made that behaviour a default. Another good indicator that you should use defaults is if most of your users call your routines with the same arguments.
Your naming should be consistent. For instance, it's better to have:
- display_day();
- display_week();
- display_year();
than
- display_day();
- week_display();
- show_year();
This applies equally to method names, parameter names, and anything else which is visible to the user (and most things that aren't!)
Use named parameters. It's easier to use a hash like this:
- $obj->do_something(
- name => "wibble",
- type => "text",
- size => 1024,
- );
... than to have a long list of unnamed parameters like this:
- $obj->do_something("wibble", "text", 1024);
While the list of arguments might work fine for one, two or even three arguments, any more arguments become hard for the module user to remember, and hard for the module author to manage. If you want to add a new parameter you will have to add it to the end of the list for backward compatibility, and this will probably make your list order unintuitive. Also, if many elements may be undefined you may see the following unattractive method calls:
Provide sensible defaults for parameters which have them. Don't make your users specify parameters which will almost always be the same.
The issue of whether to pass the arguments in a hash or a hashref is largely a matter of personal style.
The use of hash keys starting with a hyphen (-name
) or entirely in
upper case (NAME
) is a relic of older versions of Perl in which
ordinary lower case strings were not handled correctly by the =>
operator. While some modules retain uppercase or hyphenated argument
keys for historical reasons or as a matter of personal style, most new
modules should use simple lower case keys. Whatever you choose, be
consistent!
Your module should run successfully under the strict pragma and should run without generating any warnings. Your module should also handle taint-checking where appropriate, though this can cause difficulties in many cases.
Modules which are "stable" should not break backwards compatibility without at least a long transition phase and a major change in version number.
When your module encounters an error it should do one or more of:
Return an undefined value.
set $Module::errstr
or similar (errstr
is a common name used by
DBI and other popular modules; if you choose something else, be sure to
document it clearly).
warn() or carp()
a message to STDERR.
croak()
only when your module absolutely cannot figure out what to
do. (croak()
is a better version of die() for use within
modules, which reports its errors from the perspective of the caller.
See Carp for details of croak()
, carp()
and other useful
routines.)
As an alternative to the above, you may prefer to throw exceptions using the Error module.
Configurable error handling can be very useful to your users. Consider offering a choice of levels for warning and debug messages, an option to send messages to a separate file, a way to specify an error-handling routine, or other such features. Be sure to default all these options to the commonest use.
Your module should include documentation aimed at Perl developers. You should use Perl's "plain old documentation" (POD) for your general technical documentation, though you may wish to write additional documentation (white papers, tutorials, etc) in some other format. You need to cover the following subjects:
A synopsis of the common uses of the module
The purpose, scope and target applications of your module
Use of each publically accessible method or subroutine, including parameters and return values
Examples of use
Sources of further information
A contact email address for the author/maintainer
The level of detail in Perl module documentation generally goes from less detailed to more detailed. Your SYNOPSIS section should contain a minimal example of use (perhaps as little as one line of code; skip the unusual use cases or anything not needed by most users); the DESCRIPTION should describe your module in broad terms, generally in just a few paragraphs; more detail of the module's routines or methods, lengthy code examples, or other in-depth material should be given in subsequent sections.
Ideally, someone who's slightly familiar with your module should be able to refresh their memory without hitting "page down". As your reader continues through the document, they should receive a progressively greater amount of knowledge.
The recommended order of sections in Perl module documentation is:
NAME
SYNOPSIS
DESCRIPTION
One or more sections or subsections giving greater detail of available methods and routines and any other relevant information.
BUGS/CAVEATS/etc
AUTHOR
SEE ALSO
COPYRIGHT and LICENSE
Keep your documentation near the code it documents ("inline" documentation). Include POD for a given method right above that method's subroutine. This makes it easier to keep the documentation up to date, and avoids having to document each piece of code twice (once in POD and once in comments).
Your module should also include a README file describing the module and giving pointers to further information (website, author email).
An INSTALL file should be included, and should contain simple installation instructions. When using ExtUtils::MakeMaker this will usually be:
When using Module::Build, this will usually be:
Release notes or changelogs should be produced for each release of your software describing user-visible changes to your module, in terms relevant to the user.
Version numbers should indicate at least major and minor releases, and possibly sub-minor releases. A major release is one in which most of the functionality has changed, or in which major new functionality is added. A minor release is one in which a small amount of functionality has been added or changed. Sub-minor version numbers are usually used for changes which do not affect functionality, such as documentation patches.
The most common CPAN version numbering scheme looks like this:
- 1.00, 1.10, 1.11, 1.20, 1.30, 1.31, 1.32
A correct CPAN version number is a floating point number with at least 2 digits after the decimal. You can test whether it conforms to CPAN by using
- perl -MExtUtils::MakeMaker -le 'print MM->parse_version(shift)' 'Foo.pm'
If you want to release a 'beta' or 'alpha' version of a module but don't want CPAN.pm to list it as most recent use an '_' after the regular version number followed by at least 2 digits, eg. 1.20_01. If you do this, the following idiom is recommended:
- $VERSION = "1.12_01";
- $XS_VERSION = $VERSION; # only needed if you have XS code
- $VERSION = eval $VERSION;
With that trick MakeMaker will only read the first line and thus read the underscore, while the perl interpreter will evaluate the $VERSION and convert the string into a number. Later operations that treat $VERSION as a number will then be able to do so without provoking a warning about $VERSION not being a number.
Never release anything (even a one-word documentation patch) without incrementing the number. Even a one-word documentation patch should result in a change in version at the sub-minor level.
Module authors should carefully consider whether to rely on other modules, and which modules to rely on.
Most importantly, choose modules which are as stable as possible. In order of preference:
Core Perl modules
Stable CPAN modules
Unstable CPAN modules
Modules not available from CPAN
Specify version requirements for other Perl modules in the pre-requisites in your Makefile.PL or Build.PL.
Be sure to specify Perl version requirements both in Makefile.PL or
Build.PL and with require 5.6.1
or similar. See the section on
use VERSION
of require for details.
All modules should be tested before distribution (using "make disttest"),
and the tests should also be available to people installing the modules
(using "make test").
For Module::Build you would use the make test
equivalent perl Build test
.
The importance of these tests is proportional to the alleged stability of a module. A module which purports to be stable or which hopes to achieve wide use should adhere to as strict a testing regime as possible.
Useful modules to help you write tests (with minimum impact on your development process or your time) include Test::Simple, Carp::Assert and Test::Inline. For more sophisticated test suites there are Test::More and Test::MockObject.
Modules should be packaged using one of the standard packaging tools. Currently you have the choice between ExtUtils::MakeMaker and the more platform independent Module::Build, allowing modules to be installed in a consistent manner. When using ExtUtils::MakeMaker, you can use "make dist" to create your package. Tools exist to help you to build your module in a MakeMaker-friendly style. These include ExtUtils::ModuleMaker and h2xs. See also perlnewmod.
Make sure that your module has a license, and that the full text of it is included in the distribution (unless it's a common one and the terms of the license don't require you to include it).
If you don't know what license to use, dual licensing under the GPL and Artistic licenses (the same as Perl itself) is a good idea. See perlgpl and perlartistic.
There are certain application spaces which are already very, very well served by CPAN. One example is templating systems, another is date and time modules, and there are many more. While it is a rite of passage to write your own version of these things, please consider carefully whether the Perl world really needs you to publish it.
Your module will be part of a developer's toolkit. It will not, in itself, form the entire toolkit. It's tempting to add extra features until your code is a monolithic system rather than a set of modular building blocks.
Don't fall into the trap of writing for the wrong audience. Your primary audience is a reasonably experienced developer with at least a moderate understanding of your module's application domain, who's just downloaded your module and wants to start using it as quickly as possible.
Tutorials, end-user documentation, research papers, FAQs etc are not
appropriate in a module's main documentation. If you really want to
write these, include them as sub-documents such as My::Module::Tutorial
or
My::Module::FAQ
and provide a link in the SEE ALSO section of the
main documentation.
General Perl style guide
How to create a new module
POD documentation
Verifies your POD's correctness
Test::Simple, Test::Inline, Carp::Assert, Test::More, Test::MockObject
Perl Authors Upload Server. Contains links to information for module authors.
Kirrily "Skud" Robert <skud@cpan.org>
perlmroapi - Perl method resolution plugin interface
As of Perl 5.10.1 there is a new interface for plugging and using method resolution orders other than the default (linear depth first search). The C3 method resolution order added in 5.10.0 has been re-implemented as a plugin, without changing its Perl-space interface.
Each plugin should register itself by providing the following structure
- struct mro_alg {
- AV *(*resolve)(pTHX_ HV *stash, U32 level);
- const char *name;
- U16 length;
- U16 kflags;
- U32 hash;
- };
and calling Perl_mro_register
:
- Perl_mro_register(aTHX_ &my_mro_alg);
Pointer to the linearisation function, described below.
Name of the MRO, either in ISO-8859-1 or UTF-8.
Length of the name.
If the name is given in UTF-8, set this to HVhek_UTF8
. The value is passed
direct as the parameter kflags to hv_common()
.
A precomputed hash value for the MRO's name, or 0.
The resolve
function is called to generate a linearised ISA for the
given stash, using this MRO. It is called with a pointer to the stash, and
a level of 0. The core always sets level to 0 when it calls your
function - the parameter is provided to allow your implementation to track
depth if it needs to recurse.
The function should return a reference to an array containing the parent
classes in order. The names of the classes should be the result of calling
HvENAME()
on the stash. In those cases where HvENAME()
returns null,
HvNAME()
should be used instead.
The caller is responsible for incrementing the reference count of the array
returned if it wants to keep the structure. Hence, if you have created a
temporary value that you keep no pointer to, sv_2mortal()
to ensure that
it is disposed of correctly. If you have cached your return value, then
return a pointer to it without changing the reference count.
Computing MROs can be expensive. The implementation provides a cache, in
which you can store a single SV *
, or anything that can be cast to
SV *
, such as AV *
. To read your private value, use the macro
MRO_GET_PRIVATE_DATA()
, passing it the mro_meta
structure from the
stash, and a pointer to your mro_alg
structure:
- meta = HvMROMETA(stash);
- private_sv = MRO_GET_PRIVATE_DATA(meta, &my_mro_alg);
To set your private value, call Perl_mro_set_private_data()
:
- Perl_mro_set_private_data(aTHX_ meta, &c3_alg, private_sv);
The private data cache will take ownership of a reference to private_sv,
much the same way that hv_store()
takes ownership of a reference to the
value that you pass it.
For examples of MRO implementations, see S_mro_get_linear_isa_c3()
and the BOOT:
section of mro/mro.xs, and S_mro_get_linear_isa_dfs()
in mro.c
The implementation of the C3 MRO and switchable MROs within the perl core was written by Brandon L Black. Nicholas Clark created the pluggable interface, refactored Brandon's implementation to work with it, and wrote this document.
perlnetware - Perl for NetWare
This file gives instructions for building Perl 5.7 and above, and also Perl modules for NetWare. Before you start, you may want to read the README file found in the top level directory into which the Perl source code distribution was extracted. Make sure you read and understand the terms under which the software is being distributed.
This section describes the steps to be performed to build a Perl NLM and other associated NLMs.
The build requires CodeWarrior compiler and linker. In addition, the "NetWare SDK", "NLM & NetWare Libraries for C" and "NetWare Server Protocol Libraries for C", all available at http://developer.novell.com/wiki/index.php/Category:Novell_Developer_Kit, are required. Microsoft Visual C++ version 4.2 or later is also required.
The build process is dependent on the location of the NetWare SDK. Once the Tools & SDK are installed, the build environment has to be setup. The following batch files setup the environment.
The Execution of this file takes 2 parameters as input. The first being the NetWare SDK path, second being the path for CodeWarrior Compiler & tools. Execution of this file sets these paths and also sets the build type to Release by default.
This is used to set the build type to debug or release. Change the build type only after executing SetNWBld.bat
Example:
Typing "buildtype d on" at the command prompt causes the buildtype to be set to Debug type with D2 flag set.
Typing "buildtype d off" or "buildtype d" at the command prompt causes the buildtype to be set to Debug type with D1 flag set.
Typing "buildtype r" at the command prompt sets it to Release Build type.
The make process runs only under WinNT shell. The NetWare makefile is located under the NetWare folder. This makes use of miniperl.exe to run some of the Perl scripts. To create miniperl.exe, first set the required paths for Visual c++ compiler (specify vcvars32 location) at the command prompt. Then run nmake from win32 folder through WinNT command prompt. The build process can be stopped after miniperl.exe is created. Then run nmake from NetWare folder through WinNT command prompt.
Currently the following two build types are tested on NetWare:
USE_MULTI, USE_ITHREADS & USE_IMP_SYS defined
USE_MULTI & USE_IMP_SYS defined and USE_ITHREADS not defined
Once miniperl.exe creation is over, run nmake from the NetWare folder. This will build the Perl interpreter for NetWare as perl.nlm. This is copied under the Release folder if you are doing a release build, else will be copied under Debug folder for debug builds.
The make process also creates the Perl extensions as <Extension.nlm>
To install NetWare Perl onto a NetWare server, first map the Sys volume of a NetWare server to i:. This is because the makefile by default sets the drive letter to i:. Type nmake nwinstall from NetWare folder on a WinNT command prompt. This will copy the binaries and module files onto the NetWare server under sys:\Perl folder. The Perl interpreter, perl.nlm, is copied under sys:\perl\system folder. Copy this to sys:\system folder.
Example: At the command prompt Type "nmake nwinstall". This will install NetWare Perl on the NetWare Server. Similarly, if you type "nmake install", this will cause the binaries to be installed on the local machine. (Typically under the c:\perl folder)
To build extensions other than standard extensions, NetWare Perl has to be installed on Windows along with Windows Perl. The Perl for Windows can be either downloaded from the CPAN site and built using the sources, or the binaries can be directly downloaded from the ActiveState site. Installation can be done by invoking nmake install from the NetWare folder on a WinNT command prompt after building NetWare Perl by following steps given above. This will copy all the *.pm files and other required files. Documentation files are not copied. Thus one must first install Windows Perl, Then install NetWare Perl.
Once this is done, do the following to build any extension:
Change to the extension directory where its source files are present.
Run the following command at the command prompt:
- perl -II<path to NetWare lib dir> -II<path to lib> Makefile.pl
Example:
- perl -Ic:/perl/5.6.1/lib/NetWare-x86-multi-thread -Ic:\perl\5.6.1\lib MakeFile.pl
or
- perl -Ic:/perl/5.8.0/lib/NetWare-x86-multi-thread -Ic:\perl\5.8.0\lib MakeFile.pl
nmake
nmake install
Install will copy the files into the Windows machine where NetWare Perl is installed and these files may have to be copied to the NetWare server manually. Alternatively, pass INSTALLSITELIB=i:\perl\lib as an input to makefile.pl above. Here i: is the mapped drive to the sys: volume of the server where Perl on NetWare is installed. Now typing nmake install, will copy the files onto the NetWare server.
Example: You can execute the following on the command prompt.
- perl -Ic:/perl/5.6.1/lib/NetWare-x86-multi-thread -Ic:\perl\5.6.1\lib MakeFile.pl
- INSTALLSITELIB=i:\perl\lib
or
- perl -Ic:/perl/5.8.0/lib/NetWare-x86-multi-thread -Ic:\perl\5.8.0\lib MakeFile.pl
- INSTALLSITELIB=i:\perl\lib
Note: Some modules downloaded from CPAN may require NetWare related API in order to build on NetWare. Other modules may however build smoothly with or without minor changes depending on the type of module.
The makefile for Win32 is used as a reference to create the makefile for NetWare. Also, the make process for NetWare port uses miniperl.exe to run scripts during the make and installation process.
Anantha Kesari H Y (hyanantha@novell.com) Aditya C (caditya@novell.com)
Created - 18 Jan 2001
Modified - 25 June 2001
Modified - 13 July 2001
Modified - 28 May 2002
perlnewmod - preparing a new module for distribution
This document gives you some suggestions about how to go about writing Perl modules, preparing them for distribution, and making them available via CPAN.
One of the things that makes Perl really powerful is the fact that Perl hackers tend to want to share the solutions to problems they've faced, so you and I don't have to battle with the same problem again.
The main way they do this is by abstracting the solution into a Perl module. If you don't know what one of these is, the rest of this document isn't going to be much use to you. You're also missing out on an awful lot of useful code; consider having a look at perlmod, perlmodlib and perlmodinstall before coming back here.
When you've found that there isn't a module available for what you're trying to do, and you've had to write the code yourself, consider packaging up the solution into a module and uploading it to CPAN so that others can benefit.
We're going to primarily concentrate on Perl-only modules here, rather than XS modules. XS modules serve a rather different purpose, and you should consider different things before distributing them - the popularity of the library you are gluing, the portability to other operating systems, and so on. However, the notes on preparing the Perl side of the module and packaging and distributing it will apply equally well to an XS module as a pure-Perl one.
You should make a module out of any code that you think is going to be useful to others. Anything that's likely to fill a hole in the communal library and which someone else can slot directly into their program. Any part of your code which you can isolate and extract and plug into something else is a likely candidate.
Let's take an example. Suppose you're reading in data from a local format into a hash-of-hashes in Perl, turning that into a tree, walking the tree and then piping each node to an Acme Transmogrifier Server.
Now, quite a few people have the Acme Transmogrifier, and you've had to write something to talk the protocol from scratch - you'd almost certainly want to make that into a module. The level at which you pitch it is up to you: you might want protocol-level modules analogous to Net::SMTP which then talk to higher level modules analogous to Mail::Send. The choice is yours, but you do want to get a module out for that server protocol.
Nobody else on the planet is going to talk your local data format, so we can ignore that. But what about the thing in the middle? Building tree structures from Perl variables and then traversing them is a nice, general problem, and if nobody's already written a module that does that, you might want to modularise that code too.
So hopefully you've now got a few ideas about what's good to modularise. Let's now see how it's done.
Before we even start scraping out the code, there are a few things we'll want to do in advance.
Dig into a bunch of modules to see how they're written. I'd suggest
starting with Text::Tabs, since it's in the standard
library and is nice and simple, and then looking at something a little
more complex like File::Copy. For object oriented
code, WWW::Mechanize
or the Email::*
modules provide some good
examples.
These should give you an overall feel for how modules are laid out and written.
There are a lot of modules on CPAN, and it's easy to miss one that's similar to what you're planning on contributing. Have a good plough through the http://search.cpan.org and make sure you're not the one reinventing the wheel!
You might love it. You might feel that everyone else needs it. But there
might not actually be any real demand for it out there. If you're unsure
about the demand your module will have, consider sending out feelers
on the comp.lang.perl.modules
newsgroup, or as a last resort, ask the
modules list at modules@perl.org
. Remember that this is a closed list
with a very long turn-around time - be prepared to wait a good while for
a response from them.
Perl modules included on CPAN have a naming hierarchy you should try to fit in with. See perlmodlib for more details on how this works, and browse around CPAN and the modules list to get a feel of it. At the very least, remember this: modules should be title capitalised, (This::Thing) fit in with a category, and explain their purpose succinctly.
While you're doing that, make really sure you haven't missed a module similar to the one you're about to write.
When you've got your name sorted out and you're sure that your module is wanted and not currently available, it's time to start coding.
The module-starter utility is distributed as part of the Module::Starter CPAN package. It creates a directory with stubs of all the necessary files to start a new module, according to recent "best practice" for module development, and is invoked from the command line, thus:
- module-starter --module=Foo::Bar \
- --author="Your Name" --email=yourname@cpan.org
If you do not wish to install the Module::Starter package from CPAN, h2xs is an older tool, originally intended for the development of XS modules, which comes packaged with the Perl distribution.
A typical invocation of h2xs for a pure Perl module is:
- h2xs -AX --skip-exporter --use-new-tests -n Foo::Bar
The -A
omits the Autoloader code, -X omits XS elements,
--skip-exporter
omits the Exporter code, --use-new-tests
sets up a
modern testing environment, and -n
specifies the name of the module.
A module's code has to be warning and strict-clean, since you can't guarantee the conditions that it'll be used under. Besides, you wouldn't want to distribute code that wasn't warning or strict-clean anyway, right?
The Carp module allows you to present your error messages from the caller's perspective; this gives you a way to signal a problem with the caller and not your module. For instance, if you say this:
- warn "No hostname given";
the user will see something like this:
- No hostname given at /usr/local/lib/perl5/site_perl/5.6.0/Net/Acme.pm
- line 123.
which looks like your module is doing something wrong. Instead, you want to put the blame on the user, and say this:
- No hostname given at bad_code, line 10.
You do this by using Carp and replacing your warns with
carp
s. If you need to die, say croak
instead. However, keep
warn and die in place for your sanity checks - where it really is
your module at fault.
Exporter gives you a standard way of exporting symbols and
subroutines from your module into the caller's namespace. For instance,
saying use Net::Acme qw(&frob)
would import the frob
subroutine.
The package variable @EXPORT
will determine which symbols will get
exported when the caller simply says use Net::Acme
- you will hardly
ever want to put anything in there. @EXPORT_OK
, on the other hand,
specifies which symbols you're willing to export. If you do want to
export a bunch of symbols, use the %EXPORT_TAGS
and define a standard
export set - look at Exporter for more details.
The work isn't over until the paperwork is done, and you're going to
need to put in some time writing some documentation for your module.
module-starter
or h2xs
will provide a stub for you to fill in; if
you're not sure about the format, look at perlpod for an
introduction. Provide a good synopsis of how your module is used in
code, a description, and then notes on the syntax and function of the
individual subroutines or methods. Use Perl comments for developer notes
and POD for end-user notes.
You're encouraged to create self-tests for your module to ensure it's
working as intended on the myriad platforms Perl supports; if you upload
your module to CPAN, a host of testers will build your module and send
you the results of the tests. Again, module-starter
and h2xs
provide a test framework which you can extend - you should do something
more than just checking your module will compile.
Test::Simple and Test::More are good
places to start when writing a test suite.
If you're uploading to CPAN, the automated gremlins will extract the README file and place that in your CPAN directory. It'll also appear in the main by-module and by-category directories if you make it onto the modules list. It's a good idea to put here what the module actually does in detail, and the user-visible changes since the last release.
Every developer publishing modules on CPAN needs a CPAN ID. Visit
http://pause.perl.org/, select "Request PAUSE Account", and wait for
your request to be approved by the PAUSE administrators.
perl Makefile.PL; make test; make dist
Once again, module-starter
or h2xs
has done all the work for you.
They produce the standard Makefile.PL
you see when you download and
install modules, and this produces a Makefile with a dist
target.
Once you've ensured that your module passes its own tests - always a
good thing to make sure - you can make dist
, and the Makefile will
hopefully produce you a nice tarball of your module, ready for upload.
The email you got when you received your CPAN ID will tell you how to log in to PAUSE, the Perl Authors Upload SErver. From the menus there, you can upload your module to CPAN.
Once uploaded, it'll sit unnoticed in your author directory. If you want it connected to the rest of the CPAN, you'll need to go to "Register Namespace" on PAUSE. Once registered, your module will appear in the by-module and by-category listings on CPAN.
If you have a burning desire to tell the world about your release, post
an announcement to the moderated comp.lang.perl.announce
newsgroup.
Once you start accumulating users, they'll send you bug reports. If you're lucky, they'll even send you patches. Welcome to the joys of maintaining a software project...
Simon Cozens, simon@cpan.org
Updated by Kirrily "Skud" Robert, skud@cpan.org
perlmod, perlmodlib, perlmodinstall, h2xs, strict, Carp, Exporter, perlpod, Test::Simple, Test::More ExtUtils::MakeMaker, Module::Build, Module::Starter http://www.cpan.org/ , Ken Williams's tutorial on building your own module at http://mathforum.org/~ken/perl_modules.html
perlnumber - semantics of numbers and numeric operations in Perl
- $n = 1234; # decimal integer
- $n = 0b1110011; # binary integer
- $n = 01234; # octal integer
- $n = 0x1234; # hexadecimal integer
- $n = 12.34e-56; # exponential notation
- $n = "-12.34e56"; # number specified as a string
- $n = "1234"; # number specified as a string
This document describes how Perl internally handles numeric values.
Perl's operator overloading facility is completely ignored here. Operator overloading allows user-defined behaviors for numbers, such as operations over arbitrarily large integers, floating points numbers with arbitrary precision, operations over "exotic" numbers such as modular arithmetic or p-adic arithmetic, and so on. See overload for details.
Perl can internally represent numbers in 3 different ways: as native
integers, as native floating point numbers, and as decimal strings.
Decimal strings may have an exponential notation part, as in "12.34e-56"
.
Native here means "a format supported by the C compiler which was used
to build perl".
The term "native" does not mean quite as much when we talk about native integers, as it does when native floating point numbers are involved. The only implication of the term "native" on integers is that the limits for the maximal and the minimal supported true integral quantities are close to powers of 2. However, "native" floats have a most fundamental restriction: they may represent only those numbers which have a relatively "short" representation when converted to a binary fraction. For example, 0.9 cannot be represented by a native float, since the binary fraction for 0.9 is infinite:
- binary0.1110011001100...
with the sequence 1100
repeating again and again. In addition to this
limitation, the exponent of the binary number is also restricted when it
is represented as a floating point number. On typical hardware, floating
point values can store numbers with up to 53 binary digits, and with binary
exponents between -1024 and 1024. In decimal representation this is close
to 16 decimal digits and decimal exponents in the range of -304..304.
The upshot of all this is that Perl cannot store a number like
12345678901234567 as a floating point number on such architectures without
loss of information.
Similarly, decimal strings can represent only those numbers which have a finite decimal expansion. Being strings, and thus of arbitrary length, there is no practical limit for the exponent or number of decimal digits for these numbers. (But realize that what we are discussing the rules for just the storage of these numbers. The fact that you can store such "large" numbers does not mean that the operations over these numbers will use all of the significant digits. See Numeric operators and numeric conversions for details.)
In fact numbers stored in the native integer format may be stored either in the signed native form, or in the unsigned native form. Thus the limits for Perl numbers stored as native integers would typically be -2**31..2**32-1, with appropriate modifications in the case of 64-bit integers. Again, this does not mean that Perl can do operations only over integers in this range: it is possible to store many more integers in floating point format.
Summing up, Perl numeric values can store only those numbers which have a finite decimal expansion or a "short" binary expansion.
As mentioned earlier, Perl can store a number in any one of three formats, but most operators typically understand only one of those formats. When a numeric value is passed as an argument to such an operator, it will be converted to the format understood by the operator.
Six such conversions are possible:
- native integer --> native floating point (*)
- native integer --> decimal string
- native floating_point --> native integer (*)
- native floating_point --> decimal string (*)
- decimal string --> native integer
- decimal string --> native floating point (*)
These conversions are governed by the following general rules:
If the source number can be represented in the target form, that representation is used.
If the source number is outside of the limits representable in the target form, a representation of the closest limit is used. (Loss of information)
If the source number is between two numbers representable in the target form, a representation of one of these numbers is used. (Loss of information)
In native floating point --> native integer
conversions the magnitude
of the result is less than or equal to the magnitude of the source.
("Rounding to zero".)
If the decimal string --> native integer
conversion cannot be done
without loss of information, the result is compatible with the conversion
sequence decimal_string --> native_floating_point --> native_integer
.
In particular, rounding is strongly biased to 0, though a number like
"0.99999999999999999999"
has a chance of being rounded to 1.
RESTRICTION: The conversions marked with (*) above involve steps
performed by the C compiler. In particular, bugs/features of the compiler
used may lead to breakage of some of the above rules.
Perl operations which take a numeric argument treat that argument in one of four different ways: they may force it to one of the integer/floating/ string formats, or they may behave differently depending on the format of the operand. Forcing a numeric value to a particular format does not change the number stored in the value.
All the operators which need an argument in the integer format treat the
argument as in modular arithmetic, e.g., mod 2**32
on a 32-bit
architecture. sprintf "%u", -1
therefore provides the same result as
sprintf "%u", ~0
.
The binary operators +
-
*
/ %
==
!=
> <
>=
<=
and the unary operators -
abs and --
will
attempt to convert arguments to integers. If both conversions are possible
without loss of precision, and the operation can be performed without
loss of precision then the integer result is used. Otherwise arguments are
converted to floating point format and the floating point result is used.
The caching of conversions (as described above) means that the integer
conversion does not throw away fractional parts on floating point numbers.
++
behaves as the other operators above, except that if it is a string
matching the format /^[a-zA-Z]*[0-9]*\z/
the string increment described
in perlop is used.
use integer
In scopes where use integer;
is in force, nearly all the operators listed
above will force their argument(s) into integer format, and return an integer
result. The exceptions, abs, ++
and --
, do not change their
behavior with use integer;
Operators such as **
, sin and exp force arguments to floating point
format.
Arguments are forced into the integer format if not strings.
use integer
forces arguments to integer format. Also shift operations internally use signed integers rather than the default unsigned.
force the argument into the integer format. This is applicable
to the third and fourth arguments of sysread, for example.
force the argument into the string format. For example, this is
applicable to printf "%s", $value
.
Though forcing an argument into a particular form does not change the stored number, Perl remembers the result of such conversions. In particular, though the first such conversion may be time-consuming, repeated operations will not need to redo the conversion.
Ilya Zakharevich ilya@math.ohio-state.edu
Editorial adjustments by Gurusamy Sarathy <gsar@ActiveState.com>
Updates for 5.8.0 by Nicholas Clark <nick@ccl4.org>
perlobj - Perl object reference
This document provides a reference for Perl's object orientation features. If you're looking for an introduction to object-oriented programming in Perl, please see perlootut.
In order to understand Perl objects, you first need to understand references in Perl. See perlref for details.
This document describes all of Perl's object-oriented (OO) features from the ground up. If you're just looking to write some object-oriented code of your own, you are probably better served by using one of the object systems from CPAN described in perlootut.
If you're looking to write your own object system, or you need to maintain code which implements objects from scratch then this document will help you understand exactly how Perl does object orientation.
There are a few basic principles which define object oriented Perl:
An object is simply a data structure that knows to which class it belongs.
A class is simply a package. A class provides methods that expect to operate on objects.
A method is simply a subroutine that expects a reference to an object (or a package name, for class methods) as the first argument.
Let's look at each of these principles in depth.
Unlike many other languages which support object orientation, Perl does not provide any special syntax for constructing an object. Objects are merely Perl data structures (hashes, arrays, scalars, filehandles, etc.) that have been explicitly associated with a particular class.
That explicit association is created by the built-in bless function,
which is typically used within the constructor subroutine of the
class.
Here is a simple constructor:
The name new
isn't special. We could name our constructor something
else:
The modern convention for OO modules is to always use new
as the
name for the constructor, but there is no requirement to do so. Any
subroutine that blesses a data structure into a class is a valid
constructor in Perl.
In the previous examples, the {}
code creates a reference to an
empty anonymous hash. The bless function then takes that reference
and associates the hash with the class in $class
. In the simplest
case, the $class
variable will end up containing the string "File".
We can also use a variable to store a reference to the data structure that is being blessed as our object:
Once we've blessed the hash referred to by $self
we can start
calling methods on it. This is useful if you want to put object
initialization in its own separate method:
Since the object is also a hash, you can treat it as one, using it to store data associated with the object. Typically, code inside the class can treat the hash as an accessible data structure, while code outside the class should always treat the object as opaque. This is called encapsulation. Encapsulation means that the user of an object does not have to know how it is implemented. The user simply calls documented methods on the object.
Note, however, that (unlike most other OO languages) Perl does not ensure or enforce encapsulation in any way. If you want objects to actually be opaque you need to arrange for that yourself. This can be done in a varierty of ways, including using Inside-Out objects or modules from CPAN.
When we bless something, we are not blessing the variable which contains a reference to that thing, nor are we blessing the reference that the variable stores; we are blessing the thing that the variable refers to (sometimes known as the referent). This is best demonstrated with this code:
When we call bless on a variable, we are actually blessing the
underlying data structure that the variable refers to. We are not
blessing the reference itself, nor the variable that contains that
reference. That's why the second call to blessed( $bar )
returns
false. At that point $bar
is no longer storing a reference to an
object.
You will sometimes see older books or documentation mention "blessing a reference" or describe an object as a "blessed reference", but this is incorrect. It isn't the reference that is blessed as an object; it's the thing the reference refers to (i.e. the referent).
Perl does not provide any special syntax for class definitions. A package is simply a namespace containing variables and subroutines. The only difference is that in a class, the subroutines may expect a reference to an object or the name of a class as the first argument. This is purely a matter of convention, so a class may contain both methods and subroutines which don't operate on an object or class.
Each package contains a special array called @ISA
. The @ISA
array
contains a list of that class's parent classes, if any. This array is
examined when Perl does method resolution, which we will cover later.
It is possible to manually set @ISA
, and you may see this in older
Perl code. Much older code also uses the base pragma. For new code,
we recommend that you use the parent pragma to declare your parents.
This pragma will take care of setting @ISA
. It will also load the
parent classes and make sure that the package doesn't inherit from
itself.
However the parent classes are set, the package's @ISA
variable will
contain a list of those parents. This is simply a list of scalars, each
of which is a string that corresponds to a package name.
All classes inherit from the UNIVERSAL class implicitly. The
UNIVERSAL class is implemented by the Perl core, and provides
several default methods, such as isa()
, can()
, and VERSION()
.
The UNIVERSAL
class will never appear in a package's @ISA
variable.
Perl only provides method inheritance as a built-in feature. Attribute inheritance is left up the class to implement. See the Writing Accessors section for details.
Perl does not provide any special syntax for defining a method. A
method is simply a regular subroutine, and is declared with sub.
What makes a method special is that it expects to receive either an
object or a class name as its first argument.
Perl does provide special syntax for method invocation, the ->
operator. We will cover this in more detail later.
Most methods you write will expect to operate on objects:
Calling a method on an object is written as $object->method
.
The left hand side of the method invocation (or arrow) operator is the object (or class name), and the right hand side is the method name.
- my $pod = File->new( 'perlobj.pod', $data );
- $pod->save();
The ->
syntax is also used when dereferencing a reference. It
looks like the same operator, but these are two different operations.
When you call a method, the thing on the left side of the arrow is
passed as the first argument to the method. That means when we call Critter->new()
, the new()
method receives the string "Critter"
as its first argument. When we call $fred->speak()
, the $fred
variable is passed as the first argument to speak()
.
Just as with any Perl subroutine, all of the arguments passed in @_
are aliases to the original argument. This includes the object itself.
If you assign directly to $_[0]
you will change the contents of the
variable that holds the reference to the object. We recommend that you
don't do this unless you know exactly what you're doing.
Perl knows what package the method is in by looking at the left side of the arrow. If the left hand side is a package name, it looks for the method in that package. If the left hand side is an object, then Perl looks for the method in the package that the object has been blessed into.
If the left hand side is neither a package name nor an object, then the method call will cause an error, but see the section on Method Call Variations for more nuances.
We already talked about the special @ISA
array and the parent
pragma.
When a class inherits from another class, any methods defined in the parent class are available to the child class. If you attempt to call a method on an object that isn't defined in its own class, Perl will also look for that method in any parent classes it may have.
Since we didn't define a save()
method in the File::MP3
class,
Perl will look at the File::MP3
class's parent classes to find the
save()
method. If Perl cannot find a save()
method anywhere in
the inheritance hierarchy, it will die.
In this case, it finds a save()
method in the File
class. Note
that the object passed to save()
in this case is still a
File::MP3
object, even though the method is found in the File
class.
We can override a parent's method in a child class. When we do so, we
can still call the parent class's method with the SUPER
pseudo-class.
The SUPER
modifier can only be used for method calls. You can't
use it for regular subroutine calls or class methods:
- SUPER::save($thing); # FAIL: looks for save() sub in package SUPER
- SUPER->save($thing); # FAIL: looks for save() method in class
- # SUPER
- $thing->SUPER::save(); # Okay: looks for save() method in parent
- # classes
The SUPER
pseudo-class is resolved from the package where the call
is made. It is not resolved based on the object's class. This is
important, because it lets methods at different levels within a deep
inheritance hierarchy each correctly call their respective parent
methods.
- package A;
- sub new {
- return bless {}, shift;
- }
- sub speak {
- my $self = shift;
- say 'A';
- }
- package B;
- use parent -norequire, 'A';
- sub speak {
- my $self = shift;
- $self->SUPER::speak();
- say 'B';
- }
- package C;
- use parent -norequire, 'B';
- sub speak {
- my $self = shift;
- $self->SUPER::speak();
- say 'C';
- }
- my $c = C->new();
- $c->speak();
In this example, we will get the following output:
- A
- B
- C
This demonstrates how SUPER
is resolved. Even though the object is
blessed into the C
class, the speak()
method in the B
class
can still call SUPER::speak()
and expect it to correctly look in the
parent class of B
(i.e the class the method call is in), not in the
parent class of C
(i.e. the class the object belongs to).
There are rare cases where this package-based resolution can be a
problem. If you copy a subroutine from one package to another, SUPER
resolution will be done based on the original package.
Multiple inheritance often indicates a design problem, but Perl always gives you enough rope to hang yourself with if you ask for it.
To declare multiple parents, you simply need to pass multiple class
names to use parent
:
- package MultiChild;
- use parent 'Parent1', 'Parent2';
Method resolution order only matters in the case of multiple inheritance. In the case of single inheritance, Perl simply looks up the inheritance chain to find a method:
- Grandparent
- |
- Parent
- |
- Child
If we call a method on a Child
object and that method is not defined
in the Child
class, Perl will look for that method in the Parent
class and then, if necessary, in the Grandparent
class.
If Perl cannot find the method in any of these classes, it will die with an error message.
When a class has multiple parents, the method lookup order becomes more complicated.
By default, Perl does a depth-first left-to-right search for a method.
That means it starts with the first parent in the @ISA
array, and
then searches all of its parents, grandparents, etc. If it fails to
find the method, it then goes to the next parent in the original
class's @ISA
array and searches from there.
- SharedGreatGrandParent
- / \
- PaternalGrandparent MaternalGrandparent
- \ /
- Father Mother
- \ /
- Child
So given the diagram above, Perl will search Child
, Father
,
PaternalGrandparent
, SharedGreatGrandParent
, Mother
, and
finally MaternalGrandparent
. This may be a problem because now we're
looking in SharedGreatGrandParent
before we've checked all its
derived classes (i.e. before we tried Mother
and
MaternalGrandparent
).
It is possible to ask for a different method resolution order with the mro pragma.
This pragma lets you switch to the "C3" resolution order. In simple
terms, "C3" order ensures that shared parent classes are never searched
before child classes, so Perl will now search: Child
, Father
,
PaternalGrandparent
, Mother
MaternalGrandparent
, and finally
SharedGreatGrandParent
. Note however that this is not
"breadth-first" searching: All the Father
ancestors (except the
common ancestor) are searched before any of the Mother
ancestors are
considered.
The C3 order also lets you call methods in sibling classes with the
next pseudo-class. See the mro documentation for more details on
this feature.
When Perl searches for a method, it caches the lookup so that future calls to the method do not need to search for it again. Changing a class's parent class or adding subroutines to a class will invalidate the cache for that class.
The mro pragma provides some functions for manipulating the method cache directly.
As we mentioned earlier, Perl provides no special constructor syntax. This means that a class must implement its own constructor. A constructor is simply a class method that returns a reference to a new object.
The constructor can also accept additional parameters that define the
object. Let's write a real constructor for the File
class we used
earlier:
As you can see, we've stored the path and file data in the object itself. Remember, under the hood, this object is still just a hash. Later, we'll write accessors to manipulate this data.
For our File::MP3 class, we can check to make sure that the path we're given ends with ".mp3":
This constructor lets its parent class do the actual object construction.
An attribute is a piece of data belonging to a particular object. Unlike most object-oriented languages, Perl provides no special syntax or support for declaring and manipulating attributes.
Attributes are often stored in the object itself. For example, if the object is an anonymous hash, we can store the attribute values in the hash using the attribute name as the key.
While it's possible to refer directly to these hash keys outside of the class, it's considered a best practice to wrap all access to the attribute with accessor methods.
This has several advantages. Accessors make it easier to change the implementation of an object later while still preserving the original API.
An accessor lets you add additional code around attribute access. For example, you could apply a default to an attribute that wasn't set in the constructor, or you could validate that a new value for the attribute is acceptable.
Finally, using accessors makes inheritance much simpler. Subclasses can use the accessors rather than having to know how a parent class is implemented internally.
As with constructors, Perl provides no special accessor declaration syntax, so classes must provide explicitly written accessor methods. There are two common types of accessors, read-only and read-write.
A simple read-only accessor simply gets the value of a single attribute:
A read-write accessor will allow the caller to set the value as well as get it:
Our constructor and accessors are not very smart. They don't check that
a $path
is defined, nor do they check that a $path
is a valid
filesystem path.
Doing these checks by hand can quickly become tedious. Writing a bunch of accessors by hand is also incredibly tedious. There are a lot of modules on CPAN that can help you write safer and more concise code, including the modules we recommend in perlootut.
Perl supports several other ways to call methods besides the $object->method()
usage we've seen so far.
Perl lets you use a scalar variable containing a string as a method name:
This works exactly like calling $file->save()
. This can be very
useful for writing dynamic code. For example, it allows you to pass a
method name to be called as a parameter to another method.
Perl also lets you use a scalar containing a string as a class name:
Again, this allows for very dynamic code.
You can also use a subroutine reference as a method:
This is exactly equivalent to writing $sub->($file)
. You may see
this idiom in the wild combined with a call to can
:
Perl also lets you use a dereferenced scalar reference in a method call. That's a mouthful, so let's look at some code:
- $file->${ \'save' };
- $file->${ returns_scalar_ref() };
- $file->${ \( returns_scalar() ) };
- $file->${ returns_ref_to_sub_ref() };
This works if the dereference produces a string or a subroutine reference.
Under the hood, Perl filehandles are instances of the IO::Handle
or
IO::File
class. Once you have an open filehandle, you can call
methods on it. Additionally, you can call methods on the STDIN
,
STDOUT
, and STDERR
filehandles.
Because Perl allows you to use barewords for package names and
subroutine names, it sometimes interprets a bareword's meaning
incorrectly. For example, the construct Class->new()
can be
interpreted as either 'Class'->new()
or Class()->new()
.
In English, that second interpretation reads as "call a subroutine
named Class(), then call new() as a method on the return value of
Class()". If there is a subroutine named Class()
in the current
namespace, Perl will always interpret Class->new()
as the second
alternative: a call to new()
on the object returned by a call to
Class()
You can force Perl to use the first interpretation (i.e. as a method
call on the class named "Class") in two ways. First, you can append a
::
to the class name:
- Class::->new()
Perl will always interpret this as a method call.
Alternatively, you can quote the class name:
- 'Class'->new()
Of course, if the class name is in a scalar Perl will do the right thing as well:
- my $class = 'Class';
- $class->new();
Outside of the file handle case, use of this syntax is discouraged as it can confuse the Perl interpreter. See below for more details.
Perl suports another method invocation syntax called "indirect object" notation. This syntax is called "indirect" because the method comes before the object it is being invoked on.
This syntax can be used with any class or object method:
- my $file = new File $path, $data;
- save $file;
We recommend that you avoid this syntax, for several reasons.
First, it can be confusing to read. In the above example, it's not
clear if save
is a method provided by the File
class or simply a
subroutine that expects a file object as its first argument.
When used with class methods, the problem is even worse. Because Perl
allows subroutine names to be written as barewords, Perl has to guess
whether the bareword after the method is a class name or subroutine
name. In other words, Perl can resolve the syntax as either File->new( $path, $data )
or new( File( $path, $data ) )
.
To parse this code, Perl uses a heuristic based on what package names it has seen, what subroutines exist in the current package, what barewords it has previously seen, and other input. Needless to say, heuristics can produce very surprising results!
Older documentation (and some CPAN modules) encouraged this syntax, particularly for constructors, so you may still find it in the wild. However, we encourage you to avoid using it in new code.
You can force Perl to interpret the bareword as a class name by appending "::" to it, like we saw earlier:
- my $file = new File:: $path, $data;
bless, blessed
, and refAs we saw earlier, an object is simply a data structure that has been
blessed into a class via the bless function. The bless function
can take either one or two arguments:
In the first form, the anonymous hash is being blessed into the class
in $class
. In the second form, the anonymous hash is blessed into
the current package.
The second form is strongly discouraged, because it breaks the ability of a subclass to reuse the parent's constructor, but you may still run across it in existing code.
If you want to know whether a particular scalar refers to an object,
you can use the blessed
function exported by Scalar::Util, which
is shipped with the Perl core.
If $thing
refers to an object, then this function returns the name
of the package the object has been blessed into. If $thing
doesn't
contain a reference to a blessed object, the blessed
function
returns undef.
Note that blessed($thing)
will also return false if $thing
has
been blessed into a class named "0". This is a possible, but quite
pathological. Don't create a class named "0" unless you know what
you're doing.
Similarly, Perl's built-in ref function treats a reference to a
blessed object specially. If you call ref($thing) and $thing
holds a reference to an object, it will return the name of the class
that the object has been blessed into.
If you simply want to check that a variable contains an object
reference, we recommend that you use defined blessed($object)
, since
ref returns true values for all references, not just objects.
All classes automatically inherit from the UNIVERSAL class, which is built-in to the Perl core. This class provides a number of methods, all of which can be called on either a class or an object. You can also choose to override some of these methods in your class. If you do so, we recommend that you follow the built-in semantics described below.
The isa
method returns true if the object is a member of the
class in $class
, or a member of a subclass of $class
.
If you override this method, it should never throw an exception.
The DOES
method returns true if its object claims to perform the
role $role
. By default, this is equivalent to isa
. This method is
provided for use by object system extensions that implement roles, like
Moose
and Role::Tiny
.
You can also override DOES
directly in your own classes. If you
override this method, it should never throw an exception.
The can
method checks to see if the class or object it was called on
has a method named $method
. This checks for the method in the class
and all of its parents. If the method exists, then a reference to the
subroutine is returned. If it does not then undef is returned.
If your class responds to method calls via AUTOLOAD
, you may want to
overload can
to return a subroutine reference for methods which your
AUTOLOAD
method handles.
If you override this method, it should never throw an exception.
The VERSION
method returns the version number of the class
(package).
If the $need
argument is given then it will check that the current
version (as defined by the $VERSION variable in the package) is greater
than or equal to $need
; it will die if this is not the case. This
method is called automatically by the VERSION
form of use.
- use Package 1.2 qw(some imported subs);
- # implies:
- Package->VERSION(1.2);
We recommend that you use this method to access another package's
version, rather than looking directly at $Package::VERSION
. The
package you are looking at could have overridden the VERSION
method.
We also recommend using this method to check whether a module has a sufficient version. The internal implementation uses the version module to make sure that different types of version numbers are compared correctly.
If you call a method that doesn't exist in a class, Perl will throw an
error. However, if that class or any of its parent classes defines an
AUTOLOAD
method, that AUTOLOAD
method is called instead.
AUTOLOAD
is called as a regular method, and the caller will not know
the difference. Whatever value your AUTOLOAD
method returns is
returned to the caller.
The fully qualified method name that was called is available in the
$AUTOLOAD
package global for your class. Since this is a global, if
you want to refer to do it without a package name prefix under strict
'vars'
, you need to declare it.
- # XXX - this is a terrible way to implement accessors, but it makes
- # for a simple example.
- our $AUTOLOAD;
- sub AUTOLOAD {
- my $self = shift;
- # Remove qualifier from original method name...
- my $called = $AUTOLOAD =~ s/.*:://r;
- # Is there an attribute of that name?
- die "No such attribute: $called"
- unless exists $self->{$called};
- # If so, return it...
- return $self->{$called};
- }
- sub DESTROY { } # see below
Without the our $AUTOLOAD
declaration, this code will not compile
under the strict pragma.
As the comment says, this is not a good way to implement accessors. It's slow and too clever by far. However, you may see this as a way to provide accessors in older Perl code. See perlootut for recommendations on OO coding in Perl.
If your class does have an AUTOLOAD
method, we strongly recommend
that you override can
in your class as well. Your overridden can
method should return a subroutine reference for any method that your
AUTOLOAD
responds to.
When the last reference to an object goes away, the object is destroyed. If you only have one reference to an object stored in a lexical scalar, the object is destroyed when that scalar goes out of scope. If you store the object in a package global, that object may not go out of scope until the program exits.
If you want to do something when the object is destroyed, you can
define a DESTROY
method in your class. This method will always be
called by Perl at the appropriate time, unless the method is empty.
This is called just like any other method, with the object as the first
argument. It does not receive any additional arguments. However, the
$_[0]
variable will be read-only in the destructor, so you cannot
assign a value to it.
If your DESTROY
method throws an error, this error will be ignored.
It will not be sent to STDERR
and it will not cause the program to
die. However, if your destructor is running inside an eval {}
block,
then the error will change the value of $@
.
Because DESTROY
methods can be called at any time, you should
localize any global variables you might update in your DESTROY
. In
particular, if you use eval {}
you should localize $@
, and if you
use system or backticks you should localize $?
.
If you define an AUTOLOAD
in your class, then Perl will call your
AUTOLOAD
to handle the DESTROY
method. You can prevent this by
defining an empty DESTROY
, like we did in the autoloading example.
You can also check the value of $AUTOLOAD
and return without doing
anything when called to handle DESTROY
.
The order in which objects are destroyed during the global destruction before the program exits is unpredictable. This means that any objects contained by your object may already have been destroyed. You should check that a contained object is defined before calling a method on it:
You can use the ${^GLOBAL_PHASE}
variable to detect if you are
currently in the global destruction phase:
Note that this variable was added in Perl 5.14.0. If you want to detect
the global destruction phase on older versions of Perl, you can use the
Devel::GlobalDestruction
module on CPAN.
If your DESTROY
method issues a warning during global destruction,
the Perl interpreter will append the string " during global
destruction" the warning.
During global destruction, Perl will always garbage collect objects before unblessed references. See PERL_DESTRUCT_LEVEL in perlhacktips for more information about global destruction.
All the examples so far have shown objects based on a blessed hash. However, it's possible to bless any type of data structure or referent, including scalars, globs, and subroutines. You may see this sort of thing when looking at code in the wild.
Here's an example of a module as a blessed scalar:
In the past, the Perl community experimented with a technique called "inside-out objects". An inside-out object stores its data outside of the object's reference, indexed on a unique property of the object, such as its memory address, rather than in the object itself. This has the advantage of enforcing the encapsulation of object attributes, since their data is not stored in the object itself.
This technique was popular for a while (and was recommended in Damian Conway's Perl Best Practices), but never achieved universal adoption. The Object::InsideOut module on CPAN provides a comprehensive implementation of this technique, and you may see it or other inside-out modules in the wild.
Here is a simple example of the technique, using the Hash::Util::FieldHash core module. This module was added to the core to support inside-out object implementations.
- package Time;
- use strict;
- use warnings;
- use Hash::Util::FieldHash 'fieldhash';
- fieldhash my %time_for;
- sub new {
- my $class = shift;
- my $self = bless \( my $object ), $class;
- $time_for{$self} = time;
- return $self;
- }
- sub epoch {
- my $self = shift;
- return $time_for{$self};
- }
- my $time = Time->new;
- print $time->epoch;
The pseudo-hash feature was an experimental feature introduced in earlier versions of Perl and removed in 5.10.0. A pseudo-hash is an array reference which can be accessed using named keys like a hash. You may run in to some code in the wild which uses it. See the fields pragma for more information.
A kinder, gentler tutorial on object-oriented programming in Perl can be found in perlootut. You should also check out perlmodlib for some style guides on constructing both modules and classes.
perlootut - Object-Oriented Programming in Perl Tutorial
This document was created in February, 2011, and the last major revision was in February, 2013.
If you are reading this in the future then it's possible that the state of the art has changed. We recommend you start by reading the perlootut document in the latest stable release of Perl, rather than this version.
This document provides an introduction to object-oriented programming in Perl. It begins with a brief overview of the concepts behind object oriented design. Then it introduces several different OO systems from CPAN which build on top of what Perl provides.
By default, Perl's built-in OO system is very minimal, leaving you to do most of the work. This minimalism made a lot of sense in 1994, but in the years since Perl 5.0 we've seen a number of common patterns emerge in Perl OO. Fortunately, Perl's flexibility has allowed a rich ecosystem of Perl OO systems to flourish.
If you want to know how Perl OO works under the hood, the perlobj document explains the nitty gritty details.
This document assumes that you already understand the basics of Perl syntax, variable types, operators, and subroutine calls. If you don't understand these concepts yet, please read perlintro first. You should also read the perlsyn, perlop, and perlsub documents.
Most object systems share a number of common concepts. You've probably heard terms like "class", "object, "method", and "attribute" before. Understanding the concepts will make it much easier to read and write object-oriented code. If you're already familiar with these terms, you should still skim this section, since it explains each concept in terms of Perl's OO implementation.
Perl's OO system is class-based. Class-based OO is fairly common. It's used by Java, C++, C#, Python, Ruby, and many other languages. There are other object orientation paradigms as well. JavaScript is the most popular language to use another paradigm. JavaScript's OO system is prototype-based.
An object is a data structure that bundles together data and subroutines which operate on that data. An object's data is called attributes, and its subroutines are called methods. An object can be thought of as a noun (a person, a web service, a computer).
An object represents a single discrete thing. For example, an object might represent a file. The attributes for a file object might include its path, content, and last modification time. If we created an object to represent /etc/hostname on a machine named "foo.example.com", that object's path would be "/etc/hostname", its content would be "foo\n", and it's last modification time would be 1304974868 seconds since the beginning of the epoch.
The methods associated with a file might include rename() and
write().
In Perl most objects are hashes, but the OO systems we recommend keep you from having to worry about this. In practice, it's best to consider an object's internal data structure opaque.
A class defines the behavior of a category of objects. A class is a name for a category (like "File"), and a class also defines the behavior of objects in that category.
All objects belong to a specific class. For example, our
/etc/hostname object belongs to the File
class. When we want to
create a specific object, we start with its class, and construct or
instantiate an object. A specific object is often referred to as an
instance of a class.
In Perl, any package can be a class. The difference between a package
which is a class and one which isn't is based on how the package is
used. Here's our "class declaration" for the File
class:
In Perl, there is no special keyword for constructing an object.
However, most OO modules on CPAN use a method named new()
to
construct a new object:
- my $hostname = File->new(
- path => '/etc/hostname',
- content => "foo\n",
- last_mod_time => 1304974868,
- );
(Don't worry about that ->
operator, it will be explained
later.)
As we said earlier, most Perl objects are hashes, but an object can be
an instance of any Perl data type (scalar, array, etc.). Turning a
plain data structure into an object is done by blessing that data
structure using Perl's bless function.
While we strongly suggest you don't build your objects from scratch, you should know the term bless. A blessed data structure (aka "a referent") is an object. We sometimes say that an object has been "blessed into a class".
Once a referent has been blessed, the blessed
function from the
Scalar::Util core module can tell us its class name. This subroutine
returns an object's class when passed an object, and false otherwise.
A constructor creates a new object. In Perl, a class's constructor
is just another method, unlike some other languages, which provide
syntax for constructors. Most Perl classes use new
as the name for
their constructor:
- my $file = File->new(...);
You already learned that a method is a subroutine that operates on an object. You can think of a method as the things that an object can do. If an object is a noun, then methods are its verbs (save, print, open).
In Perl, methods are simply subroutines that live in a class's package. Methods are always written to receive the object as their first argument:
What makes a method special is how it's called. The arrow operator
(->
) tells Perl that we are calling a method.
When we make a method call, Perl arranges for the method's invocant to be passed as the first argument. Invocant is a fancy name for the thing on the left side of the arrow. The invocant can either be a class name or an object. We can also pass additional arguments to the method:
Each class can define its attributes. When we instantiate an object,
we assign values to those attributes. For example, every File
object
has a path. Attributes are sometimes called properties.
Perl has no special syntax for attributes. Under the hood, attributes are often stored as keys in the object's underlying hash, but don't worry about this.
We recommend that you only access attributes via accessor methods.
These are methods that can get or set the value of each attribute. We
saw this earlier in the print_info()
example, which calls $self->path
.
You might also see the terms getter and setter. These are two types of accessors. A getter gets the attribute's value, while a setter sets it. Another term for a setter is mutator
Attributes are typically defined as read-only or read-write. Read-only attributes can only be set when the object is first created, while read-write attributes can be altered at any time.
The value of an attribute may itself be another object. For example,
instead of returning its last mod time as a number, the File
class
could return a DateTime object representing that value.
It's possible to have a class that does not expose any publicly settable attributes. Not every class has attributes and methods.
Polymorphism is a fancy way of saying that objects from two
different classes share an API. For example, we could have File
and
WebPage
classes which both have a print_content()
method. This
method might produce different output for each class, but they share a
common interface.
While the two classes may differ in many ways, when it comes to the
print_content()
method, they are the same. This means that we can
try to call the print_content()
method on an object of either class,
and we don't have to know what class the object belongs to!
Polymorphism is one of the key concepts of object-oriented design.
Inheritance lets you create a specialized version of an existing class. Inheritance lets the new class reuse the methods and attributes of another class.
For example, we could create an File::MP3
class which inherits
from File
. An File::MP3
is-a more specific type of File
.
All mp3 files are files, but not all files are mp3 files.
We often refer to inheritance relationships as parent-child or
superclass/subclass
relationships. Sometimes we say that the child
has an is-a relationship with its parent class.
File
is a superclass of File::MP3
, and File::MP3
is a
subclass of File
.
- package File::MP3;
- use parent 'File';
The parent module is one of several ways that Perl lets you define inheritance relationships.
Perl allows multiple inheritance, which means that a class can inherit from multiple parents. While this is possible, we strongly recommend against it. Generally, you can use roles to do everything you can do with multiple inheritance, but in a cleaner way.
Note that there's nothing wrong with defining multiple subclasses of a
given class. This is both common and safe. For example, we might define
File::MP3::FixedBitrate
and File::MP3::VariableBitrate
classes to
distinguish between different types of mp3 file.
Inheritance allows two classes to share code. By default, every method
in the parent class is also available in the child. The child can
explicitly override a parent's method to provide its own
implementation. For example, if we have an File::MP3
object, it has
the print_info()
method from File
:
- my $cage = File::MP3->new(
- path => 'mp3s/My-Body-Is-a-Cage.mp3',
- content => $mp3_data,
- last_mod_time => 1304974868,
- title => 'My Body Is a Cage',
- );
- $cage->print_info;
- # The file is at mp3s/My-Body-Is-a-Cage.mp3
If we wanted to include the mp3's title in the greeting, we could override the method:
The process of determining what method should be used is called
method resolution. What Perl does is look at the object's class
first (File::MP3
in this case). If that class defines the method,
then that class's version of the method is called. If not, Perl looks
at each parent class in turn. For File::MP3
, its only parent is
File
. If File::MP3
does not define the method, but File
does,
then Perl calls the method in File
.
If File
inherited from DataSource
, which inherited from Thing
,
then Perl would keep looking "up the chain" if necessary.
It is possible to explicitly call a parent method from a child:
The SUPER::
bit tells Perl to look for the print_info()
in the
File::MP3
class's inheritance chain. When it finds the parent class
that implements this method, the method is called.
We mentioned multiple inheritance earlier. The main problem with multiple inheritance is that it greatly complicates method resolution. See perlobj for more details.
Encapsulation is the idea that an object is opaque. When another developer uses your class, they don't need to know how it is implemented, they just need to know what it does.
Encapsulation is important for several reasons. First, it allows you to separate the public API from the private implementation. This means you can change that implementation without breaking the API.
Second, when classes are well encapsulated, they become easier to subclass. Ideally, a subclass uses the same APIs to access object data that its parent class uses. In reality, subclassing sometimes involves violating encapsulation, but a good API can minimize the need to do this.
We mentioned earlier that most Perl objects are implemented as hashes under the hood. The principle of encapsulation tells us that we should not rely on this. Instead, we should use accessor methods to access the data in that hash. The object systems that we recommend below all automate the generation of accessor methods. If you use one of them, you should never have to access the object as a hash directly.
In object-oriented code, we often find that one object references another object. This is called composition, or a has-a relationship.
Earlier, we mentioned that the File
class's last_mod_time
accessor could return a DateTime object. This is a perfect example
of composition. We could go even further, and make the path
and
content
accessors return objects as well. The File
class would
then be composed of several other objects.
Roles are something that a class does, rather than something that it is. Roles are relatively new to Perl, but have become rather popular. Roles are applied to classes. Sometimes we say that classes consume roles.
Roles are an alternative to inheritance for providing polymorphism.
Let's assume we have two classes, Radio
and Computer
. Both of
these things have on/off switches. We want to model that in our class
definitions.
We could have both classes inherit from a common parent, like
Machine
, but not all machines have on/off switches. We could create
a parent class called HasOnOffSwitch
, but that is very artificial.
Radios and computers are not specializations of this parent. This
parent is really a rather ridiculous creation.
This is where roles come in. It makes a lot of sense to create a
HasOnOffSwitch
role and apply it to both classes. This role would
define a known API like providing turn_on()
and turn_off()
methods.
Perl does not have any built-in way to express roles. In the past, people just bit the bullet and used multiple inheritance. Nowadays, there are several good choices on CPAN for using roles.
Object Orientation is not the best solution to every problem. In Perl Best Practices (copyright 2004, Published by O'Reilly Media, Inc.), Damian Conway provides a list of criteria to use when deciding if OO is the right fit for your problem:
The system being designed is large, or is likely to become large.
The data can be aggregated into obvious structures, especially if there's a large amount of data in each aggregate.
The various types of data aggregate form a natural hierarchy that facilitates the use of inheritance and polymorphism.
You have a piece of data on which many different operations are applied.
You need to perform the same general operations on related types of data, but with slight variations depending on the specific type of data the operations are applied to.
It's likely you'll have to add new data types later.
The typical interactions between pieces of data are best represented by operators.
The implementation of individual components of the system is likely to change over time.
The system design is already object-oriented.
Large numbers of other programmers will be using your code modules.
As we mentioned before, Perl's built-in OO system is very minimal, but also quite flexible. Over the years, many people have developed systems which build on top of Perl's built-in system to provide more features and convenience.
We strongly recommend that you use one of these systems. Even the most minimal of them eliminates a lot of repetitive boilerplate. There's really no good reason to write your classes from scratch in Perl.
If you are interested in the guts underlying these systems, check out perlobj.
Moose bills itself as a "postmodern object system for Perl 5". Don't be scared, the "postmodern" label is a callback to Larry's description of Perl as "the first postmodern computer language".
Moose
provides a complete, modern OO system. Its biggest influence
is the Common Lisp Object System, but it also borrows ideas from
Smalltalk and several other languages. Moose
was created by Stevan
Little, and draws heavily from his work on the Perl 6 OO design.
Here is our File
class using Moose
:
Moose
provides a number of features:
Moose
provides a layer of declarative "sugar" for defining classes.
That sugar is just a set of exported functions that make declaring how
your class works simpler and more palatable. This lets you describe
what your class is, rather than having to tell Perl how to
implement your class.
The has()
subroutine declares an attribute, and Moose
automatically creates accessors for these attributes. It also takes
care of creating a new()
method for you. This constructor knows
about the attributes you declared, so you can set them when creating a
new File
.
Moose
lets you define roles the same way you define classes:
In the example above, you can see that we passed isa => 'Bool'
to has()
when creating our is_on
attribute. This tells Moose
that this attribute must be a boolean value. If we try to set it to an
invalid value, our code will throw an error.
Perl's built-in introspection features are fairly minimal. Moose
builds on top of them and creates a full introspection layer for your
classes. This lets you ask questions like "what methods does the File
class implement?" It also lets you modify your classes
programmatically.
Moose
describes itself using its own introspection API. Besides
being a cool trick, this means that you can extend Moose
using
Moose
itself.
There is a rich ecosystem of Moose
extensions on CPAN under the
MooseX
namespace. In addition, many modules on CPAN already use Moose
,
providing you with lots of examples to learn from.
Moose
is a very powerful tool, and we can't cover all of its
features here. We encourage you to learn more by reading the Moose
documentation, starting with
Moose::Manual.
Of course, Moose
isn't perfect.
Moose
can make your code slower to load. Moose
itself is not
small, and it does a lot of code generation when you define your
class. This code generation means that your runtime code is as fast as
it can be, but you pay for this when your modules are first loaded.
This load time hit can be a problem when startup speed is important, such as with a command-line script or a "plain vanilla" CGI script that must be loaded each time it is executed.
Before you panic, know that many people do use Moose
for
command-line tools and other startup-sensitive code. We encourage you
to try Moose
out first before worrying about startup speed.
Moose
also has several dependencies on other modules. Most of these
are small stand-alone modules, a number of which have been spun off
from Moose
. Moose
itself, and some of its dependencies, require a
compiler. If you need to install your software on a system without a
compiler, or if having any dependencies is a problem, then Moose
may not be right for you.
If you try Moose
and find that one of these issues is preventing you
from using Moose
, we encourage you to consider Moo next. Moo
implements a subset of Moose
's functionality in a simpler package.
For most features that it does implement, the end-user API is
identical to Moose
, meaning you can switch from Moo
to
Moose
quite easily.
Moo
does not implement most of Moose
's introspection API, so it's
often faster when loading your modules. Additionally, none of its
dependencies require XS, so it can be installed on machines without a
compiler.
One of Moo
's most compelling features is its interoperability with
Moose
. When someone tries to use Moose
's introspection API on a
Moo
class or role, it is transparently inflated into a Moose
class or role. This makes it easier to incorporate Moo
-using code
into a Moose
code base and vice versa.
For example, a Moose
class can subclass a Moo
class using
extends
or consume a Moo
role using with
.
The Moose
authors hope that one day Moo
can be made obsolete by
improving Moose
enough, but for now it provides a worthwhile
alternative to Moose
.
Class::Accessor is the polar opposite of Moose
. It provides very
few features, nor is it self-hosting.
It is, however, very simple, pure Perl, and it has no non-core dependencies. It also provides a "Moose-like" API on demand for the features it supports.
Even though it doesn't do much, it is still preferable to writing your own classes from scratch.
Here's our File
class with Class::Accessor
:
The antlers
import flag tells Class::Accessor
that you want to
define your attributes using Moose
-like syntax. The only parameter
that you can pass to has
is is
. We recommend that you use this
Moose-like syntax if you choose Class::Accessor
since it means you
will have a smoother upgrade path if you later decide to move to
Moose
.
Like Moose
, Class::Accessor
generates accessor methods and a
constructor for your class.
Finally, we have Object::Tiny. This module truly lives up to its name. It has an incredibly minimal API and absolutely no dependencies (core or not). Still, we think it's a lot easier to use than writing your own OO code from scratch.
Here's our File
class once more:
That's it!
With Object::Tiny
, all accessors are read-only. It generates a
constructor for you, as well as the accessors you define.
As we mentioned before, roles provide an alternative to inheritance, but Perl does not have any built-in role support. If you choose to use Moose, it comes with a full-fledged role implementation. However, if you use one of our other recommended OO modules, you can still use roles with Role::Tiny
Role::Tiny
provides some of the same features as Moose's role
system, but in a much smaller package. Most notably, it doesn't support
any sort of attribute declaration, so you have to do that by hand.
Still, it's useful, and works well with Class::Accessor
and
Object::Tiny
Here's a brief recap of the options we covered:
Moose
is the maximal option. It has a lot of features, a big
ecosystem, and a thriving user base. We also covered Moo briefly.
Moo
is Moose
lite, and a reasonable alternative when Moose
doesn't work for your application.
Class::Accessor
does a lot less than Moose
, and is a nice
alternative if you find Moose
overwhelming. It's been around a long
time and is well battle-tested. It also has a minimal Moose
compatibility mode which makes moving from Class::Accessor
to
Moose
easy.
Object::Tiny
is the absolute minimal option. It has no dependencies,
and almost no syntax to learn. It's a good option for a super minimal
environment and for throwing something together quickly without having
to worry about details.
Use Role::Tiny
with Class::Accessor
or Object::Tiny
if you
find yourself considering multiple inheritance. If you go with
Moose
, it comes with its own role implementation.
There are literally dozens of other OO-related modules on CPAN besides those covered here, and you're likely to run across one or more of them if you work with other people's code.
In addition, plenty of code in the wild does all of its OO "by hand", using just the Perl built-in OO features. If you need to maintain such code, you should read perlobj to understand exactly how Perl's built-in OO works.
As we said before, Perl's minimal OO system has led to a profusion of OO systems on CPAN. While you can still drop down to the bare metal and write your classes by hand, there's really no reason to do that with modern Perl.
For small systems, Object::Tiny and Class::Accessor both provide minimal object systems that take care of basic boilerplate for you.
For bigger projects, Moose provides a rich set of features that will let you focus on implementing your business logic.
We encourage you to play with and evaluate Moose, Class::Accessor, and Object::Tiny to see which OO system is right for you.
perlop - Perl operators and precedence
Operator precedence and associativity work in Perl more or less like they do in mathematics.
Operator precedence means some operators are evaluated before
others. For example, in 2 + 4 * 5
, the multiplication has higher
precedence so 4 * 5
is evaluated first yielding 2 + 20 ==
22
and not 6 * 5 == 30
.
Operator associativity defines what happens if a sequence of the
same operators is used one after another: whether the evaluator will
evaluate the left operations first or the right. For example, in 8
- 4 - 2
, subtraction is left associative so Perl evaluates the
expression left to right. 8 - 4
is evaluated first making the
expression 4 - 2 == 2
and not 8 - 2 == 6
.
Perl operators have the following associativity and precedence, listed from highest precedence to lowest. Operators borrowed from C keep the same precedence relationship with each other, even where C's precedence is slightly screwy. (This makes learning Perl easier for C folks.) With very few exceptions, these all operate on scalar values only, not array values.
- left terms and list operators (leftward)
- left ->
- nonassoc ++ --
- right **
- right ! ~ \ and unary + and -
- left =~ !~
- left * / % x
- left + - .
- left << >>
- nonassoc named unary operators
- nonassoc < > <= >= lt gt le ge
- nonassoc == != <=> eq ne cmp ~~
- left &
- left | ^
- left &&
- left || //
- nonassoc .. ...
- right ?:
- right = += -= *= etc. goto last next redo dump
- left , =>
- nonassoc list operators (rightward)
- right not
- left and
- left or xor
In the following sections, these operators are covered in precedence order.
Many operators can be overloaded for objects. See overload.
A TERM has the highest precedence in Perl. They include variables, quote and quote-like operators, any expression in parentheses, and any function whose arguments are parenthesized. Actually, there aren't really functions in this sense, just list operators and unary operators behaving as functions because you put parentheses around the arguments. These are all documented in perlfunc.
If any list operator (print(), etc.) or any unary operator (chdir(), etc.) is followed by a left parenthesis as the next token, the operator and arguments within parentheses are taken to be of highest precedence, just like a normal function call.
In the absence of parentheses, the precedence of list operators such as
print, sort, or chmod is either very high or very low depending on
whether you are looking at the left side or the right side of the operator.
For example, in
the commas on the right of the sort are evaluated before the sort, but the commas on the left are evaluated after. In other words, list operators tend to gobble up all arguments that follow, and then act like a simple TERM with regard to the preceding expression. Be careful with parentheses:
Also note that
- print ($foo & 255) + 1, "\n";
probably doesn't do what you expect at first glance. The parentheses
enclose the argument list for print which is evaluated (printing
the result of $foo & 255
). Then one is added to the return value
of print (usually 1). The result is something like this:
- 1 + 1, "\n"; # Obviously not what you meant.
To do what you meant properly, you must write:
- print(($foo & 255) + 1, "\n");
See Named Unary Operators for more discussion of this.
Also parsed as terms are the do {}
and eval {}
constructs, as
well as subroutine and method calls, and the anonymous
constructors []
and {}
.
See also Quote and Quote-like Operators toward the end of this section, as well as I/O Operators.
"->
" is an infix dereference operator, just as it is in C
and C++. If the right side is either a [...]
, {...}
, or a
(...)
subscript, then the left side must be either a hard or
symbolic reference to an array, a hash, or a subroutine respectively.
(Or technically speaking, a location capable of holding a hard
reference, if it's an array or hash reference being used for
assignment.) See perlreftut and perlref.
Otherwise, the right side is a method name or a simple scalar variable containing either the method name or a subroutine reference, and the left side must be either an object (a blessed reference) or a class name (that is, a package name). See perlobj.
"++" and "--" work as in C. That is, if placed before a variable, they increment or decrement the variable by one before returning the value, and if placed after, increment or decrement after returning the value.
Note that just as in C, Perl doesn't define when the variable is incremented or decremented. You just know it will be done sometime before or after the value is returned. This also means that modifying a variable twice in the same statement will lead to undefined behavior. Avoid statements like:
- $i = $i ++;
- print ++ $i + $i ++;
Perl will not guarantee what the result of the above statements is.
The auto-increment operator has a little extra builtin magic to it. If
you increment a variable that is numeric, or that has ever been used in
a numeric context, you get a normal increment. If, however, the
variable has been used in only string contexts since it was set, and
has a value that is not the empty string and matches the pattern
/^[a-zA-Z]*[0-9]*\z/
, the increment is done as a string, preserving each
character within its range, with carry:
undef is always treated as numeric, and in particular is changed
to 0
before incrementing (so that a post-increment of an undef value
will return 0
rather than undef).
The auto-decrement operator is not magical.
Binary "**" is the exponentiation operator. It binds even more tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is implemented using C's pow(3) function, which actually works on doubles internally.)
Unary "!" performs logical negation, that is, "not". See also not
for a lower
precedence version of this.
Unary "-" performs arithmetic negation if the operand is numeric, including any string that looks like a number. If the operand is an identifier, a string consisting of a minus sign concatenated with the identifier is returned. Otherwise, if the string starts with a plus or minus, a string starting with the opposite sign is returned. One effect of these rules is that -bareword is equivalent to the string "-bareword". If, however, the string begins with a non-alphabetic character (excluding "+" or "-"), Perl will attempt to convert the string to a numeric and the arithmetic negation is performed. If the string cannot be cleanly converted to a numeric, Perl will give the warning Argument "the string" isn't numeric in negation (-) at ....
Unary "~" performs bitwise negation, that is, 1's complement. For
example, 0666 & ~027
is 0640. (See also Integer Arithmetic and
Bitwise String Operators.) Note that the width of the result is
platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64
bits wide on a 64-bit platform, so if you are expecting a certain bit
width, remember to use the "&" operator to mask off the excess bits.
When complementing strings, if all characters have ordinal values under
256, then their complements will, also. But if they do not, all
characters will be in either 32- or 64-bit complements, depending on your
architecture. So for example, ~"\x{3B1}"
is "\x{FFFF_FC4E}"
on
32-bit machines and "\x{FFFF_FFFF_FFFF_FC4E}"
on 64-bit machines.
Unary "+" has no effect whatsoever, even on strings. It is useful syntactically for separating a function name from a parenthesized expression that would otherwise be interpreted as the complete list of function arguments. (See examples above under Terms and List Operators (Leftward).)
Unary "\" creates a reference to whatever follows it. See perlreftut and perlref. Do not confuse this behavior with the behavior of backslash within a string, although both forms do convey the notion of protecting the next thing from interpolation.
Binary "=~" binds a scalar expression to a pattern match. Certain operations
search or modify the string $_ by default. This operator makes that kind
of operation work on some other string. The right argument is a search
pattern, substitution, or transliteration. The left argument is what is
supposed to be searched, substituted, or transliterated instead of the default
$_. When used in scalar context, the return value generally indicates the
success of the operation. The exceptions are substitution (s///)
and transliteration (y///) with the /r
(non-destructive) option,
which cause the return value to be the result of the substitution.
Behavior in list context depends on the particular operator.
See Regexp Quote-Like Operators for details and perlretut for
examples using these operators.
If the right argument is an expression rather than a search pattern, substitution, or transliteration, it is interpreted as a search pattern at run time. Note that this means that its contents will be interpolated twice, so
- '\\' =~ q'\\';
is not ok, as the regex engine will end up trying to compile the
pattern \
, which it will consider a syntax error.
Binary "!~" is just like "=~" except the return value is negated in the logical sense.
Binary "!~" with a non-destructive substitution (s///r) or transliteration (y///r) is a syntax error.
Binary "*" multiplies two numbers.
Binary "/" divides two numbers.
Binary "%" is the modulo operator, which computes the division
remainder of its first argument with respect to its second argument.
Given integer
operands $a
and $b
: If $b
is positive, then $a % $b
is
$a
minus the largest multiple of $b
less than or equal to
$a
. If $b
is negative, then $a % $b
is $a
minus the
smallest multiple of $b
that is not less than $a
(that is, the
result will be less than or equal to zero). If the operands
$a
and $b
are floating point values and the absolute value of
$b
(that is abs($b)) is less than (UV_MAX + 1)
, only
the integer portion of $a
and $b
will be used in the operation
(Note: here UV_MAX
means the maximum of the unsigned integer type).
If the absolute value of the right operand (abs($b)) is greater than
or equal to (UV_MAX + 1)
, "%" computes the floating-point remainder
$r
in the equation ($r = $a - $i*$b)
where $i
is a certain
integer that makes $r
have the same sign as the right operand
$b
(not as the left operand $a
like C function fmod()
)
and the absolute value less than that of $b
.
Note that when use integer
is in scope, "%" gives you direct access
to the modulo operator as implemented by your C compiler. This
operator is not as well defined for negative operands, but it will
execute faster.
Binary "x" is the repetition operator. In scalar context or if the left
operand is not enclosed in parentheses, it returns a string consisting
of the left operand repeated the number of times specified by the right
operand. In list context, if the left operand is enclosed in
parentheses or is a list formed by qw/STRING/, it repeats the list.
If the right operand is zero or negative, it returns an empty string
or an empty list, depending on the context.
Binary +
returns the sum of two numbers.
Binary -
returns the difference of two numbers.
Binary . concatenates two strings.
Binary <<
returns the value of its left argument shifted left by the
number of bits specified by the right argument. Arguments should be
integers. (See also Integer Arithmetic.)
Binary >>
returns the value of its left argument shifted right by
the number of bits specified by the right argument. Arguments should
be integers. (See also Integer Arithmetic.)
Note that both <<
and >>
in Perl are implemented directly using
<<
and >>
in C. If use integer
(see Integer Arithmetic) is
in force then signed C integers are used, else unsigned C integers are
used. Either way, the implementation isn't going to generate results
larger than the size of the integer type Perl was built with (32 bits
or 64 bits).
The result of overflowing the range of the integers is undefined
because it is undefined also in C. In other words, using 32-bit
integers, 1 << 32
is undefined. Shifting by a negative number
of bits is also undefined.
If you get tired of being subject to your platform's native integers,
the use bigint
pragma neatly sidesteps the issue altogether:
The various named unary operators are treated as functions with one argument, with optional parentheses.
If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
is followed by a left parenthesis as the next token, the operator and
arguments within parentheses are taken to be of highest precedence,
just like a normal function call. For example,
because named unary operators are higher precedence than ||:
but, because * is higher precedence than named operators:
Regarding precedence, the filetest operators, like -f
, -M
, etc. are
treated like named unary operators, but they don't follow this functional
parenthesis rule. That means, for example, that -f($file).".bak"
is
equivalent to -f "$file.bak"
.
See also Terms and List Operators (Leftward).
Perl operators that return true or false generally return values
that can be safely used as numbers. For example, the relational
operators in this section and the equality operators in the next
one return 1
for true and a special version of the defined empty
string, ""
, which counts as a zero but is exempt from warnings
about improper numeric conversions, just as "0 but true"
is.
Binary "<" returns true if the left argument is numerically less than the right argument.
Binary ">" returns true if the left argument is numerically greater than the right argument.
Binary "<=" returns true if the left argument is numerically less than or equal to the right argument.
Binary ">=" returns true if the left argument is numerically greater than or equal to the right argument.
Binary "lt" returns true if the left argument is stringwise less than the right argument.
Binary "gt" returns true if the left argument is stringwise greater than the right argument.
Binary "le" returns true if the left argument is stringwise less than or equal to the right argument.
Binary "ge" returns true if the left argument is stringwise greater than or equal to the right argument.
Binary "==" returns true if the left argument is numerically equal to the right argument.
Binary "!=" returns true if the left argument is numerically not equal to the right argument.
Binary "<=>" returns -1, 0, or 1 depending on whether the left argument is numerically less than, equal to, or greater than the right argument. If your platform supports NaNs (not-a-numbers) as numeric values, using them with "<=>" returns undef. NaN is not "<", "==", ">", "<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN returns true, as does NaN != anything else. If your platform doesn't support NaNs then NaN is just a string with numeric value 0.
(Note that the bigint, bigrat, and bignum pragmas all support "NaN".)
Binary "eq" returns true if the left argument is stringwise equal to the right argument.
Binary "ne" returns true if the left argument is stringwise not equal to the right argument.
Binary "cmp" returns -1, 0, or 1 depending on whether the left argument is stringwise less than, equal to, or greater than the right argument.
Binary "~~" does a smartmatch between its arguments. Smart matching is described in the next section.
"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
by the current locale if a legacy use locale
(but not
use locale ':not_characters'
) is in effect. See
perllocale. Do not mix these with Unicode, only with legacy binary
encodings. The standard Unicode::Collate and
Unicode::Collate::Locale modules offer much more powerful solutions to
collation issues.
First available in Perl 5.10.1 (the 5.10.0 version behaved differently),
binary ~~
does a "smartmatch" between its arguments. This is mostly
used implicitly in the when
construct described in perlsyn, although
not all when
clauses call the smartmatch operator. Unique among all of
Perl's operators, the smartmatch operator can recurse.
It is also unique in that all other Perl operators impose a context (usually string or numeric context) on their operands, autoconverting those operands to those imposed contexts. In contrast, smartmatch infers contexts from the actual types of its operands and uses that type information to select a suitable comparison mechanism.
The ~~
operator compares its operands "polymorphically", determining how
to compare them according to their actual types (numeric, string, array,
hash, etc.) Like the equality operators with which it shares the same
precedence, ~~
returns 1 for true and ""
for false. It is often best
read aloud as "in", "inside of", or "is contained in", because the left
operand is often looked for inside the right operand. That makes the
order of the operands to the smartmatch operand often opposite that of
the regular match operator. In other words, the "smaller" thing is usually
placed in the left operand and the larger one in the right.
The behavior of a smartmatch depends on what type of things its arguments are, as determined by the following table. The first row of the table whose types apply determines the smartmatch behavior. Because what actually happens is mostly determined by the type of the second operand, the table is sorted on the right operand instead of on the left.
- Left Right Description and pseudocode
- ===============================================================
- Any undef check whether Any is undefined
- like: !defined Any
- Any Object invoke ~~ overloading on Object, or die
- Right operand is an ARRAY:
- Left Right Description and pseudocode
- ===============================================================
- ARRAY1 ARRAY2 recurse on paired elements of ARRAY1 and ARRAY2[2]
- like: (ARRAY1[0] ~~ ARRAY2[0])
- && (ARRAY1[1] ~~ ARRAY2[1]) && ...
- HASH ARRAY any ARRAY elements exist as HASH keys
- like: grep { exists HASH->{$_} } ARRAY
- Regexp ARRAY any ARRAY elements pattern match Regexp
- like: grep { /Regexp/ } ARRAY
- undef ARRAY undef in ARRAY
- like: grep { !defined } ARRAY
- Any ARRAY smartmatch each ARRAY element[3]
- like: grep { Any ~~ $_ } ARRAY
- Right operand is a HASH:
- Left Right Description and pseudocode
- ===============================================================
- HASH1 HASH2 all same keys in both HASHes
- like: keys HASH1 ==
- grep { exists HASH2->{$_} } keys HASH1
- ARRAY HASH any ARRAY elements exist as HASH keys
- like: grep { exists HASH->{$_} } ARRAY
- Regexp HASH any HASH keys pattern match Regexp
- like: grep { /Regexp/ } keys HASH
- undef HASH always false (undef can't be a key)
- like: 0 == 1
- Any HASH HASH key existence
- like: exists HASH->{Any}
- Right operand is CODE:
- Left Right Description and pseudocode
- ===============================================================
- ARRAY CODE sub returns true on all ARRAY elements[1]
- like: !grep { !CODE->($_) } ARRAY
- HASH CODE sub returns true on all HASH keys[1]
- like: !grep { !CODE->($_) } keys HASH
- Any CODE sub passed Any returns true
- like: CODE->(Any)
Right operand is a Regexp:
- Left Right Description and pseudocode
- ===============================================================
- ARRAY Regexp any ARRAY elements match Regexp
- like: grep { /Regexp/ } ARRAY
- HASH Regexp any HASH keys match Regexp
- like: grep { /Regexp/ } keys HASH
- Any Regexp pattern match
- like: Any =~ /Regexp/
- Other:
- Left Right Description and pseudocode
- ===============================================================
- Object Any invoke ~~ overloading on Object,
- or fall back to...
- Any Num numeric equality
- like: Any == Num
- Num nummy[4] numeric equality
- like: Num == nummy
- undef Any check whether undefined
- like: !defined(Any)
- Any Any string equality
- like: Any eq Any
Notes:
The smartmatch implicitly dereferences any non-blessed hash or array
reference, so the HASH and ARRAY entries apply in those cases.
For blessed references, the Object entries apply. Smartmatches
involving hashes only consider hash keys, never hash values.
The "like" code entry is not always an exact rendition. For example, the
smartmatch operator short-circuits whenever possible, but grep does
not. Also, grep in scalar context returns the number of matches, but
~~
returns only true or false.
Unlike most operators, the smartmatch operator knows to treat undef
specially:
Each operand is considered in a modified scalar context, the modification being that array and hash variables are passed by reference to the operator, which implicitly dereferences them. Both elements of each pair are the same:
- use v5.10.1;
- my %hash = (red => 1, blue => 2, green => 3,
- orange => 4, yellow => 5, purple => 6,
- black => 7, grey => 8, white => 9);
- my @array = qw(red blue green);
- say "some array elements in hash keys" if @array ~~ %hash;
- say "some array elements in hash keys" if \@array ~~ \%hash;
- say "red in array" if "red" ~~ @array;
- say "red in array" if "red" ~~ \@array;
- say "some keys end in e" if /e$/ ~~ %hash;
- say "some keys end in e" if /e$/ ~~ \%hash;
Two arrays smartmatch if each element in the first array smartmatches (that is, is "in") the corresponding element in the second array, recursively.
Because the smartmatch operator recurses on nested arrays, this will still report that "red" is in the array.
If two arrays smartmatch each other, then they are deep copies of each others' values, as this example reports:
- use v5.12.0;
- my @a = (0, 1, 2, [3, [4, 5], 6], 7);
- my @b = (0, 1, 2, [3, [4, 5], 6], 7);
- if (@a ~~ @b && @b ~~ @a) {
- say "a and b are deep copies of each other";
- }
- elsif (@a ~~ @b) {
- say "a smartmatches in b";
- }
- elsif (@b ~~ @a) {
- say "b smartmatches in a";
- }
- else {
- say "a and b don't smartmatch each other at all";
- }
If you were to set $b[3] = 4
, then instead of reporting that "a and b
are deep copies of each other", it now reports that "b smartmatches in a".
That because the corresponding position in @a
contains an array that
(eventually) has a 4 in it.
Smartmatching one hash against another reports whether both contain the same keys, no more and no less. This could be used to see whether two records have the same field names, without caring what values those fields might have. For example:
or, if other non-required fields are allowed, use ARRAY ~~ HASH:
The smartmatch operator is most often used as the implicit operator of a
when
clause. See the section on "Switch Statements" in perlsyn.
To avoid relying on an object's underlying representation, if the
smartmatch's right operand is an object that doesn't overload ~~
,
it raises the exception "Smartmatching a non-overloaded object
breaks encapsulation
". That's because one has no business digging
around to see whether something is "in" an object. These are all
illegal on objects without a ~~
overload:
- %hash ~~ $object
- 42 ~~ $object
- "fred" ~~ $object
However, you can change the way an object is smartmatched by overloading
the ~~
operator. This is allowed to extend the usual smartmatch semantics.
For objects that do have an ~~
overload, see overload.
Using an object as the left operand is allowed, although not very useful.
Smartmatching rules take precedence over overloading, so even if the
object in the left operand has smartmatch overloading, this will be
ignored. A left operand that is a non-overloaded object falls back on a
string or numeric comparison of whatever the ref operator returns. That
means that
- $object ~~ X
does not invoke the overload method with X as an argument.
Instead the above table is consulted as normal, and based on the type of
X, overloading may or may not be invoked. For simple strings or
numbers, in becomes equivalent to this:
For example, this reports that the handle smells IOish (but please don't really do this!):
That's because it treats $fh
as a string like
"IO::Handle=GLOB(0x8039e0)"
, then pattern matches against that.
Binary "&" returns its operands ANDed together bit by bit. (See also Integer Arithmetic and Bitwise String Operators.)
Note that "&" has lower priority than relational operators, so for example the parentheses are essential in a test like
Binary "|" returns its operands ORed together bit by bit. (See also Integer Arithmetic and Bitwise String Operators.)
Binary "^" returns its operands XORed together bit by bit. (See also Integer Arithmetic and Bitwise String Operators.)
Note that "|" and "^" have lower priority than relational operators, so for example the brackets are essential in a test like
Binary "&&" performs a short-circuit logical AND operation. That is, if the left operand is false, the right operand is not even evaluated. Scalar or list context propagates down to the right operand if it is evaluated.
Binary "||" performs a short-circuit logical OR operation. That is, if the left operand is true, the right operand is not even evaluated. Scalar or list context propagates down to the right operand if it is evaluated.
Although it has no direct equivalent in C, Perl's //
operator is related
to its C-style or. In fact, it's exactly the same as ||, except that it
tests the left hand side's definedness instead of its truth. Thus,
EXPR1 // EXPR2
returns the value of EXPR1
if it's defined,
otherwise, the value of EXPR2
is returned. (EXPR1
is evaluated
in scalar context, EXPR2
in the context of //
itself). Usually,
this is the same result as defined(EXPR1) ? EXPR1 : EXPR2
(except that
the ternary-operator form can be used as a lvalue, while EXPR1 // EXPR2
cannot). This is very useful for
providing default values for variables. If you actually want to test if
at least one of $a
and $b
is defined, use defined($a // $b)
.
The ||, //
and && operators return the last value evaluated
(unlike C's || and &&, which return 0 or 1). Thus, a reasonably
portable way to find out the home directory might be:
In particular, this means that you shouldn't use this for selecting between two aggregates for assignment:
- @a = @b || @c; # this is wrong
- @a = scalar(@b) || @c; # really meant this
- @a = @b ? @b : @c; # this works fine, though
As alternatives to && and || when used for
control flow, Perl provides the and
and or
operators (see below).
The short-circuit behavior is identical. The precedence of "and"
and "or" is much lower, however, so that you can safely use them after a
list operator without the need for parentheses:
With the C-style operators that would have been written like this:
It would be even more readable to write that this way:
Using "or" for assignment is unlikely to do what you want; see below.
Binary ".." is the range operator, which is really two different
operators depending on the context. In list context, it returns a
list of values counting (up by ones) from the left value to the right
value. If the left value is greater than the right value then it
returns the empty list. The range operator is useful for writing
foreach (1..10)
loops and for doing slice operations on arrays. In
the current implementation, no temporary array is created when the
range operator is used as the expression in foreach
loops, but older
versions of Perl might burn a lot of memory when you write something
like this:
- for (1 .. 1_000_000) {
- # code
- }
The range operator also works on strings, using the magical auto-increment, see below.
In scalar context, ".." returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each ".." operator maintains its own boolean state, even across calls to a subroutine that contains it. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again. It doesn't become false till the next time the range operator is evaluated. It can test the right operand and become false on the same evaluation it became true (as in awk), but it still returns true once. If you don't want it to test the right operand until the next evaluation, as in sed, just use three dots ("...") instead of two. In all other regards, "..." behaves just like ".." does.
The right operand is not evaluated while the operator is in the "false" state, and the left operand is not evaluated while the operator is in the "true" state. The precedence is a little lower than || and &&. The value returned is either the empty string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each range encountered. The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can exclude the beginning point by waiting for the sequence number to be greater than 1.
If either operand of scalar ".." is a constant expression,
that operand is considered true if it is equal (==
) to the current
input line number (the $.
variable).
To be pedantic, the comparison is actually int(EXPR) == int(EXPR)
,
but that is only an issue if you use a floating point expression; when
implicitly using $.
as described in the previous paragraph, the
comparison is int(EXPR) == int($.)
which is only an issue when $.
is set to a floating point value and you are not reading from a file.
Furthermore, "span" .. "spat"
or 2.18 .. 3.14
will not do what
you want in scalar context because each of the operands are evaluated
using their integer representation.
Examples:
As a scalar operator:
- if (101 .. 200) { print; } # print 2nd hundred lines, short for
- # if ($. == 101 .. $. == 200) { print; }
- next LINE if (1 .. /^$/); # skip header lines, short for
- # next LINE if ($. == 1 .. /^$/);
- # (typically in a loop labeled LINE)
- s/^/> / if (/^$/ .. eof()); # quote body
- # parse mail messages
- while (<>) {
- $in_header = 1 .. /^$/;
- $in_body = /^$/ .. eof;
- if ($in_header) {
- # do something
- } else { # in body
- # do something else
- }
- } continue {
- close ARGV if eof; # reset $. each file
- }
Here's a simple example to illustrate the difference between the two range operators:
- @lines = (" - Foo",
- "01 - Bar",
- "1 - Baz",
- " - Quux");
- foreach (@lines) {
- if (/0/ .. /1/) {
- print "$_\n";
- }
- }
This program will print only the line containing "Bar". If
the range operator is changed to ...
, it will also print the
"Baz" line.
And now some examples as a list operator:
The range operator (in list context) makes use of the magical auto-increment algorithm if the operands are strings. You can say
- @alphabet = ("A" .. "Z");
to get all normal letters of the English alphabet, or
- $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
to get a hexadecimal digit, or
- @z2 = ("01" .. "31");
- print $z2[$mday];
to get dates with leading zeros.
If the final value specified is not in the sequence that the magical increment would produce, the sequence goes until the next value would be longer than the final value specified.
If the initial value specified isn't part of a magical increment
sequence (that is, a non-empty string matching /^[a-zA-Z]*[0-9]*\z/
),
only the initial value will be returned. So the following will only
return an alpha:
To get the 25 traditional lowercase Greek letters, including both sigmas, you could use this instead:
However, because there are many other lowercase Greek characters than
just those, to match lowercase Greek characters in a regular expression,
you would use the pattern /(?:(?=\p{Greek})\p{Lower})+/
.
Because each operand is evaluated in integer form, 2.18 .. 3.14
will
return two elements in list context.
- @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
Ternary "?:" is the conditional operator, just as in C. It works much like an if-then-else. If the argument before the ? is true, the argument before the : is returned, otherwise the argument after the : is returned. For example:
- printf "I have %d dog%s.\n", $n,
- ($n == 1) ? "" : "s";
Scalar or list context propagates downward into the 2nd or 3rd argument, whichever is selected.
- $a = $ok ? $b : $c; # get a scalar
- @a = $ok ? @b : @c; # get an array
- $a = $ok ? @b : @c; # oops, that's just a count!
The operator may be assigned to if both the 2nd and 3rd arguments are legal lvalues (meaning that you can assign to them):
- ($a_or_b ? $a : $b) = $c;
Because this operator produces an assignable result, using assignments without parentheses will get you in trouble. For example, this:
- $a % 2 ? $a += 10 : $a += 2
Really means this:
- (($a % 2) ? ($a += 10) : $a) += 2
Rather than this:
- ($a % 2) ? ($a += 10) : ($a += 2)
That should probably be written more simply as:
- $a += ($a % 2) ? 10 : 2;
"=" is the ordinary assignment operator.
Assignment operators work as in C. That is,
- $a += 2;
is equivalent to
- $a = $a + 2;
although without duplicating any side effects that dereferencing the lvalue might trigger, such as from tie(). Other assignment operators work similarly. The following are recognized:
- **= += *= &= <<= &&=
- -= /= |= >>= ||=
- .= %= ^= //=
- x=
Although these are grouped by family, they all have the precedence of assignment.
Unlike in C, the scalar assignment operator produces a valid lvalue. Modifying an assignment is equivalent to doing the assignment and then modifying the variable that was assigned to. This is useful for modifying a copy of something, like this:
- ($tmp = $global) =~ tr/13579/24680/;
Although as of 5.14, that can be also be accomplished this way:
- use v5.14;
- $tmp = ($global =~ tr/13579/24680/r);
Likewise,
- ($a += 2) *= 3;
is equivalent to
- $a += 2;
- $a *= 3;
Similarly, a list assignment in list context produces the list of lvalues assigned to, and a list assignment in scalar context returns the number of elements produced by the expression on the right hand side of the assignment.
Binary "," is the comma operator. In scalar context it evaluates its left argument, throws that value away, then evaluates its right argument and returns that value. This is just like C's comma operator.
In list context, it's just the list argument separator, and inserts both its arguments into the list. These arguments are also evaluated from left to right.
The =>
operator is a synonym for the comma except that it causes a
word on its left to be interpreted as a string if it begins with a letter
or underscore and is composed only of letters, digits and underscores.
This includes operands that might otherwise be interpreted as operators,
constants, single number v-strings or function calls. If in doubt about
this behavior, the left operand can be quoted explicitly.
Otherwise, the =>
operator behaves exactly as the comma operator
or list argument separator, according to context.
For example:
is equivalent to:
- my %h = ("FOO", 23);
It is NOT:
- my %h = ("something", 23);
The =>
operator is helpful in documenting the correspondence
between keys and values in hashes, and other paired elements in lists.
- %hash = ( $key => $value );
- login( $username => $password );
The special quoting behavior ignores precedence, and hence may apply to part of the left operand:
That example prints something like "1314363215shiftbbb", because the
=>
implicitly quotes the shift immediately on its left, ignoring
the fact that time.shift is the entire left operand.
On the right side of a list operator, the comma has very low precedence, such that it controls all comma-separated expressions found there. The only operators with lower precedence are the logical operators "and", "or", and "not", which may be used to evaluate calls to list operators without the need for parentheses:
However, some people find that code harder to read than writing it with parentheses:
in which case you might as well just use the more customary "||" operator:
See also discussion of list operators in Terms and List Operators (Leftward).
Unary "not" returns the logical negation of the expression to its right. It's the equivalent of "!" except for the very low precedence.
Binary "and" returns the logical conjunction of the two surrounding
expressions. It's equivalent to && except for the very low
precedence. This means that it short-circuits: the right
expression is evaluated only if the left expression is true.
Binary "or" returns the logical disjunction of the two surrounding
expressions. It's equivalent to || except for the very low precedence.
This makes it useful for control flow:
This means that it short-circuits: the right expression is evaluated
only if the left expression is false. Due to its precedence, you must
be careful to avoid using it as replacement for the || operator.
It usually works out better for flow control than in assignments:
- $a = $b or $c; # bug: this is wrong
- ($a = $b) or $c; # really means this
- $a = $b || $c; # better written this way
However, when it's a list-context assignment and you're trying to use
|| for control flow, you probably need "or" so that the assignment
takes higher precedence.
Then again, you could always use parentheses.
Binary xor
returns the exclusive-OR of the two surrounding expressions.
It cannot short-circuit (of course).
There is no low precedence operator for defined-OR.
Here is what C has that Perl doesn't:
Address-of operator. (But see the "\" operator for taking a reference.)
Dereference-address operator. (Perl's prefix dereferencing operators are typed: $, @, %, and &.)
Type-casting operator.
While we usually think of quotes as literal values, in Perl they
function as operators, providing various kinds of interpolating and
pattern matching capabilities. Perl provides customary quote characters
for these behaviors, but also provides a way for you to choose your
quote character for any of them. In the following table, a {}
represents
any pair of delimiters you choose.
- Customary Generic Meaning Interpolates
- '' q{} Literal no
- "" qq{} Literal yes
- `` qx{} Command yes*
- qw{} Word list no
- // m{} Pattern match yes*
- qr{} Pattern yes*
- s{}{} Substitution yes*
- tr{}{} Transliteration no (but see below)
- y{}{} Transliteration no (but see below)
- <<EOF here-doc yes*
- * unless the delimiter is ''.
Non-bracketing delimiters use the same character fore and aft, but the four sorts of ASCII brackets (round, angle, square, curly) all nest, which means that
- q{foo{bar}baz}
is the same as
- 'foo{bar}baz'
Note, however, that this does not always work for quoting Perl code:
- $s = q{ if($a eq "}") ... }; # WRONG
is a syntax error. The Text::Balanced
module (standard as of v5.8,
and from CPAN before then) is able to do this properly.
There can be whitespace between the operator and the quoting
characters, except when #
is being used as the quoting character.
q#foo# is parsed as the string foo
, while q #foo# is the
operator q followed by a comment. Its argument will be taken
from the next line. This allows you to write:
- s {foo} # Replace foo
- {bar} # with bar.
The following escape sequences are available in constructs that interpolate, and in transliterations:
- Sequence Note Description
- \t tab (HT, TAB)
- \n newline (NL)
- \r return (CR)
- \f form feed (FF)
- \b backspace (BS)
- \a alarm (bell) (BEL)
- \e escape (ESC)
- \x{263A} [1,8] hex char (example: SMILEY)
- \x1b [2,8] restricted range hex char (example: ESC)
- \N{name} [3] named Unicode character or character sequence
- \N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON)
- \c[ [5] control char (example: chr(27))
- \o{23072} [6,8] octal char (example: SMILEY)
- \033 [7,8] restricted range octal char (example: ESC)
The result is the character specified by the hexadecimal number between the braces. See [8] below for details on which character.
Only hexadecimal digits are valid between the braces. If an invalid character is encountered, a warning will be issued and the invalid character and all subsequent characters (valid or invalid) within the braces will be discarded.
If there are no valid digits between the braces, the generated character is
the NULL character (\x{00}
). However, an explicit empty brace (\x{}
)
will not cause a warning (currently).
The result is the character specified by the hexadecimal number in the range 0x00 to 0xFF. See [8] below for details on which character.
Only hexadecimal digits are valid following \x
. When \x
is followed
by fewer than two valid digits, any valid digits will be zero-padded. This
means that \x7
will be interpreted as \x07
, and a lone <\x> will be
interpreted as \x00
. Except at the end of a string, having fewer than
two valid digits will result in a warning. Note that although the warning
says the illegal character is ignored, it is only ignored as part of the
escape and will still be used as the subsequent character in the string.
For example:
The result is the Unicode character or character sequence given by name. See charnames.
\N{U+hexadecimal number} means the Unicode character whose Unicode code
point is hexadecimal number.
The character following \c
is mapped to some other character as shown in the
table:
In other words, it's the character whose code point has had 64 xor'd with
its uppercase. \c?
is DELETE because ord("?") ^ 64
is 127, and
\c@
is NULL because the ord of "@" is 64, so xor'ing 64 itself produces 0.
Also, \c\X yields chr(28) . "X" for any X, but cannot come at the
end of a string, because the backslash would be parsed as escaping the end
quote.
On ASCII platforms, the resulting characters from the list above are the complete set of ASCII controls. This isn't the case on EBCDIC platforms; see OPERATOR DIFFERENCES in perlebcdic for the complete list of what these sequences mean on both ASCII and EBCDIC platforms.
Use of any other character following the "c" besides those listed above is discouraged, and some are deprecated with the intention of removing those in a later Perl version. What happens for any of these other characters currently though, is that the value is derived by xor'ing with the seventh bit, which is 64.
To get platform independent controls, you can use \N{...}
.
The result is the character specified by the octal number between the braces. See [8] below for details on which character.
If a character that isn't an octal digit is encountered, a warning is raised, and the value is based on the octal digits before it, discarding it and all following characters up to the closing brace. It is a fatal error if there are no octal digits at all.
The result is the character specified by the three-digit octal number in the range 000 to 777 (but best to not use above 077, see next paragraph). See [8] below for details on which character.
Some contexts allow 2 or even 1 digit, but any usage without exactly
three digits, the first being a zero, may give unintended results. (For
example, in a regular expression it may be confused with a backreference;
see Octal escapes in perlrebackslash.) Starting in Perl 5.14, you may
use \o{}
instead, which avoids all these problems. Otherwise, it is best to
use this construct only for ordinals \077
and below, remembering to pad to
the left with zeros to make three digits. For larger ordinals, either use
\o{}
, or convert to something else, such as to hex and use \x{}
instead.
Having fewer than 3 digits may lead to a misleading warning message that says
that what follows is ignored. For example, "\128"
in the ASCII character set
is equivalent to the two characters "\n8"
, but the warning Illegal octal
digit '8' ignored
will be thrown. If "\n8"
is what you want, you can
avoid this warning by padding your octal number with 0
's: "\0128"
.
Several constructs above specify a character by a number. That number
gives the character's position in the character set encoding (indexed from 0).
This is called synonymously its ordinal, code position, or code point. Perl
works on platforms that have a native encoding currently of either ASCII/Latin1
or EBCDIC, each of which allow specification of 256 characters. In general, if
the number is 255 (0xFF, 0377) or below, Perl interprets this in the platform's
native encoding. If the number is 256 (0x100, 0400) or above, Perl interprets
it as a Unicode code point and the result is the corresponding Unicode
character. For example \x{50}
and \o{120}
both are the number 80 in
decimal, which is less than 256, so the number is interpreted in the native
character set encoding. In ASCII the character in the 80th position (indexed
from 0) is the letter "P", and in EBCDIC it is the ampersand symbol "&".
\x{100}
and \o{400}
are both 256 in decimal, so the number is interpreted
as a Unicode code point no matter what the native encoding is. The name of the
character in the 256th position (indexed by 0) in Unicode is
LATIN CAPITAL LETTER A WITH MACRON
.
There are a couple of exceptions to the above rule. \N{U+hex number} is
always interpreted as a Unicode code point, so that \N{U+0050}
is "P" even
on EBCDIC platforms. And if use encoding is in effect, the
number is considered to be in that encoding, and is translated from that into
the platform's native encoding if there is a corresponding native character;
otherwise to Unicode.
NOTE: Unlike C and other languages, Perl has no \v
escape sequence for
the vertical tab (VT, which is 11 in both ASCII and EBCDIC), but you may
use \ck
or
\x0b
. (\v
does have meaning in regular expression patterns in Perl, see perlre.)
The following escape sequences are available in constructs that interpolate, but not in transliterations.
- \l lowercase next character only
- \u titlecase (not uppercase!) next character only
- \L lowercase all characters till \E or end of string
- \U uppercase all characters till \E or end of string
- \F foldcase all characters till \E or end of string
- \Q quote (disable) pattern metacharacters till \E or
- end of string
- \E end either case modification or quoted section
- (whichever was last seen)
See quotemeta for the exact definition of characters that
are quoted by \Q
.
\L
, \U
, \F
, and \Q
can stack, in which case you need one
\E
for each. For example:
- say"This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
- This quoting\ Business\ HERE\ ISN\'T\ QUITE\ done\ yet\, is it?
If use locale
is in effect (but not use locale ':not_characters'
),
the case map used by \l
, \L
,
\u
, and \U
is taken from the current locale. See perllocale.
If Unicode (for example, \N{}
or code points of 0x100 or
beyond) is being used, the case map used by \l
, \L
, \u
, and
\U
is as defined by Unicode. That means that case-mapping
a single character can sometimes produce several characters.
Under use locale
, \F
produces the same results as \L
.
All systems use the virtual "\n"
to represent a line terminator,
called a "newline". There is no such thing as an unvarying, physical
newline character. It is only an illusion that the operating system,
device drivers, C libraries, and Perl all conspire to preserve. Not all
systems read "\r"
as ASCII CR and "\n"
as ASCII LF. For example,
on the ancient Macs (pre-MacOS X) of yesteryear, these used to be reversed,
and on systems without line terminator,
printing "\n"
might emit no actual data. In general, use "\n"
when
you mean a "newline" for your system, but use the literal ASCII when you
need an exact character. For example, most networking protocols expect
and prefer a CR+LF ("\015\012"
or "\cM\cJ"
) for line terminators,
and although they often accept just "\012"
, they seldom tolerate just
"\015"
. If you get in the habit of using "\n"
for networking,
you may be burned some day.
For constructs that do interpolate, variables beginning with "$
"
or "@
" are interpolated. Subscripted variables such as $a[3]
or
$href->{key}[0]
are also interpolated, as are array and hash slices.
But method calls such as $obj->meth
are not.
Interpolating an array or slice interpolates the elements in order,
separated by the value of $"
, so is equivalent to interpolating
join $", @array
. "Punctuation" arrays such as @*
are usually
interpolated only if the name is enclosed in braces @{*}, but the
arrays @_
, @+
, and @-
are interpolated even without braces.
For double-quoted strings, the quoting from \Q
is applied after
interpolation and escapes are processed.
- "abc\Qfoo\tbar$s\Exyz"
is equivalent to
- "abc" . quotemeta("foo\tbar$s") . "xyz"
For the pattern of regex operators (qr//, m// and s///),
the quoting from \Q
is applied after interpolation is processed,
but before escapes are processed. This allows the pattern to match
literally (except for $
and @
). For example, the following matches:
- '\s\t' =~ /\Q\s\t/
Because $
or @
trigger interpolation, you'll need to use something
like /\Quser\E\@\Qhost/
to match them literally.
Patterns are subject to an additional level of interpretation as a
regular expression. This is done as a second pass, after variables are
interpolated, so that regular expressions may be incorporated into the
pattern from the variables. If this is not what you want, use \Q
to
interpolate a variable literally.
Apart from the behavior described above, Perl does not expand multiple levels of interpolation. In particular, contrary to the expectations of shell programmers, back-quotes do NOT interpolate within double quotes, nor do single quotes impede evaluation of variables when used within double quotes.
Here are the quote-like operators that apply to pattern matching and related activities.
This operator quotes (and possibly compiles) its STRING as a regular
expression. STRING is interpolated the same way as PATTERN
in m/PATTERN/. If "'" is used as the delimiter, no interpolation
is done. Returns a Perl value which may be used instead of the
corresponding /STRING/msixpodual
expression. The returned value is a
normalized version of the original pattern. It magically differs from
a string containing the same characters: ref(qr/x/) returns "Regexp";
however, dereferencing it is not well defined (you currently get the
normalized version of the original pattern, but this may change).
For example,
- $rex = qr/my.STRING/is;
- print $rex; # prints (?si-xm:my.STRING)
- s/$rex/foo/;
is equivalent to
- s/my.STRING/foo/is;
The result may be used as a subpattern in a match:
- $re = qr/$pattern/;
- $string =~ /foo${re}bar/; # can be interpolated in other
- # patterns
- $string =~ $re; # or used standalone
- $string =~ /$re/; # or this way
Since Perl may compile the pattern at the moment of execution of the qr() operator, using qr() may have speed advantages in some situations, notably if the result of qr() is used standalone:
Precompilation of the pattern into an internal representation at
the moment of qr() avoids a need to recompile the pattern every
time a match /$pat/
is attempted. (Perl has many other internal
optimizations, but none would be triggered in the above example if
we did not use qr() operator.)
Options (specified by the following modifiers) are:
- m Treat string as multiple lines.
- s Treat string as single line. (Make . match a newline)
- i Do case-insensitive pattern matching.
- x Use extended regular expressions.
- p When matching preserve a copy of the matched string so
- that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be
- defined.
- o Compile pattern only once.
- a ASCII-restrict: Use ASCII for \d, \s, \w; specifying two
- a's further restricts /i matching so that no ASCII
- character will match a non-ASCII one.
- l Use the locale.
- u Use Unicode rules.
- d Use Unicode or native charset, as in 5.12 and earlier.
If a precompiled pattern is embedded in a larger pattern then the effect of "msixpluad" will be propagated appropriately. The effect the "o" modifier has is not propagated, being restricted to those patterns explicitly using it.
The last four modifiers listed above, added in Perl 5.14,
control the character set semantics, but /a
is the only one you are likely
to want to specify explicitly; the other three are selected
automatically by various pragmas.
See perlre for additional information on valid syntax for STRING, and
for a detailed look at the semantics of regular expressions. In
particular, all modifiers except the largely obsolete /o are further
explained in Modifiers in perlre. /o is described in the next section.
Searches a string for a pattern match, and in scalar context returns
true if it succeeds, false if it fails. If no string is specified
via the =~
or !~
operator, the $_ string is searched. (The
string specified with =~
need not be an lvalue--it may be the
result of an expression evaluation, but remember the =~
binds
rather tightly.) See also perlre.
Options are as described in qr// above; in addition, the following match
process modifiers are available:
If "/" is the delimiter then the initial m is optional. With the m
you can use any pair of non-whitespace (ASCII) characters
as delimiters. This is particularly useful for matching path names
that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
the delimiter, then a match-only-once rule applies,
described in m?PATTERN? below.
If "'" is the delimiter, no interpolation is performed on the PATTERN.
When using a character valid in an identifier, whitespace is required
after the m.
PATTERN may contain variables, which will be interpolated
every time the pattern search is evaluated, except
for when the delimiter is a single quote. (Note that $(
, $)
, and
$|
are not interpolated because they look like end-of-string tests.)
Perl will not recompile the pattern unless an interpolated
variable that it contains changes. You can force Perl to skip the
test and never recompile by adding a /o (which stands for "once")
after the trailing delimiter.
Once upon a time, Perl would recompile regular expressions
unnecessarily, and this modifier was useful to tell it not to do so, in the
interests of speed. But now, the only reasons to use /o are either:
The variables are thousands of characters long and you know that they
don't change, and you need to wring out the last little bit of speed by
having Perl skip testing for that. (There is a maintenance penalty for
doing this, as mentioning /o constitutes a promise that you won't
change the variables in the pattern. If you do change them, Perl won't
even notice.)
you want the pattern to use the initial values of the variables
regardless of whether they change or not. (But there are saner ways
of accomplishing this than using /o.)
If the pattern contains embedded code, such as
- use re 'eval';
- $code = 'foo(?{ $x })';
- /$code/
then perl will recompile each time, even though the pattern string hasn't
changed, to ensure that the current value of $x
is seen each time.
Use /o if you want to avoid this.
The bottom line is that using /o is almost never a good idea.
If the PATTERN evaluates to the empty string, the last
successfully matched regular expression is used instead. In this
case, only the g
and c
flags on the empty pattern are honored;
the other flags are taken from the original pattern. If no match has
previously succeeded, this will (silently) act instead as a genuine
empty pattern (which will always match).
Note that it's possible to confuse Perl into thinking //
(the empty
regex) is really //
(the defined-or operator). Perl is usually pretty
good about this, but some pathological cases might trigger this, such as
$a/// (is that ($a) / (//)
or $a // /?) and print $fh //
(print $fh(// or print($fh //?). In all of these examples, Perl
will assume you meant defined-or. If you meant the empty regex, just
use parentheses or spaces to disambiguate, or even prefix the empty
regex with an m (so //
becomes m//).
If the /g option is not used, m// in list context returns a
list consisting of the subexpressions matched by the parentheses in the
pattern, that is, ($1
, $2
, $3
...) (Note that here $1
etc. are
also set). When there are no parentheses in the pattern, the return
value is the list (1)
for success.
With or without parentheses, an empty list is returned upon failure.
Examples:
- open(TTY, "+</dev/tty")
- || die "can't access /dev/tty: $!";
- <TTY> =~ /^y/i && foo(); # do foo if desired
- if (/Version: *([0-9.]*)/) { $version = $1; }
- next if m#^/usr/spool/uucp#;
- # poor man's grep
- $arg = shift;
- while (<>) {
- print if /$arg/o; # compile only once (no longer needed!)
- }
- if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
This last example splits $foo into the first two words and the remainder of the line, and assigns those three fields to $F1, $F2, and $Etc. The conditional is true if any variables were assigned; that is, if the pattern matched.
The /g modifier specifies global pattern matching--that is,
matching as many times as possible within the string. How it behaves
depends on the context. In list context, it returns a list of the
substrings matched by any capturing parentheses in the regular
expression. If there are no parentheses, it returns a list of all
the matched strings, as if there were parentheses around the whole
pattern.
In scalar context, each execution of m//g finds the next match,
returning true if it matches, and false if there is no further match.
The position after the last match can be read or set using the pos()
function; see pos. A failed match normally resets the
search position to the beginning of the string, but you can avoid that
by adding the /c modifier (for example, m//gc). Modifying the target
string also resets the search position.
You can intermix m//g matches with m/\G.../g, where \G
is a
zero-width assertion that matches the exact position where the
previous m//g, if any, left off. Without the /g modifier, the
\G
assertion still anchors at pos() as it was at the start of
the operation (see pos), but the match is of course only
attempted once. Using \G
without /g on a target string that has
not previously had a /g match applied to it is the same as using
the \A
assertion to match the beginning of the string. Note also
that, currently, \G
is only properly supported when anchored at the
very beginning of the pattern.
Examples:
Here's another way to check for sentences in a paragraph:
- my $sentence_rx = qr{
- (?: (?<= ^ ) | (?<= \s ) ) # after start-of-string or
- # whitespace
- \p{Lu} # capital letter
- .*? # a bunch of anything
- (?<= \S ) # that ends in non-
- # whitespace
- (?<! \b [DMS]r ) # but isn't a common abbr.
- (?<! \b Mrs )
- (?<! \b Sra )
- (?<! \b St )
- [.?!] # followed by a sentence
- # ender
- (?= $ | \s ) # in front of end-of-string
- # or whitespace
- }sx;
- local $/ = "";
- while (my $paragraph = <>) {
- say "NEW PARAGRAPH";
- my $count = 0;
- while ($paragraph =~ /($sentence_rx)/g) {
- printf "\tgot sentence %d: <%s>\n", ++$count, $1;
- }
- }
Here's how to use m//gc with \G
:
The last example should print:
- 1: 'oo', pos=4
- 2: 'q', pos=5
- 3: 'pp', pos=7
- 1: '', pos=7
- 2: 'q', pos=8
- 3: '', pos=8
- Final: 'q', pos=8
Notice that the final match matched q instead of p
, which a match
without the \G
anchor would have done. Also note that the final match
did not update pos. pos is only updated on a /g match. If the
final match did indeed match p
, it's a good bet that you're running a
very old (pre-5.6.0) version of Perl.
A useful idiom for lex
-like scanners is /\G.../gc
. You can
combine several regexps like this to process a string part-by-part,
doing different actions depending on which regexp matched. Each
regexp tries to match where the previous one leaves off.
- $_ = <<'EOL';
- $url = URI::URL->new( "http://example.com/" );
- die if $url eq "xXx";
- EOL
- LOOP: {
- print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
- print(" lowercase"), redo LOOP
- if /\G\p{Ll}+\b[,.;]?\s*/gc;
- print(" UPPERCASE"), redo LOOP
- if /\G\p{Lu}+\b[,.;]?\s*/gc;
- print(" Capitalized"), redo LOOP
- if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
- print(" MiXeD"), redo LOOP if /\G\pL+\b[,.;]?\s*/gc;
- print(" alphanumeric"), redo LOOP
- if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
- print(" line-noise"), redo LOOP if /\G\W+/gc;
- print ". That's all!\n";
- }
Here is the output (split into several lines):
- line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
- line-noise lowercase line-noise lowercase line-noise lowercase
- lowercase line-noise lowercase lowercase line-noise lowercase
- lowercase line-noise MiXeD line-noise. That's all!
This is just like the m/PATTERN/ search, except that it matches
only once between calls to the reset() operator. This is a useful
optimization when you want to see only the first occurrence of
something in each file of a set of files, for instance. Only m??
patterns local to the current package are reset.
Another example switched the first "latin1" encoding it finds to "utf8" in a pod file:
- s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
The match-once behavior is controlled by the match delimiter being
?; with any other delimiter this is the normal m// operator.
For historical reasons, the leading m in m?PATTERN? is optional,
but the resulting ?PATTERN?
syntax is deprecated, will warn on
usage and might be removed from a future stable release of Perl (without
further notice!).
Searches a string for a pattern, and if found, replaces that pattern with the replacement text and returns the number of substitutions made. Otherwise it returns false (specifically, the empty string).
If the /r
(non-destructive) option is used then it runs the
substitution on a copy of the string and instead of returning the
number of substitutions, it returns the copy whether or not a
substitution occurred. The original string is never changed when
/r
is used. The copy will always be a plain string, even if the
input is an object or a tied variable.
If no string is specified via the =~
or !~
operator, the $_
variable is searched and modified. Unless the /r
option is used,
the string specified must be a scalar variable, an array element, a
hash element, or an assignment to one of those; that is, some sort of
scalar lvalue.
If the delimiter chosen is a single quote, no interpolation is
done on either the PATTERN or the REPLACEMENT. Otherwise, if the
PATTERN contains a $ that looks like a variable rather than an
end-of-string test, the variable will be interpolated into the pattern
at run-time. If you want the pattern compiled only once the first time
the variable is interpolated, use the /o option. If the pattern
evaluates to the empty string, the last successfully executed regular
expression is used instead. See perlre for further explanation on these.
Options are as with m// with the addition of the following replacement specific options:
Any non-whitespace delimiter may replace the slashes. Add space after
the s when using a character allowed in identifiers. If single quotes
are used, no interpretation is done on the replacement string (the /e
modifier overrides this, however). Note that Perl treats backticks
as normal delimiters; the replacement text is not evaluated as a command.
If the PATTERN is delimited by bracketing quotes, the REPLACEMENT has
its own pair of quotes, which may or may not be bracketing quotes, for example,
s(foo)(bar) or s. A /e will cause the
replacement portion to be treated as a full-fledged Perl expression
and evaluated right then and there. It is, however, syntax checked at
compile-time. A second e
modifier will cause the replacement portion
to be evaled before being run as a Perl expression.
Examples:
- s/\bgreen\b/mauve/g; # don't change wintergreen
- $path =~ s|/usr/bin|/usr/local/bin|;
- s/Login: $foo/Login: $bar/; # run-time pattern
- ($foo = $bar) =~ s/this/that/; # copy first, then
- # change
- ($foo = "$bar") =~ s/this/that/; # convert to string,
- # copy, then change
- $foo = $bar =~ s/this/that/r; # Same as above using /r
- $foo = $bar =~ s/this/that/r
- =~ s/that/the other/r; # Chained substitutes
- # using /r
- @foo = map { s/this/that/r } @bar # /r is very useful in
- # maps
- $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-cnt
- $_ = 'abc123xyz';
- s/\d+/$&*2/e; # yields 'abc246xyz'
- s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
- s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
- s/%(.)/$percent{$1}/g; # change percent escapes; no /e
- s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
- s/^=(\w+)/pod($1)/ge; # use function call
- $_ = 'abc123xyz';
- $a = s/abc/def/r; # $a is 'def123xyz' and
- # $_ remains 'abc123xyz'.
- # expand variables in $_, but dynamics only, using
- # symbolic dereferencing
- s/\$(\w+)/${$1}/g;
- # Add one to the value of any numbers in the string
- s/(\d+)/1 + $1/eg;
- # Titlecase words in the last 30 characters only
- substr($str, -30) =~ s/\b(\p{Alpha}+)\b/\u\L$1/g;
- # This will expand any embedded scalar variable
- # (including lexicals) in $_ : First $1 is interpolated
- # to the variable name, and then evaluated
- s/(\$\w+)/$1/eeg;
- # Delete (most) C comments.
- $program =~ s {
- /\* # Match the opening delimiter.
- .*? # Match a minimal number of characters.
- \*/ # Match the closing delimiter.
- } []gsx;
- s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_,
- # expensively
- for ($variable) { # trim whitespace in $variable,
- # cheap
- s/^\s+//;
- s/\s+$//;
- }
- s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
Note the use of $ instead of \ in the last example. Unlike sed, we use the \<digit> form in only the left hand side. Anywhere else it's $<digit>.
Occasionally, you can't use just a /g to get all the changes
to occur that you might want. Here are two common cases:
- # put commas in the right places in an integer
- 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
- # expand tabs to 8-column spacing
- 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
A single-quoted, literal string. A backslash represents a backslash unless followed by the delimiter or another backslash, in which case the delimiter or backslash is interpolated.
- $foo = q!I said, "You said, 'She said it.'"!;
- $bar = q('This is it.');
- $baz = '\n'; # a two-character string
A double-quoted, interpolated string.
- $_ .= qq
- (*** The previous line contains the naughty word "$1".\n)
- if /\b(tcl|java|python)\b/i; # :-)
- $baz = "\n"; # a one-character string
A string which is (possibly) interpolated and then executed as a system command with /bin/sh or its equivalent. Shell wildcards, pipes, and redirections will be honored. The collected standard output of the command is returned; standard error is unaffected. In scalar context, it comes back as a single (potentially multi-line) string, or undef if the command failed. In list context, returns a list of lines (however you've defined lines with $/ or $INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
Because backticks do not affect standard error, use shell file descriptor syntax (assuming the shell supports this) if you care to address this. To capture a command's STDERR and STDOUT together:
- $output = `cmd 2>&1`;
To capture a command's STDOUT but discard its STDERR:
- $output = `cmd 2>/dev/null`;
To capture a command's STDERR but discard its STDOUT (ordering is important here):
- $output = `cmd 2>&1 1>/dev/null`;
To exchange a command's STDOUT and STDERR in order to capture the STDERR but leave its STDOUT to come out the old STDERR:
- $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
To read both a command's STDOUT and its STDERR separately, it's easiest to redirect them separately to files, and then read from those files when the program is done:
- system("program args 1>program.stdout 2>program.stderr");
The STDIN filehandle used by the command is inherited from Perl's STDIN. For example:
will print the sorted contents of the file named "stuff".
Using single-quote as a delimiter protects the command from Perl's double-quote interpolation, passing it on to the shell instead:
- $perl_info = qx(ps $$); # that's Perl's $$
- $shell_info = qx'ps $$'; # that's the new shell's $$
How that string gets evaluated is entirely subject to the command interpreter on your system. On most platforms, you will have to protect shell metacharacters if you want them treated literally. This is in practice difficult to do, as it's unclear how to escape which characters. See perlsec for a clean and safe example of a manual fork() and exec() to emulate backticks safely.
On some platforms (notably DOS-like ones), the shell may not be
capable of dealing with multiline commands, so putting newlines in
the string may not get you what you want. You may be able to evaluate
multiple commands in a single line by separating them with the command
separator character, if your shell supports that (for example, ;
on
many Unix shells and &
on the Windows NT cmd
shell).
Perl will attempt to flush all files opened for
output before starting the child process, but this may not be supported
on some platforms (see perlport). To be safe, you may need to set
$|
($AUTOFLUSH in English) or call the autoflush()
method of
IO::Handle
on any open handles.
Beware that some command shells may place restrictions on the length of the command line. You must ensure your strings don't exceed this limit after any necessary interpolations. See the platform-specific release notes for more details about your particular environment.
Using this operator can lead to programs that are difficult to port,
because the shell commands called vary between systems, and may in
fact not be present at all. As one example, the type
command under
the POSIX shell is very different from the type
command under DOS.
That doesn't mean you should go out of your way to avoid backticks
when they're the right way to get something done. Perl was made to be
a glue language, and one of the things it glues together is commands.
Just understand what you're getting yourself into.
See I/O Operators for more discussion.
Evaluates to a list of the words extracted out of STRING, using embedded whitespace as the word delimiters. It can be understood as being roughly equivalent to:
- split(" ", q/STRING/);
the differences being that it generates a real list at compile time, and in scalar context it returns the last element in the list. So this expression:
- qw(foo bar baz)
is semantically equivalent to the list:
- "foo", "bar", "baz"
Some frequently seen examples:
- use POSIX qw( setlocale localeconv )
- @EXPORT = qw( foo bar baz );
A common mistake is to try to separate the words with comma or to
put comments into a multi-line qw-string. For this reason, the
use warnings
pragma and the -w switch (that is, the $^W
variable)
produces warnings if the STRING contains the "," or the "#" character.
Transliterates all occurrences of the characters found in the search list
with the corresponding character in the replacement list. It returns
the number of characters replaced or deleted. If no string is
specified via the =~
or !~
operator, the $_ string is transliterated.
If the /r
(non-destructive) option is present, a new copy of the string
is made and its characters transliterated, and this copy is returned no
matter whether it was modified or not: the original string is always
left unchanged. The new copy is always a plain string, even if the input
string is an object or a tied variable.
Unless the /r
option is used, the string specified with =~
must be a
scalar variable, an array element, a hash element, or an assignment to one
of those; in other words, an lvalue.
A character range may be specified with a hyphen, so tr/A-J/0-9/
does the same replacement as tr/ACEGIBDFHJ/0246813579/.
For sed devotees, y is provided as a synonym for tr. If the
SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has
its own pair of quotes, which may or may not be bracketing quotes;
for example, tr[aeiouy][yuoiea] or tr(+\-*/)/ABCD/.
Note that tr does not do regular expression character classes such as
\d
or \pL
. The tr operator is not equivalent to the tr(1)
utility. If you want to map strings between lower/upper cases, see
lc and uc, and in general consider using the s
operator if you need regular expressions. The \U
, \u
, \L
, and
\l
string-interpolation escapes on the right side of a substitution
operator will perform correct case-mappings, but tr[a-z][A-Z] will not
(except sometimes on legacy 7-bit data).
Note also that the whole range idea is rather unportable between character sets--and even within character sets they may cause results you probably didn't expect. A sound principle is to use only ranges that begin from and end at either alphabets of equal case (a-e, A-E), or digits (0-4). Anything else is unsafe. If in doubt, spell out the character sets in full.
Options:
- c Complement the SEARCHLIST.
- d Delete found but unreplaced characters.
- s Squash duplicate replaced characters.
- r Return the modified string and leave the original string
- untouched.
If the /c modifier is specified, the SEARCHLIST character set
is complemented. If the /d modifier is specified, any characters
specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
(Note that this is slightly more flexible than the behavior of some
tr programs, which delete anything they find in the SEARCHLIST,
period.) If the /s modifier is specified, sequences of characters
that were transliterated to the same character are squashed down
to a single instance of the character.
If the /d modifier is used, the REPLACEMENTLIST is always interpreted
exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter
than the SEARCHLIST, the final character is replicated till it is long
enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
This latter is useful for counting characters in a class or for
squashing character sequences in a class.
Examples:
- $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
- $cnt = tr/*/*/; # count the stars in $_
- $cnt = $sky =~ tr/*/*/; # count the stars in $sky
- $cnt = tr/0-9//; # count the digits in $_
- tr/a-zA-Z//s; # bookkeeper -> bokeper
- ($HOST = $host) =~ tr/a-z/A-Z/;
- $HOST = $host =~ tr/a-z/A-Z/r; # same thing
- $HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
- =~ s/:/ -p/r;
- tr/a-zA-Z/ /cs; # change non-alphas to single space
- @stripped = map tr/a-zA-Z/ /csr, @original;
- # /r with map
- tr [\200-\377]
- [\000-\177]; # wickedly delete 8th bit
If multiple transliterations are given for a character, only the first one is used:
- tr/AAA/XYZ/
will transliterate any A to X.
Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():
A line-oriented form of quoting is based on the shell "here-document"
syntax. Following a <<
you specify a string to terminate
the quoted material, and all lines following the current line down to
the terminating string are the value of the item.
The terminating string may be either an identifier (a word), or some
quoted text. An unquoted identifier works like double quotes.
There may not be a space between the <<
and the identifier,
unless the identifier is explicitly quoted. (If you put a space it
will be treated as a null identifier, which is valid, and matches the
first empty line.) The terminating string must appear by itself
(unquoted and with no surrounding whitespace) on the terminating line.
If the terminating string is quoted, the type of quotes used determine the treatment of the text.
Double quotes indicate that the text will be interpolated using exactly the same rules as normal double quoted strings.
- print <<EOF;
- The price is $Price.
- EOF
- print << "EOF"; # same as above
- The price is $Price.
- EOF
Single quotes indicate the text is to be treated literally with no
interpolation of its content. This is similar to single quoted
strings except that backslashes have no special meaning, with \\
being treated as two backslashes and not one as they would in every
other quoting construct.
Just as in the shell, a backslashed bareword following the <<
means the same thing as a single-quoted string does:
- $cost = <<'VISTA'; # hasta la ...
- That'll be $10 please, ma'am.
- VISTA
- $cost = <<\VISTA; # Same thing!
- That'll be $10 please, ma'am.
- VISTA
This is the only form of quoting in perl where there is no need to worry about escaping content, something that code generators can and do make good use of.
The content of the here doc is treated just as it would be if the string were embedded in backticks. Thus the content is interpolated as though it were double quoted and then executed via the shell, with the results of the execution returned.
- print << `EOC`; # execute command and get results
- echo hi there
- EOC
It is possible to stack multiple here-docs in a row:
- print <<"foo", <<"bar"; # you can stack them
- I said foo.
- foo
- I said bar.
- bar
- myfunc(<< "THIS", 23, <<'THAT');
- Here's a line
- or two.
- THIS
- and here's another.
- THAT
Just don't forget that you have to put a semicolon on the end to finish the statement, as Perl doesn't know you're not going to try to do this:
- print <<ABC
- 179231
- ABC
- + 20;
If you want to remove the line terminator from your here-docs,
use chomp().
- chomp($string = <<'END');
- This is a string.
- END
If you want your here-docs to be indented with the rest of the code, you'll need to remove leading whitespace from each line manually:
- ($quote = <<'FINIS') =~ s/^\s+//gm;
- The Road goes ever on and on,
- down from the door where it began.
- FINIS
If you use a here-doc within a delimited construct, such as in s///eg,
the quoted material must still come on the line following the
<<FOO marker, which means it may be inside the delimited
construct:
- s/this/<<E . 'that'
- the other
- E
- . 'more '/eg;
It works this way as of Perl 5.18. Historically, it was inconsistent, and you would have to write
- s/this/<<E . 'that'
- . 'more '/eg;
- the other
- E
outside of string evals.
Additionally, quoting rules for the end-of-string identifier are
unrelated to Perl's quoting rules. q(), qq(), and the like are not
supported in place of ''
and ""
, and the only interpolation is for
backslashing the quoting character:
- print << "abc\"def";
- testing...
- abc"def
Finally, quoted strings cannot span multiple lines. The general rule is that the identifier must be a string literal. Stick with that, and you should be safe.
When presented with something that might have several different interpretations, Perl uses the DWIM (that's "Do What I Mean") principle to pick the most probable interpretation. This strategy is so successful that Perl programmers often do not suspect the ambivalence of what they write. But from time to time, Perl's notions differ substantially from what the author honestly meant.
This section hopes to clarify how Perl handles quoted constructs. Although the most common reason to learn this is to unravel labyrinthine regular expressions, because the initial steps of parsing are the same for all quoting operators, they are all discussed together.
The most important Perl parsing rule is the first one discussed below: when processing a quoted construct, Perl first finds the end of that construct, then interprets its contents. If you understand this rule, you may skip the rest of this section on the first reading. The other rules are likely to contradict the user's expectations much less frequently than this first one.
Some passes discussed below are performed concurrently, but because their results are the same, we consider them individually. For different quoting constructs, Perl performs different numbers of passes, from one to four, but these passes are always performed in the same order.
The first pass is finding the end of the quoted construct, where the information about the delimiters is used in parsing. During this search, text between the starting and ending delimiters is copied to a safe location. The text copied gets delimiter-independent.
If the construct is a here-doc, the ending delimiter is a line
that has a terminating string as the content. Therefore <<EOF is
terminated by EOF
immediately followed by "\n"
and starting
from the first column of the terminating line.
When searching for the terminating line of a here-doc, nothing
is skipped. In other words, lines after the here-doc syntax
are compared with the terminating string line by line.
For the constructs except here-docs, single characters are used as starting
and ending delimiters. If the starting delimiter is an opening punctuation
(that is (, [, {, or <
), the ending delimiter is the
corresponding closing punctuation (that is ), ], }, or >).
If the starting delimiter is an unpaired character like / or a closing
punctuation, the ending delimiter is same as the starting delimiter.
Therefore a / terminates a qq// construct, while a ] terminates
qq[] and qq]] constructs.
When searching for single-character delimiters, escaped delimiters
and \\
are skipped. For example, while searching for terminating /,
combinations of \\
and \/ are skipped. If the delimiters are
bracketing, nested pairs are also skipped. For example, while searching
for closing ] paired with the opening [, combinations of \\
, \],
and \[ are all skipped, and nested [ and ] are skipped as well.
However, when backslashes are used as the delimiters (like qq\\ and
tr\\\), nothing is skipped.
During the search for the end, backslashes that escape delimiters or
other backslashes are removed (exactly speaking, they are not copied to the
safe location).
For constructs with three-part delimiters (s///, y///, and
tr///), the search is repeated once more.
If the first delimiter is not an opening punctuation, three delimiters must
be same such as s!!! and tr))), in which case the second delimiter
terminates the left part and starts the right part at once.
If the left part is delimited by bracketing punctuation (that is ()
,
[]
, {}
, or <>
), the right part needs another pair of
delimiters such as s(){} and tr[]//. In these cases, whitespace
and comments are allowed between both parts, though the comment must follow
at least one whitespace character; otherwise a character expected as the
start of the comment may be regarded as the starting delimiter of the right part.
During this search no attention is paid to the semantics of the construct. Thus:
- "$hash{"$foo/$bar"}"
or:
- m/
- bar # NOT a comment, this slash / terminated m//!
- /x
do not form legal quoted expressions. The quoted part ends on the
first " and /, and the rest happens to be a syntax error.
Because the slash that terminated m// was followed by a SPACE
,
the example above is not m//x, but rather m// with no /x
modifier. So the embedded #
is interpreted as a literal #
.
Also no attention is paid to \c\
(multichar control char syntax) during
this search. Thus the second \
in qq/\c\/ is interpreted as a part
of \/, and the following / is not recognized as a delimiter.
Instead, use \034
or \x1c
at the end of quoted constructs.
The next step is interpolation in the text obtained, which is now delimiter-independent. There are multiple cases.
<<'EOF'
No interpolation is performed.
Note that the combination \\
is left intact, since escaped delimiters
are not available for here-docs.
m'', the pattern of s'''
No interpolation is performed at this stage.
Any backslashed sequences including \\
are treated at the stage
to parsing regular expressions.
''
, q//, tr''', y''', the replacement of s'''
The only interpolation is removal of \
from pairs of \\
.
Therefore -
in tr''' and y''' is treated literally
as a hyphen and no character range is available.
\1
in the replacement of s''' does not work as $1
.
tr///, y///
No variable interpolation occurs. String modifying combinations for
case and quoting such as \Q
, \U
, and \E
are not recognized.
The other escape sequences such as \200
and \t
and backslashed
characters such as \\
and \-
are converted to appropriate literals.
The character -
is treated specially and therefore \-
is treated
as a literal -
.
""
, ``
, qq//, qx//, <file*glob>
, <<"EOF"
\Q
, \U
, \u
, \L
, \l
, \F
(possibly paired with \E
) are
converted to corresponding Perl constructs. Thus, "$foo\Qbaz$bar"
is converted to $foo . (quotemeta("baz" . $bar))
internally.
The other escape sequences such as \200
and \t
and backslashed
characters such as \\
and \-
are replaced with appropriate
expansions.
Let it be stressed that whatever falls between \Q
and \E
is interpolated in the usual way. Something like "\Q\\E"
has
no \E
inside. Instead, it has \Q
, \\
, and E
, so the
result is the same as for "\\\\E"
. As a general rule, backslashes
between \Q
and \E
may lead to counterintuitive results. So,
"\Q\t\E"
is converted to quotemeta("\t"), which is the same
as "\\\t"
(since TAB is not alphanumeric). Note also that:
- $str = '\t';
- return "\Q$str";
may be closer to the conjectural intention of the writer of "\Q\t\E"
.
Interpolated scalars and arrays are converted internally to the join and
. catenation operations. Thus, "$foo XXX '@arr'"
becomes:
- $foo . " XXX '" . (join $", @arr) . "'";
All operations above are performed simultaneously, left to right.
Because the result of "\Q STRING \E"
has all metacharacters
quoted, there is no way to insert a literal $
or @
inside a
\Q\E
pair. If protected by \
, $
will be quoted to became
"\\\$"
; if not, it is interpreted as the start of an interpolated
scalar.
Note also that the interpolation code needs to make a decision on
where the interpolated scalar ends. For instance, whether
"a $b -> {c}"
really means:
- "a " . $b . " -> {c}";
or:
- "a " . $b -> {c};
Most of the time, the longest possible text that does not include spaces between components and which contains matching braces or brackets. because the outcome may be determined by voting based on heuristic estimators, the result is not strictly predictable. Fortunately, it's usually correct for ambiguous cases.
s///
Processing of \Q
, \U
, \u
, \L
, \l
, \F
and interpolation
happens as with qq// constructs.
It is at this step that \1
is begrudgingly converted to $1
in
the replacement text of s///, in order to correct the incorrigible
sed hackers who haven't picked up the saner idiom yet. A warning
is emitted if the use warnings
pragma or the -w command-line flag
(that is, the $^W
variable) was set.
RE
in ?RE?
, /RE/
, m/RE/, s/RE/foo/,
Processing of \Q
, \U
, \u
, \L
, \l
, \F
, \E
,
and interpolation happens (almost) as with qq// constructs.
Processing of \N{...}
is also done here, and compiled into an intermediate
form for the regex compiler. (This is because, as mentioned below, the regex
compilation may be done at execution time, and \N{...}
is a compile-time
construct.)
However any other combinations of \
followed by a character
are not substituted but only skipped, in order to parse them
as regular expressions at the following step.
As \c
is skipped at this step, @
of \c@
in RE is possibly
treated as an array symbol (for example @foo
),
even though the same text in qq// gives interpolation of \c@
.
Code blocks such as (?{BLOCK}) are handled by temporarily passing control
back to the perl parser, in a similar way that an interpolated array
subscript expression such as "foo$array[1+f("[xyz")]bar"
would be.
Moreover, inside (?{BLOCK}), (?# comment ), and
a #
-comment in a //x
-regular expression, no processing is
performed whatsoever. This is the first step at which the presence
of the //x
modifier is relevant.
Interpolation in patterns has several quirks: $|
, $(
, $)
, @+
and @-
are not interpolated, and constructs $var[SOMETHING]
are
voted (by several different estimators) to be either an array element
or $var
followed by an RE alternative. This is where the notation
${arr[$bar]}
comes handy: /${arr[0-9]}/
is interpreted as
array element -9
, not as a regular expression from the variable
$arr
followed by a digit, which would be the interpretation of
/$arr[0-9]/
. Since voting among different estimators may occur,
the result is not predictable.
The lack of processing of \\
creates specific restrictions on
the post-processed text. If the delimiter is /, one cannot get
the combination \/ into the result of this step. / will
finish the regular expression, \/ will be stripped to / on
the previous step, and \\/ will be left as is. Because / is
equivalent to \/ inside a regular expression, this does not
matter unless the delimiter happens to be character special to the
RE engine, such as in s*foo*bar*, m[foo], or ?foo?
; or an
alphanumeric char, as in:
- m m ^ a \s* b mmx;
In the RE above, which is intentionally obfuscated for illustration, the
delimiter is m, the modifier is mx
, and after delimiter-removal the
RE is the same as for m/ ^ a \s* b /mx
. There's more than one
reason you're encouraged to restrict your delimiters to non-alphanumeric,
non-whitespace choices.
This step is the last one for all constructs except regular expressions, which are processed further.
Previous steps were performed during the compilation of Perl code, but this one happens at run time, although it may be optimized to be calculated at compile time if appropriate. After preprocessing described above, and possibly after evaluation if concatenation, joining, casing translation, or metaquoting are involved, the resulting string is passed to the RE engine for compilation.
Whatever happens in the RE engine might be better discussed in perlre, but for the sake of continuity, we shall do so here.
This is another step where the presence of the //x
modifier is
relevant. The RE engine scans the string from left to right and
converts it to a finite automaton.
Backslashed characters are either replaced with corresponding
literal strings (as with \{), or else they generate special nodes
in the finite automaton (as with \b
). Characters special to the
RE engine (such as |) generate corresponding nodes or groups of
nodes. (?#...) comments are ignored. All the rest is either
converted to literal strings to match, or else is ignored (as is
whitespace and #
-style comments if //x
is present).
Parsing of the bracketed character class construct, [...]
, is
rather different than the rule used for the rest of the pattern.
The terminator of this construct is found using the same rules as
for finding the terminator of a {}
-delimited construct, the only
exception being that ] immediately following [ is treated as
though preceded by a backslash.
The terminator of runtime (?{...}) is found by temporarily switching
control to the perl parser, which should stop at the point where the
logically balancing terminating } is found.
It is possible to inspect both the string given to RE engine and the
resulting finite automaton. See the arguments debug
/debugcolor
in the use re pragma, as well as Perl's -Dr command-line
switch documented in Command Switches in perlrun.
This step is listed for completeness only. Since it does not change semantics, details of this step are not documented and are subject to change without notice. This step is performed over the finite automaton that was generated during the previous pass.
It is at this stage that split() silently optimizes /^/
to
mean /^/m
.
There are several I/O operators you should know about.
A string enclosed by backticks (grave accents) first undergoes
double-quote interpolation. It is then interpreted as an external
command, and the output of that command is the value of the
backtick string, like in a shell. In scalar context, a single string
consisting of all output is returned. In list context, a list of
values is returned, one per line of output. (You can set $/
to use
a different line terminator.) The command is executed each time the
pseudo-literal is evaluated. The status value of the command is
returned in $?
(see perlvar for the interpretation of $?
).
Unlike in csh, no translation is done on the return data--newlines
remain newlines. Unlike in any of the shells, single quotes do not
hide variable names in the command from interpretation. To pass a
literal dollar-sign through to the shell you need to hide it with a
backslash. The generalized form of backticks is qx//. (Because
backticks always undergo shell expansion as well, see perlsec for
security concerns.)
In scalar context, evaluating a filehandle in angle brackets yields
the next line from that file (the newline, if any, included), or
undef at end-of-file or on error. When $/
is set to undef
(sometimes known as file-slurp mode) and the file is empty, it
returns ''
the first time, followed by undef subsequently.
Ordinarily you must assign the returned value to a variable, but
there is one situation where an automatic assignment happens. If
and only if the input symbol is the only thing inside the conditional
of a while
statement (even if disguised as a for(;;)
loop),
the value is automatically assigned to the global variable $_,
destroying whatever was there previously. (This may seem like an
odd thing to you, but you'll use the construct in almost every Perl
script you write.) The $_ variable is not implicitly localized.
You'll have to put a local $_;
before the loop if you want that
to happen.
The following lines are equivalent:
This also behaves similarly, but assigns to a lexical variable
instead of to $_
:
In these loop constructs, the assigned value (whether assignment is automatic or explicit) is then tested to see whether it is defined. The defined test avoids problems where the line has a string value that would be treated as false by Perl; for example a "" or a "0" with no trailing newline. If you really mean for such values to terminate the loop, they should be tested for explicitly:
- while (($_ = <STDIN>) ne '0') { ... }
- while (<STDIN>) { last unless $_; ... }
In other boolean contexts, <FILEHANDLE>
without an
explicit defined test or comparison elicits a warning if the
use warnings
pragma or the -w
command-line switch (the $^W
variable) is in effect.
The filehandles STDIN, STDOUT, and STDERR are predefined. (The
filehandles stdin
, stdout
, and stderr
will also work except
in packages, where they would be interpreted as local identifiers
rather than global.) Additional filehandles may be created with
the open() function, amongst others. See perlopentut and
open for details on this.
If a <FILEHANDLE> is used in a context that is looking for a list, a list comprising all input lines is returned, one line per list element. It's easy to grow to a rather large data space this way, so use with care.
<FILEHANDLE> may also be spelled readline(*FILEHANDLE).
See readline.
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk, and any other Unix filter program
that takes a list of filenames, doing the same to each line
of input from all of them. Input from <> comes either from
standard input, or from each file listed on the command line. Here's
how it works: the first time <> is evaluated, the @ARGV array is
checked, and if it is empty, $ARGV[0]
is set to "-", which when opened
gives you standard input. The @ARGV array is then processed as a list
of filenames. The loop
- while (<>) {
- ... # code for each line
- }
is equivalent to the following Perl-like pseudo code:
except that it isn't so cumbersome to say, and will actually work. It really does shift the @ARGV array and put the current filename into the $ARGV variable. It also uses filehandle ARGV internally. <> is just a synonym for <ARGV>, which is magical. (The pseudo code above doesn't work because it treats <ARGV> as non-magical.)
Since the null filehandle uses the two argument form of open it interprets special characters, so if you have a script like this:
- while (<>) {
- print;
- }
and call it with perl dangerous.pl 'rm -rfv *|'
, it actually opens a
pipe, executes the rm
command and reads rm
's output from that pipe.
If you want all items in @ARGV
to be interpreted as file names, you
can use the module ARGV::readonly
from CPAN.
You can modify @ARGV before the first <> as long as the array ends up
containing the list of filenames you really want. Line numbers ($.
)
continue as though the input were one big happy file. See the example
in eof for how to reset line numbers on each file.
If you want to set @ARGV to your own list of files, go right ahead. This sets @ARGV to all plain text files if no @ARGV was given:
You can even set them to pipe commands. For example, this automatically filters compressed arguments through gzip:
- @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
If you want to pass switches into your script, you can use one of the Getopts modules or put a loop on the front like this:
The <> symbol will return undef for end-of-file only once.
If you call it again after this, it will assume you are processing another
@ARGV list, and if you haven't set @ARGV, will read input from STDIN.
If what the angle brackets contain is a simple scalar variable (for example, <$foo>), then that variable contains the name of the filehandle to input from, or its typeglob, or a reference to the same. For example:
- $fh = \*STDIN;
- $line = <$fh>;
If what's within the angle brackets is neither a filehandle nor a simple
scalar variable containing a filehandle name, typeglob, or typeglob
reference, it is interpreted as a filename pattern to be globbed, and
either a list of filenames or the next filename in the list is returned,
depending on context. This distinction is determined on syntactic
grounds alone. That means <$x>
is always a readline() from
an indirect handle, but <$hash{key}>
is always a glob().
That's because $x is a simple scalar variable, but $hash{key}
is
not--it's a hash element. Even <$x >
(note the extra space)
is treated as glob("$x ")
, not readline($x).
One level of double-quote interpretation is done first, but you can't
say <$foo>
because that's an indirect filehandle as explained
in the previous paragraph. (In older versions of Perl, programmers
would insert curly brackets to force interpretation as a filename glob:
<${foo}>
. These days, it's considered cleaner to call the
internal function directly as glob($foo), which is probably the right
way to have done it in the first place.) For example:
- while (<*.c>) {
- chmod 0644, $_;
- }
is roughly equivalent to:
except that the globbing is actually done internally using the standard
File::Glob
extension. Of course, the shortest way to do the above is:
- chmod 0644, <*.c>;
A (file)glob evaluates its (embedded) argument only when it is
starting a new list. All values must be read before it will start
over. In list context, this isn't important because you automatically
get them all anyway. However, in scalar context the operator returns
the next value each time it's called, or undef when the list has
run out. As with filehandle reads, an automatic defined is
generated when the glob occurs in the test part of a while
,
because legal glob returns (for example,
a file called 0) would otherwise
terminate the loop. Again, undef is returned only once. So if
you're expecting a single value from a glob, it is much better to
say
- ($file) = <blurch*>;
than
- $file = <blurch*>;
because the latter will alternate between returning a filename and returning false.
If you're trying to do variable interpolation, it's definitely better to use the glob() function, because the older notation can cause people to become confused with the indirect filehandle notation.
Like C, Perl does a certain amount of expression evaluation at compile time whenever it determines that all arguments to an operator are static and have no side effects. In particular, string concatenation happens at compile time between literals that don't do variable substitution. Backslash interpolation also happens at compile time. You can say
- 'Now is the time for all'
- . "\n"
- . 'good men to come to.'
and this all reduces to one string internally. Likewise, if you say
- foreach $file (@filenames) {
- if (-s $file > 5 + 100 * 2**16) { }
- }
the compiler precomputes the number which that expression represents so that the interpreter won't have to.
Perl doesn't officially have a no-op operator, but the bare constants
0
and 1
are special-cased not to produce a warning in void
context, so you can for example safely do
- 1 while foo();
Bitstrings of any size may be manipulated by the bitwise operators
(~ | & ^).
If the operands to a binary bitwise op are strings of different sizes, | and ^ ops act as though the shorter operand had additional zero bits on the right, while the & op acts as though the longer operand were truncated to the length of the shorter. The granularity for such extension or truncation is one or more bytes.
If you are intending to manipulate bitstrings, be certain that
you're supplying bitstrings: If an operand is a number, that will imply
a numeric bitwise operation. You may explicitly show which type of
operation you intend by using ""
or 0+
, as in the examples below.
- $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
- $foo = '150' | 105; # yields 255
- $foo = 150 | '105'; # yields 255
- $foo = '150' | '105'; # yields string '155' (under ASCII)
- $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
- $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
See vec for information on how to manipulate individual bits in a bit vector.
By default, Perl assumes that it must do most of its arithmetic in floating point. But by saying
- use integer;
you may tell the compiler to use integer operations (see integer for a detailed explanation) from here to the end of the enclosing BLOCK. An inner BLOCK may countermand this by saying
- no integer;
which lasts until the end of that BLOCK. Note that this doesn't
mean everything is an integer, merely that Perl will use integer
operations for arithmetic, comparison, and bitwise operators. For
example, even under use integer
, if you take the sqrt(2), you'll
still get 1.4142135623731
or so.
Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
and ">>") always produce integral results. (But see also
Bitwise String Operators.) However, use integer
still has meaning for
them. By default, their results are interpreted as unsigned integers, but
if use integer
is in effect, their results are interpreted
as signed integers. For example, ~0
usually evaluates to a large
integral value. However, use integer; ~0
is -1
on two's-complement
machines.
While use integer
provides integer-only arithmetic, there is no
analogous mechanism to provide automatic rounding or truncation to a
certain number of decimal places. For rounding to a certain number
of digits, sprintf() or printf() is usually the easiest route.
See perlfaq4.
Floating-point numbers are only approximations to what a mathematician would call real numbers. There are infinitely more reals than floats, so some corners must be cut. For example:
- printf "%.20g\n", 123456789123456789;
- # produces 123456789123456784
Testing for exact floating-point equality or inequality is not a good idea. Here's a (relatively expensive) work-around to compare whether two floating-point numbers are equal to a particular number of decimal places. See Knuth, volume II, for a more robust treatment of this topic.
The POSIX module (part of the standard perl distribution) implements ceil(), floor(), and other mathematical and trigonometric functions. The Math::Complex module (part of the standard perl distribution) defines mathematical functions that work on both the reals and the imaginary numbers. Math::Complex not as efficient as POSIX, but POSIX can't work with complex numbers.
Rounding in financial applications can have serious implications, and the rounding method used should be specified precisely. In these cases, it probably pays not to trust whichever system rounding is being used by Perl, but to instead implement the rounding function you need yourself.
The standard Math::BigInt
, Math::BigRat
, and Math::BigFloat
modules,
along with the bignum
, bigint
, and bigrat
pragmas, provide
variable-precision arithmetic and overloaded operators, although
they're currently pretty slow. At the cost of some space and
considerable speed, they avoid the normal pitfalls associated with
limited-precision representations.
Or with rationals:
Several modules let you calculate with (bound only by memory and CPU time) unlimited or fixed precision. There are also some non-standard modules that provide faster implementations via external C libraries.
Here is a short, but incomplete summary:
- Math::String treat string sequences like numbers
- Math::FixedPrecision calculate with a fixed precision
- Math::Currency for currency calculations
- Bit::Vector manipulate bit vectors fast (uses C)
- Math::BigIntFast Bit::Vector wrapper for big numbers
- Math::Pari provides access to the Pari C library
- Math::Cephes uses the external Cephes C library (no
- big numbers)
- Math::Cephes::Fraction fractions via the Cephes library
- Math::GMP another one using an external C library
- Math::GMPz an alternative interface to libgmp's big ints
- Math::GMPq an interface to libgmp's fraction numbers
- Math::GMPf an interface to libgmp's floating point numbers
Choose wisely.
perlopenbsd - Perl version 5 on OpenBSD systems
This document describes various features of OpenBSD that will affect how Perl version 5 (hereafter just Perl) is compiled and/or runs.
When Perl is configured to use ithreads, it will use re-entrant library calls
in preference to non-re-entrant versions. There is an incompatibility in
OpenBSD's getprotobyname_r
and getservbyname_r
function in versions 3.7
and later that will cause a SEGV when called without doing a bzero
on
their return structs prior to calling these functions. Current Perl's
should handle this problem correctly. Older threaded Perls (5.8.6 or earlier)
will run into this problem. If you want to run a threaded Perl on OpenBSD
3.7 or higher, you will need to upgrade to at least Perl 5.8.7.
Steve Peters <steve@fisharerojo.org>
Please report any errors, updates, or suggestions to perlbug@perl.org.
perlopentut - tutorial on opening things in Perl
Perl has two simple, built-in ways to open files: the shell way for convenience, and the C way for precision. The shell way also has 2- and 3-argument forms, which have different semantics for handling the filename. The choice is yours.
Perl's open function was designed to mimic the way command-line
redirection in the shell works. Here are some basic examples
from the shell:
- $ myprogram file1 file2 file3
- $ myprogram < inputfile
- $ myprogram > outputfile
- $ myprogram >> outputfile
- $ myprogram | otherprogram
- $ otherprogram | myprogram
And here are some more advanced examples:
- $ otherprogram | myprogram f1 - f2
- $ otherprogram 2>&1 | myprogram -
- $ myprogram <&3
- $ myprogram >&4
Programmers accustomed to constructs like those above can take comfort in learning that Perl directly supports these familiar constructs using virtually the same syntax as the shell.
The open function takes two arguments: the first is a filehandle,
and the second is a single string comprising both what to open and how
to open it. open returns true when it works, and when it fails,
returns a false value and sets the special variable $!
to reflect
the system error. If the filehandle was previously opened, it will
be implicitly closed first.
For example:
If you prefer the low-punctuation version, you could write that this way:
A few things to notice. First, the leading <
is optional.
If omitted, Perl assumes that you want to open the file for reading.
Note also that the first example uses the || logical operator, and the
second uses or
, which has lower precedence. Using || in the latter
examples would effectively mean
which is definitely not what you want.
The other important thing to notice is that, just as in the shell, any whitespace before or after the filename is ignored. This is good, because you wouldn't want these to do different things:
Ignoring surrounding whitespace also helps for when you read a filename in from a different file, and forget to trim it before opening:
This is not a bug, but a feature. Because open mimics the shell in
its style of using redirection arrows to specify how to open the file, it
also does so with respect to extra whitespace around the filename itself
as well. For accessing files with naughty names, see
Dispelling the Dweomer.
There is also a 3-argument version of open, which lets you put the
special redirection characters into their own argument:
In this case, the filename to open is the actual string in $datafile
,
so you don't have to worry about $datafile
containing characters
that might influence the open mode, or whitespace at the beginning of
the filename that would be absorbed in the 2-argument version. Also,
any reduction of unnecessary string interpolation is a good thing.
open's first argument can be a reference to a filehandle. As of
perl 5.6.0, if the argument is uninitialized, Perl will automatically
create a filehandle and put a reference to it in the first argument,
like so:
Indirect filehandles make namespace management easier. Since filehandles
are global to the current package, two subroutines trying to open
INFILE
will clash. With two functions opening indirect filehandles
like my $infile
, there's no clash and no need to worry about future
conflicts.
Another convenient behavior is that an indirect filehandle automatically closes when there are no more references to it:
Indirect filehandles also make it easy to pass filehandles to and return filehandles from subroutines:
In C, when you want to open a file using the standard I/O library,
you use the fopen
function, but when opening a pipe, you use the
popen
function. But in the shell, you just use a different redirection
character. That's also the case for Perl. The open call
remains the same--just its argument differs.
If the leading character is a pipe symbol, open starts up a new
command and opens a write-only filehandle leading into that command.
This lets you write into that handle and have what you write show up on
that command's standard input. For example:
If the trailing character is a pipe, you start up a new command and open a read-only filehandle leading out of that command. This lets whatever that command writes to its standard output show up on your handle for reading. For example:
What happens if you try to open a pipe to or from a non-existent
command? If possible, Perl will detect the failure and set $!
as
usual. But if the command contains special shell characters, such as
> or *
, called 'metacharacters', Perl does not execute the
command directly. Instead, Perl runs the shell, which then tries to
run the command. This means that it's the shell that gets the error
indication. In such a case, the open call will only indicate
failure if Perl can't even run the shell. See How can I capture STDERR from an external command? in perlfaq8 to see how to cope with
this. There's also an explanation in perlipc.
If you would like to open a bidirectional pipe, the IPC::Open2 library will handle this for you. Check out Bidirectional Communication with Another Process in perlipc
perl-5.6.x introduced a version of piped open that executes a process
based on its command line arguments without relying on the shell. (Similar
to the system(@LIST) notation.) This is safer and faster than executing
a single argument pipe-command, but does not allow special shell
constructs. (It is also not supported on Microsoft Windows, Mac OS Classic
or RISC OS.)
Here's an example of open '-|'
, which prints a random Unix
fortune cookie as uppercase:
And this open '|-'
pipes into lpr:
Again following the lead of the standard shell utilities, Perl's
open function treats a file whose name is a single minus, "-", in a
special way. If you open minus for reading, it really means to access
the standard input. If you open minus for writing, it really means to
access the standard output.
If minus can be used as the default input or default output, what happens
if you open a pipe into or out of minus? What's the default command it
would run? The same script as you're currently running! This is actually
a stealth fork hidden inside an open call. See
Safe Pipe Opens in perlipc for details.
It is possible to specify both read and write access. All you do is add a "+" symbol in front of the redirection. But as in the shell, using a less-than on a file never creates a new file; it only opens an existing one. On the other hand, using a greater-than always clobbers (truncates to zero length) an existing file, or creates a brand-new one if there isn't an old one. Adding a "+" for read-write doesn't affect whether it only works on existing files or always clobbers existing ones.
The first one won't create a new file, and the second one will always
clobber an old one. The third one will create a new file if necessary
and not clobber an old one, and it will allow you to read at any point
in the file, but all writes will always go to the end. In short,
the first case is substantially more common than the second and third
cases, which are almost always wrong. (If you know C, the plus in
Perl's open is historically derived from the one in C's fopen(3S),
which it ultimately calls.)
In fact, when it comes to updating a file, unless you're working on a binary file as in the WTMP case above, you probably don't want to use this approach for updating. Instead, Perl's -i flag comes to the rescue. The following command takes all the C, C++, or yacc source or header files and changes all their foo's to bar's, leaving the old version in the original filename with a ".orig" tacked on the end:
- $ perl -i.orig -pe 's/\bfoo\b/bar/g' *.[Cchy]
This is a short cut for some renaming games that are really the best way to update textfiles. See the second question in perlfaq5 for more details.
One of the most common uses for open is one you never
even notice. When you process the ARGV filehandle using
<ARGV>
, Perl actually does an implicit open
on each file in @ARGV. Thus a program called like this:
- $ myprogram file1 file2 file3
can have all its files opened and processed one at a time using a construct no more complex than:
- while (<>) {
- # do something with $_
- }
If @ARGV is empty when the loop first begins, Perl pretends you've opened
up minus, that is, the standard input. In fact, $ARGV, the currently
open file during <ARGV>
processing, is even set to "-"
in these circumstances.
You are welcome to pre-process your @ARGV before starting the loop to make sure it's to your liking. One reason to do this might be to remove command options beginning with a minus. While you can always roll the simple ones by hand, the Getopts modules are good for this:
- use Getopt::Std;
- # -v, -D, -o ARG, sets $opt_v, $opt_D, $opt_o
- getopts("vDo:");
- # -v, -D, -o ARG, sets $args{v}, $args{D}, $args{o}
- getopts("vDo:", \%args);
Or the standard Getopt::Long module to permit named arguments:
- use Getopt::Long;
- GetOptions( "verbose" => \$verbose, # --verbose
- "Debug" => \$debug, # --Debug
- "output=s" => \$output );
- # --output=somestring or --output somestring
Another reason for preprocessing arguments is to make an empty argument list default to all files:
- @ARGV = glob("*") unless @ARGV;
You could even filter out all but plain, text files. This is a bit silent, of course, and you might prefer to mention them on the way.
- @ARGV = grep { -f && -T } @ARGV;
If you're using the -n or -p command-line options, you
should put changes to @ARGV in a BEGIN{}
block.
Remember that a normal open has special properties, in that it might
call fopen(3S) or it might called popen(3S), depending on what its
argument looks like; that's why it's sometimes called "magic open".
Here's an example:
This sort of thing also comes into play in filter processing. Because
<ARGV>
processing employs the normal, shell-style Perl open,
it respects all the special things we've already seen:
- $ myprogram f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile
That program will read from the file f1, the process cmd1, standard input (tmpfile in this case), the f2 file, the cmd2 command, and finally the f3 file.
Yes, this also means that if you have files named "-" (and so on) in
your directory, they won't be processed as literal files by open.
You'll need to pass them as "./-", much as you would for the rm program,
or you could use sysopen as described below.
One of the more interesting applications is to change files of a certain name into pipes. For example, to autoprocess gzipped or compressed files by decompressing them with gzip:
- @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc $_ |" : $_ } @ARGV;
Or, if you have the GET program installed from LWP, you can fetch URLs before processing them:
- @ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV;
It's not for nothing that this is called magic <ARGV>
.
Pretty nifty, eh?
If you want the convenience of the shell, then Perl's open is
definitely the way to go. On the other hand, if you want finer precision
than C's simplistic fopen(3S) provides you should look to Perl's
sysopen, which is a direct hook into the open(2) system call.
That does mean it's a bit more involved, but that's the price of
precision.
sysopen takes 3 (or 4) arguments.
- sysopen HANDLE, PATH, FLAGS, [MASK]
The HANDLE argument is a filehandle just as with open. The PATH is
a literal path, one that doesn't pay attention to any greater-thans or
less-thans or pipes or minuses, nor ignore whitespace. If it's there,
it's part of the path. The FLAGS argument contains one or more values
derived from the Fcntl module that have been or'd together using the
bitwise "|" operator. The final argument, the MASK, is optional; if
present, it is combined with the user's current umask for the creation
mode of the file. You should usually omit this.
Although the traditional values of read-only, write-only, and read-write are 0, 1, and 2 respectively, this is known not to hold true on some systems. Instead, it's best to load in the appropriate constants first from the Fcntl module, which supplies the following standard flags:
Less common flags that are sometimes available on some operating
systems include O_BINARY
, O_TEXT
, O_SHLOCK
, O_EXLOCK
,
O_DEFER
, O_SYNC
, O_ASYNC
, O_DSYNC
, O_RSYNC
,
O_NOCTTY
, O_NDELAY
and O_LARGEFILE
. Consult your open(2)
manpage or its local equivalent for details. (Note: starting from
Perl release 5.6 the O_LARGEFILE
flag, if available, is automatically
added to the sysopen() flags because large files are the default.)
Here's how to use sysopen to emulate the simple open calls we had
before. We'll omit the || die $! checks for clarity, but make sure
you always check the return values in real code. These aren't quite
the same, since open will trim leading and trailing whitespace,
but you'll get the idea.
To open a file for reading:
To open a file for writing, creating a new file if needed or else truncating an old file:
To open a file for appending, creating one if necessary:
To open a file for update, where the file must already exist:
And here are things you can do with sysopen that you cannot do with
a regular open. As you'll see, it's just a matter of controlling the
flags in the third argument.
To open a file for writing, creating a new file which must not previously exist:
- sysopen(FH, $path, O_WRONLY | O_EXCL | O_CREAT);
To open a file for appending, where that file must already exist:
- sysopen(FH, $path, O_WRONLY | O_APPEND);
To open a file for update, creating a new file if necessary:
- sysopen(FH, $path, O_RDWR | O_CREAT);
To open a file for update, where that file must not already exist:
- sysopen(FH, $path, O_RDWR | O_EXCL | O_CREAT);
To open a file without blocking, creating one if necessary:
- sysopen(FH, $path, O_WRONLY | O_NONBLOCK | O_CREAT);
If you omit the MASK argument to sysopen, Perl uses the octal value
0666. The normal MASK to use for executables and directories should
be 0777, and for anything else, 0666.
Why so permissive? Well, it isn't really. The MASK will be modified
by your process's current umask. A umask is a number representing
disabled permissions bits; that is, bits that will not be turned on
in the created file's permissions field.
For example, if your umask were 027, then the 020 part would
disable the group from writing, and the 007 part would disable others
from reading, writing, or executing. Under these conditions, passing
sysopen 0666 would create a file with mode 0640, since 0666 & ~027
is 0640.
You should seldom use the MASK argument to sysopen(). That takes
away the user's freedom to choose what permission new files will have.
Denying choice is almost always a bad thing. One exception would be for
cases where sensitive or private data is being stored, such as with mail
folders, cookie files, and internal temporary files.
Sometimes you already have a filehandle open, and want to make another
handle that's a duplicate of the first one. In the shell, we place an
ampersand in front of a file descriptor number when doing redirections.
For example, 2>&1
makes descriptor 2 (that's STDERR in Perl)
be redirected into descriptor 1 (which is usually Perl's STDOUT).
The same is essentially true in Perl: a filename that begins with an
ampersand is treated instead as a file descriptor if a number, or as a
filehandle if a string.
That means that if a function is expecting a filename, but you don't want to give it a filename because you already have the file open, you can just pass the filehandle with a leading ampersand. It's best to use a fully qualified handle though, just in case the function happens to be in a different package:
- somefunction("&main::LOGFILE");
This way if somefunction() is planning on opening its argument, it can just use the already opened handle. This differs from passing a handle, because with a handle, you don't open the file. Here you have something you can pass to open.
If you have one of those tricky, newfangled I/O objects that the C++ folks are raving about, then this doesn't work because those aren't a proper filehandle in the native Perl sense. You'll have to use fileno() to pull out the proper descriptor number, assuming you can:
- use IO::Socket;
- $handle = IO::Socket::INET->new("www.perl.com:80");
- $fd = $handle->fileno;
- somefunction("&$fd"); # not an indirect function call
It can be easier (and certainly will be faster) just to use real filehandles though:
If the filehandle or descriptor number is preceded not just with a simple "&" but rather with a "&=" combination, then Perl will not create a completely new descriptor opened to the same place using the dup(2) system call. Instead, it will just make something of an alias to the existing one using the fdopen(3S) library call. This is slightly more parsimonious of systems resources, although this is less a concern these days. Here's an example of that:
If you're using magic <ARGV>
, you could even pass in as a
command line argument in @ARGV something like "<&=$MHCONTEXTFD"
,
but we've never seen anyone actually do this.
Perl is more of a DWIMmer language than something like Java--where DWIM is an acronym for "do what I mean". But this principle sometimes leads to more hidden magic than one knows what to do with. In this way, Perl is also filled with dweomer, an obscure word meaning an enchantment. Sometimes, Perl's DWIMmer is just too much like dweomer for comfort.
If magic open is a bit too magical for you, you don't have to turn
to sysopen. To open a file with arbitrary weird characters in
it, it's necessary to protect any leading and trailing whitespace.
Leading whitespace is protected by inserting a "./"
in front of a
filename that starts with whitespace. Trailing whitespace is protected
by appending an ASCII NUL byte ("\0"
) at the end of the string.
This assumes, of course, that your system considers dot the current
working directory, slash the directory separator, and disallows ASCII
NULs within a valid filename. Most systems follow these conventions,
including all POSIX systems as well as proprietary Microsoft systems.
The only vaguely popular system that doesn't work this way is the
"Classic" Macintosh system, which uses a colon where the rest of us
use a slash. Maybe sysopen isn't such a bad idea after all.
If you want to use <ARGV>
processing in a totally boring
and non-magical way, you could do this first:
- # "Sam sat on the ground and put his head in his hands.
- # 'I wish I had never come here, and I don't want to see
- # no more magic,' he said, and fell silent."
- for (@ARGV) {
- s#^([^./])#./$1#;
- $_ .= "\0";
- }
- while (<>) {
- # now process $_
- }
But be warned that users will not appreciate being unable to use "-" to mean standard input, per the standard convention.
You've probably noticed how Perl's warn and die functions can
produce messages like:
- Some warning at scriptname line 29, <FH> line 7.
That's because you opened a filehandle FH, and had read in seven records from it. But what was the name of the file, rather than the handle?
If you aren't running with strict refs
, or if you've turned them off
temporarily, then all you have to do is this:
Since you're using the pathname of the file as its handle, you'll get warnings more like
- Some warning at scriptname line 29, </etc/motd> line 7.
Remember how we said that Perl's open took two arguments? That was a
passive prevarication. You see, it can also take just one argument.
If and only if the variable is a global variable, not a lexical, you
can pass open just one argument, the filehandle, and it will
get the path from the global scalar variable of the same name.
Why is this here? Someone has to cater to the hysterical porpoises. It's something that's been in Perl since the very beginning, if not before.
One clever move with STDOUT is to explicitly close it when you're done with the program.
If you don't do this, and your program fills up the disk partition due to a command line redirection, it won't report the error exit with a failure status.
You don't have to accept the STDIN and STDOUT you were given. You are welcome to reopen them if you'd like.
And then these can be accessed directly or passed on to subprocesses. This makes it look as though the program were initially invoked with those redirections from the command line.
It's probably more interesting to connect these to pipes. For example:
This makes it appear as though your program were called with its stdout already piped into your pager. You can also use this kind of thing in conjunction with an implicit fork to yourself. You might do this if you would rather handle the post processing in your own program, just in a different process:
This technique can be applied to repeatedly push as many filters on your output stream as you wish.
These topics aren't really arguments related to open or sysopen,
but they do affect what you do with your open files.
When is a file not a file? Well, you could say when it exists but isn't a plain file. We'll check whether it's a symbolic link first, just in case.
- if (-l $file || ! -f _) {
- print "$file is not a plain file\n";
- }
What other kinds of files are there than, well, files? Directories,
symbolic links, named pipes, Unix-domain sockets, and block and character
devices. Those are all files, too--just not plain files. This isn't
the same issue as being a text file. Not all text files are plain files.
Not all plain files are text files. That's why there are separate -f
and -T
file tests.
To open a directory, you should use the opendir function, then
process it with readdir, carefully restoring the directory
name if necessary:
If you want to process directories recursively, it's better to use the File::Find module. For example, this prints out all files recursively and adds a slash to their names if the file is a directory.
This finds all bogus symbolic links beneath a particular directory:
As you see, with symbolic links, you can just pretend that it is
what it points to. Or, if you want to know what it points to, then
readlink is called for:
Named pipes are a different matter. You pretend they're regular files, but their opens will normally block until there is both a reader and a writer. You can read more about them in Named Pipes in perlipc. Unix-domain sockets are rather different beasts as well; they're described in Unix-Domain TCP Clients and Servers in perlipc.
When it comes to opening devices, it can be easy and it can be tricky. We'll assume that if you're opening up a block device, you know what you're doing. The character devices are more interesting. These are typically used for modems, mice, and some kinds of printers. This is described in How do I read and write the serial port? in perlfaq8 It's often enough to open them carefully:
With descriptors that you haven't opened using sysopen, such as
sockets, you can set them to be non-blocking using fcntl:
Rather than losing yourself in a morass of twisting, turning ioctls,
all dissimilar, if you're going to manipulate ttys, it's best to
make calls out to the stty(1) program if you have it, or else use the
portable POSIX interface. To figure this all out, you'll need to read the
termios(3) manpage, which describes the POSIX interface to tty devices,
and then POSIX, which describes Perl's interface to POSIX. There are
also some high-level modules on CPAN that can help you with these games.
Check out Term::ReadKey and Term::ReadLine.
What else can you open? To open a connection using sockets, you won't use one of Perl's two open functions. See Sockets: Client/Server Communication in perlipc for that. Here's an example. Once you have it, you can use FH as a bidirectional filehandle.
For opening up a URL, the LWP modules from CPAN are just what the doctor ordered. There's no filehandle interface, but it's still easy to get the contents of a document:
- use LWP::Simple;
- $doc = get('http://www.cpan.org/');
On certain legacy systems with what could charitably be called terminally convoluted (some would say broken) I/O models, a file isn't a file--at least, not with respect to the C standard I/O library. On these old systems whose libraries (but not kernels) distinguish between text and binary streams, to get files to behave properly you'll have to bend over backwards to avoid nasty problems. On such infelicitous systems, sockets and pipes are already opened in binary mode, and there is currently no way to turn that off. With files, you have more options.
Another option is to use the binmode function on the appropriate
handles before doing regular I/O on them:
Passing sysopen a non-standard flag option will also open the file in
binary mode on those systems that support it. This is the equivalent of
opening the file normally, then calling binmode on the handle.
Now you can use read and print on that handle without worrying
about the non-standard system I/O library breaking your data. It's not
a pretty picture, but then, legacy systems seldom are. CP/M will be
with us until the end of days, and after.
On systems with exotic I/O systems, it turns out that, astonishingly
enough, even unbuffered I/O using sysread and syswrite might do
sneaky data mutilation behind your back.
Depending on the vicissitudes of your runtime system, even these calls
may need binmode or O_BINARY
first. Systems known to be free of
such difficulties include Unix, the Mac OS, Plan 9, and Inferno.
In a multitasking environment, you may need to be careful not to collide with other processes who want to do I/O on the same files as you are working on. You'll often need shared or exclusive locks on files for reading and writing respectively. You might just pretend that only exclusive locks exist.
Never use the existence of a file -e $file
as a locking indication,
because there is a race condition between the test for the existence of
the file and its creation. It's possible for another process to create
a file in the slice of time between your existence check and your attempt
to create the file. Atomicity is critical.
Perl's most portable locking interface is via the flock function,
whose simplicity is emulated on systems that don't directly support it
such as SysV or Windows. The underlying semantics may affect how
it all works, so you should learn how flock is implemented on your
system's port of Perl.
File locking does not lock out another process that would like to do I/O. A file lock only locks out others trying to get a lock, not processes trying to do I/O. Because locks are advisory, if one process uses locking and another doesn't, all bets are off.
By default, the flock call will block until a lock is granted.
A request for a shared lock will be granted as soon as there is no
exclusive locker. A request for an exclusive lock will be granted as
soon as there is no locker of any kind. Locks are on file descriptors,
not file names. You can't lock a file until you open it, and you can't
hold on to a lock once the file has been closed.
Here's how to get a blocking shared lock on a file, typically used for reading:
You can get a non-blocking lock by using LOCK_NB
.
This can be useful for producing more user-friendly behaviour by warning if you're going to be blocking:
To get an exclusive lock, typically used for writing, you have to be
careful. We sysopen the file so it can be locked before it gets
emptied. You can get a nonblocking version using LOCK_EX | LOCK_NB
.
Finally, due to the uncounted millions who cannot be dissuaded from wasting cycles on useless vanity devices called hit counters, here's how to increment a number in a file safely:
- use Fcntl qw(:DEFAULT :flock);
- sysopen(FH, "numfile", O_RDWR | O_CREAT)
- or die "can't open numfile: $!";
- # autoflush FH
- $ofh = select(FH); $| = 1; select ($ofh);
- flock(FH, LOCK_EX)
- or die "can't write-lock numfile: $!";
- $num = <FH> || 0;
- seek(FH, 0, 0)
- or die "can't rewind numfile : $!";
- print FH $num+1, "\n"
- or die "can't write numfile: $!";
- truncate(FH, tell(FH))
- or die "can't truncate numfile: $!";
- close(FH)
- or die "can't close numfile: $!";
In Perl 5.8.0 a new I/O framework called "PerlIO" was introduced. This is a new "plumbing" for all the I/O happening in Perl; for the most part everything will work just as it did, but PerlIO also brought in some new features such as the ability to think of I/O as "layers". One I/O layer may in addition to just moving the data also do transformations on the data. Such transformations may include compression and decompression, encryption and decryption, and transforming between various character encodings.
Full discussion about the features of PerlIO is out of scope for this tutorial, but here is how to recognize the layers being used:
The three-(or more)-argument form of open is being used and the
second argument contains something else in addition to the usual
'<'
, '>'
, '>>'
, '|'
and their variants,
for example:
The two-argument form of binmode is being used, for example
- binmode($fh, ":encoding(utf16)");
For more detailed discussion about PerlIO see PerlIO; for more detailed discussion about Unicode and I/O see perluniintro.
The open and sysopen functions in perlfunc(1);
the system open(2), dup(2), fopen(3), and fdopen(3) manpages;
the POSIX documentation.
Copyright 1998 Tom Christiansen.
This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.
Irrespective of its distribution, all code examples in these files are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required.
First release: Sat Jan 9 08:09:11 MST 1999
perlos2 - Perl under OS/2, DOS, Win0.3*, Win0.95 and WinNT.
One can read this document in the following formats:
- man perlos2
- view perl perlos2
- explorer perlos2.html
- info perlos2
to list some (not all may be available simultaneously), or it may be read as is: either as README.os2, or pod/perlos2.pod.
To read the .INF version of documentation (very recommended) outside of OS/2, one needs an IBM's reader (may be available on IBM ftp sites (?) (URL anyone?)) or shipped with PC DOS 7.0 and IBM's Visual Age C++ 3.5.
A copy of a Win* viewer is contained in the "Just add OS/2 Warp" package
- ftp://ftp.software.ibm.com/ps/products/os2/tools/jaow/jaow.zip
in ?:\JUST_ADD\view.exe. This gives one an access to EMX's .INF docs as well (text form is available in /emx/doc in EMX's distribution). There is also a different viewer named xview.
Note that if you have lynx.exe or netscape.exe installed, you can follow WWW links
from this document in .INF format. If you have EMX docs installed
correctly, you can follow library links (you need to have view emxbook
working by setting EMXBOOK
environment variable as it is described
in EMX docs).
The target is to make OS/2 one of the best supported platform for using/building/developing Perl and Perl applications, as well as make Perl the best language to use under OS/2. The secondary target is to try to make this work under DOS and Win* as well (but not too hard).
The current state is quite close to this target. Known limitations:
Some *nix programs use fork() a lot; with the mostly useful flavors of perl for OS/2 (there are several built simultaneously) this is supported; but some flavors do not support this (e.g., when Perl is called from inside REXX). Using fork() after useing dynamically loading extensions would not work with very old versions of EMX.
You need a separate perl executable perl__.exe (see perl__.exe) if you want to use PM code in your application (as Perl/Tk or OpenGL Perl modules do) without having a text-mode window present.
While using the standard perl.exe from a text-mode window is possible too, I have seen cases when this causes degradation of the system stability. Using perl__.exe avoids such a degradation.
There is no simple way to access WPS objects. The only way I know
is via OS2::REXX
and SOM
extensions (see OS2::REXX, SOM).
However, we do not have access to
convenience methods of Object-REXX. (Is it possible at all? I know
of no Object-REXX API.) The SOM
extension (currently in alpha-text)
may eventually remove this shortcoming; however, due to the fact that
DII is not supported by the SOM
module, using SOM
is not as
convenient as one would like it.
Please keep this list up-to-date by informing me about other items.
Since OS/2 port of perl uses a remarkable EMX environment, it can run (and build extensions, and - possibly - be built itself) under any environment which can run EMX. The current list is DOS, DOS-inside-OS/2, Win0.3*, Win0.95 and WinNT. Out of many perl flavors, only one works, see perl_.exe.
Note that not all features of Perl are available under these environments. This depends on the features the extender - most probably RSX - decided to implement.
Cf. Prerequisites.
EMX runtime is required (may be substituted by RSX). Note that
it is possible to make perl_.exe to run under DOS without any
external support by binding emx.exe/rsx.exe to it, see emxbind
. Note
that under DOS for best results one should use RSX runtime, which
has much more functions working (like fork, popen
and so on). In
fact RSX is required if there is no VCPI present. Note the
RSX requires DPMI. Many implementations of DPMI are known to be very
buggy, beware!
Only the latest runtime is supported, currently 0.9d fix 03. Perl may run
under earlier versions of EMX, but this is not tested.
One can get different parts of EMX from, say
- ftp://crydee.sai.msu.ru/pub/comp/os/os2/leo/gnu/emx+gcc/
- http://hobbes.nmsu.edu/h-browse.php?dir=/pub/os2/dev/emx/v0.9d/
The runtime component should have the name emxrt.zip.
NOTE. When using emx.exe/rsx.exe, it is enough to have them on your path. One does not need to specify them explicitly (though this
- emx perl_.exe -de 0
will work as well.)
To run Perl on DPMI platforms one needs RSX runtime. This is needed under DOS-inside-OS/2, Win0.3*, Win0.95 and WinNT (see Other OSes). RSX would not work with VCPI only, as EMX would, it requires DMPI.
Having RSX and the latest sh.exe one gets a fully functional
*nix-ish environment under DOS, say, fork, ``
and
pipe-open work. In fact, MakeMaker works (for static build), so one
can have Perl development environment under DOS.
One can get RSX from, say
- http://cd.textfiles.com/hobbesos29804/disk1/EMX09C/
- ftp://crydee.sai.msu.ru/pub/comp/os/os2/leo/gnu/emx+gcc/contrib/
Contact the author on rainer@mathematik.uni-bielefeld.de
.
The latest sh.exe with DOS hooks is available in
- http://www.ilyaz.org/software/os2/
as sh_dos.zip or under similar names starting with sh
, pdksh
etc.
Perl does not care about file systems, but the perl library contains many files with long names, so to install it intact one needs a file system which supports long file names.
Note that if you do not plan to build the perl itself, it may be possible to fool EMX to truncate file names. This is not supported, read EMX docs to see how to do it.
To start external programs with complicated command lines (like with pipes in between, and/or quoting of arguments), Perl uses an external shell. With EMX port such shell should be named sh.exe, and located either in the wired-in-during-compile locations (usually F:/bin), or in configurable location (see PERL_SH_DIR).
For best results use EMX pdksh. The standard binary (5.2.14 or later) runs under DOS (with RSX) as well, see
- http://www.ilyaz.org/software/os2/
Start your Perl program foo.pl with arguments arg1 arg2 arg3
the
same way as on any other platform, by
- perl foo.pl arg1 arg2 arg3
If you want to specify perl options -my_opts
to the perl itself (as
opposed to your program), use
- perl -my_opts foo.pl arg1 arg2 arg3
Alternately, if you use OS/2-ish shell, like CMD or 4os2, put the following at the start of your perl script:
- extproc perl -S -my_opts
rename your program to foo.cmd, and start it by typing
- foo arg1 arg2 arg3
Note that because of stupid OS/2 limitations the full path of the perl
script is not available when you use extproc
, thus you are forced to
use -S
perl switch, and your script should be on the PATH
. As a plus
side, if you know a full path to your script, you may still start it
with
- perl ../../blah/foo.cmd arg1 arg2 arg3
(note that the argument -my_opts
is taken care of by the extproc
line
in your script, see extproc on the first line).
To understand what the above magic does, read perl docs about -S
switch - see perlrun, and cmdref about extproc
:
- view perl perlrun
- man perlrun
- view cmdref extproc
- help extproc
or whatever method you prefer.
There are also endless possibilities to use executable extensions of 4os2, associations of WPS and so on... However, if you use *nixish shell (like sh.exe supplied in the binary distribution), you need to follow the syntax specified in Command Switches in perlrun.
Note that -S switch supports scripts with additional extensions .cmd, .btm, .bat, .pl as well.
This is what system() (see system), ``
(see
I/O Operators in perlop), and open pipe (see open)
are for. (Avoid exec() (see exec) unless you know what you
do).
Note however that to use some of these operators you need to have a sh-syntax shell installed (see Pdksh, Frequently asked questions), and perl should be able to find it (see PERL_SH_DIR).
The cases when the shell is used are:
One-argument system() (see system), exec() (see exec) with redirection or shell meta-characters;
Pipe-open (see open) with the command which contains redirection or shell meta-characters;
Backticks ``
(see I/O Operators in perlop) with the command which contains
redirection or shell meta-characters;
If the executable called by system()/exec()/pipe-open()/``
is a script
with the "magic" #!
line or extproc
line which specifies shell;
If the executable called by system()/exec()/pipe-open()/``
is a script
without "magic" line, and $ENV{EXECSHELL}
is set to shell;
If the executable called by system()/exec()/pipe-open()/``
is not
found (is not this remark obsolete?);
For globbing (see glob, I/O Operators in perlop) (obsolete? Perl uses builtin globbing nowadays...).
For the sake of speed for a common case, in the above algorithms backslashes in the command name are not considered as shell metacharacters.
Perl starts scripts which begin with cookies
extproc
or #!
directly, without an intervention of shell. Perl uses the
same algorithm to find the executable as pdksh: if the path
on #!
line does not work, and contains /, then the directory
part of the executable is ignored, and the executable
is searched in . and on PATH
. To find arguments for these scripts
Perl uses a different algorithm than pdksh: up to 3 arguments are
recognized, and trailing whitespace is stripped.
If a script
does not contain such a cooky, then to avoid calling sh.exe, Perl uses
the same algorithm as pdksh: if $ENV{EXECSHELL}
is set, the
script is given as the first argument to this command, if not set, then
$ENV{COMSPEC} /c
is used (or a hardwired guess if $ENV{COMSPEC}
is
not set).
When starting scripts directly, Perl uses exactly the same algorithm as for
the search of script given by -S command-line option: it will look in
the current directory, then on components of $ENV{PATH}
using the
following order of appended extensions: no extension, .cmd, .btm,
.bat, .pl.
Note that Perl will start to look for scripts only if OS/2 cannot start the
specified application, thus system 'blah'
will not look for a script if
there is an executable file blah.exe anywhere on PATH
. In
other words, PATH
is essentially searched twice: once by the OS for
an executable, then by Perl for scripts.
Note also that executable files on OS/2 can have an arbitrary extension,
but .exe will be automatically appended if no dot is present in the name.
The workaround is as simple as that: since blah. and blah denote the
same file (at list on FAT and HPFS file systems), to start an executable residing in file n:/bin/blah (no
extension) give an argument n:/bin/blah. (dot appended) to system().
Perl will start PM programs from VIO (=text-mode) Perl process in a
separate PM session;
the opposite is not true: when you start a non-PM program from a PM
Perl process, Perl would not run it in a separate session. If a separate
session is desired, either ensure
that shell will be used, as in system 'cmd /c myprog'
, or start it using
optional arguments to system() documented in OS2::Process
module. This
is considered to be a feature.
Perl binary distributions come with a testperl.cmd script which tries
to detect common problems with misconfigured installations. There is a
pretty large chance it will discover which step of the installation you
managed to goof. ;-)
Did you run your programs with -w
switch? See
2 (and DOS) programs under Perl in Starting OS.
Do you try to run internal shell commands, like `copy a b`
(internal for cmd.exe), or `glob a*b`
(internal for ksh)? You
need to specify your shell explicitly, like `cmd /c copy a b`
,
since Perl cannot deduce which commands are internal to your shell.
-Zmt -Zcrtdll
?
Well, nowadays Perl DLL should be usable from a differently compiled program too... If you can run Perl code from REXX scripts (see OS2::REXX), then there are some other aspect of interaction which are overlooked by the current hackish code to support differently-compiled principal programs.
If everything else fails, you need to build a stand-alone DLL for perl. Contact me, I did it once. Sockets would not work, as a lot of other stuff.
Some time ago I had reports it does not work. Nowadays it is checked in the Perl test suite, so grep ./t subdirectory of the build tree (as well as *.t files in the ./lib subdirectory) to find how it should be done "correctly".
``
and pipe-open do not work under DOS.This may a variant of just I cannot run external programs, or a
deeper problem. Basically: you need RSX (see Prerequisites)
for these commands to work, and you may need a port of sh.exe which
understands command arguments. One of such ports is listed in
Prerequisites under RSX. Do not forget to set variable
PERL_SH_DIR as well.
DPMI is required for RSX.
find.exe "pattern" fileThe whole idea of the "standard C API to start applications" is that
the forms foo
and "foo"
of program arguments are completely
interchangeable. find breaks this paradigm;
- find "pattern" file
- find pattern file
are not equivalent; find cannot be started directly using the above API. One needs a way to surround the doublequotes in some other quoting construction, necessarily having an extra non-Unixish shell in between.
Use one of
- system 'cmd', '/c', 'find "pattern" file';
- `cmd /c 'find "pattern" file'`
This would start find.exe via cmd.exe via sh.exe
via
perl.exe
, but this is a price to pay if you want to use
non-conforming program.
The most convenient way of installing a binary distribution of perl is via perl installer install.exe. Just follow the instructions, and 99% of the installation blues would go away.
Note however, that you need to have unzip.exe on your path, and EMX environment running. The latter means that if you just installed EMX, and made all the needed changes to Config.sys, you may need to reboot in between. Check EMX runtime by running
- emxrev
Binary installer also creates a folder on your desktop with some useful objects. If you need to change some aspects of the work of the binary installer, feel free to edit the file Perl.pkg. This may be useful e.g., if you need to run the installer many times and do not want to make many interactive changes in the GUI.
Things not taken care of by automatic binary installation:
PERL_BADLANG
may be needed if you change your codepage after perl installation, and the new value is not supported by EMX. See PERL_BADLANG.
PERL_BADFREE
see PERL_BADFREE.
This file resides somewhere deep in the location you installed your perl library, find it out by
- perl -MConfig -le "print $INC{'Config.pm'}"
While most important values in this file are updated by the binary installer, some of them may need to be hand-edited. I know no such data, please keep me informed if you find one. Moreover, manual changes to the installed version may need to be accompanied by an edit of this file.
NOTE. Because of a typo the binary installer of 5.00305
would install a variable PERL_SHPATH
into Config.sys. Please
remove this variable and put PERL_SH_DIR instead.
As of version 5.00305, OS/2 perl binary distribution comes split into 11 components. Unfortunately, to enable configurable binary installation, the file paths in the zip files are not absolute, but relative to some directory.
Note that the extraction with the stored paths is still necessary
(default with unzip, specify -d
to pkunzip). However, you
need to know where to extract the files. You need also to manually
change entries in Config.sys to reflect where did you put the
files. Note that if you have some primitive unzipper (like
pkunzip
), you may get a lot of warnings/errors during
unzipping. Upgrade to (w)unzip.
Below is the sample of what to do to reproduce the configuration on my
machine. In VIEW.EXE you can press Ctrl-Insert
now, and
cut-and-paste from the resulting file - created in the directory you
started VIEW.EXE from.
For each component, we mention environment variables related to each installation directory. Either choose directories to match your values of the variables, or create/append-to variables to take into account the directories.
- unzip perl_exc.zip *.exe *.ico -d f:/emx.add/bin
- unzip perl_exc.zip *.dll -d f:/emx.add/dll
(have the directories with *.exe on PATH, and *.dll on
LIBPATH);
- unzip perl_aou.zip -d f:/emx.add/bin
(have the directory on PATH);
- unzip perl_utl.zip -d f:/emx.add/bin
(have the directory on PATH);
- unzip perl_mlb.zip -d f:/perllib/lib
If this directory is exactly the same as the prefix which was compiled
into perl.exe, you do not need to change
anything. However, for perl to find the library if you use a different
path, you need to
set PERLLIB_PREFIX
in Config.sys, see PERLLIB_PREFIX.
- unzip perl_ste.zip -d f:/perllib/lib/site_perl/5.18.2/
Same remark as above applies. Additionally, if this directory is not
one of directories on @INC (and @INC is influenced by PERLLIB_PREFIX
), you
need to put this
directory and subdirectory ./os2 in PERLLIB
or PERL5LIB
variable. Do not use PERL5LIB
unless you have it set already. See
ENVIRONMENT in perl.
[Check whether this extraction directory is still applicable with the new directory structure layout!]
- unzip perl_blb.zip -d f:/perllib/lib
Same remark as for perl_ste.zip.
- unzip perl_man.zip -d f:/perllib/man
This directory should better be on MANPATH
. You need to have a
working man to access these files.
- unzip perl_mam.zip -d f:/perllib/man
This directory should better be on MANPATH
. You need to have a
working man to access these files.
- unzip perl_pod.zip -d f:/perllib/lib
This is used by the perldoc
program (see perldoc), and may be used to
generate HTML documentation usable by WWW browsers, and
documentation in zillions of other formats: info
, LaTeX
,
Acrobat
, FrameMaker
and so on. [Use programs such as
pod2latex etc.]
- unzip perl_inf.zip -d d:/os2/book
This directory should better be on BOOKSHELF
.
- unzip perl_sh.zip -d f:/bin
This is used by perl to run external commands which explicitly require shell, like the commands using redirection and shell metacharacters. It is also used instead of explicit /bin/sh.
Set PERL_SH_DIR
(see PERL_SH_DIR) if you move sh.exe from
the above location.
Note. It may be possible to use some other sh-compatible shell (untested).
After you installed the components you needed and updated the Config.sys correspondingly, you need to hand-edit Config.pm. This file resides somewhere deep in the location you installed your perl library, find it out by
- perl -MConfig -le "print $INC{'Config.pm'}"
You need to correct all the entries which look like file paths (they
currently start with f:/).
The automatic and manual perl installation leave precompiled paths inside perl executables. While these paths are overwriteable (see PERLLIB_PREFIX, PERL_SH_DIR), some people may prefer binary editing of paths inside the executables/DLLs.
Depending on how you built/installed perl you may have (otherwise identical) Perl documentation in the following formats:
Most probably the most convenient form. Under OS/2 view it as
- view perl
- view perl perlfunc
- view perl less
- view perl ExtUtils::MakeMaker
(currently the last two may hit a wrong location, but this may improve soon). Under Win* see SYNOPSIS.
If you want to build the docs yourself, and have OS/2 toolkit, run
- pod2ipf > perl.ipf
in /perllib/lib/pod directory, then
- ipfc /inf perl.ipf
(Expect a lot of errors during the both steps.) Now move it on your BOOKSHELF path.
If you have perl documentation in the source form, perl utilities installed, and GNU groff installed, you may use
- perldoc perlfunc
- perldoc less
- perldoc ExtUtils::MakeMaker
to access the perl documentation in the text form (note that you may get better results using perl manpages).
Alternately, try running pod2text on .pod files.
If you have man installed on your system, and you installed perl manpages, use something like this:
- man perlfunc
- man 3 less
- man ExtUtils.MakeMaker
to access documentation for different components of Perl. Start with
- man perl
Note that dot (.) is used as a package separator for documentation
for packages, and as usual, sometimes you need to give the section - 3
above - to avoid shadowing by the less(1) manpage.
Make sure that the directory above the directory with manpages is
on our MANPATH
, like this
- set MANPATH=c:/man;f:/perllib/man
for Perl manpages in f:/perllib/man/man1/ etc.
If you have some WWW browser available, installed the Perl documentation in the source form, and Perl utilities, you can build HTML docs. Cd to directory with .pod files, and do like this
- cd f:/perllib/lib/pod
- pod2html
After this you can direct your browser the file perl.html in this directory, and go ahead with reading docs, like this:
- explore file:///f:/perllib/lib/pod/perl.html
Alternatively you may be able to get these docs prebuilt from CPAN.
info
filesUsers of Emacs would appreciate it very much, especially with
CPerl
mode loaded. You need to get latest pod2texi
from CPAN
,
or, alternately, the prebuilt info pages.
for Acrobat
are available on CPAN (may be for slightly older version of
perl).
LaTeX
docscan be constructed using pod2latex
.
Here we discuss how to build Perl under OS/2.
Assume that you are a seasoned porter, so are sure that all the necessary tools are already present on your system, and you know how to get the Perl source distribution. Untar it, change to the extract directory, and
- gnupatch -p0 < os2\diff.configure
- sh Configure -des -D prefix=f:/perllib
- make
- make test
- make install
- make aout_test
- make aout_install
This puts the executables in f:/perllib/bin. Manually move them to the
PATH
, manually move the built perl*.dll to LIBPATH
(here for
Perl DLL * is a not-very-meaningful hex checksum), and run
- make installcmd INSTALLCMDDIR=d:/ir/on/path
Assuming that the man
-files were put on an appropriate location,
this completes the installation of minimal Perl system. (The binary
distribution contains also a lot of additional modules, and the
documentation in INF format.)
What follows is a detailed guide through these steps.
You need to have the latest EMX development environment, the full GNU tool suite (gawk renamed to awk, and GNU find.exe earlier on path than the OS/2 find.exe, same with sort.exe, to check use
- find --version
- sort --version
). You need the latest version of pdksh installed as sh.exe.
Check that you have BSD libraries and headers installed, and - optionally - Berkeley DB headers and libraries, and crypt.
Possible locations to get the files:
- ftp://ftp.uni-heidelberg.de/pub/os2/unix/
- http://hobbes.nmsu.edu/h-browse.php?dir=/pub/os2
- http://cd.textfiles.com/hobbesos29804/disk1/DEV32/
- http://cd.textfiles.com/hobbesos29804/disk1/EMX09C/
It is reported that the following archives contain enough utils to build perl: gnufutil.zip, gnusutil.zip, gnututil.zip, gnused.zip, gnupatch.zip, gnuawk.zip, gnumake.zip, gnugrep.zip, bsddev.zip and ksh527rt.zip (or a later version). Note that all these utilities are known to be available from LEO:
- ftp://crydee.sai.msu.ru/pub/comp/os/os2/leo/gnu/
Note also that the db.lib and db.a from the EMX distribution are not suitable for multi-threaded compile (even single-threaded flavor of Perl uses multi-threaded C RTL, for compatibility with XFree86-OS/2). Get a corrected one from
- http://www.ilyaz.org/software/os2/db_mt.zip
If you have exactly the same version of Perl installed already,
make sure that no copies or perl are currently running. Later steps
of the build may fail since an older version of perl.dll loaded into
memory may be found. Running make test
becomes meaningless, since
the test are checking a previous build of perl (this situation is detected
and reported by lib/os2_base.t test). Do not forget to unset
PERL_EMXLOAD_SEC
in environment.
Also make sure that you have /tmp directory on the current drive,
and . directory in your LIBPATH
. One may try to correct the
latter condition by
- set BEGINLIBPATH .\.
if you use something like CMD.EXE or latest versions of
4os2.exe. (Setting BEGINLIBPATH to just . is ignored by the
OS/2 kernel.)
Make sure your gcc is good for -Zomf
linking: run omflibs
script in /emx/lib directory.
Check that you have link386 installed. It comes standard with OS/2, but may be not installed due to customization. If typing
- link386
shows you do not have it, do Selective install, and choose Link
object modules
in Optional system utilities/More. If you get into
link386 prompts, press Ctrl-C
to exit.
You need to fetch the latest perl source (including developers releases). With some probability it is located in
- http://www.cpan.org/src/
- http://www.cpan.org/src/unsupported
If not, you may need to dig in the indices to find it in the directory of the current maintainer.
Quick cycle of developers release may break the OS/2 build time to time, looking into
- http://www.cpan.org/ports/os2/
may indicate the latest release which was publicly released by the maintainer. Note that the release may include some additional patches to apply to the current source of perl.
Extract it like this
- tar vzxf perl5.00409.tar.gz
You may see a message about errors while extracting Configure. This is because there is a conflict with a similarly-named file configure.
Change to the directory of extraction.
You need to apply the patches in ./os2/diff.* like this:
- gnupatch -p0 < os2\diff.configure
You may also need to apply the patches supplied with the binary
distribution of perl. It also makes sense to look on the
perl5-porters mailing list for the latest OS/2-related patches (see
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/). Such
patches usually contain strings /os2/
and patch
, so it makes
sense looking for these strings.
You may look into the file ./hints/os2.sh and correct anything wrong you find there. I do not expect it is needed anywhere.
- sh Configure -des -D prefix=f:/perllib
prefix
means: where to install the resulting perl library. Giving
correct prefix you may avoid the need to specify PERLLIB_PREFIX
,
see PERLLIB_PREFIX.
Ignore the message about missing ln
, and about -c
option to
tr. The latter is most probably already fixed, if you see it and can trace
where the latter spurious warning comes from, please inform me.
Now
- make
At some moment the built may die, reporting a version mismatch or unable to run perl. This means that you do not have . in your LIBPATH, so perl.exe cannot find the needed perl67B2.dll (treat these hex digits as line noise). After this is fixed the build should finish without a lot of fuss.
Now run
- make test
All tests should succeed (with some of them skipped). If you have the
same version of Perl installed, it is crucial that you have . early
in your LIBPATH (or in BEGINLIBPATH), otherwise your tests will most
probably test the wrong version of Perl.
Some tests may generate extra messages similar to
bad free
in database tests related to Berkeley DB. This should be fixed already. If it persists, you may disable this warnings, see PERL_BADFREE.
This is a standard message issued by OS/2 applications. *nix applications die in silence. It is considered to be a feature. One can easily disable this by appropriate sighandlers.
However the test engine bleeds these message to screen in unexpected moments. Two messages of this kind should be present during testing.
To get finer test reports, call
- perl t/harness
The report with io/pipe.t failing may look like this:
- Failed Test Status Wstat Total Fail Failed List of failed
- ------------------------------------------------------------
- io/pipe.t 12 1 8.33% 9
- 7 tests skipped, plus 56 subtests skipped.
- Failed 1/195 test scripts, 99.49% okay. 1/6542 subtests failed, 99.98% okay.
The reasons for most important skipped tests are:
Checks atime
and mtime
of stat() - unfortunately, HPFS
provides only 2sec time granularity (for compatibility with FAT?).
Checks truncate() on a filehandle just opened for write - I do not
know why this should or should not work.
Checks stat(). Tests:
Checks atime
and mtime
of stat() - unfortunately, HPFS
provides only 2sec time granularity (for compatibility with FAT?).
If you haven't yet moved perl*.dll onto LIBPATH, do it now.
Run
- make install
It would put the generated files into needed locations. Manually put perl.exe, perl__.exe and perl___.exe to a location on your PATH, perl.dll to a location on your LIBPATH.
Run
- make installcmd INSTALLCMDDIR=d:/ir/on/path
to convert perl utilities to .cmd files and put them on
PATH. You need to put .EXE-utilities on path manually. They are
installed in $prefix/bin
, here $prefix
is what you gave to
Configure, see Making.
If you use man
, either move the installed */man/ directories to
your MANPATH
, or modify MANPATH
to match the location. (One
could have avoided this by providing a correct manpath
option to
./Configure, or editing ./config.sh between configuring and
making steps.)
a.out
-style buildProceed as above, but make perl_.exe (see perl_.exe) by
- make perl_
test and install by
- make aout_test
- make aout_install
Manually put perl_.exe to a location on your PATH.
Note. The build process for perl_
does not know about all the
dependencies, so you should make sure that anything is up-to-date,
say, by doing
- make perl_dll
first.
[This section provides a short overview only...]
Building should proceed differently depending on whether the version of perl you install is already present and used on your system, or is a new version not yet used. The description below assumes that the version is new, so installing its DLLs and .pm files will not disrupt the operation of your system even if some intermediate steps are not yet fully working.
The other cases require a little bit more convoluted procedures. Below I
suppose that the current version of Perl is 5.8.2
, so the executables are
named accordingly.
Fully build and test the Perl distribution. Make sure that no tests are
failing with test
and aout_test
targets; fix the bugs in Perl and
the Perl test suite detected by these tests. Make sure that all_test
make target runs as clean as possible. Check that os2/perlrexx.cmd
runs fine.
Fully install Perl, including installcmd
target. Copy the generated DLLs
to LIBPATH
; copy the numbered Perl executables (as in perl5.8.2.exe)
to PATH
; copy perl_.exe
to PATH
as perl_5.8.2.exe
. Think whether
you need backward-compatibility DLLs. In most cases you do not need to install
them yet; but sometime this may simplify the following steps.
Make sure that CPAN.pm
can download files from CPAN. If not, you may need
to manually install Net::FTP
.
Install the bundle Bundle::OS2_default
- perl5.8.2 -MCPAN -e "install Bundle::OS2_default" < nul |& tee 00cpan_i_1
This may take a couple of hours on 1GHz processor (when run the first time). And this should not be necessarily a smooth procedure. Some modules may not specify required dependencies, so one may need to repeat this procedure several times until the results stabilize.
- perl5.8.2 -MCPAN -e "install Bundle::OS2_default" < nul |& tee 00cpan_i_2
- perl5.8.2 -MCPAN -e "install Bundle::OS2_default" < nul |& tee 00cpan_i_3
Even after they stabilize, some tests may fail.
Fix as many discovered bugs as possible. Document all the bugs which are not fixed, and all the failures with unknown reasons. Inspect the produced logs 00cpan_i_1 to find suspiciously skipped tests, and other fishy events.
Keep in mind that installation of some modules may fail too: for example,
the DLLs to update may be already loaded by CPAN.pm. Inspect the install
logs (in the example above 00cpan_i_1 etc) for errors, and install things
manually, as in
- cd $CPANHOME/.cpan/build/Digest-MD5-2.31
- make install
Some distributions may fail some tests, but you may want to install them
anyway (as above, or via force install
command of CPAN.pm
shell-mode).
Since this procedure may take quite a long time to complete, it makes sense
to "freeze" your CPAN configuration by disabling periodic updates of the
local copy of CPAN index: set index_expire
to some big value (I use 365),
then save the settings
- CPAN> o conf index_expire 365
- CPAN> o conf commit
Reset back to the default value 1
when you are finished.
When satisfied with the results, rerun the installcmd
target. Now you
can copy perl5.8.2.exe
to perl.exe
, and install the other OMF-build
executables: perl__.exe
etc. They are ready to be used.
Change to the ./pod directory of the build tree, download the Perl logo
CamelGrayBig.BMP, and run
- ( perl2ipf > perl.ipf ) |& tee 00ipf
- ipfc /INF perl.ipf |& tee 00inf
This produces the Perl docs online book perl.INF
. Install in on
BOOKSHELF
path.
Now is the time to build statically linked executable perl_.exe which
includes newly-installed via Bundle::OS2_default
modules. Doing testing
via CPAN.pm
is going to be painfully slow, since it statically links
a new executable per XS extension.
Here is a possible workaround: create a toplevel Makefile.PL in $CPANHOME/.cpan/build/ with contents being (compare with Making executables with a custom collection of statically loaded extensions)
- use ExtUtils::MakeMaker;
- WriteMakefile NAME => 'dummy';
execute this as
- perl_5.8.2.exe Makefile.PL <nul |& tee 00aout_c1
- make -k all test <nul |& 00aout_t1
Again, this procedure should not be absolutely smooth. Some Makefile.PL
's
in subdirectories may be buggy, and would not run as "child" scripts. The
interdependency of modules can strike you; however, since non-XS modules
are already installed, the prerequisites of most modules have a very good
chance to be present.
If you discover some glitches, move directories of problematic modules to a different location; if these modules are non-XS modules, you may just ignore them - they are already installed; the remaining, XS, modules you need to install manually one by one.
After each such removal you need to rerun the Makefile.PL
/make
process;
usually this procedure converges soon. (But be sure to convert all the
necessary external C libraries from .lib format to .a format: run one of
- emxaout foo.lib
- emximp -o foo.a foo.lib
whichever is appropriate.) Also, make sure that the DLLs for external
libraries are usable with with executables compiled without -Zmtd
options.
When you are sure that only a few subdirectories
lead to failures, you may want to add -j4
option to make
to speed up
skipping subdirectories with already finished build.
When you are satisfied with the results of tests, install the build C libraries for extensions:
- make install |& tee 00aout_i
Now you can rename the file ./perl.exe generated during the last phase
to perl_5.8.2.exe; place it on PATH
; if there is an inter-dependency
between some XS modules, you may need to repeat the test
/install
loop
with this new executable and some excluded modules - until the procedure
converges.
Now you have all the necessary .a libraries for these Perl modules in the places where Perl builder can find it. Use the perl builder: change to an empty directory, create a "dummy" Makefile.PL again, and run
- perl_5.8.2.exe Makefile.PL |& tee 00c
- make perl |& tee 00p
This should create an executable ./perl.exe with all the statically loaded
extensions built in. Compare the generated perlmain.c files to make sure
that during the iterations the number of loaded extensions only increases.
Rename ./perl.exe to perl_5.8.2.exe on PATH
.
When it converges, you got a functional variant of perl_5.8.2.exe; copy it
to perl_.exe
. You are done with generation of the local Perl installation.
Make sure that the installed modules are actually installed in the location
of the new Perl, and are not inherited from entries of @INC given for
inheritance from the older versions of Perl: set PERLLIB_582_PREFIX
to
redirect the new version of Perl to a new location, and copy the installed
files to this new location. Redo the tests to make sure that the versions of
modules inherited from older versions of Perl are not needed.
Actually, the log output of pod2ipf(1) during the step 6 gives a very detailed info about which modules are loaded from which place; so you may use it as an additional verification tool.
Check that some temporary files did not make into the perl install tree. Run something like this
- pfind . -f "!(/\.(pm|pl|ix|al|h|a|lib|txt|pod|imp|bs|dll|ld|bs|inc|xbm|yml|cgi|uu|e2x|skip|packlist|eg|cfg|html|pub|enc|all|ini|po|pot)$/i or /^\w+$/") | less
in the install tree (both top one and sitelib one).
Compress all the DLLs with lxlite. The tiny .exe can be compressed with
/c:max (the bug only appears when there is a fixup in the last 6 bytes of a
page (?); since the tiny executables are much smaller than a page, the bug
will not hit). Do not compress perl_.exe
- it would not work under DOS.
Now you can generate the binary distribution. This is done by running the
test of the CPAN distribution OS2::SoftInstaller
. Tune up the file
test.pl to suit the layout of current version of Perl first. Do not
forget to pack the necessary external DLLs accordingly. Include the
description of the bugs and test suite failures you could not fix. Include
the small-stack versions of Perl executables from Perl build directory.
Include perl5.def so that people can relink the perl DLL preserving
the binary compatibility, or can create compatibility DLLs. Include the diff
files (diff -pu old new
) of fixes you did so that people can rebuild your
version. Include perl5.map so that one can use remote debugging.
Share what you did with the other people. Relax. Enjoy fruits of your work.
Brace yourself for thanks, bug reports, hate mail and spam coming as result of the previous step. No good deed should remain unpunished!
The Perl executables can be easily rebuilt at any moment. Moreover, one can use the embedding interface (see perlembed) to make very customized executables.
It is a little bit easier to do so while decreasing the list of statically loaded extensions. We discuss this case only here.
Change to an empty directory, and create a placeholder <Makefile.PL>:
- use ExtUtils::MakeMaker;
- WriteMakefile NAME => 'dummy';
Run it with the flavor of Perl (perl.exe or perl_.exe) you want to rebuild.
- perl_ Makefile.PL
Ask it to create new Perl executable:
- make perl
(you may need to manually add PERLTYPE=-DPERL_CORE
to this commandline on
some versions of Perl; the symptom is that the command-line globbing does not
work from OS/2 shells with the newly-compiled executable; check with
- .\perl.exe -wle "print for @ARGV" *
).
The previous step created perlmain.c which contains a list of newXS() calls near the end. Removing unnecessary calls, and rerunning
- make perl
will produce a customized executable.
The default perl executable is flexible enough to support most usages. However, one may want something yet more flexible; for example, one may want to find Perl DLL relatively to the location of the EXE file; or one may want to ignore the environment when setting the Perl-library search patch, etc.
If you fill comfortable with embedding interface (see perlembed), such things are easy to do repeating the steps outlined in Making executables with a custom collection of statically loaded extensions, and doing more comprehensive edits to main() of perlmain.c. The people with little desire to understand Perl can just rename main(), and do necessary modification in a custom main() which calls the renamed function in appropriate time.
However, there is a third way: perl DLL exports the main() function and several callbacks to customize the search path. Below is a complete example of a "Perl loader" which
Looks for Perl DLL in the directory $exedir/../dll;
Prepends the above directory to BEGINLIBPATH
;
Fails if the Perl DLL found via BEGINLIBPATH
is different from what was
loaded on step 1; e.g., another process could have loaded it from LIBPATH
or from a different value of BEGINLIBPATH
. In these cases one needs to
modify the setting of the system so that this other process either does not
run, or loads the DLL from BEGINLIBPATH
with LIBPATHSTRICT=T
(available
with kernels after September 2000).
Loads Perl library from $exedir/../dll/lib/.
Uses Bourne shell from $exedir/../dll/sh/ksh.exe.
For best results compile the C file below with the same options as the Perl DLL. However, a lot of functionality will work even if the executable is not an EMX applications, e.g., if compiled with
- gcc -Wall -DDOSISH -DOS2=1 -O2 -s -Zomf -Zsys perl-starter.c -DPERL_DLL_BASENAME=\"perl312F\" -Zstack 8192 -Zlinker /PM:VIO
Here is the sample C file:
- #define INCL_DOS
- #define INCL_NOPM
- /* These are needed for compile if os2.h includes os2tk.h, not os2emx.h */
- #define INCL_DOSPROCESS
- #include <os2.h>
- #include "EXTERN.h"
- #define PERL_IN_MINIPERLMAIN_C
- #include "perl.h"
- static char *me;
- HMODULE handle;
- static void
- die_with(char *msg1, char *msg2, char *msg3, char *msg4)
- {
- ULONG c;
- char *s = " error: ";
- DosWrite(2, me, strlen(me), &c);
- DosWrite(2, s, strlen(s), &c);
- DosWrite(2, msg1, strlen(msg1), &c);
- DosWrite(2, msg2, strlen(msg2), &c);
- DosWrite(2, msg3, strlen(msg3), &c);
- DosWrite(2, msg4, strlen(msg4), &c);
- DosWrite(2, "\r\n", 2, &c);
- exit(255);
- }
- typedef ULONG (*fill_extLibpath_t)(int type, char *pre, char *post, int replace, char *msg);
- typedef int (*main_t)(int type, char *argv[], char *env[]);
- typedef int (*handler_t)(void* data, int which);
- #ifndef PERL_DLL_BASENAME
- # define PERL_DLL_BASENAME "perl"
- #endif
- static HMODULE
- load_perl_dll(char *basename)
- {
- char buf[300], fail[260];
- STRLEN l, dirl;
- fill_extLibpath_t f;
- ULONG rc_fullname;
- HMODULE handle, handle1;
- if (_execname(buf, sizeof(buf) - 13) != 0)
- die_with("Can't find full path: ", strerror(errno), "", "");
- /* XXXX Fill 'me' with new value */
- l = strlen(buf);
- while (l && buf[l-1] != '/' && buf[l-1] != '\\')
- l--;
- dirl = l - 1;
- strcpy(buf + l, basename);
- l += strlen(basename);
- strcpy(buf + l, ".dll");
- if ( (rc_fullname = DosLoadModule(fail, sizeof fail, buf, &handle)) != 0
- && DosLoadModule(fail, sizeof fail, basename, &handle) != 0 )
- die_with("Can't load DLL ", buf, "", "");
- if (rc_fullname)
- return handle; /* was loaded with short name; all is fine */
- if (DosQueryProcAddr(handle, 0, "fill_extLibpath", (PFN*)&f))
- die_with(buf, ": DLL exports no symbol ", "fill_extLibpath", "");
- buf[dirl] = 0;
- if (f(0 /*BEGINLIBPATH*/, buf /* prepend */, NULL /* append */,
- 0 /* keep old value */, me))
- die_with(me, ": prepending BEGINLIBPATH", "", "");
- if (DosLoadModule(fail, sizeof fail, basename, &handle1) != 0)
- die_with(me, ": finding perl DLL again via BEGINLIBPATH", "", "");
- buf[dirl] = '\\';
- if (handle1 != handle) {
- if (DosQueryModuleName(handle1, sizeof(fail), fail))
- strcpy(fail, "???");
- die_with(buf, ":\n\tperl DLL via BEGINLIBPATH is different: \n\t",
- fail,
- "\n\tYou may need to manipulate global BEGINLIBPATH and LIBPATHSTRICT"
- "\n\tso that the other copy is loaded via BEGINLIBPATH.");
- }
- return handle;
- }
- int
- main(int argc, char **argv, char **env)
- {
- main_t f;
- handler_t h;
- me = argv[0];
- /**/
- handle = load_perl_dll(PERL_DLL_BASENAME);
- if (DosQueryProcAddr(handle, 0, "Perl_OS2_handler_install", (PFN*)&h))
- die_with(PERL_DLL_BASENAME, ": DLL exports no symbol ", "Perl_OS2_handler_install", "");
- if ( !h((void *)"~installprefix", Perlos2_handler_perllib_from)
- || !h((void *)"~dll", Perlos2_handler_perllib_to)
- || !h((void *)"~dll/sh/ksh.exe", Perlos2_handler_perl_sh) )
- die_with(PERL_DLL_BASENAME, ": Can't install @INC manglers", "", "");
- if (DosQueryProcAddr(handle, 0, "dll_perlmain", (PFN*)&f))
- die_with(PERL_DLL_BASENAME, ": DLL exports no symbol ", "dll_perlmain", "");
- return f(argc, argv, env);
- }
/ became \
in pdksh.You have a very old pdksh. See Prerequisites.
'errno'
- unresolved externalYou do not have MT-safe db.lib. See Prerequisites.
reported with very old version of tr and sed.
You have an older version of perl.dll on your LIBPATH, which broke the build of extensions.
You did not run omflibs
. See Prerequisites.
You use an old version of GNU make. See Prerequisites.
This can result from a bug in emx sprintf which was fixed in 0.9d fix 03.
setpriority, getpriorityNote that these functions are compatible with *nix, not with the older ports of '94 - 95. The priorities are absolute, go from 32 to -95, lower is quicker. 0 is the default priority.
WARNING. Calling getpriority on a non-existing process could lock
the system before Warp3 fixpak22. Starting with Warp3, Perl will use
a workaround: it aborts getpriority() if the process is not present.
This is not possible on older versions 2.*
, and has a race
condition anyway.
system()Multi-argument form of system() allows an additional numeric
argument. The meaning of this argument is described in
OS2::Process.
When finding a program to run, Perl first asks the OS to look for executables
on PATH
(OS/2 adds extension .exe if no extension is present).
If not found, it looks for a script with possible extensions
added in this order: no extension, .cmd, .btm,
.bat, .pl. If found, Perl checks the start of the file for magic
strings "#!"
and "extproc "
. If found, Perl uses the rest of the
first line as the beginning of the command line to run this script. The
only mangling done to the first line is extraction of arguments (currently
up to 3), and ignoring of the path-part of the "interpreter" name if it can't
be found using the full path.
E.g., system 'foo', 'bar', 'baz'
may lead Perl to finding
C:/emx/bin/foo.cmd with the first line being
- extproc /bin/bash -x -c
If /bin/bash.exe is not found, then Perl looks for an executable bash.exe on
PATH
. If found in C:/emx.add/bin/bash.exe, then the above system() is
translated to
- system qw(C:/emx.add/bin/bash.exe -x -c C:/emx/bin/foo.cmd bar baz)
One additional translation is performed: instead of /bin/sh Perl uses
the hardwired-or-customized shell (see PERL_SH_DIR).
The above search for "interpreter" is recursive: if bash executable is not found, but bash.btm is found, Perl will investigate its first line etc. The only hardwired limit on the recursion depth is implicit: there is a limit 4 on the number of additional arguments inserted before the actual arguments given to system(). In particular, if no additional arguments are specified on the "magic" first lines, then the limit on the depth is 4.
If Perl finds that the found executable is of PM type when the
current session is not, it will start the new process in a separate session of
necessary type. Call via OS2::Process
to disable this magic.
WARNING. Due to the described logic, you need to explicitly specify .com extension if needed. Moreover, if the executable perl5.6.1 is requested, Perl will not look for perl5.6.1.exe. [This may change in the future.]
extproc
on the first lineIf the first chars of a Perl script are "extproc "
, this line is treated
as #!
-line, thus all the switches on this line are processed (twice
if script was started via cmd.exe). See DESCRIPTION in perlrun.
OS2::Process, OS2::DLL, OS2::REXX, OS2::PrfDB, OS2::ExtAttr. These
modules provide access to additional numeric argument for system
and to the information about the running process,
to DLLs having functions with REXX signature and to the REXX runtime, to
OS/2 databases in the .INI format, and to Extended Attributes.
Two additional extensions by Andreas Kaiser, OS2::UPM
, and
OS2::FTP
, are included into ILYAZ
directory, mirrored on CPAN.
Other OS/2-related extensions are available too.
File::Copy::syscopy
used by File::Copy::copy
, see File::Copy.
DynaLoader::mod2fname
used by DynaLoader
for DLL name mangling.
Cwd::current_drive()
Self explanatory.
Cwd::sys_chdir(name)
leaves drive as it is.
Cwd::change_drive(name)
changes the "current" drive.
Cwd::sys_is_absolute(name)
means has drive letter and is_rooted.
Cwd::sys_is_rooted(name)
means has leading [/\\] (maybe after a drive-letter:).
Cwd::sys_is_relative(name)
means changes with current dir.
Cwd::sys_cwd(name)
Interface to cwd from EMX. Used by Cwd::cwd
.
Cwd::sys_abspath(name, dir)
Really really odious function to implement. Returns absolute name of
file which would have name
if CWD were dir
. Dir
defaults to the
current dir.
Cwd::extLibpath([type])
Get current value of extended library search path. If type
is
present and positive, works with END_LIBPATH
, if negative, works
with LIBPATHSTRICT
, otherwise with BEGIN_LIBPATH
.
Cwd::extLibpath_set( path [, type ] )
Set current value of extended library search path. If type
is
present and positive, works with <END_LIBPATH>, if negative, works
with LIBPATHSTRICT
, otherwise with BEGIN_LIBPATH
.
OS2::Error(do_harderror,do_exception)
Returns undef if it was not called yet, otherwise bit 1 is
set if on the previous call do_harderror was enabled, bit
2 is set if on previous call do_exception was enabled.
This function enables/disables error popups associated with hardware errors (Disk not ready etc.) and software exceptions.
I know of no way to find out the state of popups before the first call to this function.
OS2::Errors2Drive(drive)
Returns undef if it was not called yet, otherwise return false if errors
were not requested to be written to a hard drive, or the drive letter if
this was requested.
This function may redirect error popups associated with hardware errors (Disk not ready etc.) and software exceptions to the file POPUPLOG.OS2 at the root directory of the specified drive. Overrides OS2::Error() specified by individual programs. Given argument undef will disable redirection.
Has global effect, persists after the application exits.
I know of no way to find out the state of redirection of popups to the disk before the first call to this function.
Returns a hash with system information. The keys of the hash are
- MAX_PATH_LENGTH, MAX_TEXT_SESSIONS, MAX_PM_SESSIONS,
- MAX_VDM_SESSIONS, BOOT_DRIVE, DYN_PRI_VARIATION,
- MAX_WAIT, MIN_SLICE, MAX_SLICE, PAGE_SIZE,
- VERSION_MAJOR, VERSION_MINOR, VERSION_REVISION,
- MS_COUNT, TIME_LOW, TIME_HIGH, TOTPHYSMEM, TOTRESMEM,
- TOTAVAILMEM, MAXPRMEM, MAXSHMEM, TIMER_INTERVAL,
- MAX_COMP_LENGTH, FOREGROUND_FS_SESSION,
- FOREGROUND_PROCESS
Returns a letter without colon.
OS2::MorphPM(serve)
, OS2::UnMorphPM(serve)
Transforms the current application into a PM application and back. The argument true means that a real message loop is going to be served. OS2::MorphPM() returns the PM message queue handle as an integer.
See Centralized management of resources for additional details.
OS2::Serve_Messages(force)
Fake on-demand retrieval of outstanding PM messages. If force
is false,
will not dispatch messages if a real message loop is known to
be present. Returns number of messages retrieved.
Dies with "QUITing..." if WM_QUIT message is obtained.
OS2::Process_Messages(force [, cnt])
Retrieval of PM messages until window creation/destruction.
If force
is false, will not dispatch messages if a real message loop
is known to be present.
Returns change in number of windows. If cnt
is given,
it is incremented by the number of messages retrieved.
Dies with "QUITing..." if WM_QUIT message is obtained.
OS2::_control87(new,mask)
the same as _control87(3) of EMX. Takes integers as arguments, returns
the previous coprocessor control word as an integer. Only bits in new
which
are present in mask
are changed in the control word.
gets the coprocessor control word as an integer.
OS2::set_control87_em(new=MCW_EM,mask=MCW_EM)
The variant of OS2::_control87() with default values good for
handling exception mask: if no mask
, uses exception mask part of new
only. If no new
, disables all the floating point exceptions.
See Misfeatures for details.
OS2::DLLname([how [, \&xsub]])
Gives the information about the Perl DLL or the DLL containing the C
function bound to by &xsub
. The meaning of how
is: default (2):
full name; 0: handle; 1: module name.
(Note that some of these may be moved to different libraries - eventually).
numeric value is the same as _emx_rev of EMX, a string value the same
as _emx_vprt (similar to 0.9c).
same as _emx_env of EMX, a number similar to 0x8001.
a number OS_MAJOR + 0.001 * OS_MINOR
.
true if the Perl library was compiled in AOUT format.
true if the current executable is an AOUT EMX executable, so Perl can fork. Do not use this, use the portable check for $Config::Config{dfork}.
This variable (default is 1) controls whether to enforce the contents
of $^E to start with SYS0003
-like id. If set to 0, then the string
value of $^E is what is available from the OS/2 message file. (Some
messages in this file have an SYS0003
-like id prepended, some not.)
Since flock(3) is present in EMX, but is not functional, it is
emulated by perl. To disable the emulations, set environment variable
USE_PERL_FLOCK=0
.
Here is the list of things which may be "broken" on EMX (from EMX docs):
The functions recvmsg(3), sendmsg(3), and socketpair(3) are not implemented.
sock_init(3) is not required and not implemented.
flock(3) is not yet implemented (dummy function). (Perl has a workaround.)
kill(3): Special treatment of PID=0, PID=1 and PID=-1 is not implemented.
waitpid(3):
- WUNTRACED
- Not implemented.
- waitpid() is not implemented for negative values of PID.
Note that kill -9
does not work with the current version of EMX.
Unix-domain sockets on OS/2 live in a pseudo-file-system /sockets/...
.
To avoid a failure to create a socket with a name of a different form,
"/socket/"
is prepended to the socket name (unless it starts with this
already).
This may lead to problems later in case the socket is accessed via the "usual" file-system calls using the "initial" name.
Apparently, IBM used a compiler (for some period of time around '95?) which changes FP mask right and left. This is not that bad for IBM's programs, but the same compiler was used for DLLs which are used with general-purpose applications. When these DLLs are used, the state of floating-point flags in the application is not predictable.
What is much worse, some DLLs change the floating point flags when in _DLLInitTerm() (e.g., TCP32IP). This means that even if you do not call any function in the DLL, just the act of loading this DLL will reset your flags. What is worse, the same compiler was used to compile some HOOK DLLs. Given that HOOK dlls are executed in the context of all the applications in the system, this means a complete unpredictability of floating point flags on systems using such HOOK DLLs. E.g., GAMESRVR.DLL of DIVE origin changes the floating point flags on each write to the TTY of a VIO (windowed text-mode) applications.
Some other (not completely debugged) situations when FP flags change include some video drivers (?), and some operations related to creation of the windows. People who code OpenGL may have more experience on this.
Perl is generally used in the situation when all the floating-point
exceptions are ignored, as is the default under EMX. If they are not ignored,
some benign Perl programs would get a SIGFPE
and would die a horrible death.
To circumvent this, Perl uses two hacks. They help against one type of damage only: FP flags changed when loading a DLL.
One of the hacks is to disable floating point exceptions on Perl startup (as is the default with EMX). This helps only with compile-time-linked DLLs changing the flags before main() had a chance to be called.
The other hack is to restore FP flags after a call to dlopen(). This helps against similar damage done by DLLs _DLLInitTerm() at runtime. Currently no way to switch these hacks off is provided.
Perl modifies some standard C library calls in the following ways:
popen
my_popen
uses sh.exe if shell is required, cf. PERL_SH_DIR.
tmpnam
is created using TMP
or TEMP
environment variable, via
tempnam
.
tmpfile
If the current directory is not writable, file is created using modified
tmpnam
, so there may be a race condition.
ctermid
a dummy implementation.
stat
os2_stat
special-cases /dev/tty and /dev/con.
mkdir, rmdir
these EMX functions do not work if the path contains a trailing /.
Perl contains a workaround for this.
flock
Since flock(3) is present in EMX, but is not functional, it is
emulated by perl. To disable the emulations, set environment variable
USE_PERL_FLOCK=0
.
All the DLLs built with the current versions of Perl have ID strings
identifying the name of the extension, its version, and the version
of Perl required for this DLL. Run bldlevel DLL-name
to find this
info.
Since to call certain OS/2 API one needs to have a correctly initialized
Win
subsystem, OS/2-specific extensions may require getting HAB
s and
HMQ
s. If an extension would do it on its own, another extension could
fail to initialize.
Perl provides a centralized management of these resources:
HAB
To get the HAB, the extension should call hab = perl_hab_GET()
in C. After
this call is performed, hab
may be accessed as Perl_hab
. There is
no need to release the HAB after it is used.
If by some reasons perl.h cannot be included, use
- extern int Perl_hab_GET(void);
instead.
HMQ
There are two cases:
the extension needs an HMQ
only because some API will not work otherwise.
Use serve = 0
below.
the extension needs an HMQ
since it wants to engage in a PM event loop.
Use serve = 1
below.
To get an HMQ
, the extension should call hmq = perl_hmq_GET(serve)
in C.
After this call is performed, hmq
may be accessed as Perl_hmq
.
To signal to Perl that HMQ is not needed any more, call
perl_hmq_UNSET(serve)
. Perl process will automatically morph/unmorph itself
into/from a PM process if HMQ is needed/not-needed. Perl will automatically
enable/disable WM_QUIT
message during shutdown if the message queue is
served/not-served.
NOTE. If during a shutdown there is a message queue which did not disable
WM_QUIT, and which did not process the received WM_QUIT message, the
shutdown will be automatically cancelled. Do not call perl_hmq_GET(1)
unless you are going to process messages on an orderly basis.
There are two principal conventions (it is useful to call them Dos*
and Win*
- though this part of the function signature is not always
determined by the name of the API) of reporting the error conditions
of OS/2 API. Most of Dos*
APIs report the error code as the result
of the call (so 0 means success, and there are many types of errors).
Most of Win*
API report success/fail via the result being
TRUE
/FALSE
; to find the reason for the failure one should call
WinGetLastError() API.
Some Win*
entry points also overload a "meaningful" return value
with the error indicator; having a 0 return value indicates an error.
Yet some other Win*
entry points overload things even more, and 0
return value may mean a successful call returning a valid value 0, as
well as an error condition; in the case of a 0 return value one should
call WinGetLastError() API to distinguish a successful call from a
failing one.
By convention, all the calls to OS/2 API should indicate their failures by resetting $^E. All the Perl-accessible functions which call OS/2 API may be broken into two classes: some die()s when an API error is encountered, the other report the error via a false return value (of course, this does not concern Perl-accessible functions which expect a failure of the OS/2 API call, having some workarounds coded).
Obviously, in the situation of the last type of the signature of an OS/2 API, it is must more convenient for the users if the failure is indicated by die()ing: one does not need to check $^E to know that something went wrong. If, however, this solution is not desirable by some reason, the code in question should reset $^E to 0 before making this OS/2 API call, so that the caller of this Perl-accessible function has a chance to distinguish a success-but-0-return value from a failure. (One may return undef as an alternative way of reporting an error.)
The macros to simplify this type of error propagation are
CheckOSError(expr)
Returns true on error, sets $^E. Expects expr() be a call of
Dos*
-style API.
CheckWinError(expr)
Returns true on error, sets $^E. Expects expr() be a call of
Win*
-style API.
SaveWinError(expr)
Returns expr
, sets $^E from WinGetLastError() if expr
is false.
SaveCroakWinError(expr,die,name1,name2)
Returns expr
, sets $^E from WinGetLastError() if expr
is false,
and die()s if die and $^E are true. The message to die is the
concatenated strings name1
and name2
, separated by ": "
from
the contents of $^E.
WinError_2_Perl_rc
Sets Perl_rc
to the return value of WinGetLastError().
FillWinError
Sets Perl_rc
to the return value of WinGetLastError(), and sets $^E
to the corresponding value.
FillOSError(rc)
Sets Perl_rc
to rc
, and sets $^E to the corresponding value.
Some DLLs are only present in some versions of OS/2, or in some configurations of OS/2. Some exported entry points are present only in DLLs shipped with some versions of OS/2. If these DLLs and entry points were linked directly for a Perl executable/DLL or from a Perl extensions, this binary would work only with the specified versions/setups. Even if these entry points were not needed, the load of the executable (or DLL) would fail.
For example, many newer useful APIs are not present in OS/2 v2; many PM-related APIs require DLLs not available on floppy-boot setup.
To make these calls fail only when the calls are executed, one
should call these API via a dynamic linking API. There is a subsystem
in Perl to simplify such type of calls. A large number of entry
points available for such linking is provided (see entries_ordinals
- and also PMWIN_entries
- in os2ish.h). These ordinals can be
accessed via the APIs:
- CallORD(), DeclFuncByORD(), DeclVoidFuncByORD(),
- DeclOSFuncByORD(), DeclWinFuncByORD(), AssignFuncPByORD(),
- DeclWinFuncByORD_CACHE(), DeclWinFuncByORD_CACHE_survive(),
- DeclWinFuncByORD_CACHE_resetError_survive(),
- DeclWinFunc_CACHE(), DeclWinFunc_CACHE_resetError(),
- DeclWinFunc_CACHE_survive(), DeclWinFunc_CACHE_resetError_survive()
See the header files and the C code in the supplied OS/2-related modules for the details on usage of these functions.
Some of these functions also combine dynaloading semantic with the error-propagation semantic discussed above.
Because of idiosyncrasies of OS/2 one cannot have all the eggs in the same basket (though EMX environment tries hard to overcome this limitations, so the situation may somehow improve). There are 4 executables for Perl provided by the distribution:
The main workhorse. This is a chimera executable: it is compiled as an
a.out
-style executable, but is linked with omf
-style dynamic
library perl.dll, and with dynamic CRT DLL. This executable is a
VIO application.
It can load perl dynamic extensions, and it can fork().
Note. Keep in mind that fork() is needed to open a pipe to yourself.
This is a statically linked a.out
-style executable. It cannot
load dynamic Perl extensions. The executable supplied in binary
distributions has a lot of extensions prebuilt, thus the above restriction is
important only if you use custom-built extensions. This executable is a VIO
application.
This is the only executable with does not require OS/2. The
friends locked into M$
world would appreciate the fact that this
executable runs under DOS, Win0.3*, Win0.95 and WinNT with an
appropriate extender. See Other OSes.
This is the same executable as perl___.exe, but it is a PM application.
Note. Usually (unless explicitly redirected during the startup)
STDIN, STDERR, and STDOUT of a PM
application are redirected to nul. However, it is possible to see
them if you start perl__.exe
from a PM program which emulates a
console window, like Shell mode of Emacs or EPM. Thus it is
possible to use Perl debugger (see perldebug) to debug your PM
application (but beware of the message loop lockups - this will not
work if you have a message queue to serve, unless you hook the serving
into the getc() function of the debugger).
Another way to see the output of a PM program is to run it as
- pm_prog args 2>&1 | cat -
with a shell different from cmd.exe, so that it does not create
a link between a VIO session and the session of pm_porg
. (Such a link
closes the VIO window.) E.g., this works with sh.exe - or with Perl!
The flavor perl__.exe is required if you want to start your program without
a VIO window present, but not detach
ed (run help detach
for more info).
Very useful for extensions which use PM, like Perl/Tk
or OpenGL
.
Note also that the differences between PM and VIO executables are only
in the default behaviour. One can start any executable in
any kind of session by using the arguments /fs, /pm or
/win switches of the command start
(of CMD.EXE or a similar
shell). Alternatively, one can use the numeric first argument of the
system Perl function (see OS2::Process).
This is an omf
-style executable which is dynamically linked to
perl.dll and CRT DLL. I know no advantages of this executable
over perl.exe
, but it cannot fork() at all. Well, one advantage is
that the build process is not so convoluted as with perl.exe
.
It is a VIO application.
Since Perl processes the #!
-line (cf.
DESCRIPTION in perlrun, Command Switches in perlrun,
No Perl script found in input in perldiag), it should know when a
program is a Perl. There is some naming convention which allows
Perl to distinguish correct lines from wrong ones. The above names are
almost the only names allowed by this convention which do not contain
digits (which have absolutely different semantics).
Well, having several executables dynamically linked to the same huge library has its advantages, but this would not substantiate the additional work to make it compile. The reason is the complicated-to-developers but very quick and convenient-to-users "hard" dynamic linking used by OS/2.
There are two distinctive features of the dyna-linking model of OS/2: first, all the references to external functions are resolved at the compile time; second, there is no runtime fixup of the DLLs after they are loaded into memory. The first feature is an enormous advantage over other models: it avoids conflicts when several DLLs used by an application export entries with the same name. In such cases "other" models of dyna-linking just choose between these two entry points using some random criterion - with predictable disasters as results. But it is the second feature which requires the build of perl.dll.
The address tables of DLLs are patched only once, when they are loaded. The addresses of the entry points into DLLs are guaranteed to be the same for all the programs which use the same DLL. This removes the runtime fixup - once DLL is loaded, its code is read-only.
While this allows some (significant?) performance advantages, this makes life much harder for developers, since the above scheme makes it impossible for a DLL to be "linked" to a symbol in the .EXE file. Indeed, this would need a DLL to have different relocations tables for the (different) executables which use this DLL.
However, a dynamically loaded Perl extension is forced to use some symbols from the perl executable, e.g., to know how to find the arguments to the functions: the arguments live on the perl internal evaluation stack. The solution is to put the main code of the interpreter into a DLL, and make the .EXE file which just loads this DLL into memory and supplies command-arguments. The extension DLL cannot link to symbols in .EXE, but it has no problem linking to symbols in the .DLL.
This greatly increases the load time for the application (as well as complexity of the compilation). Since interpreter is in a DLL, the C RTL is basically forced to reside in a DLL as well (otherwise extensions would not be able to use CRT). There are some advantages if you use different flavors of perl, such as running perl.exe and perl__.exe simultaneously: they share the memory of perl.dll.
NOTE. There is one additional effect which makes DLLs more wasteful: DLLs are loaded in the shared memory region, which is a scarse resource given the 512M barrier of the "standard" OS/2 virtual memory. The code of .EXE files is also shared by all the processes which use the particular .EXE, but they are "shared in the private address space of the process"; this is possible because the address at which different sections of the .EXE file are loaded is decided at compile-time, thus all the processes have these sections loaded at same addresses, and no fixup of internal links inside the .EXE is needed.
Since DLLs may be loaded at run time, to have the same mechanism for DLLs one needs to have the address range of any of the loaded DLLs in the system to be available in all the processes which did not load a particular DLL yet. This is why the DLLs are mapped to the shared memory region.
Current EMX environment does not allow DLLs compiled using Unixish
a.out
format to export symbols for data (or at least some types of
data). This forces omf
-style compile of perl.dll.
Current EMX environment does not allow .EXE files compiled in
omf
format to fork(). fork() is needed for exactly three Perl
operations:
explicit fork() in the script,
open FH, "|-"
open FH, "-|"
, in other words, opening pipes to itself.
While these operations are not questions of life and death, they are
needed for a lot of
useful scripts. This forces a.out
-style compile of
perl.exe.
Here we list environment variables with are either OS/2- and DOS- and Win*-specific, or are more important under OS/2 than under other OSes.
PERLLIB_PREFIX
Specific for EMX port. Should have the form
- path1;path2
or
- path1 path2
If the beginning of some prebuilt path matches path1, it is substituted with path2.
Should be used if the perl library is moved from the default
location in preference to PERL(5)LIB, since this would not leave wrong
entries in @INC. For example, if the compiled version of perl looks for @INC
in f:/perllib/lib, and you want to install the library in
h:/opt/gnu, do
- set PERLLIB_PREFIX=f:/perllib/lib;h:/opt/gnu
This will cause Perl with the prebuilt @INC of
- f:/perllib/lib/5.00553/os2
- f:/perllib/lib/5.00553
- f:/perllib/lib/site_perl/5.00553/os2
- f:/perllib/lib/site_perl/5.00553
- .
to use the following @INC:
- h:/opt/gnu/5.00553/os2
- h:/opt/gnu/5.00553
- h:/opt/gnu/site_perl/5.00553/os2
- h:/opt/gnu/site_perl/5.00553
- .
PERL_BADLANG
If 0, perl ignores setlocale() failing. May be useful with some strange locales.
PERL_BADFREE
If 0, perl would not warn of in case of unwarranted free(). With older perls this might be useful in conjunction with the module DB_File, which was buggy when dynamically linked and OMF-built.
Should not be set with newer Perls, since this may hide some real problems.
PERL_SH_DIR
Specific for EMX port. Gives the directory part of the location for sh.exe.
USE_PERL_FLOCK
Specific for EMX port. Since flock(3) is present in EMX, but is not
functional, it is emulated by perl. To disable the emulations, set
environment variable USE_PERL_FLOCK=0
.
TMP
or TEMP
Specific for EMX port. Used as storage place for temporary files.
Here we list major changes which could make you by surprise.
Starting from version 5.8, Perl uses a builtin translation layer for text-mode files. This replaces the efficient well-tested EMX layer by some code which should be best characterized as a "quick hack".
In addition to possible bugs and an inability to follow changes to the translation policy with off/on switches of TERMIO translation, this introduces a serious incompatible change: before sysread() on text-mode filehandles would go through the translation layer, now it would not.
setpriority and getpriority are not compatible with earlier
ports by Andreas Kaiser. See "setpriority, getpriority"
.
With the release 5.003_01 the dynamically loadable libraries should be rebuilt when a different version of Perl is compiled. In particular, DLLs (including perl.dll) are now created with the names which contain a checksum, thus allowing workaround for OS/2 scheme of caching DLLs.
It may be possible to code a simple workaround which would
find the old DLLs looking through the old @INC;
mangle the names according to the scheme of new perl and copy the DLLs to these names;
edit the internal LX
tables of DLL to reflect the change of the name
(probably not needed for Perl extension DLLs, since the internally coded names
are not used for "specific" DLLs, they used only for "global" DLLs).
edit the internal IMPORT
tables and change the name of the "old"
perl????.dll to the "new" perl????.dll.
In fact mangling of extension DLLs was done due to misunderstanding of the OS/2 dynaloading model. OS/2 (effectively) maintains two different tables of loaded DLL:
those loaded by the base name from LIBPATH
; including those
associated at link time;
loaded by the full name.
When resolving a request for a global DLL, the table of already-loaded specific DLLs is (effectively) ignored; moreover, specific DLLs are always loaded from the prescribed path.
There is/was a minor twist which makes this scheme fragile: what to do with DLLs loaded from
BEGINLIBPATH
and ENDLIBPATH
(which depend on the process)
LIBPATH
which effectively depends on the process (although LIBPATH
is the
same for all the processes).
Unless LIBPATHSTRICT
is set to T
(and the kernel is after
2000/09/01), such DLLs are considered to be global. When loading a
global DLL it is first looked in the table of already-loaded global
DLLs. Because of this the fact that one executable loaded a DLL from
BEGINLIBPATH
and ENDLIBPATH
, or . from LIBPATH
may affect
which DLL is loaded when another executable requests a DLL with
the same name. This is the reason for version-specific mangling of
the DLL name for perl DLL.
Since the Perl extension DLLs are always loaded with the full path,
there is no need to mangle their names in a version-specific ways:
their directory already reflects the corresponding version of perl,
and @INC takes into account binary compatibility with older version.
Starting from 5.6.2
the name mangling scheme is fixed to be the
same as for Perl 5.005_53 (same as in a popular binary release). Thus
new Perls will be able to resolve the names of old extension DLLs
if @INC allows finding their directories.
However, this still does not guarantee that these DLL may be loaded. The reason is the mangling of the name of the Perl DLL. And since the extension DLLs link with the Perl DLL, extension DLLs for older versions would load an older Perl DLL, and would most probably segfault (since the data in this DLL is not properly initialized).
There is a partial workaround (which can be made complete with newer
OS/2 kernels): create a forwarder DLL with the same name as the DLL of
the older version of Perl, which forwards the entry points to the
newer Perl's DLL. Make this DLL accessible on (say) the BEGINLIBPATH
of
the new Perl executable. When the new executable accesses old Perl's
extension DLLs, they would request the old Perl's DLL by name, get the
forwarder instead, so effectively will link with the currently running
(new) Perl DLL.
This may break in two ways:
Old perl executable is started when a new executable is running has loaded an extension compiled for the old executable (ouph!). In this case the old executable will get a forwarder DLL instead of the old perl DLL, so would link with the new perl DLL. While not directly fatal, it will behave the same as new executable. This beats the whole purpose of explicitly starting an old executable.
A new executable loads an extension compiled for the old executable when an old perl executable is running. In this case the extension will not pick up the forwarder - with fatal results.
With support for LIBPATHSTRICT
this may be circumvented - unless
one of DLLs is started from . from LIBPATH
(I do not know
whether LIBPATHSTRICT
affects this case).
REMARK. Unless newer kernels allow . in BEGINLIBPATH
(older
do not), this mess cannot be completely cleaned. (It turns out that
as of the beginning of 2002, . is not allowed, but .\. is - and
it has the same effect.)
REMARK. LIBPATHSTRICT
, BEGINLIBPATH
and ENDLIBPATH
are
not environment variables, although cmd.exe emulates them on SET
...
lines. From Perl they may be accessed by
Cwd::extLibpath and
Cwd::extLibpath_set.
Assume that the old DLL is named perlE0AC.dll (as is one for 5.005_53), and the new version is 5.6.1. Create a file perl5shim.def-leader with
- LIBRARY 'perlE0AC' INITINSTANCE TERMINSTANCE
- DESCRIPTION '@#perl5-porters@perl.org:5.006001#@ Perl module for 5.00553 -> Perl 5.6.1 forwarder'
- CODE LOADONCALL
- DATA LOADONCALL NONSHARED MULTIPLE
- EXPORTS
modifying the versions/names as needed. Run
- perl -wnle "next if 0../EXPORTS/; print qq( \"$1\") if /\"(\w+)\"/" perl5.def >lst
in the Perl build directory (to make the DLL smaller replace perl5.def with the definition file for the older version of Perl if present).
- cat perl5shim.def-leader lst >perl5shim.def
- gcc -Zomf -Zdll -o perlE0AC.dll perl5shim.def -s -llibperl
(ignore multiple warning L4085
).
As of release 5.003_01 perl is linked to multithreaded C RTL DLL. If perl itself is not compiled multithread-enabled, so will not be perl's malloc(). However, extensions may use multiple thread on their own risk.
This was needed to compile Perl/Tk
for XFree86-OS/2 out-of-the-box, and
link with DLLs for other useful libraries, which typically are compiled
with -Zmt -Zcrtdll
.
Due to a popular demand the perl external program calling has been changed wrt Andreas Kaiser's port. If perl needs to call an external program via shell, the f:/bin/sh.exe will be called, or whatever is the override, see PERL_SH_DIR.
Thus means that you need to get some copy of a sh.exe as well (I use one from pdksh). The path F:/bin above is set up automatically during the build to a correct value on the builder machine, but is overridable at runtime,
Reasons: a consensus on perl5-porters
was that perl should use
one non-overridable shell per platform. The obvious choices for OS/2
are cmd.exe and sh.exe. Having perl build itself would be impossible
with cmd.exe as a shell, thus I picked up sh.exe
. This assures almost
100% compatibility with the scripts coming from *nix. As an added benefit
this works as well under DOS if you use DOS-enabled port of pdksh
(see Prerequisites).
Disadvantages: currently sh.exe of pdksh calls external programs
via fork()/exec(), and there is no functioning exec() on
OS/2. exec() is emulated by EMX by an asynchronous call while the caller
waits for child completion (to pretend that the pid
did not change). This
means that 1 extra copy of sh.exe is made active via fork()/exec(),
which may lead to some resources taken from the system (even if we do
not count extra work needed for fork()ing).
Note that this a lesser issue now when we do not spawn sh.exe unless needed (metachars found).
One can always start cmd.exe explicitly via
- system 'cmd', '/c', 'mycmd', 'arg1', 'arg2', ...
If you need to use cmd.exe, and do not want to hand-edit thousands of your scripts, the long-term solution proposed on p5-p is to have a directive
- use OS2::Cmd;
which will override system(), exec(), ``
, and
open(,'...|'). With current perl you may override only system(),
readpipe() - the explicit version of ``
, and maybe exec(). The code
will substitute the one-argument call to system() by
CORE::system('cmd.exe', '/c', shift)
.
If you have some working code for OS2::Cmd
, please send it to me,
I will include it into distribution. I have no need for such a module, so
cannot test it.
For the details of the current situation with calling external programs, see 2 (and DOS) programs under Perl in Starting OS. Set us mention a couple of features:
External scripts may be called by their basename. Perl will try the same extensions as when processing -S command-line switch.
External scripts starting with #!
or extproc
will be executed directly,
without calling the shell, by calling the program specified on the rest of
the first line.
Perl uses its own malloc() under OS/2 - interpreters are usually malloc-bound for speed, but perl is not, since its malloc is lightning-fast. Perl-memory-usage-tuned benchmarks show that Perl's malloc is 5 times quicker than EMX one. I do not have convincing data about memory footprint, but a (pretty random) benchmark showed that Perl's one is 5% better.
Combination of perl's malloc() and rigid DLL name resolution creates
a special problem with library functions which expect their return value to
be free()d by system's free(). To facilitate extensions which need to call
such functions, system memory-allocation functions are still available with
the prefix emx_
added. (Currently only DLL perl has this, it should
propagate to perl_.exe shortly.)
One can build perl with thread support enabled by providing -D usethreads
option to Configure. Currently OS/2 support of threads is very
preliminary.
Most notable problems:
COND_WAIT
may have a race condition (but probably does not due to edge-triggered nature of OS/2 Event semaphores). (Needs a reimplementation (in terms of chaining waiting threads, with the linked list stored in per-thread structure?)?)
has a couple of static variables used in OS/2-specific functions. (Need to be moved to per-thread structure, or serialized?)
Note that these problems should not discourage experimenting, since they have a low probability of affecting small programs.
This description is not updated often (since 5.6.1?), see ./os2/Changes for more info.
Ilya Zakharevich, cpan@ilyaz.org
perl(1).
perlos390 - building and installing Perl for OS/390 and z/OS
This document will help you Configure, build, test and install Perl on OS/390 (aka z/OS) Unix System Services.
This is a fully ported Perl for OS/390 Version 2 Release 3, 5, 6, 7, 8, and 9. It may work on other versions or releases, but those are the ones we've tested it on.
You may need to carry out some system configuration tasks before running the Configure script for Perl.
The z/OS Unix Tools and Toys list may prove helpful and contains links to ports of much of the software helpful for building Perl. http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1toy.html
If using ftp remember to transfer the distribution in binary format.
Gunzip/gzip for OS/390 is discussed at:
- http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1ty1.html
to extract an ASCII tar archive on OS/390, try this:
- pax -o to=IBM-1047,from=ISO8859-1 -r < latest.tar
or
- zcat latest.tar.Z | pax -o to=IBM-1047,from=ISO8859-1 -r
If you get lots of errors of the form
- tar: FSUM7171 ...: cannot set uid/gid: EDC5139I Operation not permitted.
you didn't read the above and tried to use tar instead of pax, you'll first have to remove the (now corrupt) perl directory
- rm -rf perl-...
and then use pax.
Be sure that your yacc installation is in place including any necessary parser template files. If you have not already done so then be sure to:
- cp /samples/yyparse.c /etc
This may also be a good time to ensure that your /etc/protocol file and either your /etc/resolv.conf or /etc/hosts files are in place. The IBM document that described such USS system setup issues was SC28-1890-07 "OS/390 UNIX System Services Planning", in particular Chapter 6 on customizing the OE shell.
GNU make for OS/390, which is recommended for the build of perl (as well as building CPAN modules and extensions), is available from the Tools.
Some people have reported encountering "Out of memory!" errors while trying to build Perl using GNU make binaries. If you encounter such trouble then try to download the source code kit and build GNU make from source to eliminate any such trouble. You might also find GNU make (as well as Perl and Apache) in the red-piece/book "Open Source Software for OS/390 UNIX", SG24-5944-00 from IBM.
If instead of the recommended GNU make you would like to use the system supplied make program then be sure to install the default rules file properly via the shell command:
- cp /samples/startup.mk /etc
and be sure to also set the environment variable _C89_CCMODE=1 (exporting _C89_CCMODE=1 is also a good idea for users of GNU make).
You might also want to have GNU groff for OS/390 installed before running the "make install" step for Perl.
There is a syntax error in the /usr/include/sys/socket.h header file that IBM supplies with USS V2R7, V2R8, and possibly V2R9. The problem with the header file is that near the definition of the SO_REUSEPORT constant there is a spurious extra '/' character outside of a comment like so:
- #define SO_REUSEPORT 0x0200 /* allow local address & port
- reuse */ /
You could edit that header yourself to remove that last '/', or you might note that Language Environment (LE) APAR PQ39997 describes the problem and PTF's UQ46272 and UQ46271 are the (R8 at least) fixes and apply them. If left unattended that syntax error will turn up as an inability for Perl to build its "Socket" extension.
For successful testing you may need to turn on the sticky bit for your world readable /tmp directory if you have not already done so (see man chmod).
Once you've unpacked the distribution, run "sh Configure" (see INSTALL for a full discussion of the Configure options). There is a "hints" file for os390 that specifies the correct values for most things. Some things to watch out for include:
A message of the form:
- (I see you are using the Korn shell. Some ksh's blow up on Configure,
- mainly on older exotic systems. If yours does, try the Bourne shell instead.)
is nothing to worry about at all.
Some of the parser default template files in /samples are needed in /etc. In particular be sure that you at least copy /samples/yyparse.c to /etc before running Perl's Configure. This step ensures successful extraction of EBCDIC versions of parser files such as perly.c, perly.h, and x2p/a2p.c. This has to be done before running Configure the first time. If you failed to do so then the easiest way to re-Configure Perl is to delete your misconfigured build root and re-extract the source from the tar ball. Then you must ensure that /etc/yyparse.c is properly in place before attempting to re-run Configure.
This port will support dynamic loading, but it is not selected by default. If you would like to experiment with dynamic loading then be sure to specify -Dusedl in the arguments to the Configure script. See the comments in hints/os390.sh for more information on dynamic loading. If you build with dynamic loading then you will need to add the $archlibexp/CORE directory to your LIBPATH environment variable in order for perl to work. See the config.sh file for the value of $archlibexp. If in trying to use Perl you see an error message similar to:
- CEE3501S The module libperl.dll was not found.
- From entry point __dllstaticinit at compile unit offset +00000194 at
then your LIBPATH does not have the location of libperl.x and either libperl.dll or libperl.so in it. Add that directory to your LIBPATH and proceed.
Do not turn on the compiler optimization flag "-O". There is a bug in either the optimizer or perl that causes perl to not work correctly when the optimizer is on.
Some of the configuration files in /etc used by the networking APIs are either missing or have the wrong names. In particular, make sure that there's either an /etc/resolv.conf or an /etc/hosts, so that gethostbyname() works, and make sure that the file /etc/proto has been renamed to /etc/protocol (NOT /etc/protocols, as used by other Unix systems). You may have to look for things like HOSTNAME and DOMAINORIGIN in the "//'SYS1.TCPPARMS(TCPDATA)'" PDS member in order to properly set up your /etc networking files.
Simply put:
- sh Configure
- make
- make test
if everything looks ok (see the next section for test/IVP diagnosis) then:
- make install
this last step may or may not require UID=0 privileges depending on how you answered the questions that Configure asked and whether or not you have write access to the directories you specified.
"Out of memory!" messages during the build of Perl are most often fixed by re building the GNU make utility for OS/390 from a source code kit.
Another memory limiting item to check is your MAXASSIZE parameter in your 'SYS1.PARMLIB(BPXPRMxx)' data set (note too that as of V2R8 address space limits can be set on a per user ID basis in the USS segment of a RACF profile). People have reported successful builds of Perl with MAXASSIZE parameters as small as 503316480 (and it may be possible to build Perl with a MAXASSIZE smaller than that).
Within USS your /etc/profile or $HOME/.profile may limit your ulimit settings. Check that the following command returns reasonable values:
- ulimit -a
To conserve memory you should have your compiler modules loaded into the Link Pack Area (LPA/ELPA) rather than in a link list or step lib.
If the c89 compiler complains of syntax errors during the build of the Socket extension then be sure to fix the syntax error in the system header /usr/include/sys/socket.h.
The "make test" step runs a Perl Verification Procedure, usually before installation. You might encounter STDERR messages even during a successful run of "make test". Here is a guide to some of the more commonly seen anomalies:
A message of the form:
- io/openpid...........CEE5210S The signal SIGHUP was received.
- CEE5210S The signal SIGHUP was received.
- CEE5210S The signal SIGHUP was received.
- ok
indicates that the t/io/openpid.t test of Perl has passed but done so with extraneous messages on stderr from CEE.
A message of the form:
- lib/ftmp-security....File::Temp::_gettemp: Parent directory (/tmp/) is not safe
- (sticky bit not set when world writable?) at lib/ftmp-security.t line 100
- File::Temp::_gettemp: Parent directory (/tmp/) is not safe (sticky bit not
- set when world writable?) at lib/ftmp-security.t line 100
- ok
indicates a problem with the permissions on your /tmp directory within the HFS. To correct that problem issue the command:
- chmod a+t /tmp
from an account with write access to the directory entry for /tmp.
Out of Memory!
Recent perl test suite is quite memory hungry. In addition to the comments above on memory limitations it is also worth checking for _CEE_RUNOPTS in your environment. Perl now has (in miniperlmain.c) a C #pragma to set CEE run options, but the environment variable wins.
The C code asks for:
- #pragma runopts(HEAP(2M,500K,ANYWHERE,KEEP,8K,4K) STACK(,,ANY,) ALL31(ON))
The important parts of that are the second argument (the increment) to HEAP, and allowing the stack to be "Above the (16M) line". If the heap increment is too small then when perl (for example loading unicode/Name.pl) tries to create a "big" (400K+) string it cannot fit in a single segment and you get "Out of Memory!" - even if there is still plenty of memory available.
A related issue is use with perl's malloc. Perl's malloc uses sbrk()
to get memory, and sbrk()
is limited to the first allocation so in this
case something like:
- HEAP(8M,500K,ANYWHERE,KEEP,8K,4K)
is needed to get through the test suite.
The installman script will try to run on OS/390. There will be fewer errors if you have a roff utility installed. You can obtain GNU groff from the Redbook SG24-5944-00 ftp site.
When using perl on OS/390 please keep in mind that the EBCDIC and ASCII character sets are different. See perlebcdic.pod for more on such character set issues. Perl builtin functions that may behave differently under EBCDIC are also mentioned in the perlport.pod document.
Open Edition (UNIX System Services) from V2R8 onward does support #!/path/to/perl script invocation. There is a PTF available from IBM for V2R7 that will allow shell/kernel support for #!. USS releases prior to V2R7 did not support the #! means of script invocation. If you are running V2R6 or earlier then see:
- head `whence perldoc`
for an example of how to use the "eval exec" trick to ask the shell to have Perl run your scripts on those older releases of Unix System Services.
If you are having trouble with square brackets then consider switching your rlogin or telnet client. Try to avoid older 3270 emulators and ISHELL for working with Perl on USS.
There appears to be a bug in the floating point implementation on S/390 systems such that calling int() on the product of a number and a small magnitude number is not the same as calling int() on the quotient of that number and a large magnitude number. For example, in the following Perl code:
Although one would expect the quantities $y and $z to be the same and equal to 100000 they will differ and instead will be 0 and 100000 respectively.
The problem can be further examined in a roughly equivalent C program:
- #include <stdio.h>
- #include <math.h>
- main()
- {
- double r1,r2;
- double x = 100000.0;
- double y = 0.0;
- double z = 0.0;
- x = 100000.0 * 1e-5;
- r1 = modf (x,&y);
- x = 100000.0 / 1e+5;
- r2 = modf (x,&z);
- printf("y is %e and z is %e\n",y*1e5,z*1e5);
- /* y is 0.000000e+00 and z is 1.000000e+05 (with c89) */
- }
Pure pure (that is non xs) modules may be installed via the usual:
- perl Makefile.PL
- make
- make test
- make install
If you built perl with dynamic loading capability then that would also be the way to build xs based extensions. However, if you built perl with the default static linking you can still build xs based extensions for OS/390 but you will need to follow the instructions in ExtUtils::MakeMaker for building statically linked perl binaries. In the simplest configurations building a static perl + xs extension boils down to:
- perl Makefile.PL
- make
- make perl
- make test
- make install
- make -f Makefile.aperl inst_perl MAP_TARGET=perl
In most cases people have reported better results with GNU make rather than the system's /bin/make program, whether for plain modules or for xs based extensions.
If the make process encounters trouble with either compilation or linking then try setting the _C89_CCMODE to 1. Assuming sh is your login shell then run:
- export _C89_CCMODE=1
If tcsh is your login shell then use the setenv command.
David Fiander and Peter Prymmer with thanks to Dennis Longnecker and William Raffloer for valuable reports, LPAR and PTF feedback. Thanks to Mike MacIsaac and Egon Terwedow for SG24-5944-00. Thanks to Ignasi Roca for pointing out the floating point problems. Thanks to John Goodyear for dynamic loading help.
INSTALL, perlport, perlebcdic, ExtUtils::MakeMaker.
- http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1toy.html
- http://www.redbooks.ibm.com/redbooks/SG245944.html
- http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1ty1.html#opensrc
- http://www.xray.mpe.mpg.de/mailing-lists/perl-mvs/
- http://publibz.boulder.ibm.com:80/cgi-bin/bookmgr_OS390/BOOKS/ceea3030/
- http://publibz.boulder.ibm.com:80/cgi-bin/bookmgr_OS390/BOOKS/CBCUG030/
If you are interested in the z/OS (formerly known as OS/390) and POSIX-BC (BS2000) ports of Perl then see the perl-mvs mailing list. To subscribe, send an empty message to perl-mvs-subscribe@perl.org.
See also:
- http://lists.perl.org/list/perl-mvs.html
There are web archives of the mailing list at:
- http://www.xray.mpe.mpg.de/mailing-lists/perl-mvs/
- http://archive.develooper.com/perl-mvs@perl.org/
This document was originally written by David Fiander for the 5.005 release of Perl.
This document was podified for the 5.005_03 release of Perl 11 March 1999.
Updated 28 November 2001 for broken URLs.
Updated 12 November 2000 for the 5.7.1 release of Perl.
Updated 15 January 2001 for the 5.7.1 release of Perl.
Updated 24 January 2001 to mention dynamic loading.
Updated 12 March 2001 to mention //'SYS1.TCPPARMS(TCPDATA)'.
perlos400 - Perl version 5 on OS/400
This document describes various features of IBM's OS/400 operating system that will affect how Perl version 5 (hereafter just Perl) is compiled and/or runs.
By far the easiest way to build Perl for OS/400 is to use the PASE (Portable Application Solutions Environment), for more information see http://www.iseries.ibm.com/developer/factory/pase/index.html This environment allows one to use AIX APIs while programming, and it provides a runtime that allows AIX binaries to execute directly on the PowerPC iSeries.
The recommended way to build Perl for the OS/400 PASE is to build the Perl 5 source code (release 5.8.1 or later) under AIX.
The trick is to give a special parameter to the Configure shell script when running it on AIX:
- sh Configure -DPASE ...
The default installation directory of Perl under PASE is /QOpenSys/perl. This can be modified if needed with Configure parameter -Dprefix=/some/dir.
Starting from OS/400 V5R2 the IBM Visual Age compiler is supported on OS/400 PASE, so it is possible to build Perl natively on OS/400. The easier way, however, is to compile in AIX, as just described.
If you don't want to install the compiled Perl in AIX into /QOpenSys (for packaging it before copying it to PASE), you can use a Configure parameter: -Dinstallprefix=/tmp/QOpenSys/perl. This will cause the "make install" to install everything into that directory, while the installed files still think they are (will be) in /QOpenSys/perl.
If building natively on PASE, please do the build under the /QOpenSys directory, since Perl is happier when built on a case sensitive filesystem.
If you are compiling on AIX, simply do a "make install" on the AIX box. Once the install finishes, tar up the /QOpenSys/perl directory. Transfer the tarball to the OS/400 using FTP with the following commands:
- > binary
- > site namefmt 1
- > put perl.tar /QOpenSys
Once you have it on, simply bring up a PASE shell and extract the tarball.
If you are compiling in PASE, then "make install" is the only thing you will need to do.
The default path for perl binary is /QOpenSys/perl/bin/perl. You'll want to symlink /QOpenSys/usr/bin/perl to this file so you don't have to modify your path.
Perl in PASE may be used in the same manner as you would use Perl on AIX.
Scripts starting with #!/usr/bin/perl should work if you have /QOpenSys/usr/bin/perl symlinked to your perl binary. This will not work if you've done a setuid/setgid or have environment variable PASE_EXEC_QOPENSYS="N". If you have V5R1, you'll need to get the latest PTFs to have this feature. Scripts starting with #!/QOpenSys/perl/bin/perl should always work.
When compiling in PASE, there is no "oslevel" command. Therefore, you may want to create a script called "oslevel" that echoes the level of AIX that your version of PASE runtime supports. If you're unsure, consult your documentation or use "4.3.3.0".
If you have test cases that fail, check for the existence of spool files. The test case may be trying to use a syscall that is not implemented in PASE. To avoid the SIGILL, try setting the PASE_SYSCALL_NOSIGILL environment variable or have a handler for the SIGILL. If you can compile programs for PASE, run the config script and edit config.sh when it gives you the option. If you want to remove fchdir(), which isn't implement in V5R1, simply change the line that says:
d_fchdir='define'
to
d_fchdir='undef'
and then compile Perl. The places where fchdir() is used have alternatives for systems that do not have fchdir() available.
There exists a port of Perl to the ILE environment. This port, however, is based quite an old release of Perl, Perl 5.00502 (August 1998). (As of July 2002 the latest release of Perl is 5.8.0, and even 5.6.1 has been out since April 2001.) If you need to run Perl on ILE, though, you may need this older port: http://www.cpan.org/ports/#os400 Note that any Perl release later than 5.00502 has not been ported to ILE.
If you need to use Perl in the ILE environment, you may want to consider using Qp2RunPase() to call the PASE version of Perl.
Jarkko Hietaniemi <jhi@iki.fi> Bryan Logan <bryanlog@us.ibm.com> David Larson <larson1@us.ibm.com>
perlpacktut - tutorial on pack and unpack
pack and unpack are two functions for transforming data according
to a user-defined template, between the guarded way Perl stores values
and some well-defined representation as might be required in the
environment of a Perl program. Unfortunately, they're also two of
the most misunderstood and most often overlooked functions that Perl
provides. This tutorial will demystify them for you.
Most programming languages don't shelter the memory where variables are
stored. In C, for instance, you can take the address of some variable,
and the sizeof
operator tells you how many bytes are allocated to
the variable. Using the address and the size, you may access the storage
to your heart's content.
In Perl, you just can't access memory at random, but the structural and
representational conversion provided by pack and unpack is an
excellent alternative. The pack function converts values to a byte
sequence containing representations according to a given specification,
the so-called "template" argument. unpack is the reverse process,
deriving some values from the contents of a string of bytes. (Be cautioned,
however, that not all that has been packed together can be neatly unpacked -
a very common experience as seasoned travellers are likely to confirm.)
Why, you may ask, would you need a chunk of memory containing some values
in binary representation? One good reason is input and output accessing
some file, a device, or a network connection, whereby this binary
representation is either forced on you or will give you some benefit
in processing. Another cause is passing data to some system call that
is not available as a Perl function: syscall requires you to provide
parameters stored in the way it happens in a C program. Even text processing
(as shown in the next section) may be simplified with judicious usage
of these two functions.
To see how (un)packing works, we'll start with a simple template
code where the conversion is in low gear: between the contents of a byte
sequence and a string of hexadecimal digits. Let's use unpack, since
this is likely to remind you of a dump program, or some desperate last
message unfortunate programs are wont to throw at you before they expire
into the wild blue yonder. Assuming that the variable $mem
holds a
sequence of bytes that we'd like to inspect without assuming anything
about its meaning, we can write
whereupon we might see something like this, with each pair of hex digits corresponding to a byte:
- 41204d414e204120504c414e20412043414e414c2050414e414d41
What was in this chunk of memory? Numbers, characters, or a mixture of
both? Assuming that we're on a computer where ASCII (or some similar)
encoding is used: hexadecimal values in the range 0x40
- 0x5A
indicate an uppercase letter, and 0x20
encodes a space. So we might
assume it is a piece of text, which some are able to read like a tabloid;
but others will have to get hold of an ASCII table and relive that
firstgrader feeling. Not caring too much about which way to read this,
we note that unpack with the template code H
converts the contents
of a sequence of bytes into the customary hexadecimal notation. Since
"a sequence of" is a pretty vague indication of quantity, H
has been
defined to convert just a single hexadecimal digit unless it is followed
by a repeat count. An asterisk for the repeat count means to use whatever
remains.
The inverse operation - packing byte contents from a string of hexadecimal digits - is just as easily written. For instance:
Since we feed a list of ten 2-digit hexadecimal strings to pack, the
pack template should contain ten pack codes. If this is run on a computer
with ASCII character coding, it will print 0123456789
.
Let's suppose you've got to read in a data file like this:
- Date |Description | Income|Expenditure
- 01/24/2001 Zed's Camel Emporium 1147.99
- 01/28/2001 Flea spray 24.99
- 01/29/2001 Camel rides to tourists 235.00
How do we do it? You might think first to use split; however, since
split collapses blank fields, you'll never know whether a record was
income or expenditure. Oops. Well, you could always use substr:
It's not really a barrel of laughs, is it? In fact, it's worse than it may seem; the eagle-eyed may notice that the first field should only be 10 characters wide, and the error has propagated right through the other numbers - which we've had to count by hand. So it's error-prone as well as horribly unfriendly.
Or maybe we could use regular expressions:
- while (<>) {
- my($date, $desc, $income, $expend) =
- m|(\d\d/\d\d/\d{4}) (.{27}) (.{7})(.*)|;
- ...
- }
Urgh. Well, it's a bit better, but - well, would you want to maintain that?
Hey, isn't Perl supposed to make this sort of thing easy? Well, it does,
if you use the right tools. pack and unpack are designed to help
you out when dealing with fixed-width data like the above. Let's have a
look at a solution with unpack:
That looks a bit nicer; but we've got to take apart that weird template. Where did I pull that out of?
OK, let's have a look at some of our data again; in fact, we'll include the headers, and a handy ruler so we can keep track of where we are.
- 1 2 3 4 5
- 1234567890123456789012345678901234567890123456789012345678
- Date |Description | Income|Expenditure
- 01/28/2001 Flea spray 24.99
- 01/29/2001 Camel rides to tourists 235.00
From this, we can see that the date column stretches from column 1 to
column 10 - ten characters wide. The pack-ese for "character" is
A
, and ten of them are A10
. So if we just wanted to extract the
dates, we could say this:
OK, what's next? Between the date and the description is a blank column;
we want to skip over that. The x
template means "skip forward", so we
want one of those. Next, we have another batch of characters, from 12 to
38. That's 27 more characters, hence A27
. (Don't make the fencepost
error - there are 27 characters between 12 and 38, not 26. Count 'em!)
Now we skip another character and pick up the next 7 characters:
Now comes the clever bit. Lines in our ledger which are just income and
not expenditure might end at column 46. Hence, we don't want to tell our
unpack pattern that we need to find another 12 characters; we'll
just say "if there's anything left, take it". As you might guess from
regular expressions, that's what the *
means: "use everything
remaining".
Be warned, though, that unlike regular expressions, if the unpack
template doesn't match the incoming data, Perl will scream and die.
Hence, putting it all together:
Now, that's our data parsed. I suppose what we might want to do now is total up our income and expenditure, and add another line to the end of our ledger - in the same format - saying how much we've brought in and how much we've spent:
- while (<>) {
- my($date, $desc, $income, $expend) = unpack("A10xA27xA7xA*", $_);
- $tot_income += $income;
- $tot_expend += $expend;
- }
- $tot_income = sprintf("%.2f", $tot_income); # Get them into
- $tot_expend = sprintf("%.2f", $tot_expend); # "financial" format
- $date = POSIX::strftime("%m/%d/%Y", localtime);
- # OK, let's go:
- print pack("A10xA27xA7xA*", $date, "Totals", $tot_income, $tot_expend);
Oh, hmm. That didn't quite work. Let's see what happened:
- 01/24/2001 Zed's Camel Emporium 1147.99
- 01/28/2001 Flea spray 24.99
- 01/29/2001 Camel rides to tourists 1235.00
- 03/23/2001Totals 1235.001172.98
OK, it's a start, but what happened to the spaces? We put x
, didn't
we? Shouldn't it skip forward? Let's look at what pack says:
- x A null byte.
Urgh. No wonder. There's a big difference between "a null byte", character zero, and "a space", character 32. Perl's put something between the date and the description - but unfortunately, we can't see it!
What we actually need to do is expand the width of the fields. The A
format pads any non-existent characters with spaces, so we can use the
additional spaces to line up our fields, like this:
(Note that you can put spaces in the template to make it more readable, but they don't translate to spaces in the output.) Here's what we got this time:
- 01/24/2001 Zed's Camel Emporium 1147.99
- 01/28/2001 Flea spray 24.99
- 01/29/2001 Camel rides to tourists 1235.00
- 03/23/2001 Totals 1235.00 1172.98
That's a bit better, but we still have that last column which needs to
be moved further over. There's an easy way to fix this up:
unfortunately, we can't get pack to right-justify our fields, but we
can get sprintf to do it:
This time we get the right answer:
- 01/28/2001 Flea spray 24.99
- 01/29/2001 Camel rides to tourists 1235.00
- 03/23/2001 Totals 1235.00 1172.98
So that's how we consume and produce fixed-width data. Let's recap what
we've seen of pack and unpack so far:
Use pack to go from several pieces of data to one fixed-width
version; use unpack to turn a fixed-width-format string into several
pieces of data.
The pack format A
means "any character"; if you're packing and
you've run out of things to pack, pack will fill the rest up with
spaces.
x
means "skip a byte" when unpacking; when packing, it means
"introduce a null byte" - that's probably not what you mean if you're
dealing with plain text.
You can follow the formats with numbers to say how many characters
should be affected by that format: A12
means "take 12 characters";
x6
means "skip 6 bytes" or "character 0, 6 times".
Instead of a number, you can use *
to mean "consume everything else
left".
Warning: when packing multiple pieces of data, *
only means
"consume all of the current piece of data". That's to say
- pack("A*A*", $one, $two)
packs all of $one
into the first A*
and then all of $two
into
the second. This is a general principle: each format character
corresponds to one piece of data to be packed.
So much for textual data. Let's get onto the meaty stuff that pack
and unpack are best at: handling binary formats for numbers. There is,
of course, not just one binary format - life would be too simple - but
Perl will do all the finicky labor for you.
Packing and unpacking numbers implies conversion to and from some specific binary representation. Leaving floating point numbers aside for the moment, the salient properties of any such representation are:
the number of bytes used for storing the integer,
whether the contents are interpreted as a signed or unsigned number,
the byte ordering: whether the first byte is the least or most significant byte (or: little-endian or big-endian, respectively).
So, for instance, to pack 20302 to a signed 16 bit integer in your computer's representation you write
Again, the result is a string, now containing 2 bytes. If you print
this string (which is, generally, not recommended) you might see
ON
or NO
(depending on your system's byte ordering) - or something
entirely different if your computer doesn't use ASCII character encoding.
Unpacking $ps
with the same template returns the original integer value:
This is true for all numeric template codes. But don't expect miracles:
if the packed value exceeds the allotted byte capacity, high order bits
are silently discarded, and unpack certainly won't be able to pull them
back out of some magic hat. And, when you pack using a signed template
code such as s, an excess value may result in the sign bit
getting set, and unpacking this will smartly return a negative value.
16 bits won't get you too far with integers, but there is l
and L
for signed and unsigned 32-bit integers. And if this is not enough and
your system supports 64 bit integers you can push the limits much closer
to infinity with pack codes q and Q
. A notable exception is provided
by pack codes i
and I
for signed and unsigned integers of the
"local custom" variety: Such an integer will take up as many bytes as
a local C compiler returns for sizeof(int)
, but it'll use at least
32 bits.
Each of the integer pack codes sSlLqQ
results in a fixed number of bytes,
no matter where you execute your program. This may be useful for some
applications, but it does not provide for a portable way to pass data
structures between Perl and C programs (bound to happen when you call
XS extensions or the Perl function syscall), or when you read or
write binary files. What you'll need in this case are template codes that
depend on what your local C compiler compiles when you code short
or
unsigned long
, for instance. These codes and their corresponding
byte lengths are shown in the table below. Since the C standard leaves
much leeway with respect to the relative sizes of these data types, actual
values may vary, and that's why the values are given as expressions in
C and Perl. (If you'd like to use values from %Config
in your program
you have to import it with use Config
.)
- signed unsigned byte length in C byte length in Perl
- s! S! sizeof(short) $Config{shortsize}
- i! I! sizeof(int) $Config{intsize}
- l! L! sizeof(long) $Config{longsize}
- q! Q! sizeof(long long) $Config{longlongsize}
The i!
and I!
codes aren't different from i
and I
; they are
tolerated for completeness' sake.
Requesting a particular byte ordering may be necessary when you work with binary data coming from some specific architecture whereas your program could run on a totally different system. As an example, assume you have 24 bytes containing a stack frame as it happens on an Intel 8086:
- +---------+ +----+----+ +---------+
- TOS: | IP | TOS+4:| FL | FH | FLAGS TOS+14:| SI |
- +---------+ +----+----+ +---------+
- | CS | | AL | AH | AX | DI |
- +---------+ +----+----+ +---------+
- | BL | BH | BX | BP |
- +----+----+ +---------+
- | CL | CH | CX | DS |
- +----+----+ +---------+
- | DL | DH | DX | ES |
- +----+----+ +---------+
First, we note that this time-honored 16-bit CPU uses little-endian order,
and that's why the low order byte is stored at the lower address. To
unpack such a (unsigned) short we'll have to use code v
. A repeat
count unpacks all 12 shorts:
Alternatively, we could have used C
to unpack the individually
accessible byte registers FL, FH, AL, AH, etc.:
It would be nice if we could do this in one fell swoop: unpack a short,
back up a little, and then unpack 2 bytes. Since Perl is nice, it
proffers the template code X
to back up one byte. Putting this all
together, we may now write:
(The clumsy construction of the template can be avoided - just read on!)
We've taken some pains to construct the template so that it matches
the contents of our frame buffer. Otherwise we'd either get undefined values,
or unpack could not unpack all. If pack runs out of items, it will
supply null strings (which are coerced into zeroes whenever the pack code
says so).
The pack code for big-endian (high order byte at the lowest address) is
n
for 16 bit and N
for 32 bit integers. You use these codes
if you know that your data comes from a compliant architecture, but,
surprisingly enough, you should also use these pack codes if you
exchange binary data, across the network, with some system that you
know next to nothing about. The simple reason is that this
order has been chosen as the network order, and all standard-fearing
programs ought to follow this convention. (This is, of course, a stern
backing for one of the Lilliputian parties and may well influence the
political development there.) So, if the protocol expects you to send
a message by sending the length first, followed by just so many bytes,
you could write:
or even:
and pass $buf
to your send routine. Some protocols demand that the
count should include the length of the count itself: then just add 4
to the data length. (But make sure to read Lengths and Widths before
you really code this!)
In the previous sections we've learned how to use n
, N
, v
and
V
to pack and unpack integers with big- or little-endian byte-order.
While this is nice, it's still rather limited because it leaves out all
kinds of signed integers as well as 64-bit integers. For example, if you
wanted to unpack a sequence of signed big-endian 16-bit integers in a
platform-independent way, you would have to write:
This is ugly. As of Perl 5.9.2, there's a much nicer way to express your
desire for a certain byte-order: the > and <
modifiers.
> is the big-endian modifier, while <
is the little-endian
modifier. Using them, we could rewrite the above code as:
As you can see, the "big end" of the arrow touches the s, which is a
nice way to remember that > is the big-endian modifier. The same
obviously works for <
, where the "little end" touches the code.
You will probably find these modifiers even more useful if you have to deal with big- or little-endian C structures. Be sure to read Packing and Unpacking C Structures for more on that.
For packing floating point numbers you have the choice between the
pack codes f
, d
, F
and D
. f
and d
pack into (or unpack
from) single-precision or double-precision representation as it is provided
by your system. If your systems supports it, D
can be used to pack and
unpack extended-precision floating point values (long double
), which
can offer even more resolution than f
or d
. F
packs an NV
,
which is the floating point type used by Perl internally. (There
is no such thing as a network representation for reals, so if you want
to send your real numbers across computer boundaries, you'd better stick
to ASCII representation, unless you're absolutely sure what's on the other
end of the line. For the even more adventuresome, you can use the byte-order
modifiers from the previous section also on floating point codes.)
Bits are the atoms in the memory world. Access to individual bits may
have to be used either as a last resort or because it is the most
convenient way to handle your data. Bit string (un)packing converts
between strings containing a series of 0
and 1
characters and
a sequence of bytes each containing a group of 8 bits. This is almost
as simple as it sounds, except that there are two ways the contents of
a byte may be written as a bit string. Let's have a look at an annotated
byte:
- 7 6 5 4 3 2 1 0
- +-----------------+
- | 1 0 0 0 1 1 0 0 |
- +-----------------+
- MSB LSB
It's egg-eating all over again: Some think that as a bit string this should be written "10001100" i.e. beginning with the most significant bit, others insist on "00110001". Well, Perl isn't biased, so that's why we have two bit string codes:
It is not possible to pack or unpack bit fields - just integral bytes.
pack always starts at the next byte boundary and "rounds up" to the
next multiple of 8 by adding zero bits as required. (If you do want bit
fields, there is vec. Or you could implement bit field
handling at the character string level, using split, substr, and
concatenation on unpacked bit strings.)
To illustrate unpacking for bit strings, we'll decompose a simple status register (a "-" stands for a "reserved" bit):
- +-----------------+-----------------+
- | S Z - A - P - C | - - - - O D I T |
- +-----------------+-----------------+
- MSB LSB MSB LSB
Converting these two bytes to a string can be done with the unpack
template 'b16'
. To obtain the individual bit values from the bit
string we use split with the "empty" separator pattern which dissects
into individual characters. Bit values from the "reserved" positions are
simply assigned to undef, a convenient notation for "I don't care where
this goes".
We could have used an unpack template 'b12'
just as well, since the
last 4 bits can be ignored anyway.
Another odd-man-out in the template alphabet is u
, which packs an
"uuencoded string". ("uu" is short for Unix-to-Unix.) Chances are that
you won't ever need this encoding technique which was invented to overcome
the shortcomings of old-fashioned transmission mediums that do not support
other than simple ASCII data. The essential recipe is simple: Take three
bytes, or 24 bits. Split them into 4 six-packs, adding a space (0x20) to
each. Repeat until all of the data is blended. Fold groups of 4 bytes into
lines no longer than 60 and garnish them in front with the original byte count
(incremented by 0x20) and a "\n"
at the end. - The pack chef will
prepare this for you, a la minute, when you select pack code u
on the menu:
A repeat count after u
sets the number of bytes to put into an
uuencoded line, which is the maximum of 45 by default, but could be
set to some (smaller) integer multiple of three. unpack simply ignores
the repeat count.
An even stranger template code is %
<number>. First, because
it's used as a prefix to some other template code. Second, because it
cannot be used in pack at all, and third, in unpack, doesn't return the
data as defined by the template code it precedes. Instead it'll give you an
integer of number bits that is computed from the data value by
doing sums. For numeric unpack codes, no big feat is achieved:
For string values, %
returns the sum of the byte values saving
you the trouble of a sum loop with substr and ord:
Although the %
code is documented as returning a "checksum":
don't put your trust in such values! Even when applied to a small number
of bytes, they won't guarantee a noticeable Hamming distance.
In connection with b
or B
, %
simply adds bits, and this can be put
to good use to count set bits efficiently:
And an even parity bit can be determined like this:
Unicode is a character set that can represent most characters in most of the world's languages, providing room for over one million different characters. Unicode 3.1 specifies 94,140 characters: The Basic Latin characters are assigned to the numbers 0 - 127. The Latin-1 Supplement with characters that are used in several European languages is in the next range, up to 255. After some more Latin extensions we find the character sets from languages using non-Roman alphabets, interspersed with a variety of symbol sets such as currency symbols, Zapf Dingbats or Braille. (You might want to visit http://www.unicode.org/ for a look at some of them - my personal favourites are Telugu and Kannada.)
The Unicode character sets associates characters with integers. Encoding these numbers in an equal number of bytes would more than double the requirements for storing texts written in Latin alphabets. The UTF-8 encoding avoids this by storing the most common (from a western point of view) characters in a single byte while encoding the rarer ones in three or more bytes.
Perl uses UTF-8, internally, for most Unicode strings.
So what has this got to do with pack? Well, if you want to compose a
Unicode string (that is internally encoded as UTF-8), you can do so by
using template code U
. As an example, let's produce the Euro currency
symbol (code number 0x20AC):
- $UTF8{Euro} = pack( 'U', 0x20AC );
- # Equivalent to: $UTF8{Euro} = "\x{20ac}";
Inspecting $UTF8{Euro}
shows that it contains 3 bytes:
"\xe2\x82\xac". However, it contains only 1 character, number 0x20AC.
The round trip can be completed with unpack:
- $Unicode{Euro} = unpack( 'U', $UTF8{Euro} );
Unpacking using the U
template code also works on UTF-8 encoded byte
strings.
Usually you'll want to pack or unpack UTF-8 strings:
Please note: in the general case, you're better off using Encode::decode_utf8 to decode a UTF-8 encoded byte string to a Perl Unicode string, and Encode::encode_utf8 to encode a Perl Unicode string to UTF-8 bytes. These functions provide means of handling invalid byte sequences and generally have a friendlier interface.
The pack code w
has been added to support a portable binary data
encoding scheme that goes way beyond simple integers. (Details can
be found at http://Casbah.org/, the Scarab project.) A BER (Binary Encoded
Representation) compressed unsigned integer stores base 128
digits, most significant digit first, with as few digits as possible.
Bit eight (the high bit) is set on each byte except the last. There
is no size limit to BER encoding, but Perl won't go to extremes.
A hex dump of $berbuf
, with spaces inserted at the right places,
shows 01 8100 8101 81807F. Since the last byte is always less than
128, unpack knows where to stop.
Prior to Perl 5.8, repetitions of templates had to be made by
x
-multiplication of template strings. Now there is a better way as
we may use the pack codes ( and ) combined with a repeat count.
The unpack template from the Stack Frame example can simply
be written like this:
- unpack( 'v2 (vXXCC)5 v5', $frame )
Let's explore this feature a little more. We'll begin with the equivalent of
which returns a string consisting of the first character from each string. Using pack, we can write
- pack( '(A)'.@str, @str )
or, because a repeat count *
means "repeat as often as required",
simply
- pack( '(A)*', @str )
(Note that the template A*
would only have packed $str[0]
in full
length.)
To pack dates stored as triplets ( day, month, year ) in an array @dates
into a sequence of byte, byte, short integer we can write
To swap pairs of characters in a string (with even length) one could use
several techniques. First, let's use x
and X
to skip forward and back:
We can also use @
to jump to an offset, with 0 being the position where
we were when the last ( was encountered:
Finally, there is also an entirely different approach by unpacking big endian shorts and packing them in the reverse byte order:
- $s = pack( '(v)*', unpack( '(n)*', $s );
In the previous section we've seen a network message that was constructed by prefixing the binary message length to the actual message. You'll find that packing a length followed by so many bytes of data is a frequently used recipe since appending a null byte won't work if a null byte may be part of the data. Here is an example where both techniques are used: after two null terminated strings with source and destination address, a Short Message (to a mobile phone) is sent after a length byte:
Unpacking this message can be done with the same template:
- ( $src, $dst, $len, $sm ) = unpack( 'Z*Z*CA*', $msg );
There's a subtle trap lurking in the offing: Adding another field after
the Short Message (in variable $sm
) is all right when packing, but this
cannot be unpacked naively:
The pack code A*
gobbles up all remaining bytes, and $prio
remains
undefined! Before we let disappointment dampen the morale: Perl's got
the trump card to make this trick too, just a little further up the sleeve.
Watch this:
Combining two pack codes with a slash (/) associates them with a single
value from the argument list. In pack, the length of the argument is
taken and packed according to the first code while the argument itself
is added after being converted with the template code after the slash.
This saves us the trouble of inserting the length call, but it is
in unpack where we really score: The value of the length byte marks the
end of the string to be taken from the buffer. Since this combination
doesn't make sense except when the second pack code isn't a*
, A*
or Z*
, Perl won't let you.
The pack code preceding / may be anything that's fit to represent a
number: All the numeric binary pack codes, and even text codes such as
A4
or Z*
:
/ is not implemented in Perls before 5.6, so if your code is required to
work on older Perls you'll need to unpack( 'Z* Z* C')
to get the length,
then use it to make a new unpack string. For example
But that second unpack is rushing ahead. It isn't using a simple literal
string for the template. So maybe we should introduce...
So far, we've seen literals used as templates. If the list of pack
items doesn't have fixed length, an expression constructing the
template is required (whenever, for some reason, ()*
cannot be used).
Here's an example: To store named string values in a way that can be
conveniently parsed by a C program, we create a sequence of names and
null terminated ASCII strings, with =
between the name and the value,
followed by an additional delimiting null byte. Here's how:
Let's examine the cogs of this byte mill, one by one. There's the map
call, creating the items we intend to stuff into the $env
buffer:
to each key (in $_
) it adds the =
separator and the hash entry value.
Each triplet is packed with the template code sequence A*A*Z*
that
is repeated according to the number of keys. (Yes, that's what the keys
function returns in scalar context.) To get the very last null byte,
we add a 0
at the end of the pack list, to be packed with C
.
(Attentive readers may have noticed that we could have omitted the 0.)
For the reverse operation, we'll have to determine the number of items
in the buffer before we can let unpack rip it apart:
The tr counts the null bytes. The unpack call returns a list of
name-value pairs each of which is taken apart in the map block.
Rather than storing a sentinel at the end of a data item (or a list of items), we could precede the data with a count. Again, we pack keys and values of a hash, preceding each with an unsigned short length count, and up front we store the number of pairs:
This simplifies the reverse operation as the number of repetitions can be
unpacked with the / code:
Note that this is one of the rare cases where you cannot use the same
template for pack and unpack because pack can't determine
a repeat count for a ()
-group.
Intel HEX is a file format for representing binary data, mostly for
programming various chips, as a text file. (See
http://en.wikipedia.org/wiki/.hex for a detailed description, and
http://en.wikipedia.org/wiki/SREC_(file_format) for the Motorola
S-record format, which can be unravelled using the same technique.)
Each line begins with a colon (':') and is followed by a sequence of
hexadecimal characters, specifying a byte count n (8 bit),
an address (16 bit, big endian), a record type (8 bit), n data bytes
and a checksum (8 bit) computed as the least significant byte of the two's
complement sum of the preceding bytes. Example: :0300300002337A1E.
The first step of processing such a line is the conversion, to binary,
of the hexadecimal data, to obtain the four fields, while checking the
checksum. No surprise here: we'll start with a simple pack call to
convert everything to binary:
The resulting byte sequence is most convenient for checking the checksum.
Don't slow your program down with a for loop adding the ord values
of this string's bytes - the unpack code %
is the thing to use
for computing the 8-bit sum of all bytes, which must be equal to zero:
Finally, let's get those four fields. By now, you shouldn't have any
problems with the first three fields - but how can we use the byte count
of the data in the first field as a length for the data field? Here
the codes x
and X
come to the rescue, as they permit jumping
back and forth in the string to unpack.
Code x
skips a byte, since we don't need the count yet. Code n
takes
care of the 16-bit big-endian integer address, and C
unpacks the
record type. Being at offset 4, where the data begins, we need the count.
X4
brings us back to square one, which is the byte at offset 0.
Now we pick up the count, and zoom forth to offset 4, where we are
now fully furnished to extract the exact number of data bytes, leaving
the trailing checksum byte alone.
In previous sections we have seen how to pack numbers and character strings. If it were not for a couple of snags we could conclude this section right away with the terse remark that C structures don't contain anything else, and therefore you already know all there is to it. Sorry, no: read on, please.
If you have to deal with a lot of C structures, and don't want to
hack all your template strings manually, you'll probably want to have
a look at the CPAN module Convert::Binary::C
. Not only can it parse
your C source directly, but it also has built-in support for all the
odds and ends described further on in this section.
In the consideration of speed against memory requirements the balance has been tilted in favor of faster execution. This has influenced the way C compilers allocate memory for structures: On architectures where a 16-bit or 32-bit operand can be moved faster between places in memory, or to or from a CPU register, if it is aligned at an even or multiple-of-four or even at a multiple-of eight address, a C compiler will give you this speed benefit by stuffing extra bytes into structures. If you don't cross the C shoreline this is not likely to cause you any grief (although you should care when you design large data structures, or you want your code to be portable between architectures (you do want that, don't you?)).
To see how this affects pack and unpack, we'll compare these two
C structures:
- typedef struct {
- char c1;
- short s;
- char c2;
- long l;
- } gappy_t;
- typedef struct {
- long l;
- short s;
- char c1;
- char c2;
- } dense_t;
Typically, a C compiler allocates 12 bytes to a gappy_t
variable, but
requires only 8 bytes for a dense_t
. After investigating this further,
we can draw memory maps, showing where the extra 4 bytes are hidden:
- 0 +4 +8 +12
- +--+--+--+--+--+--+--+--+--+--+--+--+
- |c1|xx| s |c2|xx|xx|xx| l | xx = fill byte
- +--+--+--+--+--+--+--+--+--+--+--+--+
- gappy_t
- 0 +4 +8
- +--+--+--+--+--+--+--+--+
- | l | h |c1|c2|
- +--+--+--+--+--+--+--+--+
- dense_t
And that's where the first quirk strikes: pack and unpack
templates have to be stuffed with x
codes to get those extra fill bytes.
The natural question: "Why can't Perl compensate for the gaps?" warrants
an answer. One good reason is that C compilers might provide (non-ANSI)
extensions permitting all sorts of fancy control over the way structures
are aligned, even at the level of an individual structure field. And, if
this were not enough, there is an insidious thing called union
where
the amount of fill bytes cannot be derived from the alignment of the next
item alone.
OK, so let's bite the bullet. Here's one way to get the alignment right
by inserting template codes x
, which don't take a corresponding item
from the list:
Note the !
after l
: We want to make sure that we pack a long
integer as it is compiled by our C compiler. And even now, it will only
work for the platforms where the compiler aligns things as above.
And somebody somewhere has a platform where it doesn't.
[Probably a Cray, where short
s, ints and long
s are all 8 bytes. :-)]
Counting bytes and watching alignments in lengthy structures is bound to be a drag. Isn't there a way we can create the template with a simple program? Here's a C program that does the trick:
- #include <stdio.h>
- #include <stddef.h>
- typedef struct {
- char fc1;
- short fs;
- char fc2;
- long fl;
- } gappy_t;
- #define Pt(struct,field,tchar) \
- printf( "@%d%s ", offsetof(struct,field), # tchar );
- int main() {
- Pt( gappy_t, fc1, c );
- Pt( gappy_t, fs, s! );
- Pt( gappy_t, fc2, c );
- Pt( gappy_t, fl, l! );
- printf( "\n" );
- }
The output line can be used as a template in a pack or unpack call:
Gee, yet another template code - as if we hadn't plenty. But
@
saves our day by enabling us to specify the offset from the beginning
of the pack buffer to the next item: This is just the value
the offsetof
macro (defined in <stddef.h>
) returns when
given a struct
type and one of its field names ("member-designator" in
C standardese).
Neither using offsets nor adding x
's to bridge the gaps is satisfactory.
(Just imagine what happens if the structure changes.) What we really need
is a way of saying "skip as many bytes as required to the next multiple of N".
In fluent Templatese, you say this with x!N
where N is replaced by the
appropriate value. Here's the next version of our struct packaging:
That's certainly better, but we still have to know how long all the
integers are, and portability is far away. Rather than 2
,
for instance, we want to say "however long a short is". But this can be
done by enclosing the appropriate pack code in brackets: [s]. So, here's
the very best we can do:
Now, imagine that we want to pack the data for a machine with a different byte-order. First, we'll have to figure out how big the data types on the target machine really are. Let's assume that the longs are 32 bits wide and the shorts are 16 bits wide. You can then rewrite the template as:
If the target machine is little-endian, we could write:
This forces the short and the long members to be little-endian, and is just fine if you don't have too many struct members. But we could also use the byte-order modifier on a group and write the following:
This is not as short as before, but it makes it more obvious that we intend to have little-endian byte-order for a whole group, not only for individual template codes. It can also be more readable and easier to maintain.
I'm afraid that we're not quite through with the alignment catch yet. The hydra raises another ugly head when you pack arrays of structures:
- typedef struct {
- short count;
- char glyph;
- } cell_t;
- typedef cell_t buffer_t[BUFLEN];
Where's the catch? Padding is neither required before the first field count
,
nor between this and the next field glyph
, so why can't we simply pack
like this:
This packs 3*@buffer
bytes, but it turns out that the size of
buffer_t
is four times BUFLEN
! The moral of the story is that
the required alignment of a structure or array is propagated to the
next higher level where we have to consider padding at the end
of each component as well. Thus the correct template is:
And even if you take all the above into account, ANSI still lets this:
- typedef struct {
- char foo[2];
- } foo_t;
vary in size. The alignment constraint of the structure can be greater than
any of its elements. [And if you think that this doesn't affect anything
common, dismember the next cellphone that you see. Many have ARM cores, and
the ARM structure rules make sizeof (foo_t)
== 4]
The title of this section indicates the second problem you may run into
sooner or later when you pack C structures. If the function you intend
to call expects a, say, void *
value, you cannot simply take
a reference to a Perl variable. (Although that value certainly is a
memory address, it's not the address where the variable's contents are
stored.)
Template code P
promises to pack a "pointer to a fixed length string".
Isn't this what we want? Let's try:
But wait: doesn't pack just return a sequence of bytes? How can we pass this
string of bytes to some C code expecting a pointer which is, after all,
nothing but a number? The answer is simple: We have to obtain the numeric
address from the bytes returned by pack.
Obviously this assumes that it is possible to typecast a pointer to an unsigned long and vice versa, which frequently works but should not be taken as a universal law. - Now that we have this pointer the next question is: How can we put it to good use? We need a call to some C function where a pointer is expected. The read(2) system call comes to mind:
After reading perlfunc explaining how to use syscall we can write
this Perl function copying a file to standard output:
- require 'syscall.ph';
- sub cat($){
- my $path = shift();
- my $size = -s $path;
- my $memory = "\x00" x $size; # allocate some memory
- my $ptr = unpack( 'L', pack( 'P', $memory ) );
- open( F, $path ) || die( "$path: cannot open ($!)\n" );
- my $fd = fileno(F);
- my $res = syscall( &SYS_read, fileno(F), $ptr, $size );
- print $memory;
- close( F );
- }
This is neither a specimen of simplicity nor a paragon of portability but
it illustrates the point: We are able to sneak behind the scenes and
access Perl's otherwise well-guarded memory! (Important note: Perl's
syscall does not require you to construct pointers in this roundabout
way. You simply pass a string variable, and Perl forwards the address.)
How does unpack with P
work? Imagine some pointer in the buffer
about to be unpacked: If it isn't the null pointer (which will smartly
produce the undef value) we have a start address - but then what?
Perl has no way of knowing how long this "fixed length string" is, so
it's up to you to specify the actual size as an explicit length after P
.
As a consequence, pack ignores any number or *
after P
.
Now that we have seen P
at work, we might as well give p
a whirl.
Why do we need a second template code for packing pointers at all? The
answer lies behind the simple fact that an unpack with p
promises
a null-terminated string starting at the address taken from the buffer,
and that implies a length for the data item to be returned:
Albeit this is apt to be confusing: As a consequence of the length being
implied by the string's length, a number after pack code p
is a repeat
count, not a length as after P
.
Using pack(..., $x)
with P
or p
to get the address where $x
is
actually stored must be used with circumspection. Perl's internal machinery
considers the relation between a variable and that address as its very own
private matter and doesn't really care that we have obtained a copy. Therefore:
Do not use pack with p
or P
to obtain the address of variable
that's bound to go out of scope (and thereby freeing its memory) before you
are done with using the memory at that address.
Be very careful with Perl operations that change the value of the variable. Appending something to the variable, for instance, might require reallocation of its storage, leaving you with a pointer into no-man's land.
Don't think that you can get the address of a Perl variable
when it is stored as an integer or double number! pack('P', $x)
will
force the variable's internal representation to string, just as if you
had written something like $x .= ''
.
It's safe, however, to P- or p-pack a string literal, because Perl simply allocates an anonymous variable.
Here are a collection of (possibly) useful canned recipes for pack
and unpack:
- # Convert IP address for socket functions
- pack( "C4", split /\./, "123.4.5.6" );
- # Count the bits in a chunk of memory (e.g. a select vector)
- unpack( '%32b*', $mask );
- # Determine the endianness of your system
- $is_little_endian = unpack( 'c', pack( 's', 1 ) );
- $is_big_endian = unpack( 'xc', pack( 's', 1 ) );
- # Determine the number of bits in a native integer
- $bits = unpack( '%32I!', ~0 );
- # Prepare argument for the nanosleep system call
- my $timespec = pack( 'L!L!', $secs, $nanosecs );
For a simple memory dump we unpack some bytes into just as
many pairs of hex digits, and use map to handle the traditional
spacing - 16 bytes to a line:
Simon Cozens and Wolfgang Laun.
perlperf - Perl Performance and Optimization Techniques
This is an introduction to the use of performance and optimization techniques which can be used with particular reference to perl programs. While many perl developers have come from other languages, and can use their prior knowledge where appropriate, there are many other people who might benefit from a few perl specific pointers. If you want the condensed version, perhaps the best advice comes from the renowned Japanese Samurai, Miyamoto Musashi, who said:
- "Do Not Engage in Useless Activity"
in 1645.
Perhaps the most common mistake programmers make is to attempt to optimize their code before a program actually does anything useful - this is a bad idea. There's no point in having an extremely fast program that doesn't work. The first job is to get a program to correctly do something useful, (not to mention ensuring the test suite is fully functional), and only then to consider optimizing it. Having decided to optimize existing working code, there are several simple but essential steps to consider which are intrinsic to any optimization process.
Firstly, you need to establish a baseline time for the existing code, which
timing needs to be reliable and repeatable. You'll probably want to use the
Benchmark
or Devel::NYTProf
modules, or something similar, for this step,
or perhaps the Unix system time utility, whichever is appropriate. See the
base of this document for a longer list of benchmarking and profiling modules,
and recommended further reading.
Next, having examined the program for hot spots, (places where the code
seems to run slowly), change the code with the intention of making it run
faster. Using version control software, like subversion
, will ensure no
changes are irreversible. It's too easy to fiddle here and fiddle there -
don't change too much at any one time or you might not discover which piece of
code really was the slow bit.
It's not enough to say: "that will make it run faster", you have to check it. Rerun the code under control of the benchmarking or profiling modules, from the first step above, and check that the new code executed the same task in less time. Save your work and repeat...
The critical thing when considering performance is to remember there is no such
thing as a Golden Bullet
, which is why there are no rules, only guidelines.
It is clear that inline code is going to be faster than subroutine or method calls, because there is less overhead, but this approach has the disadvantage of being less maintainable and comes at the cost of greater memory usage - there is no such thing as a free lunch. If you are searching for an element in a list, it can be more efficient to store the data in a hash structure, and then simply look to see whether the key is defined, rather than to loop through the entire array using grep() for instance. substr() may be (a lot) faster than grep() but not as flexible, so you have another trade-off to access. Your code may contain a line which takes 0.01 of a second to execute which if you call it 1,000 times, quite likely in a program parsing even medium sized files for instance, you already have a 10 second delay, in just one single code location, and if you call that line 100,000 times, your entire program will slow down to an unbearable crawl.
Using a subroutine as part of your sort is a powerful way to get exactly what
you want, but will usually be slower than the built-in alphabetic cmp
and
numeric <=>
sort operators. It is possible to make multiple
passes over your data, building indices to make the upcoming sort more
efficient, and to use what is known as the OM
(Orcish Maneuver) to cache the
sort keys in advance. The cache lookup, while a good idea, can itself be a
source of slowdown by enforcing a double pass over the data - once to setup the
cache, and once to sort the data. Using pack() to extract the required sort
key into a consistent string can be an efficient way to build a single string
to compare, instead of using multiple sort keys, which makes it possible to use
the standard, written in c
and fast, perl sort() function on the output,
and is the basis of the GRT
(Guttman Rossler Transform). Some string
combinations can slow the GRT
down, by just being too plain complex for it's
own good.
For applications using database backends, the standard DBIx
namespace has
tries to help with keeping things nippy, not least because it tries to not
query the database until the latest possible moment, but always read the docs
which come with your choice of libraries. Among the many issues facing
developers dealing with databases should remain aware of is to always use
SQL
placeholders and to consider pre-fetching data sets when this might
prove advantageous. Splitting up a large file by assigning multiple processes
to parsing a single file, using say POE
, threads
or fork can also be a
useful way of optimizing your usage of the available CPU
resources, though
this technique is fraught with concurrency issues and demands high attention to
detail.
Every case has a specific application and one or more exceptions, and there is no replacement for running a few tests and finding out which method works best for your particular environment, this is why writing optimal code is not an exact science, and why we love using Perl so much - TMTOWTDI.
Here are a few examples to demonstrate usage of Perl's benchmarking tools.
I'm sure most of us have seen code which looks like, (or worse than), this:
- if ( $obj->{_ref}->{_myscore} >= $obj->{_ref}->{_yourscore} ) {
- ...
This sort of code can be a real eyesore to read, as well as being very
sensitive to typos, and it's much clearer to dereference the variable
explicitly. We're side-stepping the issue of working with object-oriented
programming techniques to encapsulate variable access via methods, only
accessible through an object. Here we're just discussing the technical
implementation of choice, and whether this has an effect on performance. We
can see whether this dereferencing operation, has any overhead by putting
comparative code in a file and running a Benchmark
test.
# dereference
- #!/usr/bin/perl
- use strict;
- use warnings;
- use Benchmark;
- my $ref = {
- 'ref' => {
- _myscore => '100 + 1',
- _yourscore => '102 - 1',
- },
- };
- timethese(1000000, {
- 'direct' => sub {
- my $x = $ref->{ref}->{_myscore} . $ref->{ref}->{_yourscore} ;
- },
- 'dereference' => sub {
- my $ref = $ref->{ref};
- my $myscore = $ref->{_myscore};
- my $yourscore = $ref->{_yourscore};
- my $x = $myscore . $yourscore;
- },
- });
It's essential to run any timing measurements a sufficient number of times so
the numbers settle on a numerical average, otherwise each run will naturally
fluctuate due to variations in the environment, to reduce the effect of
contention for CPU
resources and network bandwidth for instance. Running
the above code for one million iterations, we can take a look at the report
output by the Benchmark
module, to see which approach is the most effective.
- $> perl dereference
- Benchmark: timing 1000000 iterations of dereference, direct...
- dereference: 2 wallclock secs ( 1.59 usr + 0.00 sys = 1.59 CPU) @ 628930.82/s (n=1000000)
- direct: 1 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 833333.33/s (n=1000000)
The difference is clear to see and the dereferencing approach is slower. While it managed to execute an average of 628,930 times a second during our test, the direct approach managed to run an additional 204,403 times, unfortunately. Unfortunately, because there are many examples of code written using the multiple layer direct variable access, and it's usually horrible. It is, however, minusculy faster. The question remains whether the minute gain is actually worth the eyestrain, or the loss of maintainability.
If we have a string which needs to be modified, while a regex will almost
always be much more flexible, tr, an oft underused tool, can still be a
useful. One scenario might be replace all vowels with another character. The
regex solution might look like this:
- $str =~ s/[aeiou]/x/g
The tr alternative might look like this:
- $str =~ tr/aeiou/xxxxx/
We can put that into a test file which we can run to check which approach is
the fastest, using a global $STR
variable to assign to the my $str
variable so as to avoid perl trying to optimize any of the work away by
noticing it's assigned only the once.
# regex-transliterate
Running the code gives us our results:
- $> perl regex-transliterate
- Benchmark: timing 1000000 iterations of sr, tr...
- sr: 2 wallclock secs ( 1.19 usr + 0.00 sys = 1.19 CPU) @ 840336.13/s (n=1000000)
- tr: 0 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 2040816.33/s (n=1000000)
The tr version is a clear winner. One solution is flexible, the other is
fast - and it's appropriately the programmer's choice which to use.
Check the Benchmark
docs for further useful techniques.
A slightly larger piece of code will provide something on which a profiler can
produce more extensive reporting statistics. This example uses the simplistic
wordmatch
program which parses a given input file and spews out a short
report on the contents.
# wordmatch
- #!/usr/bin/perl
- use strict;
- use warnings;
- =head1 NAME
- filewords - word analysis of input file
- =head1 SYNOPSIS
- filewords -f inputfilename [-d]
- =head1 DESCRIPTION
- This program parses the given filename, specified with C<-f>, and displays a
- simple analysis of the words found therein. Use the C<-d> switch to enable
- debugging messages.
- =cut
- use FileHandle;
- use Getopt::Long;
- my $debug = 0;
- my $file = '';
- my $result = GetOptions (
- 'debug' => \$debug,
- 'file=s' => \$file,
- );
- die("invalid args") unless $result;
- unless ( -f $file ) {
- die("Usage: $0 -f filename [-d]");
- }
- my $FH = FileHandle->new("< $file") or die("unable to open file($file): $!");
- my $i_LINES = 0;
- my $i_WORDS = 0;
- my %count = ();
- my @lines = <$FH>;
- foreach my $line ( @lines ) {
- $i_LINES++;
- $line =~ s/\n//;
- my @words = split(/ +/, $line);
- my $i_words = scalar(@words);
- $i_WORDS = $i_WORDS + $i_words;
- debug("line: $i_LINES supplying $i_words words: @words");
- my $i_word = 0;
- foreach my $word ( @words ) {
- $i_word++;
- $count{$i_LINES}{spec} += matches($i_word, $word, '[^a-zA-Z0-9]');
- $count{$i_LINES}{only} += matches($i_word, $word, '^[^a-zA-Z0-9]+$');
- $count{$i_LINES}{cons} += matches($i_word, $word, '^[(?i:bcdfghjklmnpqrstvwxyz)]+$');
- $count{$i_LINES}{vows} += matches($i_word, $word, '^[(?i:aeiou)]+$');
- $count{$i_LINES}{caps} += matches($i_word, $word, '^[(A-Z)]+$');
- }
- }
- print report( %count );
- sub matches {
- my $i_wd = shift;
- my $word = shift;
- my $regex = shift;
- my $has = 0;
- if ( $word =~ /($regex)/ ) {
- $has++ if $1;
- }
- debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/");
- return $has;
- }
- sub report {
- my %report = @_;
- my %rep;
- foreach my $line ( keys %report ) {
- foreach my $key ( keys %{ $report{$line} } ) {
- $rep{$key} += $report{$line}{$key};
- }
- }
- my $report = qq|
- $0 report for $file:
- lines in file: $i_LINES
- words in file: $i_WORDS
- words with special (non-word) characters: $i_spec
- words with only special (non-word) characters: $i_only
- words with only consonants: $i_cons
- words with only capital letters: $i_caps
- words with only vowels: $i_vows
- |;
- return $report;
- }
- sub debug {
- my $message = shift;
- if ( $debug ) {
- print STDERR "DBG: $message\n";
- }
- }
- exit 0;
This venerable module has been the de-facto standard for Perl code profiling
for more than a decade, but has been replaced by a number of other modules
which have brought us back to the 21st century. Although you're recommended to
evaluate your tool from the several mentioned here and from the CPAN list at
the base of this document, (and currently Devel::NYTProf seems to be the
weapon of choice - see below), we'll take a quick look at the output from
Devel::DProf first, to set a baseline for Perl profiling tools. Run the
above program under the control of Devel::DProf
by using the -d
switch on
the command-line.
- $> perl -d:DProf wordmatch -f perl5db.pl
- <...multiple lines snipped...>
- wordmatch report for perl5db.pl:
- lines in file: 9428
- words in file: 50243
- words with special (non-word) characters: 20480
- words with only special (non-word) characters: 7790
- words with only consonants: 4801
- words with only capital letters: 1316
- words with only vowels: 1701
Devel::DProf
produces a special file, called tmon.out by default, and
this file is read by the dprofpp
program, which is already installed as part
of the Devel::DProf
distribution. If you call dprofpp
with no options,
it will read the tmon.out file in the current directory and produce a human
readable statistics report of the run of your program. Note that this may take
a little time.
- $> dprofpp
- Total Elapsed Time = 2.951677 Seconds
- User+System Time = 2.871677 Seconds
- Exclusive Times
- %Time ExclSec CumulS #Calls sec/call Csec/c Name
- 102. 2.945 3.003 251215 0.0000 0.0000 main::matches
- 2.40 0.069 0.069 260643 0.0000 0.0000 main::debug
- 1.74 0.050 0.050 1 0.0500 0.0500 main::report
- 1.04 0.030 0.049 4 0.0075 0.0123 main::BEGIN
- 0.35 0.010 0.010 3 0.0033 0.0033 Exporter::as_heavy
- 0.35 0.010 0.010 7 0.0014 0.0014 IO::File::BEGIN
- 0.00 - -0.000 1 - - Getopt::Long::FindOption
- 0.00 - -0.000 1 - - Symbol::BEGIN
- 0.00 - -0.000 1 - - Fcntl::BEGIN
- 0.00 - -0.000 1 - - Fcntl::bootstrap
- 0.00 - -0.000 1 - - warnings::BEGIN
- 0.00 - -0.000 1 - - IO::bootstrap
- 0.00 - -0.000 1 - - Getopt::Long::ConfigDefaults
- 0.00 - -0.000 1 - - Getopt::Long::Configure
- 0.00 - -0.000 1 - - Symbol::gensym
dprofpp
will produce some quite detailed reporting on the activity of the
wordmatch
program. The wallclock, user and system, times are at the top of
the analysis, and after this are the main columns defining which define the
report. Check the dprofpp
docs for details of the many options it supports.
See also Apache::DProf
which hooks Devel::DProf
into mod_perl
.
Let's take a look at the same program using a different profiler:
Devel::Profiler
, a drop-in Perl-only replacement for Devel::DProf
. The
usage is very slightly different in that instead of using the special -d:
flag, you pull Devel::Profiler
in directly as a module using -M
.
- $> perl -MDevel::Profiler wordmatch -f perl5db.pl
- <...multiple lines snipped...>
- wordmatch report for perl5db.pl:
- lines in file: 9428
- words in file: 50243
- words with special (non-word) characters: 20480
- words with only special (non-word) characters: 7790
- words with only consonants: 4801
- words with only capital letters: 1316
- words with only vowels: 1701
Devel::Profiler
generates a tmon.out file which is compatible with the
dprofpp
program, thus saving the construction of a dedicated statistics
reader program. dprofpp
usage is therefore identical to the above example.
- $> dprofpp
- Total Elapsed Time = 20.984 Seconds
- User+System Time = 19.981 Seconds
- Exclusive Times
- %Time ExclSec CumulS #Calls sec/call Csec/c Name
- 49.0 9.792 14.509 251215 0.0000 0.0001 main::matches
- 24.4 4.887 4.887 260643 0.0000 0.0000 main::debug
- 0.25 0.049 0.049 1 0.0490 0.0490 main::report
- 0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::GetOptions
- 0.00 0.000 0.000 2 0.0000 0.0000 Getopt::Long::ParseOptionSpec
- 0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::FindOption
- 0.00 0.000 0.000 1 0.0000 0.0000 IO::File::new
- 0.00 0.000 0.000 1 0.0000 0.0000 IO::Handle::new
- 0.00 0.000 0.000 1 0.0000 0.0000 Symbol::gensym
- 0.00 0.000 0.000 1 0.0000 0.0000 IO::File::open
Interestingly we get slightly different results, which is mostly because the
algorithm which generates the report is different, even though the output file
format was allegedly identical. The elapsed, user and system times are clearly
showing the time it took for Devel::Profiler
to execute its own run, but
the column listings feel more accurate somehow than the ones we had earlier
from Devel::DProf
. The 102% figure has disappeared, for example. This is
where we have to use the tools at our disposal, and recognise their pros and
cons, before using them. Interestingly, the numbers of calls for each
subroutine are identical in the two reports, it's the percentages which differ.
As the author of Devel::Proviler
writes:
- ...running HTML::Template's test suite under Devel::DProf shows output()
- taking NO time but Devel::Profiler shows around 10% of the time is in output().
- I don't know which to trust but my gut tells me something is wrong with
- Devel::DProf. HTML::Template::output() is a big routine that's called for
- every test. Either way, something needs fixing.
YMMV.
See also Devel::Apache::Profiler
which hooks Devel::Profiler
into mod_perl
.
The Devel::SmallProf
profiler examines the runtime of your Perl program and
produces a line-by-line listing to show how many times each line was called,
and how long each line took to execute. It is called by supplying the familiar
-d
flag to Perl at runtime.
- $> perl -d:SmallProf wordmatch -f perl5db.pl
- <...multiple lines snipped...>
- wordmatch report for perl5db.pl:
- lines in file: 9428
- words in file: 50243
- words with special (non-word) characters: 20480
- words with only special (non-word) characters: 7790
- words with only consonants: 4801
- words with only capital letters: 1316
- words with only vowels: 1701
Devel::SmallProf
writes it's output into a file called smallprof.out, by
default. The format of the file looks like this:
- <num> <time> <ctime> <line>:<text>
When the program has terminated, the output may be examined and sorted using any standard text filtering utilities. Something like the following may be sufficient:
- $> cat smallprof.out | grep \d*: | sort -k3 | tac | head -n20
- 251215 1.65674 7.68000 75: if ( $word =~ /($regex)/ ) {
- 251215 0.03264 4.40000 79: debug("word: $i_wd ".($has ? 'matches' :
- 251215 0.02693 4.10000 81: return $has;
- 260643 0.02841 4.07000 128: if ( $debug ) {
- 260643 0.02601 4.04000 126: my $message = shift;
- 251215 0.02641 3.91000 73: my $has = 0;
- 251215 0.03311 3.71000 70: my $i_wd = shift;
- 251215 0.02699 3.69000 72: my $regex = shift;
- 251215 0.02766 3.68000 71: my $word = shift;
- 50243 0.59726 1.00000 59: $count{$i_LINES}{cons} =
- 50243 0.48175 0.92000 61: $count{$i_LINES}{spec} =
- 50243 0.00644 0.89000 56: my $i_cons = matches($i_word, $word,
- 50243 0.48837 0.88000 63: $count{$i_LINES}{caps} =
- 50243 0.00516 0.88000 58: my $i_caps = matches($i_word, $word, '^[(A-
- 50243 0.00631 0.81000 54: my $i_spec = matches($i_word, $word, '[^a-
- 50243 0.00496 0.80000 57: my $i_vows = matches($i_word, $word,
- 50243 0.00688 0.80000 53: $i_word++;
- 50243 0.48469 0.79000 62: $count{$i_LINES}{only} =
- 50243 0.48928 0.77000 60: $count{$i_LINES}{vows} =
- 50243 0.00683 0.75000 55: my $i_only = matches($i_word, $word, '^[^a-
You can immediately see a slightly different focus to the subroutine profiling modules, and we start to see exactly which line of code is taking the most time. That regex line is looking a bit suspicious, for example. Remember that these tools are supposed to be used together, there is no single best way to profile your code, you need to use the best tools for the job.
See also Apache::SmallProf
which hooks Devel::SmallProf
into mod_perl
.
Devel::FastProf
is another Perl line profiler. This was written with a view
to getting a faster line profiler, than is possible with for example
Devel::SmallProf
, because it's written in C
. To use Devel::FastProf
,
supply the -d
argument to Perl:
- $> perl -d:FastProf wordmatch -f perl5db.pl
- <...multiple lines snipped...>
- wordmatch report for perl5db.pl:
- lines in file: 9428
- words in file: 50243
- words with special (non-word) characters: 20480
- words with only special (non-word) characters: 7790
- words with only consonants: 4801
- words with only capital letters: 1316
- words with only vowels: 1701
Devel::FastProf
writes statistics to the file fastprof.out in the current
directory. The output file, which can be specified, can be interpreted by using
the fprofpp
command-line program.
- $> fprofpp | head -n20
- # fprofpp output format is:
- # filename:line time count: source
- wordmatch:75 3.93338 251215: if ( $word =~ /($regex)/ ) {
- wordmatch:79 1.77774 251215: debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/");
- wordmatch:81 1.47604 251215: return $has;
- wordmatch:126 1.43441 260643: my $message = shift;
- wordmatch:128 1.42156 260643: if ( $debug ) {
- wordmatch:70 1.36824 251215: my $i_wd = shift;
- wordmatch:71 1.36739 251215: my $word = shift;
- wordmatch:72 1.35939 251215: my $regex = shift;
Straightaway we can see that the number of times each line has been called is
identical to the Devel::SmallProf
output, and the sequence is only very
slightly different based on the ordering of the amount of time each line took
to execute, if ( $debug ) { and my $message = shift;
, for example. The
differences in the actual times recorded might be in the algorithm used
internally, or it could be due to system resource limitations or contention.
See also the DBIx::Profile which will profile database queries running
under the DBIx::*
namespace.
Devel::NYTProf
is the next generation of Perl code profiler, fixing many
shortcomings in other tools and implementing many cool features. First of all it
can be used as either a line profiler, a block or a subroutine
profiler, all at once. It can also use sub-microsecond (100ns) resolution on
systems which provide clock_gettime()
. It can be started and stopped even
by the program being profiled. It's a one-line entry to profile mod_perl
applications. It's written in c
and is probably the fastest profiler
available for Perl. The list of coolness just goes on. Enough of that, let's
see how to it works - just use the familiar -d
switch to plug it in and run
the code.
- $> perl -d:NYTProf wordmatch -f perl5db.pl
- wordmatch report for perl5db.pl:
- lines in file: 9427
- words in file: 50243
- words with special (non-word) characters: 20480
- words with only special (non-word) characters: 7790
- words with only consonants: 4801
- words with only capital letters: 1316
- words with only vowels: 1701
NYTProf
will generate a report database into the file nytprof.out by
default. Human readable reports can be generated from here by using the
supplied nytprofhtml
(HTML output) and nytprofcsv
(CSV output) programs.
We've used the Unix system html2text
utility to convert the
nytprof/index.html file for convenience here.
- $> html2text nytprof/index.html
- Performance Profile Index
- For wordmatch
- Run on Fri Sep 26 13:46:39 2008
- Reported on Fri Sep 26 13:47:23 2008
- Top 15 Subroutines -- ordered by exclusive time
- |Calls |P |F |Inclusive|Exclusive|Subroutine |
- | | | |Time |Time | |
- |251215|5 |1 |13.09263 |10.47692 |main:: |matches |
- |260642|2 |1 |2.71199 |2.71199 |main:: |debug |
- |1 |1 |1 |0.21404 |0.21404 |main:: |report |
- |2 |2 |2 |0.00511 |0.00511 |XSLoader:: |load (xsub) |
- |14 |14|7 |0.00304 |0.00298 |Exporter:: |import |
- |3 |1 |1 |0.00265 |0.00254 |Exporter:: |as_heavy |
- |10 |10|4 |0.00140 |0.00140 |vars:: |import |
- |13 |13|1 |0.00129 |0.00109 |constant:: |import |
- |1 |1 |1 |0.00360 |0.00096 |FileHandle:: |import |
- |3 |3 |3 |0.00086 |0.00074 |warnings::register::|import |
- |9 |3 |1 |0.00036 |0.00036 |strict:: |bits |
- |13 |13|13|0.00032 |0.00029 |strict:: |import |
- |2 |2 |2 |0.00020 |0.00020 |warnings:: |import |
- |2 |1 |1 |0.00020 |0.00020 |Getopt::Long:: |ParseOptionSpec|
- |7 |7 |6 |0.00043 |0.00020 |strict:: |unimport |
- For more information see the full list of 189 subroutines.
The first part of the report already shows the critical information regarding which subroutines are using the most time. The next gives some statistics about the source files profiled.
- Source Code Files -- ordered by exclusive time then name
- |Stmts |Exclusive|Avg. |Reports |Source File |
- | |Time | | | |
- |2699761|15.66654 |6e-06 |line . block . sub|wordmatch |
- |35 |0.02187 |0.00062|line . block . sub|IO/Handle.pm |
- |274 |0.01525 |0.00006|line . block . sub|Getopt/Long.pm |
- |20 |0.00585 |0.00029|line . block . sub|Fcntl.pm |
- |128 |0.00340 |0.00003|line . block . sub|Exporter/Heavy.pm |
- |42 |0.00332 |0.00008|line . block . sub|IO/File.pm |
- |261 |0.00308 |0.00001|line . block . sub|Exporter.pm |
- |323 |0.00248 |8e-06 |line . block . sub|constant.pm |
- |12 |0.00246 |0.00021|line . block . sub|File/Spec/Unix.pm |
- |191 |0.00240 |0.00001|line . block . sub|vars.pm |
- |77 |0.00201 |0.00003|line . block . sub|FileHandle.pm |
- |12 |0.00198 |0.00016|line . block . sub|Carp.pm |
- |14 |0.00175 |0.00013|line . block . sub|Symbol.pm |
- |15 |0.00130 |0.00009|line . block . sub|IO.pm |
- |22 |0.00120 |0.00005|line . block . sub|IO/Seekable.pm |
- |198 |0.00085 |4e-06 |line . block . sub|warnings/register.pm|
- |114 |0.00080 |7e-06 |line . block . sub|strict.pm |
- |47 |0.00068 |0.00001|line . block . sub|warnings.pm |
- |27 |0.00054 |0.00002|line . block . sub|overload.pm |
- |9 |0.00047 |0.00005|line . block . sub|SelectSaver.pm |
- |13 |0.00045 |0.00003|line . block . sub|File/Spec.pm |
- |2701595|15.73869 | |Total |
- |128647 |0.74946 | |Average |
- | |0.00201 |0.00003|Median |
- | |0.00121 |0.00003|Deviation |
- Report produced by the NYTProf 2.03 Perl profiler, developed by Tim Bunce and
- Adam Kaplan.
At this point, if you're using the html report, you can click through the various links to bore down into each subroutine and each line of code. Because we're using the text reporting here, and there's a whole directory full of reports built for each source file, we'll just display a part of the corresponding wordmatch-line.html file, sufficient to give an idea of the sort of output you can expect from this cool tool.
- $> html2text nytprof/wordmatch-line.html
- Performance Profile -- -block view-.-line view-.-sub view-
- For wordmatch
- Run on Fri Sep 26 13:46:39 2008
- Reported on Fri Sep 26 13:47:22 2008
- File wordmatch
- Subroutines -- ordered by exclusive time
- |Calls |P|F|Inclusive|Exclusive|Subroutine |
- | | | |Time |Time | |
- |251215|5|1|13.09263 |10.47692 |main::|matches|
- |260642|2|1|2.71199 |2.71199 |main::|debug |
- |1 |1|1|0.21404 |0.21404 |main::|report |
- |0 |0|0|0 |0 |main::|BEGIN |
- |Line|Stmts.|Exclusive|Avg. |Code |
- | | |Time | | |
- |1 | | | |#!/usr/bin/perl |
- |2 | | | | |
- | | | | |use strict; |
- |3 |3 |0.00086 |0.00029|# spent 0.00003s making 1 calls to strict:: |
- | | | | |import |
- | | | | |use warnings; |
- |4 |3 |0.01563 |0.00521|# spent 0.00012s making 1 calls to warnings:: |
- | | | | |import |
- |5 | | | | |
- |6 | | | |=head1 NAME |
- |7 | | | | |
- |8 | | | |filewords - word analysis of input file |
- <...snip...>
- |62 |1 |0.00445 |0.00445|print report( %count ); |
- | | | | |# spent 0.21404s making 1 calls to main::report|
- |63 | | | | |
- | | | | |# spent 23.56955s (10.47692+2.61571) within |
- | | | | |main::matches which was called 251215 times, |
- | | | | |avg 0.00005s/call: # 50243 times |
- | | | | |(2.12134+0.51939s) at line 57 of wordmatch, avg|
- | | | | |0.00005s/call # 50243 times (2.17735+0.54550s) |
- |64 | | | |at line 56 of wordmatch, avg 0.00005s/call # |
- | | | | |50243 times (2.10992+0.51797s) at line 58 of |
- | | | | |wordmatch, avg 0.00005s/call # 50243 times |
- | | | | |(2.12696+0.51598s) at line 55 of wordmatch, avg|
- | | | | |0.00005s/call # 50243 times (1.94134+0.51687s) |
- | | | | |at line 54 of wordmatch, avg 0.00005s/call |
- | | | | |sub matches { |
- <...snip...>
- |102 | | | | |
- | | | | |# spent 2.71199s within main::debug which was |
- | | | | |called 260642 times, avg 0.00001s/call: # |
- | | | | |251215 times (2.61571+0s) by main::matches at |
- |103 | | | |line 74 of wordmatch, avg 0.00001s/call # 9427 |
- | | | | |times (0.09628+0s) at line 50 of wordmatch, avg|
- | | | | |0.00001s/call |
- | | | | |sub debug { |
- |104 |260642|0.58496 |2e-06 |my $message = shift; |
- |105 | | | | |
- |106 |260642|1.09917 |4e-06 |if ( $debug ) { |
- |107 | | | |print STDERR "DBG: $message\n"; |
- |108 | | | |} |
- |109 | | | |} |
- |110 | | | | |
- |111 |1 |0.01501 |0.01501|exit 0; |
- |112 | | | | |
Oodles of very useful information in there - this seems to be the way forward.
See also Devel::NYTProf::Apache
which hooks Devel::NYTProf
into mod_perl
.
Perl modules are not the only tools a performance analyst has at their
disposal, system tools like time should not be overlooked as the next
example shows, where we take a quick look at sorting. Many books, theses and
articles, have been written about efficient sorting algorithms, and this is not
the place to repeat such work, there's several good sorting modules which
deserve taking a look at too: Sort::Maker
, Sort::Key
spring to mind.
However, it's still possible to make some observations on certain Perl specific
interpretations on issues relating to sorting data sets and give an example or
two with regard to how sorting large data volumes can effect performance.
Firstly, an often overlooked point when sorting large amounts of data, one can
attempt to reduce the data set to be dealt with and in many cases grep() can
be quite useful as a simple filter:
A command such as this can vastly reduce the volume of material to actually
sort through in the first place, and should not be too lightly disregarded
purely on the basis of its simplicity. The KISS
principle is too often
overlooked - the next example uses the simple system time utility to
demonstrate. Let's take a look at an actual example of sorting the contents of
a large file, an apache logfile would do. This one has over a quarter of a
million lines, is 50M in size, and a snippet of it looks like this:
# logfile
- 188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
- 188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
- 151.56.71.198 - - [08/Feb/2007:12:57:41 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
- 151.56.71.198 - - [08/Feb/2007:12:57:42 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
- 151.56.71.198 - - [08/Feb/2007:12:57:43 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
- 217.113.68.60 - - [08/Feb/2007:13:02:15 +0000] "GET / HTTP/1.1" 304 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
- 217.113.68.60 - - [08/Feb/2007:13:02:16 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
- debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
- debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
- debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
- 195.24.196.99 - - [08/Feb/2007:13:26:48 +0000] "GET / HTTP/1.0" 200 3309 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
- 195.24.196.99 - - [08/Feb/2007:13:26:58 +0000] "GET /data/css HTTP/1.0" 404 206 "http://www.rfi.net/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
- 195.24.196.99 - - [08/Feb/2007:13:26:59 +0000] "GET /favicon.ico HTTP/1.0" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
- crawl1.cosmixcorp.com - - [08/Feb/2007:13:27:57 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "voyager/1.0"
- crawl1.cosmixcorp.com - - [08/Feb/2007:13:28:25 +0000] "GET /links.html HTTP/1.0" 200 3413 "-" "voyager/1.0"
- fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:32 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
- fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:34 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
- 80.247.140.134 - - [08/Feb/2007:13:57:35 +0000] "GET / HTTP/1.1" 200 3309 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
- 80.247.140.134 - - [08/Feb/2007:13:57:37 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
- pop.compuscan.co.za - - [08/Feb/2007:14:10:43 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
- livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
- livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /html/oracle.html HTTP/1.0" 404 214 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
- dslb-088-064-005-154.pools.arcor-ip.net - - [08/Feb/2007:14:12:15 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
- 196.201.92.41 - - [08/Feb/2007:14:15:01 +0000] "GET / HTTP/1.1" 200 3309 "-" "MOT-L7/08.B7.DCR MIB/2.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1"
The specific task here is to sort the 286,525 lines of this file by Response Code, Query, Browser, Referring Url, and lastly Date. One solution might be to use the following code, which iterates over the files given on the command-line.
# sort-apache-log
- #!/usr/bin/perl -n
- use strict;
- use warnings;
- my @data;
- LINE:
- while ( <> ) {
- my $line = $_;
- if (
- $line =~ m/^(
- ([\w\.\-]+) # client
- \s*-\s*-\s*\[
- ([^]]+) # date
- \]\s*"\w+\s*
- (\S+) # query
- [^"]+"\s*
- (\d+) # status
- \s+\S+\s+"[^"]*"\s+"
- ([^"]*) # browser
- "
- .*
- )$/x
- ) {
- my @chunks = split(/ +/, $line);
- my $ip = $1;
- my $date = $2;
- my $query = $3;
- my $status = $4;
- my $browser = $5;
- push(@data, [$ip, $date, $query, $status, $browser, $line]);
- }
- }
- my @sorted = sort {
- $a->[3] cmp $b->[3]
- ||
- $a->[2] cmp $b->[2]
- ||
- $a->[0] cmp $b->[0]
- ||
- $a->[1] cmp $b->[1]
- ||
- $a->[4] cmp $b->[4]
- } @data;
- foreach my $data ( @sorted ) {
- print $data->[5];
- }
- exit 0;
When running this program, redirect STDOUT
so it is possible to check the
output is correct from following test runs and use the system time utility
to check the overall runtime.
- $> time ./sort-apache-log logfile > out-sort
- real 0m17.371s
- user 0m15.757s
- sys 0m0.592s
The program took just over 17 wallclock seconds to run. Note the different
values time outputs, it's important to always use the same one, and to not
confuse what each one means.
The overall, or wallclock, time between when time was called, and when it
terminates. The elapsed time includes both user and system times, and time
spent waiting for other users and processes on the system. Inevitably, this is
the most approximate of the measurements given.
The user time is the amount of time the entire process spent on behalf of the user on this system executing this program.
The system time is the amount of time the kernel itself spent executing routines, or system calls, on behalf of this process user.
Running this same process as a Schwarzian Transform
it is possible to
eliminate the input and output arrays for storing all the data, and work on the
input directly as it arrives too. Otherwise, the code looks fairly similar:
# sort-apache-log-schwarzian
- #!/usr/bin/perl -n
- use strict;
- use warnings;
- map $_->[0] =>
- sort {
- $a->[4] cmp $b->[4]
- ||
- $a->[3] cmp $b->[3]
- ||
- $a->[1] cmp $b->[1]
- ||
- $a->[2] cmp $b->[2]
- ||
- $a->[5] cmp $b->[5]
- }
- map [ $_, m/^(
- ([\w\.\-]+) # client
- \s*-\s*-\s*\[
- ([^]]+) # date
- \]\s*"\w+\s*
- (\S+) # query
- [^"]+"\s*
- (\d+) # status
- \s+\S+\s+"[^"]*"\s+"
- ([^"]*) # browser
- "
- .*
- )$/xo ]
- => <>;
- exit 0;
Run the new code against the same logfile, as above, to check the new time.
- $> time ./sort-apache-log-schwarzian logfile > out-schwarz
- real 0m9.664s
- user 0m8.873s
- sys 0m0.704s
The time has been cut in half, which is a respectable speed improvement by any
standard. Naturally, it is important to check the output is consistent with
the first program run, this is where the Unix system cksum
utility comes in.
- $> cksum out-sort out-schwarz
- 3044173777 52029194 out-sort
- 3044173777 52029194 out-schwarz
BTW. Beware too of pressure from managers who see you speed a program up by 50% of the runtime once, only to get a request one month later to do the same again (true story) - you'll just have to point out your only human, even if you are a Perl programmer, and you'll see what you can do...
An essential part of any good development process is appropriate error handling with appropriately informative messages, however there exists a school of thought which suggests that log files should be chatty, as if the chain of unbroken output somehow ensures the survival of the program. If speed is in any way an issue, this approach is wrong.
A common sight is code which looks something like this:
- logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) )
The problem is that this code will always be parsed and executed, even when the
debug level set in the logging configuration file is zero. Once the debug()
subroutine has been entered, and the internal $debug
variable confirmed to
be zero, for example, the message which has been sent in will be discarded and
the program will continue. In the example given though, the \%INC hash will
already have been dumped, and the message string constructed, all of which work
could be bypassed by a debug variable at the statement level, like this:
- logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) ) if $DEBUG;
This effect can be demonstrated by setting up a test script with both forms,
including a debug()
subroutine to emulate typical logger()
functionality.
# ifdebug
- #!/usr/bin/perl
- use strict;
- use warnings;
- use Benchmark;
- use Data::Dumper;
- my $DEBUG = 0;
- sub debug {
- my $msg = shift;
- if ( $DEBUG ) {
- print "DEBUG: $msg\n";
- }
- };
- timethese(100000, {
- 'debug' => sub {
- debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
- },
- 'ifdebug' => sub {
- debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if $DEBUG
- },
- });
Let's see what Benchmark
makes of this:
- $> perl ifdebug
- Benchmark: timing 100000 iterations of constant, sub...
- ifdebug: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) @ 10000000.00/s (n=100000)
- (warning: too few iterations for a reliable count)
- debug: 14 wallclock secs (13.18 usr + 0.04 sys = 13.22 CPU) @ 7564.30/s (n=100000)
In the one case the code, which does exactly the same thing as far as
outputting any debugging information is concerned, in other words nothing,
takes 14 seconds, and in the other case the code takes one hundredth of a
second. Looks fairly definitive. Use a $DEBUG
variable BEFORE you call the
subroutine, rather than relying on the smart functionality inside it.
It's possible to take the previous idea a little further, by using a compile
time DEBUG
constant.
# ifdebug-constant
- #!/usr/bin/perl
- use strict;
- use warnings;
- use Benchmark;
- use Data::Dumper;
- use constant
- DEBUG => 0
- ;
- sub debug {
- if ( DEBUG ) {
- my $msg = shift;
- print "DEBUG: $msg\n";
- }
- };
- timethese(100000, {
- 'debug' => sub {
- debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
- },
- 'constant' => sub {
- debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if DEBUG
- },
- });
Running this program produces the following output:
- $> perl ifdebug-constant
- Benchmark: timing 100000 iterations of constant, sub...
- constant: 0 wallclock secs (-0.00 usr + 0.00 sys = -0.00 CPU) @ -7205759403792793600000.00/s (n=100000)
- (warning: too few iterations for a reliable count)
- sub: 14 wallclock secs (13.09 usr + 0.00 sys = 13.09 CPU) @ 7639.42/s (n=100000)
The DEBUG
constant wipes the floor with even the $debug
variable,
clocking in at minus zero seconds, and generates a "warning: too few iterations
for a reliable count" message into the bargain. To see what is really going
on, and why we had too few iterations when we thought we asked for 100000, we
can use the very useful B::Deparse
to inspect the new code:
- $> perl -MO=Deparse ifdebug-constant
- use Benchmark;
- use Data::Dumper;
- use constant ('DEBUG', 0);
- sub debug {
- use warnings;
- use strict 'refs';
- 0;
- }
- use warnings;
- use strict 'refs';
- timethese(100000, {'sub', sub {
- debug "A $0 logging message via process-id: $$" . Dumper(\%INC);
- }
- , 'constant', sub {
- 0;
- }
- });
- ifdebug-constant syntax OK
The output shows the constant() subroutine we're testing being replaced with
the value of the DEBUG
constant: zero. The line to be tested has been
completely optimized away, and you can't get much more efficient than that.
This document has provided several way to go about identifying hot-spots, and checking whether any modifications have improved the runtime of the code.
As a final thought, remember that it's not (at the time of writing) possible to produce a useful program which will run in zero or negative time and this basic principle can be written as: useful programs are slow by their very definition. It is of course possible to write a nearly instantaneous program, but it's not going to do very much, here's a very efficient one:
- $> perl -e 0
Optimizing that any further is a job for p5p
.
Further reading can be found using the modules and links below.
For example: perldoc -f sort
.
perlfork, perlfunc, perlretut, perlthrtut.
time.
It's not possible to individually showcase all the performance related code for Perl here, naturally, but here's a short list of modules from the CPAN which deserve further attention.
- Apache::DProf
- Apache::SmallProf
- Benchmark
- DBIx::Profile
- Devel::AutoProfiler
- Devel::DProf
- Devel::DProfLB
- Devel::FastProf
- Devel::GraphVizProf
- Devel::NYTProf
- Devel::NYTProf::Apache
- Devel::Profiler
- Devel::Profile
- Devel::Profit
- Devel::SmallProf
- Devel::WxProf
- POE::Devel::Profiler
- Sort::Key
- Sort::Maker
Very useful online reference material:
- http://www.ccl4.org/~nick/P/Fast_Enough/
- http://www-128.ibm.com/developerworks/library/l-optperl.html
- http://perlbuzz.com/2007/11/bind-output-variables-in-dbi-for-speed-and-safety.html
- http://en.wikipedia.org/wiki/Performance_analysis
- http://apache.perl.org/docs/1.0/guide/performance.html
- http://perlgolf.sourceforge.net/
- http://www.sysarch.com/Perl/sort_paper.html
Richard Foley <richard.foley@rfi.net> Copyright (c) 2008
perlplan9 - Plan 9-specific documentation for Perl
These are a few notes describing features peculiar to Plan 9 Perl. As such, it is not intended to be a replacement for the rest of the Perl 5 documentation (which is both copious and excellent). If you have any questions to which you can't find answers in these man pages, contact Luther Huffman at lutherh@stratcom.com and we'll try to answer them.
Perl is invoked from the command line as described in perl. Most perl scripts, however, do have a first line such as "#!/usr/local/bin/perl". This is known as a shebang (shell-bang) statement and tells the OS shell where to find the perl interpreter. In Plan 9 Perl this statement should be "#!/bin/perl" if you wish to be able to directly invoke the script by its name. Alternatively, you may invoke perl with the command "Perl" instead of "perl". This will produce Acme-friendly error messages of the form "filename:18".
Some scripts, usually identified with a *.PL extension, are self-configuring and are able to correctly create their own shebang path from config information located in Plan 9 Perl. These you won't need to be worried about.
Although Plan 9 Perl currently only provides static loading, it is built with a number of useful extensions. These include Opcode, FileHandle, Fcntl, and POSIX. Expect to see others (and DynaLoading!) in the future.
As mentioned previously, dynamic loading isn't currently available nor is MakeMaker. Both are high-priority items.
Some, such as chown and umask aren't provided
because the concept does not exist within Plan 9. Others,
such as some of the socket-related functions, simply
haven't been written yet. Many in the latter category
may be supported in the future.
The functions not currently implemented include:
- chown, chroot, dbmclose, dbmopen, getsockopt,
- setsockopt, recvmsg, sendmsg, getnetbyname,
- getnetbyaddr, getnetent, getprotoent, getservent,
- sethostent, setnetent, setprotoent, setservent,
- endservent, endnetent, endprotoent, umask
There may be several other functions that have undefined behavior so this list shouldn't be considered complete.
For compatibility with perl scripts written for the Unix environment, Plan 9 Perl uses the POSIX signal emulation provided in Plan 9's ANSI POSIX Environment (APE). Signal stacking isn't supported. The signals provided are:
- SIGHUP, SIGINT, SIGQUIT, SIGILL, SIGABRT,
- SIGFPE, SIGKILL, SIGSEGV, SIGPIPE, SIGPIPE, SIGALRM,
- SIGTERM, SIGUSR1, SIGUSR2, SIGCHLD, SIGCONT,
- SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU
WELCOME to Plan 9 Perl, brave soul!
- This is a preliminary alpha version of Plan 9 Perl. Still to be
- implemented are MakeMaker and DynaLoader. Many perl commands are
- missing or currently behave in an inscrutable manner. These gaps will,
- with perseverance and a modicum of luck, be remedied in the near
- future.To install this software:
1. Create the source directories and libraries for perl by running the plan9/setup.rc command (i.e., located in the plan9 subdirectory). Note: the setup routine assumes that you haven't dearchived these files into /sys/src/cmd/perl. After running setup.rc you may delete the copy of the source you originally detarred, as source code has now been installed in /sys/src/cmd/perl. If you plan on installing perl binaries for all architectures, run "setup.rc -a".
2. After making sure that you have adequate privileges to build system software, from /sys/src/cmd/perl/5.00301 (adjust version appropriately) run:
- mk install
If you wish to install perl versions for all architectures (68020, mips, sparc and 386) run:
- mk installall
3. Wait. The build process will take a *long* time because perl bootstraps itself. A 75MHz Pentium, 16MB RAM machine takes roughly 30 minutes to build the distribution from scratch.
This perl distribution comes with a tremendous amount of documentation. To add these to the built-in manuals that come with Plan 9, from /sys/src/cmd/perl/5.00301 (adjust version appropriately) run:
- mk man
To begin your reading, start with:
- man perl
This is a good introduction and will direct you towards other man pages that may interest you.
(Note: "mk man" may produce some extraneous noise. Fear not.)
"As many as there are grains of sand on all the beaches of the world . . ." - Carl Sagan
This document was revised 09-October-1996 for Perl 5.003_7.
Direct questions, comments, and the unlikely bug report (ahem) direct comments toward:
Luther Huffman, lutherh@stratcom.com, Strategic Computer Solutions, Inc.
perlpod - the Plain Old Documentation format
Pod is a simple-to-use markup language used for writing documentation for Perl, Perl programs, and Perl modules.
Translators are available for converting Pod to various formats like plain text, HTML, man pages, and more.
Pod markup consists of three basic kinds of paragraphs: ordinary, verbatim, and command.
Most paragraphs in your documentation will be ordinary blocks of text, like this one. You can simply type in your text without any markup whatsoever, and with just a blank line before and after. When it gets formatted, it will undergo minimal formatting, like being rewrapped, probably put into a proportionally spaced font, and maybe even justified.
You can use formatting codes in ordinary paragraphs, for bold,
italic, code-style
, hyperlinks, and more. Such
codes are explained in the "Formatting Codes"
section, below.
Verbatim paragraphs are usually used for presenting a codeblock or other text which does not require any special parsing or formatting, and which shouldn't be wrapped.
A verbatim paragraph is distinguished by having its first character be a space or a tab. (And commonly, all its lines begin with spaces and/or tabs.) It should be reproduced exactly, with tabs assumed to be on 8-column boundaries. There are no special formatting codes, so you can't italicize or anything like that. A \ means \, and nothing else.
A command paragraph is used for special treatment of whole chunks of text, usually as headings or parts of lists.
All command paragraphs (which are typically only one line long) start with "=", followed by an identifier, followed by arbitrary text that the command can use however it pleases. Currently recognized commands are
- =pod
- =head1 Heading Text
- =head2 Heading Text
- =head3 Heading Text
- =head4 Heading Text
- =over indentlevel
- =item stuff
- =back
- =begin format
- =end format
- =for format text...
- =encoding type
- =cut
To explain them each in detail:
=head1 Heading Text
=head2 Heading Text
=head3 Heading Text
=head4 Heading Text
Head1 through head4 produce headings, head1 being the highest level. The text in the rest of this paragraph is the content of the heading. For example:
- =head2 Object Attributes
The text "Object Attributes" comprises the heading there. The text in these heading commands can use formatting codes, as seen here:
- =head2 Possible Values for C<$/>
Such commands are explained in the "Formatting Codes" section, below.
=over indentlevel
=item stuff...
=back
Item, over, and back require a little more explanation: "=over" starts
a region specifically for the generation of a list using "=item"
commands, or for indenting (groups of) normal paragraphs. At the end
of your list, use "=back" to end it. The indentlevel option to
"=over" indicates how far over to indent, generally in ems (where
one em is the width of an "M" in the document's base font) or roughly
comparable units; if there is no indentlevel option, it defaults
to four. (And some formatters may just ignore whatever indentlevel
you provide.) In the stuff in =item stuff..., you may
use formatting codes, as seen here:
- =item Using C<$|> to Control Buffering
Such commands are explained in the "Formatting Codes" section, below.
Note also that there are some basic rules to using "=over" ... "=back" regions:
Don't use "=item"s outside of an "=over" ... "=back" region.
The first thing after the "=over" command should be an "=item", unless there aren't going to be any items at all in this "=over" ... "=back" region.
Don't put "=headn" commands inside an "=over" ... "=back" region.
And perhaps most importantly, keep the items consistent: either use "=item *" for all of them, to produce bullets; or use "=item 1.", "=item 2.", etc., to produce numbered lists; or use "=item foo", "=item bar", etc.--namely, things that look nothing like bullets or numbers.
If you start with bullets or numbers, stick with them, as formatters use the first "=item" type to decide how to format the list.
=cut
To end a Pod block, use a blank line, then a line beginning with "=cut", and a blank line after it. This lets Perl (and the Pod formatter) know that this is where Perl code is resuming. (The blank line before the "=cut" is not technically necessary, but many older Pod processors require it.)
=pod
The "=pod" command by itself doesn't do much of anything, but it signals to Perl (and Pod formatters) that a Pod block starts here. A Pod block starts with any command paragraph, so a "=pod" command is usually used just when you want to start a Pod block with an ordinary paragraph or a verbatim paragraph. For example:
- =item stuff()
- This function does stuff.
- =cut
- sub stuff {
- ...
- }
- =pod
- Remember to check its return value, as in:
- stuff() || die "Couldn't do stuff!";
- =cut
=begin formatname
=end formatname
=for formatname text...
For, begin, and end will let you have regions of text/code/data that are not generally interpreted as normal Pod text, but are passed directly to particular formatters, or are otherwise special. A formatter that can use that format will use the region, otherwise it will be completely ignored.
A command "=begin formatname", some paragraphs, and a command "=end formatname", mean that the text/data in between is meant for formatters that understand the special format called formatname. For example,
- =begin html
- <hr> <img src="thang.png">
- <p> This is a raw HTML paragraph </p>
- =end html
The command "=for formatname text..." specifies that the remainder of just this paragraph (starting right after formatname) is in that special format.
- =for html <hr> <img src="thang.png">
- <p> This is a raw HTML paragraph </p>
This means the same thing as the above "=begin html" ... "=end html" region.
That is, with "=for", you can have only one paragraph's worth of text (i.e., the text in "=foo targetname text..."), but with "=begin targetname" ... "=end targetname", you can have any amount of stuff in between. (Note that there still must be a blank line after the "=begin" command and a blank line before the "=end" command.)
Here are some examples of how to use these:
- =begin html
- <br>Figure 1.<br><IMG SRC="figure1.png"><br>
- =end html
- =begin text
- ---------------
- | foo |
- | bar |
- ---------------
- ^^^^ Figure 1. ^^^^
- =end text
Some format names that formatters currently are known to accept include "roff", "man", "latex", "tex", "text", and "html". (Some formatters will treat some of these as synonyms.)
A format name of "comment" is common for just making notes (presumably to yourself) that won't appear in any formatted version of the Pod document:
- =for comment
- Make sure that all the available options are documented!
Some formatnames will require a leading colon (as in
"=for :formatname"
, or
"=begin :formatname" ... "=end :formatname"
),
to signal that the text is not raw data, but instead is Pod text
(i.e., possibly containing formatting codes) that's just not for
normal formatting (e.g., may not be a normal-use paragraph, but might
be for formatting as a footnote).
=encoding encodingname
This command is used for declaring the encoding of a document. Most
users won't need this; but if your encoding isn't US-ASCII or Latin-1,
then put a =encoding encodingname command early in the document so
that pod formatters will know how to decode the document. For
encodingname, use a name recognized by the Encode::Supported
module. Examples:
- =encoding utf8
- =encoding koi8-r
- =encoding ShiftJIS
- =encoding big5
=encoding
affects the whole document, and must occur only once.
And don't forget, when using any other command, that the command lasts up until the end of its paragraph, not its line. So in the examples below, you can see that every command needs the blank line after it, to end its paragraph.
Some examples of lists include:
- =over
- =item *
- First item
- =item *
- Second item
- =back
- =over
- =item Foo()
- Description of Foo function
- =item Bar()
- Description of Bar function
- =back
In ordinary paragraphs and in some command paragraphs, various formatting codes (a.k.a. "interior sequences") can be used:
I<text>
-- italic text
Used for emphasis ("be I<careful!>
") and parameters
("redo I<LABEL>
")
B<text>
-- bold text
Used for switches ("perl's B<-n> switch
"), programs
("some systems provide a B<chfn> for that
"),
emphasis ("be B<careful!>
"), and so on
("and that feature is known as B<autovivification>
").
C<code>
-- code text
Renders code in a typewriter font, or gives some other indication that
this represents program text ("C<gmtime($^T)>
") or some other
form of computerese ("C<drwxr-xr-x>
").
L<name>
-- a hyperlink
There are various syntaxes, listed below. In the syntaxes given,
text
, name
, and section
cannot contain the characters
'/' and '|'; and any '<' or '>' should be matched.
L<name>
Link to a Perl manual page (e.g., L<Net::Ping>
). Note
that name
should not contain spaces. This syntax
is also occasionally used for references to Unix man pages, as in
L<crontab(5)>
.
L<name/"sec">
or L<name/sec>
Link to a section in other manual page. E.g.,
L<perlsyn/"For Loops">
L</"sec">
or L</sec>
Link to a section in this manual page. E.g.,
L</"Object Methods">
A section is started by the named heading or item. For
example, L<perlvar/$.>
or L<perlvar/"$.">
both
link to the section started by "=item $.
" in perlvar. And
L<perlsyn/For Loops>
or L<perlsyn/"For Loops">
both link to the section started by "=head2 For Loops
"
in perlsyn.
To control what text is used for display, you
use "L<text|...>", as in:
L<text|name>
Link this text to that manual page. E.g.,
L<Perl Error Messages|perldiag>
L<text|name/"sec"> or L<text|name/sec>
Link this text to that section in that manual page. E.g.,
L<postfix "if"|perlsyn/"Statement Modifiers">
L<text|/"sec"> or L<text|/sec>
or L<text|"sec">
Link this text to that section in this manual page. E.g.,
L<the various attributes|/"Member Data">
Or you can link to a web page:
L<scheme:...>
L<text|scheme:...>
Links to an absolute URL. For example, L<http://www.perl.org/> or
L<The Perl Home Page|http://www.perl.org/>.
E<escape>
-- a character escape
Very similar to HTML/XML &foo; "entity references":
E<lt>
-- a literal < (less than)
E<gt>
-- a literal > (greater than)
E<verbar>
-- a literal | (vertical bar)
E<sol>
-- a literal / (solidus)
The above four are optional except in other formatting codes,
notably L<...>
, and when preceded by a
capital letter.
E<htmlname>
Some non-numeric HTML entity name, such as E<eacute>
,
meaning the same thing as é
in HTML -- i.e., a lowercase
e with an acute (/-shaped) accent.
E<number>
The ASCII/Latin-1/Unicode character with that number. A
leading "0x" means that number is hex, as in
E<0x201E>
. A leading "0" means that number is octal,
as in E<075>
. Otherwise number is interpreted as being
in decimal, as in E<181>
.
Note that older Pod formatters might not recognize octal or
hex numeric escapes, and that many formatters cannot reliably
render characters above 255. (Some formatters may even have
to use compromised renderings of Latin-1 characters, like
rendering E<eacute>
as just a plain "e".)
F<filename>
-- used for filenames
Typically displayed in italics. Example: "F<.cshrc>
"
S<text>
-- text contains non-breaking spaces
This means that the words in text should not be broken
across lines. Example: S<$x ? $y : $z>
.
X<topic name>
-- an index entry
This is ignored by most formatters, but some may use it for building
indexes. It always renders as empty-string.
Example: X<absolutizing relative URLs>
Z<>
-- a null (zero-effect) formatting code
This is rarely used. It's one way to get around using an
E<...> code sometimes. For example, instead of
"NE<lt>3" (for "N<3") you could write
"NZ<><3
" (the "Z<>" breaks up the "N" and
the "<" so they can't be considered
the part of a (fictitious) "N<...>" code).
Most of the time, you will need only a single set of angle brackets to
delimit the beginning and end of formatting codes. However,
sometimes you will want to put a real right angle bracket (a
greater-than sign, '>') inside of a formatting code. This is particularly
common when using a formatting code to provide a different font-type for a
snippet of code. As with all things in Perl, there is more than
one way to do it. One way is to simply escape the closing bracket
using an E
code:
- C<$a E<lt>=E<gt> $b>
This will produce: "$a <=> $b
"
A more readable, and perhaps more "plain" way is to use an alternate set of delimiters that doesn't require a single ">" to be escaped. Doubled angle brackets ("<<" and ">>") may be used if and only if there is whitespace right after the opening delimiter and whitespace right before the closing delimiter! For example, the following will do the trick:
- C<< $a <=> $b >>
In fact, you can use as many repeated angle-brackets as you like so long as you have the same number of them in the opening and closing delimiters, and make sure that whitespace immediately follows the last '<' of the opening delimiter, and immediately precedes the first '>' of the closing delimiter. (The whitespace is ignored.) So the following will also work:
- C<<< $a <=> $b >>>
- C<<<< $a <=> $b >>>>
And they all mean exactly the same as this:
- C<$a E<lt>=E<gt> $b>
The multiple-bracket form does not affect the interpretation of the contents of the formatting code, only how it must end. That means that the examples above are also exactly the same as this:
- C<< $a E<lt>=E<gt> $b >>
As a further example, this means that if you wanted to put these bits of
code in C
(code) style:
- open(X, ">>thing.dat") || die $!
- $foo->bar();
you could do it like so:
- C<<< open(X, ">>thing.dat") || die $! >>>
- C<< $foo->bar(); >>
which is presumably easier to read than the old way:
- C<open(X, "E<gt>E<gt>thing.dat") || die $!>
- C<$foo-E<gt>bar();>
This is currently supported by pod2text (Pod::Text), pod2man (Pod::Man), and any other pod2xxx or Pod::Xxxx translators that use Pod::Parser 1.093 or later, or Pod::Tree 1.02 or later.
The intent is simplicity of use, not power of expression. Paragraphs
look like paragraphs (block format), so that they stand out
visually, and so that I could run them through fmt
easily to reformat
them (that's F7 in my version of vi, or Esc Q in my version of
emacs). I wanted the translator to always leave the ' and ` and
" quotes alone, in verbatim mode, so I could slurp in a
working program, shift it over four spaces, and have it print out, er,
verbatim. And presumably in a monospace font.
The Pod format is not necessarily sufficient for writing a book. Pod is just meant to be an idiot-proof common source for nroff, HTML, TeX, and other markup languages, as used for online documentation. Translators exist for pod2text, pod2html, pod2man (that's for nroff(1) and troff(1)), pod2latex, and pod2fm. Various others are available in CPAN.
You can embed Pod documentation in your Perl modules and scripts. Start your documentation with an empty line, a "=head1" command at the beginning, and end it with a "=cut" command and an empty line. Perl will ignore the Pod text. See any of the supplied library modules for examples. If you're going to put your Pod at the end of the file, and you're using an __END__ or __DATA__ cut mark, make sure to put an empty line there before the first Pod command.
Without that empty line before the "=head1", many translators wouldn't have recognized the "=head1" as starting a Pod block.
The podchecker command is provided for checking Pod syntax for errors and warnings. For example, it checks for completely blank lines in Pod blocks and for unknown commands and formatting codes. You should still also pass your document through one or more translators and proofread the result, or print out the result and proofread that. Some of the problems found may be bugs in the translators, which you may or may not wish to work around.
If you're more familiar with writing in HTML than with writing in Pod, you can try your hand at writing documentation in simple HTML, and converting it to Pod with the experimental Pod::HTML2Pod module, (available in CPAN), and looking at the resulting code. The experimental Pod::PXML module in CPAN might also be useful.
Many older Pod translators require the lines before every Pod command and after every Pod command (including "=cut"!) to be a blank line. Having something like this:
- # - - - - - - - - - - - -
- =item $firecracker->boom()
- This noisily detonates the firecracker object.
- =cut
- sub boom {
- ...
...will make such Pod translators completely fail to see the Pod block at all.
Instead, have it like this:
- # - - - - - - - - - - - -
- =item $firecracker->boom()
- This noisily detonates the firecracker object.
- =cut
- sub boom {
- ...
Some older Pod translators require paragraphs (including command paragraphs like "=head2 Functions") to be separated by completely empty lines. If you have an apparently empty line with some spaces on it, this might not count as a separator for those translators, and that could cause odd formatting.
Older translators might add wording around an L<> link, so that
L<Foo::Bar>
may become "the Foo::Bar manpage", for example.
So you shouldn't write things like the L<foo>
documentation, if you want the translated document to read sensibly.
Instead, write the L<Foo::Bar|Foo::Bar> documentation
or
L<the Foo::Bar documentation|Foo::Bar>
, to control how the
link comes out.
Going past the 70th column in a verbatim block might be ungracefully wrapped by some formatters.
perlpodspec, PODs: Embedded Documentation in perlsyn, perlnewmod, perldoc, pod2html, pod2man, podchecker.
Larry Wall, Sean M. Burke
perlpodspec - Plain Old Documentation: format specification and notes
This document is detailed notes on the Pod markup language. Most people will only have to read perlpod to know how to write in Pod, but this document may answer some incidental questions to do with parsing and rendering Pod.
In this document, "must" / "must not", "should" / "should not", and "may" have their conventional (cf. RFC 2119) meanings: "X must do Y" means that if X doesn't do Y, it's against this specification, and should really be fixed. "X should do Y" means that it's recommended, but X may fail to do Y, if there's a good reason. "X may do Y" is merely a note that X can do Y at will (although it is up to the reader to detect any connotation of "and I think it would be nice if X did Y" versus "it wouldn't really bother me if X did Y").
Notably, when I say "the parser should do Y", the parser may fail to do Y, if the calling application explicitly requests that the parser not do Y. I often phrase this as "the parser should, by default, do Y." This doesn't require the parser to provide an option for turning off whatever feature Y is (like expanding tabs in verbatim paragraphs), although it implicates that such an option may be provided.
Pod is embedded in files, typically Perl source files, although you can write a file that's nothing but Pod.
A line in a file consists of zero or more non-newline characters, terminated by either a newline or the end of the file.
A newline sequence is usually a platform-dependent concept, but Pod parsers should understand it to mean any of CR (ASCII 13), LF (ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in addition to any other system-specific meaning. The first CR/CRLF/LF sequence in the file may be used as the basis for identifying the newline sequence for parsing the rest of the file.
A blank line is a line consisting entirely of zero or more spaces (ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file. A non-blank line is a line containing one or more characters other than space or tab (and terminated by a newline or end-of-file).
(Note: Many older Pod parsers did not accept a line consisting of spaces/tabs and then a newline as a blank line. The only lines they considered blank were lines consisting of no characters at all, terminated by a newline.)
Whitespace is used in this document as a blanket term for spaces, tabs, and newline sequences. (By itself, this term usually refers to literal whitespace. That is, sequences of whitespace characters in Pod source, as opposed to "E<32>", which is a formatting code that denotes a whitespace character.)
A Pod parser is a module meant for parsing Pod (regardless of whether this involves calling callbacks or building a parse tree or directly formatting it). A Pod formatter (or Pod translator) is a module or program that converts Pod to some other format (HTML, plaintext, TeX, PostScript, RTF). A Pod processor might be a formatter or translator, or might be a program that does something else with the Pod (like counting words, scanning for index points, etc.).
Pod content is contained in Pod blocks. A Pod block starts with a
line that matches <m/\A=[a-zA-Z]/>, and continues up to the next line
that matches m/\A=cut/ or up to the end of the file if there is
no m/\A=cut/ line.
Within a Pod block, there are Pod paragraphs. A Pod paragraph consists of non-blank lines of text, separated by one or more blank lines.
For purposes of Pod processing, there are four types of paragraphs in a Pod block:
A command paragraph (also called a "directive"). The first line of
this paragraph must match m/\A=[a-zA-Z]/. Command paragraphs are
typically one line, as in:
- =head1 NOTES
- =item *
But they may span several (non-blank) lines:
- =for comment
- Hm, I wonder what it would look like if
- you tried to write a BNF for Pod from this.
- =head3 Dr. Strangelove, or: How I Learned to
- Stop Worrying and Love the Bomb
Some command paragraphs allow formatting codes in their content
(i.e., after the part that matches m/\A=[a-zA-Z]\S*\s*/), as in:
- =head1 Did You Remember to C<use strict;>?
In other words, the Pod processing handler for "head1" will apply the same processing to "Did You Remember to C<use strict;>?" that it would to an ordinary paragraph (i.e., formatting codes like "C<...>") are parsed and presumably formatted appropriately, and whitespace in the form of literal spaces and/or tabs is not significant.
A verbatim paragraph. The first line of this paragraph must be a literal space or tab, and this paragraph must not be inside a "=begin identifier", ... "=end identifier" sequence unless "identifier" begins with a colon (":"). That is, if a paragraph starts with a literal space or tab, but is inside a "=begin identifier", ... "=end identifier" region, then it's a data paragraph, unless "identifier" begins with a colon.
Whitespace is significant in verbatim paragraphs (although, in processing, tabs are probably expanded).
An ordinary paragraph. A paragraph is an ordinary paragraph
if its first line matches neither m/\A=[a-zA-Z]/ nor
m/\A[ \t]/
, and if it's not inside a "=begin identifier",
... "=end identifier" sequence unless "identifier" begins with
a colon (":").
A data paragraph. This is a paragraph that is inside a "=begin identifier" ... "=end identifier" sequence where "identifier" does not begin with a literal colon (":"). In some sense, a data paragraph is not part of Pod at all (i.e., effectively it's "out-of-band"), since it's not subject to most kinds of Pod parsing; but it is specified here, since Pod parsers need to be able to call an event for it, or store it in some form in a parse tree, or at least just parse around it.
For example: consider the following paragraphs:
- # <- that's the 0th column
- =head1 Foo
- Stuff
- $foo->bar
- =cut
Here, "=head1 Foo" and "=cut" are command paragraphs because the first
line of each matches m/\A=[a-zA-Z]/. "[space][space]$foo->bar"
is a verbatim paragraph, because its first line starts with a literal
whitespace character (and there's no "=begin"..."=end" region around).
The "=begin identifier" ... "=end identifier" commands stop paragraphs that they surround from being parsed as ordinary or verbatim paragraphs, if identifier doesn't begin with a colon. This is discussed in detail in the section About Data Paragraphs and =begin/=end Regions.
This section is intended to supplement and clarify the discussion in Command Paragraph in perlpod. These are the currently recognized Pod commands:
This command indicates that the text in the remainder of the paragraph is a heading. That text may contain formatting codes. Examples:
- =head1 Object Attributes
- =head3 What B<Not> to Do!
This command indicates that this paragraph begins a Pod block. (If we are already in the middle of a Pod block, this command has no effect at all.) If there is any text in this command paragraph after "=pod", it must be ignored. Examples:
- =pod
- This is a plain Pod paragraph.
- =pod This text is ignored.
This command indicates that this line is the end of this previously started Pod block. If there is any text after "=cut" on the line, it must be ignored. Examples:
- =cut
- =cut The documentation ends here.
- =cut
- # This is the first line of program text.
- sub foo { # This is the second.
It is an error to try to start a Pod block with a "=cut" command. In that case, the Pod processor must halt parsing of the input file, and must by default emit a warning.
This command indicates that this is the start of a list/indent region. If there is any text following the "=over", it must consist of only a nonzero positive numeral. The semantics of this numeral is explained in the About =over...=back Regions section, further below. Formatting codes are not expanded. Examples:
- =over 3
- =over 3.5
- =over
This command indicates that an item in a list begins here. Formatting codes are processed. The semantics of the (optional) text in the remainder of this paragraph are explained in the About =over...=back Regions section, further below. Examples:
- =item
- =item *
- =item *
- =item 14
- =item 3.
- =item C<< $thing->stuff(I<dodad>) >>
- =item For transporting us beyond seas to be tried for pretended
- offenses
- =item He is at this time transporting large armies of foreign
- mercenaries to complete the works of death, desolation and
- tyranny, already begun with circumstances of cruelty and perfidy
- scarcely paralleled in the most barbarous ages, and totally
- unworthy the head of a civilized nation.
This command indicates that this is the end of the region begun by the most recent "=over" command. It permits no text after the "=back" command.
This marks the following paragraphs (until the matching "=end formatname") as being for some special kind of processing. Unless "formatname" begins with a colon, the contained non-command paragraphs are data paragraphs. But if "formatname" does begin with a colon, then non-command paragraphs are ordinary paragraphs or data paragraphs. This is discussed in detail in the section About Data Paragraphs and =begin/=end Regions.
It is advised that formatnames match the regexp
m/\A:?[-a-zA-Z0-9_]+\z/. Everything following whitespace after the
formatname is a parameter that may be used by the formatter when dealing
with this region. This parameter must not be repeated in the "=end"
paragraph. Implementors should anticipate future expansion in the
semantics and syntax of the first parameter to "=begin"/"=end"/"=for".
This marks the end of the region opened by the matching "=begin formatname" region. If "formatname" is not the formatname of the most recent open "=begin formatname" region, then this is an error, and must generate an error message. This is discussed in detail in the section About Data Paragraphs and =begin/=end Regions.
This is synonymous with:
- =begin formatname
- text...
- =end formatname
That is, it creates a region consisting of a single paragraph; that paragraph is to be treated as a normal paragraph if "formatname" begins with a ":"; if "formatname" doesn't begin with a colon, then "text..." will constitute a data paragraph. There is no way to use "=for formatname text..." to express "text..." as a verbatim paragraph.
This command, which should occur early in the document (at least before any non-US-ASCII data!), declares that this document is encoded in the encoding encodingname, which must be an encoding name that Encode recognizes. (Encode's list of supported encodings, in Encode::Supported, is useful here.) If the Pod parser cannot decode the declared encoding, it should emit a warning and may abort parsing the document altogether.
A document having more than one "=encoding" line should be considered an error. Pod processors may silently tolerate this if the not-first "=encoding" lines are just duplicates of the first one (e.g., if there's a "=encoding utf8" line, and later on another "=encoding utf8" line). But Pod processors should complain if there are contradictory "=encoding" lines in the same document (e.g., if there is a "=encoding utf8" early in the document and "=encoding big5" later). Pod processors that recognize BOMs may also complain if they see an "=encoding" line that contradicts the BOM (e.g., if a document with a UTF-16LE BOM has an "=encoding shiftjis" line).
If a Pod processor sees any command other than the ones listed above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish", or "=w123"), that processor must by default treat this as an error. It must not process the paragraph beginning with that command, must by default warn of this as an error, and may abort the parse. A Pod parser may allow a way for particular applications to add to the above list of known commands, and to stipulate, for each additional command, whether formatting codes should be processed.
Future versions of this specification may add additional commands.
(Note that in previous drafts of this document and of perlpod, formatting codes were referred to as "interior sequences", and this term may still be found in the documentation for Pod parsers, and in error messages from Pod processors.)
There are two syntaxes for formatting codes:
A formatting code starts with a capital letter (just US-ASCII [A-Z]) followed by a "<", any number of characters, and ending with the first matching ">". Examples:
- That's what I<you> think!
- What's C<dump()> for?
- X<C<chmod> and C<unlink()> Under Different Operating Systems>
A formatting code starts with a capital letter (just US-ASCII [A-Z]) followed by two or more "<"'s, one or more whitespace characters, any number of characters, one or more whitespace characters, and ending with the first matching sequence of two or more ">"'s, where the number of ">"'s equals the number of "<"'s in the opening of this formatting code. Examples:
- That's what I<< you >> think!
- C<<< open(X, ">>thing.dat") || die $! >>>
- B<< $foo->bar(); >>
With this syntax, the whitespace character(s) after the "C<<<" and before the ">>" (or whatever letter) are not renderable. They do not signify whitespace, are merely part of the formatting codes themselves. That is, these are all synonymous:
- C<thing>
- C<< thing >>
- C<< thing >>
- C<<< thing >>>
- C<<<<
- thing
- >>>>
and so on.
Finally, the multiple-angle-bracket form does not alter the interpretation of nested formatting codes, meaning that the following four example lines are identical in meaning:
- B<example: C<$a E<lt>=E<gt> $b>>
- B<example: C<< $a <=> $b >>>
- B<example: C<< $a E<lt>=E<gt> $b >>>
- B<<< example: C<< $a E<lt>=E<gt> $b >> >>>
In parsing Pod, a notably tricky part is the correct parsing of
(potentially nested!) formatting codes. Implementors should
consult the code in the parse_text
routine in Pod::Parser as an
example of a correct implementation.
I<text>
-- italic text
See the brief discussion in Formatting Codes in perlpod.
B<text>
-- bold text
See the brief discussion in Formatting Codes in perlpod.
C<code>
-- code text
See the brief discussion in Formatting Codes in perlpod.
F<filename>
-- style for filenames
See the brief discussion in Formatting Codes in perlpod.
X<topic name>
-- an index entry
See the brief discussion in Formatting Codes in perlpod.
This code is unusual in that most formatters completely discard this code and its content. Other formatters will render it with invisible codes that can be used in building an index of the current document.
Z<>
-- a null (zero-effect) formatting code
Discussed briefly in Formatting Codes in perlpod.
This code is unusual is that it should have no content. That is,
a processor may complain if it sees Z<potatoes>
. Whether
or not it complains, the potatoes text should ignored.
L<name>
-- a hyperlink
The complicated syntaxes of this code are discussed at length in Formatting Codes in perlpod, and implementation details are discussed below, in About L<...> Codes. Parsing the contents of L<content> is tricky. Notably, the content has to be checked for whether it looks like a URL, or whether it has to be split on literal "|" and/or "/" (in the right order!), and so on, before E<...> codes are resolved.
E<escape>
-- a character escape
See Formatting Codes in perlpod, and several points in Notes on Implementing Pod Processors.
S<text>
-- text contains non-breaking spaces
This formatting code is syntactically simple, but semantically complex. What it means is that each space in the printable content of this code signifies a non-breaking space.
Consider:
- C<$x ? $y : $z>
- S<C<$x ? $y : $z>>
Both signify the monospace (c[ode] style) text consisting of "$x", one space, "?", one space, ":", one space, "$z". The difference is that in the latter, with the S code, those spaces are not "normal" spaces, but instead are non-breaking spaces.
If a Pod processor sees any formatting code other than the ones listed above (as in "N<...>", or "Q<...>", etc.), that processor must by default treat this as an error. A Pod parser may allow a way for particular applications to add to the above list of known formatting codes; a Pod parser might even allow a way to stipulate, for each additional command, whether it requires some form of special processing, as L<...> does.
Future versions of this specification may add additional formatting codes.
Historical note: A few older Pod processors would not see a ">" as closing a "C<" code, if the ">" was immediately preceded by a "-". This was so that this:
- C<$foo->bar>
would parse as equivalent to this:
- C<$foo-E<gt>bar>
instead of as equivalent to a "C" formatting code containing only "$foo-", and then a "bar>" outside the "C" formatting code. This problem has since been solved by the addition of syntaxes like this:
- C<< $foo->bar >>
Compliant parsers must not treat "->" as special.
Formatting codes absolutely cannot span paragraphs. If a code is opened in one paragraph, and no closing code is found by the end of that paragraph, the Pod parser must close that formatting code, and should complain (as in "Unterminated I code in the paragraph starting at line 123: 'Time objects are not...'"). So these two paragraphs:
- I<I told you not to do this!
- Don't make me say it again!>
...must not be parsed as two paragraphs in italics (with the I code starting in one paragraph and starting in another.) Instead, the first paragraph should generate a warning, but that aside, the above code must parse as if it were:
- I<I told you not to do this!>
- Don't make me say it again!E<gt>
(In SGMLish jargon, all Pod commands are like block-level elements, whereas all Pod formatting codes are like inline-level elements.)
The following is a long section of miscellaneous requirements and suggestions to do with Pod processing.
Pod formatters should tolerate lines in verbatim blocks that are of any length, even if that means having to break them (possibly several times, for very long lines) to avoid text running off the side of the page. Pod formatters may warn of such line-breaking. Such warnings are particularly appropriate for lines are over 100 characters long, which are usually not intentional.
Pod parsers must recognize all of the three well-known newline formats: CR, LF, and CRLF. See perlport.
Pod parsers should accept input lines that are of any length.
Since Perl recognizes a Unicode Byte Order Mark at the start of files as signaling that the file is Unicode encoded as in UTF-16 (whether big-endian or little-endian) or UTF-8, Pod parsers should do the same. Otherwise, the character encoding should be understood as being UTF-8 if the first highbit byte sequence in the file seems valid as a UTF-8 sequence, or otherwise as Latin-1.
Future versions of this specification may specify how Pod can accept other encodings. Presumably treatment of other encodings in Pod parsing would be as in XML parsing: whatever the encoding declared by a particular Pod file, content is to be stored in memory as Unicode characters.
The well known Unicode Byte Order Marks are as follows: if the file begins with the two literal byte values 0xFE 0xFF, this is the BOM for big-endian UTF-16. If the file begins with the two literal byte value 0xFF 0xFE, this is the BOM for little-endian UTF-16. If the file begins with the three literal byte values 0xEF 0xBB 0xBF, this is the BOM for UTF-8.
A naive but sufficient heuristic for testing the first highbit byte-sequence in a BOM-less file (whether in code or in Pod!), to see whether that sequence is valid as UTF-8 (RFC 2279) is to check whether that the first byte in the sequence is in the range 0xC0 - 0xFD and whether the next byte is in the range 0x80 - 0xBF. If so, the parser may conclude that this file is in UTF-8, and all highbit sequences in the file should be assumed to be UTF-8. Otherwise the parser should treat the file as being in Latin-1. In the unlikely circumstance that the first highbit sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one can cater to our heuristic (as well as any more intelligent heuristic) by prefacing that line with a comment line containing a highbit sequence that is clearly not valid as UTF-8. A line consisting of simply "#", an e-acute, and any non-highbit byte, is sufficient to establish this file's encoding.
This document's requirements and suggestions about encodings do not apply to Pod processors running on non-ASCII platforms, notably EBCDIC platforms.
Pod processors must treat a "=for [label] [content...]" paragraph as meaning the same thing as a "=begin [label]" paragraph, content, and an "=end [label]" paragraph. (The parser may conflate these two constructs, or may leave them distinct, in the expectation that the formatter will nevertheless treat them the same.)
When rendering Pod to a format that allows comments (i.e., to nearly any format other than plaintext), a Pod formatter must insert comment text identifying its name and version number, and the name and version numbers of any modules it might be using to process the Pod. Minimal examples:
- %% POD::Pod2PS v3.14159, using POD::Parser v1.92
- <!-- Pod::HTML v3.14159, using POD::Parser v1.92 -->
- {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08}
- .\" Pod::Man version 3.14159, using POD::Parser version 1.92
Formatters may also insert additional comments, including: the release date of the Pod formatter program, the contact address for the author(s) of the formatter, the current time, the name of input file, the formatting options in effect, version of Perl used, etc.
Formatters may also choose to note errors/warnings as comments,
besides or instead of emitting them otherwise (as in messages to
STDERR, or dieing).
Pod parsers may emit warnings or error messages ("Unknown E code
E<zslig>!") to STDERR (whether through printing to STDERR, or
warning/carp
ing, or dieing/croak
ing), but must allow
suppressing all such STDERR output, and instead allow an option for
reporting errors/warnings
in some other way, whether by triggering a callback, or noting errors
in some attribute of the document object, or some similarly unobtrusive
mechanism -- or even by appending a "Pod Errors" section to the end of
the parsed form of the document.
In cases of exceptionally aberrant documents, Pod parsers may abort the
parse. Even then, using dieing/croak
ing is to be avoided; where
possible, the parser library may simply close the input file
and add text like "*** Formatting Aborted ***" to the end of the
(partial) in-memory document.
In paragraphs where formatting codes (like E<...>, B<...>) are understood (i.e., not verbatim paragraphs, but including ordinary paragraphs, and command paragraphs that produce renderable text, like "=head1"), literal whitespace should generally be considered "insignificant", in that one literal space has the same meaning as any (nonzero) number of literal spaces, literal newlines, and literal tabs (as long as this produces no blank lines, since those would terminate the paragraph). Pod parsers should compact literal whitespace in each processed paragraph, but may provide an option for overriding this (since some processing tasks do not require it), or may follow additional special rules (for example, specially treating period-space-space or period-newline sequences).
Pod parsers should not, by default, try to coerce apostrophe (') and quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to turn backtick (`) into anything else but a single backtick character (distinct from an open quote character!), nor "--" into anything but two minus signs. They must never do any of those things to text in C<...> formatting codes, and never ever to text in verbatim paragraphs.
When rendering Pod to a format that has two kinds of hyphens (-), one that's a non-breaking hyphen, and another that's a breakable hyphen (as in "object-oriented", which can be split across lines as "object-", newline, "oriented"), formatters are encouraged to generally translate "-" to non-breaking hyphen, but may apply heuristics to convert some of these to breaking hyphens.
Pod formatters should make reasonable efforts to keep words of Perl code from being broken across lines. For example, "Foo::Bar" in some formatting systems is seen as eligible for being broken across lines as "Foo::" newline "Bar" or even "Foo::-" newline "Bar". This should be avoided where possible, either by disabling all line-breaking in mid-word, or by wrapping particular words with internal punctuation in "don't break this across lines" codes (which in some formats may not be a single code, but might be a matter of inserting non-breaking zero-width spaces between every pair of characters in a word.)
Pod parsers should, by default, expand tabs in verbatim paragraphs as they are processed, before passing them to the formatter or other processor. Parsers may also allow an option for overriding this.
Pod parsers should, by default, remove newlines from the end of ordinary and verbatim paragraphs before passing them to the formatter. For example, while the paragraph you're reading now could be considered, in Pod source, to end with (and contain) the newline(s) that end it, it should be processed as ending with (and containing) the period character that ends this sentence.
Pod parsers, when reporting errors, should make some effort to report an approximate line number ("Nested E<>'s in Paragraph #52, near line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph number ("Nested E<>'s in Paragraph #52 of Thing/Foo.pm!"). Where this is problematic, the paragraph number should at least be accompanied by an excerpt from the paragraph ("Nested E<>'s in Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for the C<interest rate> attribute...'").
Pod parsers, when processing a series of verbatim paragraphs one after another, should consider them to be one large verbatim paragraph that happens to contain blank lines. I.e., these two lines, which have a blank line between them:
should be unified into one paragraph ("\tuse Foo;\n\n\tprint Foo->VERSION") before being passed to the formatter or other processor. Parsers may also allow an option for overriding this.
While this might be too cumbersome to implement in event-based Pod parsers, it is straightforward for parsers that return parse trees.
Pod formatters, where feasible, are advised to avoid splitting short verbatim paragraphs (under twelve lines, say) across pages.
Pod parsers must treat a line with only spaces and/or tabs on it as a "blank line" such as separates paragraphs. (Some older parsers recognized only two adjacent newlines as a "blank line" but would not recognize a newline, a space, and a newline, as a blank line. This is noncompliant behavior.)
Authors of Pod formatters/processors should make every effort to avoid writing their own Pod parser. There are already several in CPAN, with a wide range of interface styles -- and one of them, Pod::Parser, comes with modern versions of Perl.
Characters in Pod documents may be conveyed either as literals, or by number in E<n> codes, or by an equivalent mnemonic, as in E<eacute> which is exactly equivalent to E<233>.
Characters in the range 32-126 refer to those well known US-ASCII characters (also defined there by Unicode, with the same meaning), which all Pod formatters must render faithfully. Characters in the ranges 0-31 and 127-159 should not be used (neither as literals, nor as E<number> codes), except for the literal byte-sequences for newline (13, 13 10, or 10), and tab (9).
Characters in the range 160-255 refer to Latin-1 characters (also defined there by Unicode, with the same meaning). Characters above 255 should be understood to refer to Unicode characters.
Be warned that some formatters cannot reliably render characters outside 32-126; and many are able to handle 32-126 and 160-255, but nothing above 255.
Besides the well-known "E<lt>" and "E<gt>" codes for less-than and greater-than, Pod parsers must understand "E<sol>" for "/" (solidus, slash), and "E<verbar>" for "|" (vertical bar, pipe). Pod parsers should also understand "E<lchevron>" and "E<rchevron>" as legacy codes for characters 171 and 187, i.e., "left-pointing double angle quotation mark" = "left pointing guillemet" and "right-pointing double angle quotation mark" = "right pointing guillemet". (These look like little "<<" and ">>", and they are now preferably expressed with the HTML/XHTML codes "E<laquo>" and "E<raquo>".)
Pod parsers should understand all "E<html>" codes as defined
in the entity declarations in the most recent XHTML specification at
www.W3.org
. Pod parsers must understand at least the entities
that define characters in the range 160-255 (Latin-1). Pod parsers,
when faced with some unknown "E<identifier>" code,
shouldn't simply replace it with nullstring (by default, at least),
but may pass it through as a string consisting of the literal characters
E, less-than, identifier, greater-than. Or Pod parsers may offer the
alternative option of processing such unknown
"E<identifier>" codes by firing an event especially
for such codes, or by adding a special node-type to the in-memory
document tree. Such "E<identifier>" may have special meaning
to some processors, or some processors may choose to add them to
a special error report.
Pod parsers must also support the XHTML codes "E<quot>" for character 34 (doublequote, "), "E<amp>" for character 38 (ampersand, &), and "E<apos>" for character 39 (apostrophe, ').
Note that in all cases of "E<whatever>", whatever (whether
an htmlname, or a number in any base) must consist only of
alphanumeric characters -- that is, whatever must watch
m/\A\w+\z/. So "E< 0 1 2 3 >" is invalid, because
it contains spaces, which aren't alphanumeric characters. This
presumably does not need special treatment by a Pod processor;
" 0 1 2 3 " doesn't look like a number in any base, so it would
presumably be looked up in the table of HTML-like names. Since
there isn't (and cannot be) an HTML-like entity called " 0 1 2 3 ",
this will be treated as an error. However, Pod processors may
treat "E< 0 1 2 3 >" or "E<e-acute>" as syntactically
invalid, potentially earning a different error message than the
error message (or warning, or event) generated by a merely unknown
(but theoretically valid) htmlname, as in "E<qacute>"
[sic]. However, Pod parsers are not required to make this
distinction.
Note that E<number> must not be interpreted as simply "codepoint number in the current/native character set". It always means only "the character represented by codepoint number in Unicode." (This is identical to the semantics of &#number; in XML.)
This will likely require many formatters to have tables mapping from treatable Unicode codepoints (such as the "\xE9" for the e-acute character) to the escape sequences or codes necessary for conveying such sequences in the target output format. A converter to *roff would, for example know that "\xE9" (whether conveyed literally, or via a E<...> sequence) is to be conveyed as "e\\*'". Similarly, a program rendering Pod in a Mac OS application window, would presumably need to know that "\xE9" maps to codepoint 142 in MacRoman encoding that (at time of writing) is native for Mac OS. Such Unicode2whatever mappings are presumably already widely available for common output formats. (Such mappings may be incomplete! Implementers are not expected to bend over backwards in an attempt to render Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any of the other weird things that Unicode can encode.) And if a Pod document uses a character not found in such a mapping, the formatter should consider it an unrenderable character.
If, surprisingly, the implementor of a Pod formatter can't find a satisfactory pre-existing table mapping from Unicode characters to escapes in the target format (e.g., a decent table of Unicode characters to *roff escapes), it will be necessary to build such a table. If you are in this circumstance, you should begin with the characters in the range 0x00A0 - 0x00FF, which is mostly the heavily used accented characters. Then proceed (as patience permits and fastidiousness compels) through the characters that the (X)HTML standards groups judged important enough to merit mnemonics for. These are declared in the (X)HTML specifications at the www.W3.org site. At time of writing (September 2001), the most recent entity declaration files are:
- http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
- http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
- http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
Then you can progress through any remaining notable Unicode characters in the range 0x2000-0x204D (consult the character tables at www.unicode.org), and whatever else strikes your fancy. For example, in xhtml-symbol.ent, there is the entry:
- <!ENTITY infin "∞"> <!-- infinity, U+221E ISOtech -->
While the mapping "infin" to the character "\x{221E}" will (hopefully) have been already handled by the Pod parser, the presence of the character in this file means that it's reasonably important enough to include in a formatter's table that maps from notable Unicode characters to the codes necessary for rendering them. So for a Unicode-to-*roff mapping, for example, this would merit the entry:
- "\x{221E}" => '\(in',
It is eagerly hoped that in the future, increasing numbers of formats
(and formatters) will support Unicode characters directly (as (X)HTML
does with ∞
, ∞, or ∞), reducing the need
for idiosyncratic mappings of Unicode-to-my_escapes.
It is up to individual Pod formatter to display good judgement when confronted with an unrenderable character (which is distinct from an unknown E<thing> sequence that the parser couldn't resolve to anything, renderable or not). It is good practice to map Latin letters with diacritics (like "E<eacute>"/"E<233>") to the corresponding unaccented US-ASCII letters (like a simple character 101, "e"), but clearly this is often not feasible, and an unrenderable character may be represented as "?", or the like. In attempting a sane fallback (as from E<233> to "e"), Pod formatters may use the %Latin1Code_to_fallback table in Pod::Escapes, or Text::Unidecode, if available.
For example, this Pod text:
- magic is enabled if you set C<$Currency> to 'E<euro>'.
may be rendered as:
"magic is enabled if you set $Currency
to '?'" or as
"magic is enabled if you set $Currency
to '[euro]'", or as
"magic is enabled if you set $Currency
to '[x20AC]', etc.
A Pod formatter may also note, in a comment or warning, a list of what unrenderable characters were encountered.
E<...> may freely appear in any formatting code (other than in another E<...> or in an Z<>). That is, "X<The E<euro>1,000,000 Solution>" is valid, as is "L<The E<euro>1,000,000 Solution|Million::Euros>".
Some Pod formatters output to formats that implement non-breaking spaces as an individual character (which I'll call "NBSP"), and others output to formats that implement non-breaking spaces just as spaces wrapped in a "don't break this across lines" code. Note that at the level of Pod, both sorts of codes can occur: Pod can contain a NBSP character (whether as a literal, or as a "E<160>" or "E<nbsp>" code); and Pod can contain "S<foo I<bar> baz>" codes, where "mere spaces" (character 32) in such codes are taken to represent non-breaking spaces. Pod parsers should consider supporting the optional parsing of "S<foo I<bar> baz>" as if it were "fooNBSPI<bar>NBSPbaz", and, going the other way, the optional parsing of groups of words joined by NBSP's as if each group were in a S<...> code, so that formatters may use the representation that maps best to what the output format demands.
Some processors may find that the S<...>
code is easiest to
implement by replacing each space in the parse tree under the content
of the S, with an NBSP. But note: the replacement should apply not to
spaces in all text, but only to spaces in printable text. (This
distinction may or may not be evident in the particular tree/event
model implemented by the Pod parser.) For example, consider this
unusual case:
- S<L</Autoloaded Functions>>
This means that the space in the middle of the visible link text must not be broken across lines. In other words, it's the same as this:
- L<"AutoloadedE<160>Functions"/Autoloaded Functions>
However, a misapplied space-to-NBSP replacement could (wrongly) produce something equivalent to this:
- L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions>
...which is almost definitely not going to work as a hyperlink (assuming this formatter outputs a format supporting hypertext).
Formatters may choose to just not support the S format code, especially in cases where the output format simply has no NBSP character/code and no code for "don't break this stuff across lines".
Besides the NBSP character discussed above, implementors are reminded
of the existence of the other "special" character in Latin-1, the
"soft hyphen" character, also known as "discretionary hyphen",
i.e. E<173>
= E<0xAD>
=
E<shy>
). This character expresses an optional hyphenation
point. That is, it normally renders as nothing, but may render as a
"-" if a formatter breaks the word at that point. Pod formatters
should, as appropriate, do one of the following: 1) render this with
a code with the same meaning (e.g., "\-" in RTF), 2) pass it through
in the expectation that the formatter understands this character as
such, or 3) delete it.
For example:
- sigE<shy>action
- manuE<shy>script
- JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi
These signal to a formatter that if it is to hyphenate "sigaction"
or "manuscript", then it should be done as
"sig-[linebreak]action" or "manu-[linebreak]script"
(and if it doesn't hyphenate it, then the E<shy>
doesn't
show up at all). And if it is
to hyphenate "Jarkko" and/or "Hietaniemi", it can do
so only at the points where there is a E<shy>
code.
In practice, it is anticipated that this character will not be used often, but formatters should either support it, or delete it.
If you think that you want to add a new command to Pod (like, say, a "=biblio" command), consider whether you could get the same effect with a for or begin/end sequence: "=for biblio ..." or "=begin biblio" ... "=end biblio". Pod processors that don't understand "=for biblio", etc, will simply ignore it, whereas they may complain loudly if they see "=biblio".
Throughout this document, "Pod" has been the preferred spelling for the name of the documentation format. One may also use "POD" or "pod". For the documentation that is (typically) in the Pod format, you may use "pod", or "Pod", or "POD". Understanding these distinctions is useful; but obsessing over how to spell them, usually is not.
As you can tell from a glance at perlpod, the L<...> code is the most complex of the Pod formatting codes. The points below will hopefully clarify what it means and how processors should deal with it.
In parsing an L<...> code, Pod parsers must distinguish at least four attributes:
The link-text. If there is none, this must be undef. (E.g., in "L<Perl Functions|perlfunc>", the link-text is "Perl Functions". In "L<Time::HiRes>" and even "L<|Time::HiRes>", there is no link text. Note that link text may contain formatting.)
The possibly inferred link-text; i.e., if there was no real link text, then this is the text that we'll infer in its place. (E.g., for "L<Getopt::Std>", the inferred link text is "Getopt::Std".)
The name or URL, or undef if none. (E.g., in "L<Perl Functions|perlfunc>", the name (also sometimes called the page) is "perlfunc". In "L</CAVEATS>", the name is undef.)
The section (AKA "item" in older perlpods), or undef if none. E.g., in "L<Getopt::Std/DESCRIPTION>", "DESCRIPTION" is the section. (Note that this is not the same as a manpage section like the "5" in "man 5 crontab". "Section Foo" in the Pod sense means the part of the text that's introduced by the heading or item whose text is "Foo".)
Pod parsers may also note additional attributes including:
A flag for whether item 3 (if present) is a URL (like "http://lists.perl.org" is), in which case there should be no section attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or possibly a man page name (like "crontab(5)" is).
The raw original L<...> content, before text is split on "|", "/", etc, and before E<...> codes are expanded.
(The above were numbered only for concise reference below. It is not a requirement that these be passed as an actual list or array.)
For example:
- L<Foo::Bar>
- => undef, # link text
- "Foo::Bar", # possibly inferred link text
- "Foo::Bar", # name
- undef, # section
- 'pod', # what sort of link
- "Foo::Bar" # original content
- L<Perlport's section on NL's|perlport/Newlines>
- => "Perlport's section on NL's", # link text
- "Perlport's section on NL's", # possibly inferred link text
- "perlport", # name
- "Newlines", # section
- 'pod', # what sort of link
- "Perlport's section on NL's|perlport/Newlines" # orig. content
- L<perlport/Newlines>
- => undef, # link text
- '"Newlines" in perlport', # possibly inferred link text
- "perlport", # name
- "Newlines", # section
- 'pod', # what sort of link
- "perlport/Newlines" # original content
- L<crontab(5)/"DESCRIPTION">
- => undef, # link text
- '"DESCRIPTION" in crontab(5)', # possibly inferred link text
- "crontab(5)", # name
- "DESCRIPTION", # section
- 'man', # what sort of link
- 'crontab(5)/"DESCRIPTION"' # original content
- L</Object Attributes>
- => undef, # link text
- '"Object Attributes"', # possibly inferred link text
- undef, # name
- "Object Attributes", # section
- 'pod', # what sort of link
- "/Object Attributes" # original content
- L<http://www.perl.org/>
- => undef, # link text
- "http://www.perl.org/", # possibly inferred link text
- "http://www.perl.org/", # name
- undef, # section
- 'url', # what sort of link
- "http://www.perl.org/" # original content
- L<Perl.org|http://www.perl.org/>
- => "Perl.org", # link text
- "http://www.perl.org/", # possibly inferred link text
- "http://www.perl.org/", # name
- undef, # section
- 'url', # what sort of link
- "Perl.org|http://www.perl.org/" # original content
Note that you can distinguish URL-links from anything else by the
fact that they match m/\A\w+:[^:\s]\S*\z/. So
L<http://www.perl.com> is a URL, but
L<HTTP::Response>
isn't.
In case of L<...> codes with no "text|" part in them,
older formatters have exhibited great variation in actually displaying
the link or cross reference. For example, L<crontab(5)> would render
as "the crontab(5)
manpage", or "in the crontab(5)
manpage"
or just "crontab(5)
".
Pod processors must now treat "text|"-less links as follows:
- L<name> => L<name|name>
- L</section> => L<"section"|/section>
- L<name/section> => L<"section" in name|name/section>
Note that section names might contain markup. I.e., if a section starts with:
- =head2 About the C<-M> Operator
or with:
- =item About the C<-M> Operator
then a link to it would look like this:
- L<somedoc/About the C<-M> Operator>
Formatters may choose to ignore the markup for purposes of resolving the link and use only the renderable characters in the section name, as in:
- <h1><a name="About_the_-M_Operator">About the <code>-M</code>
- Operator</h1>
- ...
- <a href="somedoc#About_the_-M_Operator">About the <code>-M</code>
- Operator" in somedoc</a>
Previous versions of perlpod distinguished L<name/"section">
links from L<name/item>
links (and their targets). These
have been merged syntactically and semantically in the current
specification, and section can refer either to a "=headn Heading
Content" command or to a "=item Item Content" command. This
specification does not specify what behavior should be in the case
of a given document having several things all seeming to produce the
same section identifier (e.g., in HTML, several things all producing
the same anchorname in <a name="anchorname">...</a>
elements). Where Pod processors can control this behavior, they should
use the first such anchor. That is, L<Foo/Bar>
refers to the
first "Bar" section in Foo.
But for some processors/formats this cannot be easily controlled; as with the HTML example, the behavior of multiple ambiguous <a name="anchorname">...</a> is most easily just left up to browsers to decide.
In a L<text|...> code, text may contain formatting codes
for formatting or for E<...> escapes, as in:
- L<B<ummE<234>stuff>|...>
For L<...>
codes without a "name|" part, only
E<...>
and Z<>
codes may occur. That is,
authors should not use "L<B<Foo::Bar>>
".
Note, however, that formatting codes and Z<>'s can occur in any and all parts of an L<...> (i.e., in name, section, text, and url).
Authors must not nest L<...> codes. For example, "L<The L<Foo::Bar> man page>" should be treated as an error.
Note that Pod authors may use formatting codes inside the "text" part of "L<text|name>" (and so on for L<text|/"sec">).
In other words, this is valid:
- Go read L<the docs on C<$.>|perlvar/"$.">
Some output formats that do allow rendering "L<...>" codes as hypertext, might not allow the link-text to be formatted; in that case, formatters will have to just ignore that formatting.
At time of writing, L<name>
values are of two types:
either the name of a Pod page like L<Foo::Bar>
(which
might be a real Perl module or program in an @INC / PATH
directory, or a .pod file in those places); or the name of a Unix
man page, like L<crontab(5)>
. In theory, L<chmod>
in ambiguous between a Pod page called "chmod", or the Unix man page
"chmod" (in whatever man-section). However, the presence of a string
in parens, as in "crontab(5)", is sufficient to signal that what
is being discussed is not a Pod page, and so is presumably a
Unix man page. The distinction is of no importance to many
Pod processors, but some processors that render to hypertext formats
may need to distinguish them in order to know how to render a
given L<foo>
code.
Previous versions of perlpod allowed for a L<section>
syntax (as in
L<Object Attributes>
), which was not easily distinguishable from
L<name>
syntax and for L<"section">
which was only
slightly less ambiguous. This syntax is no longer in the specification, and
has been replaced by the L</section>
syntax (where the slash was
formerly optional). Pod parsers should tolerate the L<"section">
syntax, for a while at least. The suggested heuristic for distinguishing
L<section>
from L<name>
is that if it contains any
whitespace, it's a section. Pod processors should warn about this being
deprecated syntax.
"=over"..."=back" regions are used for various kinds of list-like structures. (I use the term "region" here simply as a collective term for everything from the "=over" to the matching "=back".)
The non-zero numeric indentlevel in "=over indentlevel" ...
"=back" is used for giving the formatter a clue as to how many
"spaces" (ems, or roughly equivalent units) it should tab over,
although many formatters will have to convert this to an absolute
measurement that may not exactly match with the size of spaces (or M's)
in the document's base font. Other formatters may have to completely
ignore the number. The lack of any explicit indentlevel parameter is
equivalent to an indentlevel value of 4. Pod processors may
complain if indentlevel is present but is not a positive number
matching m/\A(\d*\.)?\d+\z/.
Authors of Pod formatters are reminded that "=over" ... "=back" may map to several different constructs in your output format. For example, in converting Pod to (X)HTML, it can map to any of <ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or <blockquote>...</blockquote>. Similarly, "=item" can map to <li> or <dt>.
Each "=over" ... "=back" region should be one of the following:
An "=over" ... "=back" region containing only "=item *" commands, each followed by some number of ordinary/verbatim paragraphs, other nested "=over" ... "=back" regions, "=for..." paragraphs, and "=begin"..."=end" regions.
(Pod processors must tolerate a bare "=item" as if it were "=item *".) Whether "*" is rendered as a literal asterisk, an "o", or as some kind of real bullet character, is left up to the Pod formatter, and may depend on the level of nesting.
An "=over" ... "=back" region containing only
m/\A=item\s+\d+\.?\s*\z/ paragraphs, each one (or each group of them)
followed by some number of ordinary/verbatim paragraphs, other nested
"=over" ... "=back" regions, "=for..." paragraphs, and/or
"=begin"..."=end" codes. Note that the numbers must start at 1
in each section, and must proceed in order and without skipping
numbers.
(Pod processors must tolerate lines like "=item 1" as if they were "=item 1.", with the period.)
An "=over" ... "=back" region containing only "=item [text]" commands, each one (or each group of them) followed by some number of ordinary/verbatim paragraphs, other nested "=over" ... "=back" regions, or "=for..." paragraphs, and "=begin"..."=end" regions.
The "=item [text]" paragraph should not match
m/\A=item\s+\d+\.?\s*\z/ or m/\A=item\s+\*\s*\z/, nor should it
match just m/\A=item\s*\z/.
An "=over" ... "=back" region containing no "=item" paragraphs at all, and containing only some number of ordinary/verbatim paragraphs, and possibly also some nested "=over" ... "=back" regions, "=for..." paragraphs, and "=begin"..."=end" regions. Such an itemless "=over" ... "=back" region in Pod is equivalent in meaning to a "<blockquote>...</blockquote>" element in HTML.
Note that with all the above cases, you can determine which type of "=over" ... "=back" you have, by examining the first (non-"=cut", non-"=pod") Pod paragraph after the "=over" command.
Pod formatters must tolerate arbitrarily large amounts of text in the "=item text..." paragraph. In practice, most such paragraphs are short, as in:
- =item For cutting off our trade with all parts of the world
But they may be arbitrarily long:
- =item For transporting us beyond seas to be tried for pretended
- offenses
- =item He is at this time transporting large armies of foreign
- mercenaries to complete the works of death, desolation and
- tyranny, already begun with circumstances of cruelty and perfidy
- scarcely paralleled in the most barbarous ages, and totally
- unworthy the head of a civilized nation.
Pod processors should tolerate "=item *" / "=item number" commands with no accompanying paragraph. The middle item is an example:
- =over
- =item 1
- Pick up dry cleaning.
- =item 2
- =item 3
- Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs.
- =back
No "=over" ... "=back" region can contain headings. Processors may treat such a heading as an error.
Note that an "=over" ... "=back" region should have some content. That is, authors should not have an empty region like this:
- =over
- =back
Pod processors seeing such a contentless "=over" ... "=back" region, may ignore it, or may report it as an error.
Processors must tolerate an "=over" list that goes off the end of the document (i.e., which has no matching "=back"), but they may warn about such a list.
Authors of Pod formatters should note that this construct:
- =item Neque
- =item Porro
- =item Quisquam Est
- Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
- velit, sed quia non numquam eius modi tempora incidunt ut
- labore et dolore magnam aliquam quaerat voluptatem.
- =item Ut Enim
is semantically ambiguous, in a way that makes formatting decisions a bit difficult. On the one hand, it could be mention of an item "Neque", mention of another item "Porro", and mention of another item "Quisquam Est", with just the last one requiring the explanatory paragraph "Qui dolorem ipsum quia dolor..."; and then an item "Ut Enim". In that case, you'd want to format it like so:
- Neque
- Porro
- Quisquam Est
- Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
- velit, sed quia non numquam eius modi tempora incidunt ut
- labore et dolore magnam aliquam quaerat voluptatem.
- Ut Enim
But it could equally well be a discussion of three (related or equivalent) items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph explaining them all, and then a new item "Ut Enim". In that case, you'd probably want to format it like so:
- Neque
- Porro
- Quisquam Est
- Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
- velit, sed quia non numquam eius modi tempora incidunt ut
- labore et dolore magnam aliquam quaerat voluptatem.
- Ut Enim
But (for the foreseeable future), Pod does not provide any way for Pod authors to distinguish which grouping is meant by the above "=item"-cluster structure. So formatters should format it like so:
- Neque
- Porro
- Quisquam Est
- Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
- velit, sed quia non numquam eius modi tempora incidunt ut
- labore et dolore magnam aliquam quaerat voluptatem.
- Ut Enim
That is, there should be (at least roughly) equal spacing between items as between paragraphs (although that spacing may well be less than the full height of a line of text). This leaves it to the reader to use (con)textual cues to figure out whether the "Qui dolorem ipsum..." paragraph applies to the "Quisquam Est" item or to all three items "Neque", "Porro", and "Quisquam Est". While not an ideal situation, this is preferable to providing formatting cues that may be actually contrary to the author's intent.
Data paragraphs are typically used for inlining non-Pod data that is to be used (typically passed through) when rendering the document to a specific format:
- =begin rtf
- \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
- =end rtf
The exact same effect could, incidentally, be achieved with a single "=for" paragraph:
- =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
(Although that is not formally a data paragraph, it has the same meaning as one, and Pod parsers may parse it as one.)
Another example of a data paragraph:
- =begin html
- I like <em>PIE</em>!
- <hr>Especially pecan pie!
- =end html
If these were ordinary paragraphs, the Pod parser would try to expand the "E</em>" (in the first paragraph) as a formatting code, just like "E<lt>" or "E<eacute>". But since this is in a "=begin identifier"..."=end identifier" region and the identifier "html" doesn't begin have a ":" prefix, the contents of this region are stored as data paragraphs, instead of being processed as ordinary paragraphs (or if they began with a spaces and/or tabs, as verbatim paragraphs).
As a further example: At time of writing, no "biblio" identifier is supported, but suppose some processor were written to recognize it as a way of (say) denoting a bibliographic reference (necessarily containing formatting codes in ordinary paragraphs). The fact that "biblio" paragraphs were meant for ordinary processing would be indicated by prefacing each "biblio" identifier with a colon:
- =begin :biblio
- Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
- Programs.> Prentice-Hall, Englewood Cliffs, NJ.
- =end :biblio
This would signal to the parser that paragraphs in this begin...end region are subject to normal handling as ordinary/verbatim paragraphs (while still tagged as meant only for processors that understand the "biblio" identifier). The same effect could be had with:
- =for :biblio
- Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
- Programs.> Prentice-Hall, Englewood Cliffs, NJ.
The ":" on these identifiers means simply "process this stuff normally, even though the result will be for some special target". I suggest that parser APIs report "biblio" as the target identifier, but also report that it had a ":" prefix. (And similarly, with the above "html", report "html" as the target identifier, and note the lack of a ":" prefix.)
Note that a "=begin identifier"..."=end identifier" region where identifier begins with a colon, can contain commands. For example:
- =begin :biblio
- Wirth's classic is available in several editions, including:
- =for comment
- hm, check abebooks.com for how much used copies cost.
- =over
- =item
- Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
- Teubner, Stuttgart. [Yes, it's in German.]
- =item
- Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
- Programs.> Prentice-Hall, Englewood Cliffs, NJ.
- =back
- =end :biblio
Note, however, a "=begin identifier"..."=end identifier" region where identifier does not begin with a colon, should not directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back", nor "=item". For example, this may be considered invalid:
- =begin somedata
- This is a data paragraph.
- =head1 Don't do this!
- This is a data paragraph too.
- =end somedata
A Pod processor may signal that the above (specifically the "=head1" paragraph) is an error. Note, however, that the following should not be treated as an error:
- =begin somedata
- This is a data paragraph.
- =cut
- # Yup, this isn't Pod anymore.
- sub excl { (rand() > .5) ? "hoo!" : "hah!" }
- =pod
- This is a data paragraph too.
- =end somedata
And this too is valid:
- =begin someformat
- This is a data paragraph.
- And this is a data paragraph.
- =begin someotherformat
- This is a data paragraph too.
- And this is a data paragraph too.
- =begin :yetanotherformat
- =head2 This is a command paragraph!
- This is an ordinary paragraph!
- And this is a verbatim paragraph!
- =end :yetanotherformat
- =end someotherformat
- Another data paragraph!
- =end someformat
The contents of the above "=begin :yetanotherformat" ... "=end :yetanotherformat" region aren't data paragraphs, because the immediately containing region's identifier (":yetanotherformat") begins with a colon. In practice, most regions that contain data paragraphs will contain only data paragraphs; however, the above nesting is syntactically valid as Pod, even if it is rare. However, the handlers for some formats, like "html", will accept only data paragraphs, not nested regions; and they may complain if they see (targeted for them) nested regions, or commands, other than "=end", "=pod", and "=cut".
Also consider this valid structure:
- =begin :biblio
- Wirth's classic is available in several editions, including:
- =over
- =item
- Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
- Teubner, Stuttgart. [Yes, it's in German.]
- =item
- Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
- Programs.> Prentice-Hall, Englewood Cliffs, NJ.
- =back
- Buy buy buy!
- =begin html
- <img src='wirth_spokesmodeling_book.png'>
- <hr>
- =end html
- Now now now!
- =end :biblio
There, the "=begin html"..."=end html" region is nested inside the larger "=begin :biblio"..."=end :biblio" region. Note that the content of the "=begin html"..."=end html" region is data paragraph(s), because the immediately containing region's identifier ("html") doesn't begin with a colon.
Pod parsers, when processing a series of data paragraphs one after another (within a single region), should consider them to be one large data paragraph that happens to contain blank lines. So the content of the above "=begin html"..."=end html" may be stored as two data paragraphs (one consisting of "<img src='wirth_spokesmodeling_book.png'>\n" and another consisting of "<hr>\n"), but should be stored as a single data paragraph (consisting of "<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n").
Pod processors should tolerate empty "=begin something"..."=end something" regions, empty "=begin :something"..."=end :something" regions, and contentless "=for something" and "=for :something" paragraphs. I.e., these should be tolerated:
- =for html
- =begin html
- =end html
- =begin :biblio
- =end :biblio
Incidentally, note that there's no easy way to express a data paragraph starting with something that looks like a command. Consider:
- =begin stuff
- =shazbot
- =end stuff
There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data paragraph "=shazbot\n". However, you can express a data paragraph consisting of "=shazbot\n" using this code:
- =for stuff =shazbot
The situation where this is necessary, is presumably quite rare.
Note that =end commands must match the currently open =begin command. That is, they must properly nest. For example, this is valid:
- =begin outer
- X
- =begin inner
- Y
- =end inner
- Z
- =end outer
while this is invalid:
- =begin outer
- X
- =begin inner
- Y
- =end outer
- Z
- =end inner
This latter is improper because when the "=end outer" command is seen, the currently open region has the formatname "inner", not "outer". (It just happens that "outer" is the format name of a higher-up region.) This is an error. Processors must by default report this as an error, and may halt processing the document containing that error. A corollary of this is that regions cannot "overlap". That is, the latter block above does not represent a region called "outer" which contains X and Y, overlapping a region called "inner" which contains Y and Z. But because it is invalid (as all apparently overlapping regions would be), it doesn't represent that, or anything at all.
Similarly, this is invalid:
- =begin thing
- =end hting
This is an error because the region is opened by "thing", and the "=end" tries to close "hting" [sic].
This is also invalid:
- =begin thing
- =end
This is invalid because every "=end" command must have a formatname parameter.
perlpod, PODs: Embedded Documentation in perlsyn, podchecker
Sean M. Burke
perlpodstyle - Perl POD style guide
These are general guidelines for how to write POD documentation for Perl scripts and modules, based on general guidelines for writing good UNIX man pages. All of these guidelines are, of course, optional, but following them will make your documentation more consistent with other documentation on the system.
The name of the program being documented is conventionally written in bold
(using B<>) wherever it occurs, as are all program options.
Arguments should be written in italics (I<>). Function names are
traditionally written in italics; if you write a function as function(),
Pod::Man will take care of this for you. Literal code or commands should
be in C<>. References to other man pages should be in the form
manpage(section)
or L<manpage(section)>
, and Pod::Man will
automatically format those appropriately. The second form, with
L<>, is used to request that a POD formatter make a link to the
man page if possible. As an exception, one normally omits the section
when referring to module documentation since it's not clear what section
module documentation will be in; use L<Module::Name>
for module
references instead.
References to other programs or functions are normally in the form of man page references so that cross-referencing tools can provide the user with links and the like. It's possible to overdo this, though, so be careful not to clutter your documentation with too much markup. References to other programs that are not given as man page references should be enclosed in B<>.
The major headers should be set out using a =head1
directive, and are
historically written in the rather startling ALL UPPER CASE format; this
is not mandatory, but it's strongly recommended so that sections have
consistent naming across different software packages. Minor headers may
be included using =head2
, and are typically in mixed case.
The standard sections of a manual page are:
Mandatory section; should be a comma-separated list of programs or functions documented by this POD page, such as:
- foo, bar - programs to do something
Manual page indexers are often extremely picky about the format of this
section, so don't put anything in it except this line. Every program or
function documented by this POD page should be listed, separated by a
comma and a space. For a Perl module, just give the module name. A
single dash, and only a single dash, should separate the list of programs
or functions from the description. Do not use any markup such as
C<> or B<> anywhere in this line. Functions should not be
qualified with ()
or the like. The description should ideally fit on a
single line, even if a man program replaces the dash with a few tabs.
A short usage summary for programs and functions. This section is mandatory for section 3 pages. For Perl module documentation, it's usually convenient to have the contents of this section be a verbatim block showing some (brief) examples of typical ways the module is used.
Extended description and discussion of the program or functions, or the
body of the documentation for man pages that document something else. If
particularly long, it's a good idea to break this up into subsections
=head2
directives like:
- =head2 Normal Usage
- =head2 Advanced Features
- =head2 Writing Configuration Files
or whatever is appropriate for your documentation.
For a module, this is generally where the documentation of the interfaces
provided by the module goes, usually in the form of a list with an
=item
for each interface. Depending on how many interfaces there are,
you may want to put that documentation in separate METHODS, FUNCTIONS,
CLASS METHODS, or INSTANCE METHODS sections instead and save the
DESCRIPTION section for an overview.
Detailed description of each of the command-line options taken by the
program. This should be separate from the description for the use of
parsers like Pod::Usage. This is normally presented as a list, with
each option as a separate =item
. The specific option string should be
enclosed in B<>. Any values that the option takes should be
enclosed in I<>. For example, the section for the option
--section=manext would be introduced with:
- =item B<--section>=I<manext>
Synonymous options (like both the short and long forms) are separated by a
comma and a space on the same =item
line, or optionally listed as their
own item with a reference to the canonical name. For example, since
--section can also be written as -s, the above would be:
- =item B<-s> I<manext>, B<--section>=I<manext>
Writing the short option first is recommended because it's easier to read. The long option is long enough to draw the eye to it anyway and the short option can otherwise get lost in visual noise.
What the program or function returns, if successful. This section can be omitted for programs whose precise exit codes aren't important, provided they return 0 on success and non-zero on failure as is standard. It should always be present for functions. For modules, it may be useful to summarize return values from the module interface here, or it may be more useful to discuss return values separately in the documentation of each function or method the module provides.
Exceptions, error return codes, exit statuses, and errno settings.
Typically used for function or module documentation; program documentation
uses DIAGNOSTICS instead. The general rule of thumb is that errors
printed to STDOUT
or STDERR
and intended for the end user are
documented in DIAGNOSTICS while errors passed internal to the calling
program and intended for other programmers are documented in ERRORS. When
documenting a function that sets errno, a full list of the possible errno
values should be given here.
All possible messages the program can print out and what they mean. You may wish to follow the same documentation style as the Perl documentation; see perldiag(1) for more details (and look at the POD source as well).
If applicable, please include details on what the user should do to correct the error; documenting an error as indicating "the input buffer is too small" without telling the user how to increase the size of the input buffer (or at least telling them that it isn't possible) aren't very useful.
Give some example uses of the program or function. Don't skimp; users often find this the most useful part of the documentation. The examples are generally given as verbatim paragraphs.
Don't just present an example without explaining what it does. Adding a short paragraph saying what the example will do can increase the value of the example immensely.
Environment variables that the program cares about, normally presented as
a list using =over
, =item
, and =back
. For example:
- =over 6
- =item HOME
- Used to determine the user's home directory. F<.foorc> in this
- directory is read for configuration details, if it exists.
- =back
Since environment variables are normally in all uppercase, no additional special formatting is generally needed; they're glaring enough as it is.
All files used by the program or function, normally presented as a list, and what it uses them for. File names should be enclosed in F<>. It's particularly important to document files that will be potentially modified.
Things to take special care with, sometimes called WARNINGS.
Things that are broken or just don't work quite right.
Bugs you don't plan to fix. :-)
Miscellaneous commentary.
Who wrote it (use AUTHORS for multiple people). It's a good idea to include your current e-mail address (or some e-mail address to which bug reports should be sent) or some other contact information so that users have a way of contacting you. Remember that program documentation tends to roam the wild for far longer than you expect and pick a contact method that's likely to last.
Programs derived from other sources sometimes have this. Some people keep a modification log here, but that usually gets long and is normally better maintained in a separate file.
For copyright
- Copyright YEAR(s) YOUR NAME(s)
(No, (C) is not needed. No, "all rights reserved" is not needed.)
For licensing the easiest way is to use the same licensing as Perl itself:
- This library is free software; you may redistribute it and/or modify
- it under the same terms as Perl itself.
This makes it easy for people to use your module with Perl. Note that this licensing example is neither an endorsement or a requirement, you are of course free to choose any licensing.
Other man pages to check out, like man(1), man(7), makewhatis(8), or
catman(8). Normally a simple list of man pages separated by commas, or a
paragraph giving the name of a reference work. Man page references, if
they use the standard name(section)
form, don't have to be enclosed in
L<> (although it's recommended), but other things in this section
probably should be when appropriate.
If the package has a mailing list, include a URL or subscription instructions here.
If the package has a web site, include a URL here.
Documentation of object-oriented libraries or modules may want to use CONSTRUCTORS and METHODS sections, or CLASS METHODS and INSTANCE METHODS sections, for detailed documentation of the parts of the library and save the DESCRIPTION section for an overview. Large modules with a function interface may want to use FUNCTIONS for similar reasons. Some people use OVERVIEW to summarize the description if it's quite long.
Section ordering varies, although NAME must always be the first section (you'll break some man page systems otherwise), and NAME, SYNOPSIS, DESCRIPTION, and OPTIONS generally always occur first and in that order if present. In general, SEE ALSO, AUTHOR, and similar material should be left for last. Some systems also move WARNINGS and NOTES to last. The order given above should be reasonable for most purposes.
Some systems use CONFORMING TO to note conformance to relevant standards and MT-LEVEL to note safeness for use in threaded programs or signal handlers. These headings are primarily useful when documenting parts of a C library.
Finally, as a general note, try not to use an excessive amount of markup. As documented here and in Pod::Man, you can safely leave Perl variables, function names, man page references, and the like unadorned by markup and the POD translators will figure it out for you. This makes it much easier to later edit the documentation. Note that many existing translators will do the wrong thing with e-mail addresses when wrapped in L<>, so don't do that.
For additional information that may be more accurate for your specific system, see either man(5) or man(7) depending on your system manual section numbering conventions.
This documentation is maintained as part of the podlators distribution. The current version is always available from its web site at <http://www.eyrie.org/~eagle/software/podlators/>.
Russ Allbery <rra@stanford.edu>, with large portions of this documentation taken from the documentation of the original pod2man implementation by Larry Wall and Tom Christiansen.
Copyright 1999, 2000, 2001, 2004, 2006, 2008, 2010 Russ Allbery <rra@stanford.edu>.
This documentation is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
perlpolicy - Various and sundry policies and commitments related to the Perl core
This document is the master document which records all written policies about how the Perl 5 Porters collectively develop and maintain the Perl core.
Subscribers to perl5-porters (the porters themselves) come in several flavours. Some are quiet curious lurkers, who rarely pitch in and instead watch the ongoing development to ensure they're forewarned of new changes or features in Perl. Some are representatives of vendors, who are there to make sure that Perl continues to compile and work on their platforms. Some patch any reported bug that they know how to fix, some are actively patching their pet area (threads, Win32, the regexp -engine), while others seem to do nothing but complain. In other words, it's your usual mix of technical people.
Over this group of porters presides Larry Wall. He has the final word in what does and does not change in any of the Perl programming languages. These days, Larry spends most of his time on Perl 6, while Perl 5 is shepherded by a "pumpking", a porter responsible for deciding what goes into each release and ensuring that releases happen on a regular basis.
Larry sees Perl development along the lines of the US government: there's the Legislature (the porters), the Executive branch (the -pumpking), and the Supreme Court (Larry). The legislature can discuss and submit patches to the executive branch all they like, but the executive branch is free to veto them. Rarely, the Supreme Court will side with the executive branch over the legislature, or the legislature over the executive branch. Mostly, however, the legislature and the executive branch are supposed to get along and work out their differences without impeachment or court cases.
You might sometimes see reference to Rule 1 and Rule 2. Larry's power as Supreme Court is expressed in The Rules:
Larry is always by definition right about how Perl should behave. This means he has final veto power on the core functionality.
Larry is allowed to change his mind about any matter at a later date, regardless of whether he previously invoked Rule 1.
Got that? Larry is always right, even when he was wrong. It's rare to see either Rule exercised, but they are often alluded to.
Perl 5 is developed by a community, not a corporate entity. Every change contributed to the Perl core is the result of a donation. Typically, these donations are contributions of code or time by individual members of our community. On occasion, these donations come in the form of corporate or organizational sponsorship of a particular individual or project.
As a volunteer organization, the commitments we make are heavily dependent on the goodwill and hard work of individuals who have no obligation to contribute to Perl.
That being said, we value Perl's stability and security and have long had an unwritten covenant with the broader Perl community to support and maintain releases of Perl.
This document codifies the support and maintenance commitments that the Perl community should expect from Perl's developers:
We "officially" support the two most recent stable release series. 5.12.x and earlier are now out of support. As of the release of 5.18.0, we will "officially" end support for Perl 5.14.x, other than providing security updates as described below.
To the best of our ability, we will attempt to fix critical issues in the two most recent stable 5.x release series. Fixes for the current release series take precedence over fixes for the previous release series.
To the best of our ability, we will provide "critical" security patches / releases for any major version of Perl whose 5.x.0 release was within the past three years. We can only commit to providing these for the most recent .y release in any 5.x.y series.
We will not provide security updates or bug fixes for development releases of Perl.
We encourage vendors to ship the most recent supported release of Perl at the time of their code freeze.
As a vendor, you may have a requirement to backport security fixes beyond our 3 year support commitment. We can provide limited support and advice to you as you do so and, where possible will try to apply those patches to the relevant -maint branches in git, though we may or may not choose to make numbered releases or "official" patches available. Contact us at <perl5-security-report@perl.org> to begin that process.
Our community has a long-held belief that backward-compatibility is a virtue, even when the functionality in question is a design flaw.
We would all love to unmake some mistakes we've made over the past decades. Living with every design error we've ever made can lead to painful stagnation. Unwinding our mistakes is very, very difficult. Doing so without actively harming our users is nearly impossible.
Lately, ignoring or actively opposing compatibility with earlier versions of Perl has come into vogue. Sometimes, a change is proposed which wants to usurp syntax which previously had another meaning. Sometimes, a change wants to improve previously-crazy semantics.
Down this road lies madness.
Requiring end-user programmers to change just a few language constructs, even language constructs which no well-educated developer would ever intentionally use is tantamount to saying "you should not upgrade to a new release of Perl unless you have 100% test coverage and can do a full manual audit of your codebase." If we were to have tools capable of reliably upgrading Perl source code from one version of Perl to another, this concern could be significantly mitigated.
We want to ensure that Perl continues to grow and flourish in the coming years and decades, but not at the expense of our user community.
Existing syntax and semantics should only be marked for destruction in very limited circumstances. If a given language feature's continued inclusion in the language will cause significant harm to the language or prevent us from making needed changes to the runtime, then it may be considered for deprecation.
Any language change which breaks backward-compatibility should be able to be enabled or disabled lexically. Unless code at a given scope declares that it wants the new behavior, that new behavior should be disabled. Which backward-incompatible changes are controlled implicitly by a 'use v5.x.y' is a decision which should be made by the pumpking in consultation with the community.
When a backward-incompatible change can't be toggled lexically, the decision to change the language must be considered very, very carefully. If it's possible to move the old syntax or semantics out of the core language and into XS-land, that XS module should be enabled by default unless the user declares that they want a newer revision of Perl.
Historically, we've held ourselves to a far higher standard than backward-compatibility -- bugward-compatibility. Any accident of implementation or unintentional side-effect of running some bit of code has been considered to be a feature of the language to be defended with the same zeal as any other feature or functionality. No matter how frustrating these unintentional features may be to us as we continue to improve Perl, these unintentional features often deserve our protection. It is very important that existing software written in Perl continue to work correctly. If end-user developers have adopted a bug as a feature, we need to treat it as such.
New syntax and semantics which don't break existing language constructs and syntax have a much lower bar. They merely need to prove themselves to be useful, elegant, well designed, and well tested.
To make sure we're talking about the same thing when we discuss the removal of features or functionality from the Perl core, we have specific definitions for a few words and phrases.
If something in the Perl core is marked as experimental, we may change its behaviour, deprecate or remove it without notice. While we'll always do our best to smooth the transition path for users of experimental features, you should contact the perl5-porters mailinglist if you find an experimental feature useful and want to help shape its future.
If something in the Perl core is marked as deprecated, we may remove it from the core in the next stable release series, though we may not. As of Perl 5.12, deprecated features and modules warn the user as they're used. When a module is deprecated, it will also be made available on CPAN. Installing it from CPAN will silence deprecation warnings for that module.
If you use a deprecated feature or module and believe that its removal from the Perl core would be a mistake, please contact the perl5-porters mailinglist and plead your case. We don't deprecate things without a good reason, but sometimes there's a counterargument we haven't considered. Historically, we did not distinguish between "deprecated" and "discouraged" features.
From time to time, we may mark language constructs and features which we consider to have been mistakes as discouraged. Discouraged features aren't candidates for removal in the next major release series, but we may later deprecate them if they're found to stand in the way of a significant improvement to the Perl core.
Once a feature, construct or module has been marked as deprecated for a stable release cycle, we may remove it from the Perl core. Unsurprisingly, we say we've removed these things. When a module is removed, it will no longer ship with Perl, but will continue to be available on CPAN.
New releases of maint should contain as few changes as possible. If there is any question about whether a given patch might merit inclusion in a maint release, then it almost certainly should not be included.
Portability fixes, such as changes to Configure and the files in hints/ are acceptable. Ports of Perl to a new platform, architecture or OS release that involve changes to the implementation are NOT acceptable.
Acceptable documentation updates are those that correct factual errors, explain significant bugs or deficiencies in the current implementation, or fix broken markup.
Patches that add new warnings or errors or deprecate features are not acceptable.
Patches that fix crashing bugs that do not otherwise change Perl's functionality or negatively impact performance are acceptable.
Patches that fix CVEs or security issues are acceptable, but should be run through the perl5-security-report@perl.org mailing list rather than applied directly.
Patches that fix regressions in perl's behavior relative to previous releases are acceptable.
Updates to dual-life modules should consist of minimal patches to fix crashing or security issues (as above).
Minimal patches that fix platform-specific test failures or installation issues are acceptable. When these changes are made to dual-life modules for which CPAN is canonical, any changes should be coordinated with the upstream author.
New versions of dual-life modules should NOT be imported into maint. Those belong in the next stable series.
Patches that add or remove features are not acceptable.
Patches that break binary compatibility are not acceptable. (Please talk to a pumpking.)
Historically, only the pumpking cherry-picked changes from bleadperl into maintperl. This has scaling problems. At the same time, maintenance branches of stable versions of Perl need to be treated with great care. To that end, as of Perl 5.12, we have a new process for maint branches.
Any committer may cherry-pick any commit from blead to a maint branch if they send mail to perl5-porters announcing their intent to cherry-pick a specific commit along with a rationale for doing so and at least two other committers respond to the list giving their assent. (This policy applies to current and former pumpkings, as well as other committers.)
What follows is a statement about artistic control, defined as the ability of authors of packages to guide the future of their code and maintain control over their work. It is a recognition that authors should have control over their work, and that it is a responsibility of the rest of the Perl community to ensure that they retain this control. It is an attempt to document the standards to which we, as Perl developers, intend to hold ourselves. It is an attempt to write down rough guidelines about the respect we owe each other as Perl developers.
This statement is not a legal contract. This statement is not a legal document in any way, shape, or form. Perl is distributed under the GNU Public License and under the Artistic License; those are the precise legal terms. This statement isn't about the law or licenses. It's about community, mutual respect, trust, and good-faith cooperation.
We recognize that the Perl core, defined as the software distributed with the heart of Perl itself, is a joint project on the part of all of us. From time to time, a script, module, or set of modules (hereafter referred to simply as a "module") will prove so widely useful and/or so integral to the correct functioning of Perl itself that it should be distributed with the Perl core. This should never be done without the author's explicit consent, and a clear recognition on all parts that this means the module is being distributed under the same terms as Perl itself. A module author should realize that inclusion of a module into the Perl core will necessarily mean some loss of control over it, since changes may occasionally have to be made on short notice or for consistency with the rest of Perl.
Once a module has been included in the Perl core, however, everyone involved in maintaining Perl should be aware that the module is still the property of the original author unless the original author explicitly gives up their ownership of it. In particular:
The version of the module in the Perl core should still be considered the work of the original author. All patches, bug reports, and so forth should be fed back to them. Their development directions should be respected whenever possible.
Patches may be applied by the pumpkin holder without the explicit cooperation of the module author if and only if they are very minor, time-critical in some fashion (such as urgent security fixes), or if the module author cannot be reached. Those patches must still be given back to the author when possible, and if the author decides on an alternate fix in their version, that fix should be strongly preferred unless there is a serious problem with it. Any changes not endorsed by the author should be marked as such, and the contributor of the change acknowledged.
The version of the module distributed with Perl should, whenever possible, be the latest version of the module as distributed by the author (the latest non-beta version in the case of public Perl releases), although the pumpkin holder may hold off on upgrading the version of the module distributed with Perl to the latest version until the latest version has had sufficient testing.
In other words, the author of a module should be considered to have final say on modifications to their module whenever possible (bearing in mind that it's expected that everyone involved will work together and arrive at reasonable compromises when there are disagreements).
As a last resort, however:
If the author's vision of the future of their module is sufficiently different from the vision of the pumpkin holder and perl5-porters as a whole so as to cause serious problems for Perl, the pumpkin holder may choose to formally fork the version of the module in the Perl core from the one maintained by the author. This should not be done lightly and should always if at all possible be done only after direct input from Larry. If this is done, it must then be made explicit in the module as distributed with the Perl core that it is a forked version and that while it is based on the original author's work, it is no longer maintained by them. This must be noted in both the documentation and in the comments in the source of the module.
Again, this should be a last resort only. Ideally, this should never happen, and every possible effort at cooperation and compromise should be made before doing this. If it does prove necessary to fork a module for the overall health of Perl, proper credit must be given to the original author in perpetuity and the decision should be constantly re-evaluated to see if a remerging of the two branches is possible down the road.
In all dealings with contributed modules, everyone maintaining Perl should keep in mind that the code belongs to the original author, that they may not be on perl5-porters at any given time, and that a patch is not official unless it has been integrated into the author's copy of the module. To aid with this, and with points #1, #2, and #3 above, contact information for the authors of all contributed modules should be kept with the Perl distribution.
Finally, the Perl community as a whole recognizes that respect for ownership of code, respect for artistic control, proper credit, and active effort to prevent unintentional code skew or communication gaps is vital to the health of the community and Perl itself. Members of a community should not normally have to resort to rules and laws to deal with each other, and this document, although it contains rules so as to be clear, is about an attitude and general approach. The first step in any dispute should be open communication, respect for opposing views, and an attempt at a compromise. In nearly every circumstance nothing more will be necessary, and certainly no more drastic measure should be used until every avenue of communication and discussion has failed.
Perl's documentation is an important resource for our users. It's incredibly important for Perl's documentation to be reasonably coherent and to accurately reflect the current implementation.
Just as P5P collectively maintains the codebase, we collectively maintain the documentation. Writing a particular bit of documentation doesn't give an author control of the future of that documentation. At the same time, just as source code changes should match the style of their surrounding blocks, so should documentation changes.
Examples in documentation should be illustrative of the concept
they're explaining. Sometimes, the best way to show how a
language feature works is with a small program the reader can
run without modification. More often, examples will consist
of a snippet of code containing only the "important" bits.
The definition of "important" varies from snippet to snippet.
Sometimes it's important to declare use strict
and use warnings
,
initialize all variables and fully catch every error condition.
More often than not, though, those things obscure the lesson
the example was intended to teach.
As Perl is developed by a global team of volunteers, our documentation often contains spellings which look funny to somebody. Choice of American/British/Other spellings is left as an exercise for the author of each bit of documentation. When patching documentation, try to emulate the documentation around you, rather than changing the existing prose.
In general, documentation should describe what Perl does "now" rather than what it used to do. It's perfectly reasonable to include notes in documentation about how behaviour has changed from previous releases, but, with very few exceptions, documentation isn't "dual-life" -- it doesn't need to fully describe how all old versions used to work.
"Social Contract about Contributed Modules" originally by Russ Allbery <rra@stanford.edu> and the perl5-porters.
perlport - Writing portable Perl
Perl runs on numerous operating systems. While most of them share much in common, they also have their own unique features.
This document is meant to help you to find out what constitutes portable Perl code. That way once you make a decision to write portably, you know where the lines are drawn, and you can stay within them.
There is a tradeoff between taking full advantage of one particular type of computer and taking advantage of a full range of them. Naturally, as you broaden your range and become more diverse, the common factors drop, and you are left with an increasingly smaller area of common ground in which you can operate to accomplish a particular task. Thus, when you begin attacking a problem, it is important to consider under which part of the tradeoff curve you want to operate. Specifically, you must decide whether it is important that the task that you are coding have the full generality of being portable, or whether to just get the job done right now. This is the hardest choice to be made. The rest is easy, because Perl provides many choices, whichever way you want to approach your problem.
Looking at it another way, writing portable code is usually about willfully limiting your available choices. Naturally, it takes discipline and sacrifice to do that. The product of portability and convenience may be a constant. You have been warned.
Be aware of two important points:
There is no reason you should not use Perl as a language to glue Unix tools together, or to prototype a Macintosh application, or to manage the Windows registry. If it makes no sense to aim for portability for one reason or another in a given program, then don't bother.
Don't be fooled into thinking that it is hard to create portable Perl code. It isn't. Perl tries its level-best to bridge the gaps between what's available on different platforms, and all the means available to use those features. Thus almost all Perl code runs on any machine without modification. But there are some significant issues in writing portable code, and this document is entirely about those issues.
Here's the general rule: When you approach a task commonly done using a whole range of platforms, think about writing portable code. That way, you don't sacrifice much by way of the implementation choices you can avail yourself of, and at the same time you can give your users lots of platform choices. On the other hand, when you have to take advantage of some unique feature of a particular platform, as is often the case with systems programming (whether for Unix, Windows, VMS, etc.), consider writing platform-specific code.
When the code will run on only two or three operating systems, you may need to consider only the differences of those particular systems. The important thing is to decide where the code will run and to be deliberate in your decision.
The material below is separated into three main sections: main issues of portability (ISSUES), platform-specific issues (PLATFORMS), and built-in perl functions that behave differently on various ports (FUNCTION IMPLEMENTATIONS).
This information should not be considered complete; it includes possibly
transient information about idiosyncrasies of some of the ports, almost
all of which are in a state of constant evolution. Thus, this material
should be considered a perpetual work in progress
(<IMG SRC="yellow_sign.gif" ALT="Under Construction">
).
In most operating systems, lines in files are terminated by newlines.
Just what is used as a newline may vary from OS to OS. Unix
traditionally uses \012
, one type of DOSish I/O uses \015\012
,
and Mac OS uses \015
.
Perl uses \n
to represent the "logical" newline, where what is
logical may depend on the platform in use. In MacPerl, \n
always
means \015
. In DOSish perls, \n
usually means \012
, but when
accessing a file in "text" mode, perl uses the :crlf
layer that
translates it to (or from) \015\012
, depending on whether you're
reading or writing. Unix does the same thing on ttys in canonical
mode. \015\012
is commonly referred to as CRLF.
To trim trailing newlines from text lines use chomp(). With default
settings that function looks for a trailing \n
character and thus
trims in a portable way.
When dealing with binary files (or text files in binary mode) be sure to explicitly set $/ to the appropriate value for your file format before using chomp().
Because of the "text" mode translation, DOSish perls have limitations
in using seek and tell on a file accessed in "text" mode.
Stick to seek-ing to locations you got from tell (and no
others), and you are usually free to use seek and tell even
in "text" mode. Using seek or tell or other file operations
may be non-portable. If you use binmode on a file, however, you
can usually seek and tell with arbitrary values in safety.
A common misconception in socket programming is that \n
eq \012
everywhere. When using protocols such as common Internet protocols,
\012
and \015
are called for specifically, and the values of
the logical \n
and \r
(carriage return) are not reliable.
However, using \015\012
(or \cM\cJ
, or \x0D\x0A
) can be tedious
and unsightly, as well as confusing to those maintaining the code. As
such, the Socket module supplies the Right Thing for those who want it.
When reading from a socket, remember that the default input record
separator $/
is \n
, but robust socket code will recognize as
either \012
or \015\012
as end of line:
- while (<SOCKET>) {
- # ...
- }
Because both CRLF and LF end in LF, the input record separator can be set to LF and any CR stripped later. Better to write:
This example is preferred over the previous one--even for Unix
platforms--because now any \015
's (\cM
's) are stripped out
(and there was much rejoicing).
Similarly, functions that return text data--such as a function that fetches a web page--should sometimes translate newlines before returning the data, if they've not yet been translated to the local newline representation. A single line of code will often suffice:
- $data =~ s/\015?\012/\n/g;
- return $data;
Some of this may be confusing. Here's a handy reference to the ASCII CR and LF characters. You can print it out and stick it in your wallet.
- LF eq \012 eq \x0A eq \cJ eq chr(10) eq ASCII 10
- CR eq \015 eq \x0D eq \cM eq chr(13) eq ASCII 13
- | Unix | DOS | Mac |
- ---------------------------
- \n | LF | LF | CR |
- \r | CR | CR | LF |
- \n * | LF | CRLF | CR |
- \r * | CR | CR | LF |
- ---------------------------
- * text-mode STDIO
The Unix column assumes that you are not accessing a serial line (like a tty) in canonical mode. If you are, then CR on input becomes "\n", and "\n" on output becomes CRLF.
These are just the most common definitions of \n
and \r
in Perl.
There may well be others. For example, on an EBCDIC implementation
such as z/OS (OS/390) or OS/400 (using the ILE, the PASE is ASCII-based)
the above material is similar to "Unix" but the code numbers change:
- LF eq \025 eq \x15 eq \cU eq chr(21) eq CP-1047 21
- LF eq \045 eq \x25 eq chr(37) eq CP-0037 37
- CR eq \015 eq \x0D eq \cM eq chr(13) eq CP-1047 13
- CR eq \015 eq \x0D eq \cM eq chr(13) eq CP-0037 13
- | z/OS | OS/400 |
- ----------------------
- \n | LF | LF |
- \r | CR | CR |
- \n * | LF | LF |
- \r * | CR | CR |
- ----------------------
- * text-mode STDIO
Different CPUs store integers and floating point numbers in different orders (called endianness) and widths (32-bit and 64-bit being the most common today). This affects your programs when they attempt to transfer numbers in binary format from one CPU architecture to another, usually either "live" via network connection, or by storing the numbers to secondary storage such as a disk file or tape.
Conflicting storage orders make utter mess out of the numbers. If a
little-endian host (Intel, VAX) stores 0x12345678 (305419896 in
decimal), a big-endian host (Motorola, Sparc, PA) reads it as
0x78563412 (2018915346 in decimal). Alpha and MIPS can be either:
Digital/Compaq used/uses them in little-endian mode; SGI/Cray uses
them in big-endian mode. To avoid this problem in network (socket)
connections use the pack and unpack formats n
and N
, the
"network" orders. These are guaranteed to be portable.
As of perl 5.10.0, you can also use the > and <
modifiers
to force big- or little-endian byte-order. This is useful if you want
to store signed integers or 64-bit integers, for example.
You can explore the endianness of your platform by unpacking a data structure packed in native format such as:
If you need to distinguish between endian architectures you could use either of the variables set like so:
Differing widths can cause truncation even between platforms of equal endianness. The platform of shorter width loses the upper parts of the number. There is no good solution for this problem except to avoid transferring or storing raw binary numbers.
One can circumnavigate both these problems in two ways. Either transfer and store numbers always in text format, instead of raw binary, or else consider using modules like Data::Dumper and Storable (included as of perl 5.8). Keeping all data as text significantly simplifies matters.
The v-strings are portable only up to v2147483647 (0x7FFFFFFF), that's how far EBCDIC, or more precisely UTF-EBCDIC will go.
Most platforms these days structure files in a hierarchical fashion. So, it is reasonably safe to assume that all platforms support the notion of a "path" to uniquely identify a file on the system. How that path is really written, though, differs considerably.
Although similar, file path specifications differ between Unix, Windows, Mac OS, OS/2, VMS, VOS, RISC OS, and probably others. Unix, for example, is one of the few OSes that has the elegant idea of a single root directory.
DOS, OS/2, VMS, VOS, and Windows can work similarly to Unix with /
as path separator, or in their own idiosyncratic ways (such as having
several root directories and various "unrooted" device files such NIL:
and LPT:).
Mac OS 9 and earlier used :
as a path separator instead of /.
The filesystem may support neither hard links (link) nor
symbolic links (symlink, readlink, lstat).
The filesystem may support neither access timestamp nor change timestamp (meaning that about the only portable timestamp is the modification timestamp), or one second granularity of any timestamps (e.g. the FAT filesystem limits the time granularity to two seconds).
The "inode change timestamp" (the -C
filetest) may really be the
"creation timestamp" (which it is not in Unix).
VOS perl can emulate Unix filenames with / as path separator. The
native pathname characters greater-than, less-than, number-sign, and
percent-sign are always accepted.
RISC OS perl can emulate Unix filenames with / as path
separator, or go native and use . for path separator and :
to
signal filesystems and disk names.
Don't assume Unix filesystem access semantics: that read, write, and execute are all the permissions there are, and even if they exist, that their semantics (for example what do r, w, and x mean on a directory) are the Unix ones. The various Unix/POSIX compatibility layers usually try to make interfaces like chmod() work, but sometimes there simply is no good mapping.
If all this is intimidating, have no (well, maybe only a little) fear. There are modules that can help. The File::Spec modules provide methods to do the Right Thing on whatever platform happens to be running the program.
File::Spec is available in the standard distribution as of version 5.004_05. File::Spec::Functions is only in File::Spec 0.7 and later, and some versions of perl come with version 0.6. If File::Spec is not updated to 0.7 or later, you must use the object-oriented interface from File::Spec (or upgrade File::Spec).
In general, production code should not have file paths hardcoded. Making them user-supplied or read from a configuration file is better, keeping in mind that file path syntax varies on different machines.
This is especially noticeable in scripts like Makefiles and test suites,
which often assume / as a path separator for subdirectories.
Also of use is File::Basename from the standard distribution, which splits a pathname into pieces (base filename, full path to directory, and file suffix).
Even when on a single platform (if you can call Unix a single platform), remember not to count on the existence or the contents of particular system-specific files or directories, like /etc/passwd, /etc/sendmail.conf, /etc/resolv.conf, or even /tmp/. For example, /etc/passwd may exist but not contain the encrypted passwords, because the system is using some form of enhanced security. Or it may not contain all the accounts, because the system is using NIS. If code does need to rely on such a file, include a description of the file and its format in the code's documentation, then make it easy for the user to override the default location of the file.
Don't assume a text file will end with a newline. They should, but people forget.
Do not have two files or directories of the same name with different
case, like test.pl and Test.pl, as many platforms have
case-insensitive (or at least case-forgiving) filenames. Also, try
not to have non-word characters (except for .) in the names, and
keep them to the 8.3 convention, for maximum portability, onerous a
burden though this may appear.
Likewise, when using the AutoSplit module, try to keep your functions to 8.3 naming and case-insensitive conventions; or, at the least, make it so the resulting files have a unique (case-insensitively) first 8 characters.
Whitespace in filenames is tolerated on most systems, but not all, and even on systems where it might be tolerated, some utilities might become confused by such whitespace.
Many systems (DOS, VMS ODS-2) cannot have more than one . in their
filenames.
Don't assume > won't be the first character of a filename.
Always use <
explicitly to open a file for reading, or even
better, use the three-arg version of open, unless you want the user to
be able to specify a pipe open.
- open my $fh, '<', $existing_file) or die $!;
If filenames might use strange characters, it is safest to open it
with sysopen instead of open. open is magic and can
translate characters like >, <
, and |, which may
be the wrong thing to do. (Sometimes, though, it's the right thing.)
Three-arg open can also help protect against this translation in cases
where it is undesirable.
Don't use :
as a part of a filename since many systems use that for
their own semantics (Mac OS Classic for separating pathname components,
many networking schemes and utilities for separating the nodename and
the pathname, and so on). For the same reasons, avoid @
, ;
and
|.
Don't assume that in pathnames you can collapse two leading slashes
//
into one: some networking and clustering filesystems have special
semantics for that. Let the operating system to sort it out.
The portable filename characters as defined by ANSI C are
- a b c d e f g h i j k l m n o p q r t u v w x y z
- A B C D E F G H I J K L M N O P Q R T U V W X Y Z
- 0 1 2 3 4 5 6 7 8 9
- . _ -
and the "-" shouldn't be the first character. If you want to be
hypercorrect, stay case-insensitive and within the 8.3 naming
convention (all the files and directories have to be unique within one
directory if their names are lowercased and truncated to eight
characters before the ., if any, and to three characters after the
., if any). (And do not use .s in directory names.)
Not all platforms provide a command line. These are usually platforms that rely primarily on a Graphical User Interface (GUI) for user interaction. A program requiring a command line interface might not work everywhere. This is probably for the user of the program to deal with, so don't stay up late worrying about it.
Some platforms can't delete or rename files held open by the system,
this limitation may also apply to changing filesystem metainformation
like file permissions or owners. Remember to close files when you
are done with them. Don't unlink or rename an open file. Don't
tie or open a file already tied or opened; untie or close
it first.
Don't open the same file more than once at a time for writing, as some operating systems put mandatory locks on such files.
Don't assume that write/modify permission on a directory gives the right to add or delete files/directories in that directory. That is filesystem specific: in some filesystems you need write/modify permission also (or even just) in the file/directory itself. In some filesystems (AFS, DFS) the permission to add/delete directory entries is a completely separate permission.
Don't assume that a single unlink completely gets rid of the file:
some filesystems (most notably the ones in VMS) have versioned
filesystems, and unlink() removes only the most recent one (it doesn't
remove all the versions because by default the native tools on those
platforms remove just the most recent version, too). The portable
idiom to remove all the versions of a file is
- 1 while unlink "file";
This will terminate if the file is undeleteable for some reason (protected, not there, and so on).
Don't count on a specific environment variable existing in %ENV
.
Don't count on %ENV
entries being case-sensitive, or even
case-preserving. Don't try to clear %ENV by saying %ENV = ();
, or,
if you really have to, make it conditional on $^O ne 'VMS'
since in
VMS the %ENV
table is much more than a per-process key-value string
table.
On VMS, some entries in the %ENV hash are dynamically created when
their key is used on a read if they did not previously exist. The
values for $ENV{HOME}
, $ENV{TERM}
, $ENV{HOME}
, and $ENV{USER}
,
are known to be dynamically generated. The specific names that are
dynamically generated may vary with the version of the C library on VMS,
and more may exist than is documented.
On VMS by default, changes to the %ENV hash are persistent after the process exits. This can cause unintended issues.
Don't count on signals or %SIG
for anything.
Don't count on filename globbing. Use opendir, readdir, and
closedir instead.
Don't count on per-program environment variables, or per-program current directories.
Don't count on specific values of $!
, neither numeric nor
especially the strings values. Users may switch their locales causing
error messages to be translated into their languages. If you can
trust a POSIXish environment, you can portably use the symbols defined
by the Errno module, like ENOENT. And don't trust on the values of $!
at all except immediately after a failed system call.
Don't assume that the name used to invoke a command or program with
system or exec can also be used to test for the existence of the
file that holds the executable code for that command or program.
First, many systems have "internal" commands that are built-in to the
shell or OS and while these commands can be invoked, there is no
corresponding file. Second, some operating systems (e.g., Cygwin,
DJGPP, OS/2, and VOS) have required suffixes for executable files;
these suffixes are generally permitted on the command name but are not
required. Thus, a command like "perl" might exist in a file named
"perl", "perl.exe", or "perl.pm", depending on the operating system.
The variable "_exe" in the Config module holds the executable suffix,
if any. Third, the VMS port carefully sets up $^X and
$Config{perlpath} so that no further processing is required. This is
just as well, because the matching regular expression used below would
then have to deal with a possible trailing version number in the VMS
file name.
To convert $^X to a file pathname, taking account of the requirements of the various operating system possibilities, say:
To convert $Config{perlpath} to a file pathname, say:
Don't assume that you can reach the public Internet.
Don't assume that there is only one way to get through firewalls to the public Internet.
Don't assume that you can reach outside world through any other port than 80, or some web proxy. ftp is blocked by many firewalls.
Don't assume that you can send email by connecting to the local SMTP port.
Don't assume that you can reach yourself or any node by the name 'localhost'. The same goes for '127.0.0.1'. You will have to try both.
Don't assume that the host has only one network card, or that it can't bind to many virtual IP addresses.
Don't assume a particular network device name.
Don't assume a particular set of ioctl()s will work.
Don't assume that you can ping hosts and get replies.
Don't assume that any particular port (service) will respond.
Don't assume that Sys::Hostname (or any other API or command) returns either a fully qualified hostname or a non-qualified hostname: it all depends on how the system had been configured. Also remember that for things such as DHCP and NAT, the hostname you get back might not be very useful.
All the above "don't":s may look daunting, and they are, but the key is to degrade gracefully if one cannot reach the particular network service one wants. Croaking or hanging do not look very professional.
In general, don't directly access the system in code meant to be
portable. That means, no system, exec, fork, pipe,
``
, qx//, open with a |, nor any of the other things
that makes being a perl hacker worth being.
Commands that launch external processes are generally supported on most platforms (though many of them do not support any type of forking). The problem with using them arises from what you invoke them on. External tools are often named differently on different platforms, may not be available in the same location, might accept different arguments, can behave differently, and often present their results in a platform-dependent way. Thus, you should seldom depend on them to produce consistent results. (Then again, if you're calling netstat -a, you probably don't expect it to run on both Unix and CP/M.)
One especially common bit of Perl code is opening a pipe to sendmail:
This is fine for systems programming when sendmail is known to be available. But it is not fine for many non-Unix systems, and even some Unix systems that may not have sendmail installed. If a portable solution is needed, see the various distributions on CPAN that deal with it. Mail::Mailer and Mail::Send in the MailTools distribution are commonly used, and provide several mailing methods, including mail, sendmail, and direct SMTP (via Net::SMTP) if a mail transfer agent is not available. Mail::Sendmail is a standalone module that provides simple, platform-independent mailing.
The Unix System V IPC (msg*(), sem*(), shm*()
) is not available
even on all Unix platforms.
Do not use either the bare result of pack("N", 10, 20, 30, 40)
or
bare v-strings (such as v10.20.30.40
) to represent IPv4 addresses:
both forms just pack the four bytes into network order. That this
would be equal to the C language in_addr
struct (which is what the
socket code internally uses) is not guaranteed. To be portable use
the routines of the Socket extension, such as inet_aton()
,
inet_ntoa()
, and sockaddr_in()
.
The rule of thumb for portable code is: Do it all in portable Perl, or use a module (that may internally implement it with platform-specific code, but expose a common interface).
XS code can usually be made to work with any platform, but dependent libraries, header files, etc., might not be readily available or portable, or the XS code itself might be platform-specific, just as Perl code might be. If the libraries and headers are portable, then it is normally reasonable to make sure the XS code is portable, too.
A different type of portability issue arises when writing XS code: availability of a C compiler on the end-user's system. C brings with it its own portability issues, and writing XS code will expose you to some of those. Writing purely in Perl is an easier way to achieve portability.
In general, the standard modules work across platforms. Notable exceptions are the CPAN module (which currently makes connections to external programs that may not be available), platform-specific modules (like ExtUtils::MM_VMS), and DBM modules.
There is no one DBM module available on all platforms. SDBM_File and the others are generally available on all Unix and DOSish ports, but not in MacPerl, where only NBDM_File and DB_File are available.
The good news is that at least some DBM module should be available, and AnyDBM_File will use whichever module it can find. Of course, then the code needs to be fairly strict, dropping to the greatest common factor (e.g., not exceeding 1K for each record), so that it will work with any DBM module. See AnyDBM_File for more details.
The system's notion of time of day and calendar date is controlled in
widely different ways. Don't assume the timezone is stored in $ENV{TZ}
,
and even if it is, don't assume that you can control the timezone through
that variable. Don't assume anything about the three-letter timezone
abbreviations (for example that MST would be the Mountain Standard Time,
it's been known to stand for Moscow Standard Time). If you need to
use timezones, express them in some unambiguous format like the
exact number of minutes offset from UTC, or the POSIX timezone
format.
Don't assume that the epoch starts at 00:00:00, January 1, 1970,
because that is OS- and implementation-specific. It is better to
store a date in an unambiguous representation. The ISO 8601 standard
defines YYYY-MM-DD as the date format, or YYYY-MM-DDTHH:MM:SS
(that's a literal "T" separating the date from the time).
Please do use the ISO 8601 instead of making us guess what
date 02/03/04 might be. ISO 8601 even sorts nicely as-is.
A text representation (like "1987-12-18") can be easily converted
into an OS-specific value using a module like Date::Parse.
An array of values, such as those returned by localtime, can be
converted to an OS-specific representation using Time::Local.
When calculating specific times, such as for tests in time or date modules, it may be appropriate to calculate an offset for the epoch.
The value for $offset
in Unix will be 0
, but in Mac OS Classic
will be some large number. $offset
can then be added to a Unix time
value to get what should be the proper value on any system.
Assume very little about character sets.
Assume nothing about numerical values (ord, chr) of characters.
Do not use explicit code point ranges (like \xHH-\xHH); use for
example symbolic character classes like [:print:].
Do not assume that the alphabetic characters are encoded contiguously (in the numeric sense). There may be gaps.
Do not assume anything about the ordering of the characters. The lowercase letters may come before or after the uppercase letters; the lowercase and uppercase may be interlaced so that both "a" and "A" come before "b"; the accented and other international characters may be interlaced so that ä comes before "b".
If you may assume POSIX (a rather large assumption), you may read more about the POSIX locale system from perllocale. The locale system at least attempts to make things a little bit more portable, or at least more convenient and native-friendly for non-English users. The system affects character sets and encoding, and date and time formatting--amongst other things.
If you really want to be international, you should consider Unicode. See perluniintro and perlunicode for more information.
If you want to use non-ASCII bytes (outside the bytes 0x00..0x7f) in
the "source code" of your code, to be portable you have to be explicit
about what bytes they are. Someone might for example be using your
code under a UTF-8 locale, in which case random native bytes might be
illegal ("Malformed UTF-8 ...") This means that for example embedding
ISO 8859-1 bytes beyond 0x7f into your strings might cause trouble
later. If the bytes are native 8-bit bytes, you can use the bytes
pragma. If the bytes are in a string (regular expression being a
curious string), you can often also use the \xHH
notation instead
of embedding the bytes as-is. If you want to write your code in UTF-8,
you can use the utf8
.
If your code is destined for systems with severely constrained (or missing!) virtual memory systems then you want to be especially mindful of avoiding wasteful constructs such as:
The last two constructs may appear unintuitive to most people. The first repeatedly grows a string, whereas the second allocates a large chunk of memory in one go. On some systems, the second is more efficient that the first.
Most multi-user platforms provide basic levels of security, usually implemented at the filesystem level. Some, however, unfortunately do not. Thus the notion of user id, or "home" directory, or even the state of being logged-in, may be unrecognizable on many platforms. If you write programs that are security-conscious, it is usually best to know what type of system you will be running under so that you can write code explicitly for that platform (or class of platforms).
Don't assume the Unix filesystem access semantics: the operating system or the filesystem may be using some ACL systems, which are richer languages than the usual rwx. Even if the rwx exist, their semantics might be different.
(From security viewpoint testing for permissions before attempting to do something is silly anyway: if one tries this, there is potential for race conditions. Someone or something might change the permissions between the permissions check and the actual operation. Just try the operation.)
Don't assume the Unix user and group semantics: especially, don't
expect the $<
and $>
(or the $(
and $)
) to work
for switching identities (or memberships).
Don't assume set-uid and set-gid semantics. (And even if you do, think twice: set-uid and set-gid are a known can of security worms.)
For those times when it is necessary to have platform-specific code,
consider keeping the platform-specific code in one place, making porting
to other platforms easier. Use the Config module and the special
variable $^O
to differentiate platforms, as described in
PLATFORMS.
Be careful in the tests you supply with your module or programs.
Module code may be fully portable, but its tests might not be. This
often happens when tests spawn off other processes or call external
programs to aid in the testing, or when (as noted above) the tests
assume certain things about the filesystem and paths. Be careful not
to depend on a specific output style for errors, such as when checking
$!
after a failed system call. Using $!
for anything else than
displaying it as output is doubtful (though see the Errno module for
testing reasonably portably for error value). Some platforms expect
a certain output format, and Perl on those platforms may have been
adjusted accordingly. Most specifically, don't anchor a regex when
testing an error value.
Modules uploaded to CPAN are tested by a variety of volunteers on different platforms. These CPAN testers are notified by mail of each new upload, and reply to the list with PASS, FAIL, NA (not applicable to this platform), or UNKNOWN (unknown), along with any relevant notations.
The purpose of the testing is twofold: one, to help developers fix any problems in their code that crop up because of lack of testing on other platforms; two, to provide users with information about whether a given module works on a given platform.
Also see:
Mailing list: cpan-testers-discuss@perl.org
Testing results: http://www.cpantesters.org/
Perl is built with a $^O
variable that indicates the operating
system it was built on. This was implemented
to help speed up code that would otherwise have to use Config
and use the value of $Config{osname}
. Of course, to get more
detailed information about the system, looking into %Config
is
certainly recommended.
%Config
cannot always be trusted, however, because it was built
at compile time. If perl was built in one place, then transferred
elsewhere, some values may be wrong. The values may even have been
edited after the fact.
Perl works on a bewildering variety of Unix and Unix-like platforms (see
e.g. most of the files in the hints/ directory in the source code kit).
On most of these systems, the value of $^O
(hence $Config{'osname'}
,
too) is determined either by lowercasing and stripping punctuation from the
first field of the string returned by typing uname -a
(or a similar command)
at the shell prompt or by testing the file system for the presence of
uniquely named files such as a kernel or header file. Here, for example,
are a few of the more popular Unix flavors:
- uname $^O $Config{'archname'}
- --------------------------------------------
- AIX aix aix
- BSD/OS bsdos i386-bsdos
- Darwin darwin darwin
- dgux dgux AViiON-dgux
- DYNIX/ptx dynixptx i386-dynixptx
- FreeBSD freebsd freebsd-i386
- Haiku haiku BePC-haiku
- Linux linux arm-linux
- Linux linux i386-linux
- Linux linux i586-linux
- Linux linux ppc-linux
- HP-UX hpux PA-RISC1.1
- IRIX irix irix
- Mac OS X darwin darwin
- NeXT 3 next next-fat
- NeXT 4 next OPENSTEP-Mach
- openbsd openbsd i386-openbsd
- OSF1 dec_osf alpha-dec_osf
- reliantunix-n svr4 RM400-svr4
- SCO_SV sco_sv i386-sco_sv
- SINIX-N svr4 RM400-svr4
- sn4609 unicos CRAY_C90-unicos
- sn6521 unicosmk t3e-unicosmk
- sn9617 unicos CRAY_J90-unicos
- SunOS solaris sun4-solaris
- SunOS solaris i86pc-solaris
- SunOS4 sunos sun4-sunos
Because the value of $Config{archname}
may depend on the
hardware architecture, it can vary more than the value of $^O
.
Perl has long been ported to Intel-style microcomputers running under systems like PC-DOS, MS-DOS, OS/2, and most Windows platforms you can bring yourself to mention (except for Windows CE, if you count that). Users familiar with COMMAND.COM or CMD.EXE style shells should be aware that each of these file specifications may have subtle differences:
System calls accept either / or \
as the path separator.
However, many command-line utilities of DOS vintage treat / as
the option prefix, so may get confused by filenames containing /.
Aside from calling any external programs, / will work just fine,
and probably better, as it is more consistent with popular usage,
and avoids the problem of remembering what to backwhack and what
not to.
The DOS FAT filesystem can accommodate only "8.3" style filenames. Under
the "case-insensitive, but case-preserving" HPFS (OS/2) and NTFS (NT)
filesystems you may have to be careful about case returned with functions
like readdir or used with functions like open or opendir.
DOS also treats several filenames as special, such as AUX, PRN, NUL, CON, COM1, LPT1, LPT2, etc. Unfortunately, sometimes these filenames won't even work if you include an explicit directory prefix. It is best to avoid such filenames, if you want your code to be portable to DOS and its derivatives. It's hard to know what these all are, unfortunately.
Users of these operating systems may also wish to make use of scripts such as pl2bat.bat or pl2cmd to put wrappers around your scripts.
Newline (\n
) is translated as \015\012
by STDIO when reading from
and writing to files (see Newlines). binmode(FILEHANDLE)
will keep \n
translated as \012
for that filehandle. Since it is a
no-op on other systems, binmode should be used for cross-platform code
that deals with binary data. That's assuming you realize in advance
that your data is in binary. General-purpose programs should
often assume nothing about their data.
The $^O
variable and the $Config{archname}
values for various
DOSish perls are as follows:
- OS $^O $Config{archname} ID Version
- --------------------------------------------------------
- MS-DOS dos ?
- PC-DOS dos ?
- OS/2 os2 ?
- Windows 3.1 ? ? 0 3 01
- Windows 95 MSWin32 MSWin32-x86 1 4 00
- Windows 98 MSWin32 MSWin32-x86 1 4 10
- Windows ME MSWin32 MSWin32-x86 1 ?
- Windows NT MSWin32 MSWin32-x86 2 4 xx
- Windows NT MSWin32 MSWin32-ALPHA 2 4 xx
- Windows NT MSWin32 MSWin32-ppc 2 4 xx
- Windows 2000 MSWin32 MSWin32-x86 2 5 00
- Windows XP MSWin32 MSWin32-x86 2 5 01
- Windows 2003 MSWin32 MSWin32-x86 2 5 02
- Windows Vista MSWin32 MSWin32-x86 2 6 00
- Windows 7 MSWin32 MSWin32-x86 2 6 01
- Windows 7 MSWin32 MSWin32-x64 2 6 01
- Windows 2008 MSWin32 MSWin32-x86 2 6 01
- Windows 2008 MSWin32 MSWin32-x64 2 6 01
- Windows CE MSWin32 ? 3
- Cygwin cygwin cygwin
The various MSWin32 Perl's can distinguish the OS they are running on via the value of the fifth element of the list returned from Win32::GetOSVersion(). For example:
There are also Win32::IsWinNT() and Win32::IsWin95(), try perldoc Win32
,
and as of libwin32 0.19 (not part of the core Perl distribution)
Win32::GetOSName(). The very portable POSIX::uname() will work too:
- c:\> perl -MPOSIX -we "print join '|', uname"
- Windows NT|moonru|5.0|Build 2195 (Service Pack 2)|x86
Also see:
The djgpp environment for DOS, http://www.delorie.com/djgpp/ and perldos.
The EMX environment for DOS, OS/2, etc. emx@iaehv.nl, ftp://hobbes.nmsu.edu/pub/os2/dev/emx/ Also perlos2.
Build instructions for Win32 in perlwin32, or under the Cygnus environment in perlcygwin.
The Win32::*
modules in Win32.
The ActiveState Pages, http://www.activestate.com/
The Cygwin environment for Win32; README.cygwin (installed as perlcygwin), http://www.cygwin.com/
The U/WIN environment for Win32, http://www.research.att.com/sw/tools/uwin/
Build instructions for OS/2, perlos2
Perl on VMS is discussed in perlvms in the perl distribution.
The official name of VMS as of this writing is OpenVMS.
Perl on VMS can accept either VMS- or Unix-style file specifications as in either of the following:
- $ perl -ne "print if /perl_setup/i" SYS$LOGIN:LOGIN.COM
- $ perl -ne "print if /perl_setup/i" /sys$login/login.com
but not a mixture of both as in:
- $ perl -ne "print if /perl_setup/i" sys$login:/login.com
- Can't open sys$login:/login.com: file specification syntax error
Interacting with Perl from the Digital Command Language (DCL) shell often requires a different set of quotation marks than Unix shells do. For example:
- $ perl -e "print ""Hello, world.\n"""
- Hello, world.
There are several ways to wrap your perl scripts in DCL .COM files, if you are so inclined. For example:
- $ write sys$output "Hello from DCL!"
- $ if p1 .eqs. ""
- $ then perl -x 'f$environment("PROCEDURE")
- $ else perl -x - 'p1 'p2 'p3 'p4 'p5 'p6 'p7 'p8
- $ deck/dollars="__END__"
- #!/usr/bin/perl
- print "Hello from Perl!\n";
- __END__
- $ endif
Do take care with $ ASSIGN/nolog/user SYS$COMMAND: SYS$INPUT if your
perl-in-DCL script expects to do things like $read = <STDIN>;
.
The VMS operating system has two filesystems, known as ODS-2 and ODS-5.
For ODS-2, filenames are in the format "name.extension;version". The
maximum length for filenames is 39 characters, and the maximum length for
extensions is also 39 characters. Version is a number from 1 to
32767. Valid characters are /[A-Z0-9$_-]/
.
The ODS-2 filesystem is case-insensitive and does not preserve case. Perl simulates this by converting all filenames to lowercase internally.
For ODS-5, filenames may have almost any character in them and can include
Unicode characters. Characters that could be misinterpreted by the DCL
shell or file parsing utilities need to be prefixed with the ^
character, or replaced with hexadecimal characters prefixed with the
^ character. Such prefixing is only needed with the pathnames are
in VMS format in applications. Programs that can accept the Unix format
of pathnames do not need the escape characters. The maximum length for
filenames is 255 characters. The ODS-5 file system can handle both
a case preserved and a case sensitive mode.
ODS-5 is only available on the OpenVMS for 64 bit platforms.
Support for the extended file specifications is being done as optional settings to preserve backward compatibility with Perl scripts that assume the previous VMS limitations.
In general routines on VMS that get a Unix format file specification should return it in a Unix format, and when they get a VMS format specification they should return a VMS format unless they are documented to do a conversion.
For routines that generate return a file specification, VMS allows setting if the C library which Perl is built on if it will be returned in VMS format or in Unix format.
With the ODS-2 file system, there is not much difference in syntax of filenames without paths for VMS or Unix. With the extended character set available with ODS-5 there can be a significant difference.
Because of this, existing Perl scripts written for VMS were sometimes treating VMS and Unix filenames interchangeably. Without the extended character set enabled, this behavior will mostly be maintained for backwards compatibility.
When extended characters are enabled with ODS-5, the handling of Unix formatted file specifications is to that of a Unix system.
VMS file specifications without extensions have a trailing dot. An equivalent Unix file specification should not show the trailing dot.
The result of all of this, is that for VMS, for portable scripts, you can not depend on Perl to present the filenames in lowercase, to be case sensitive, and that the filenames could be returned in either Unix or VMS format.
And if a routine returns a file specification, unless it is intended to convert it, it should return it in the same format as it found it.
readdir by default has traditionally returned lowercased filenames.
When the ODS-5 support is enabled, it will return the exact case of the
filename on the disk.
Files without extensions have a trailing period on them, so doing a
readdir in the default mode with a file named A.;5 will
return a. when VMS is (though that file could be opened with
open(FH, 'A')
).
With support for extended file specifications and if opendir was
given a Unix format directory, a file named A.;5 will return a
and optionally in the exact case on the disk. When opendir is given
a VMS format directory, then readdir should return a., and
again with the optionally the exact case.
RMS had an eight level limit on directory depths from any rooted logical
(allowing 16 levels overall) prior to VMS 7.2, and even with versions of
VMS on VAX up through 7.3. Hence PERL_ROOT:[LIB.2.3.4.5.6.7.8]
is a
valid directory specification but PERL_ROOT:[LIB.2.3.4.5.6.7.8.9]
is
not. Makefile.PL authors might have to take this into account, but at
least they can refer to the former as /PERL_ROOT/lib/2/3/4/5/6/7/8/.
Pumpkings and module integrators can easily see whether files with too many directory levels have snuck into the core by running the following in the top-level source directory:
- $ perl -ne "$_=~s/\s+.*//; print if scalar(split /\//) > 8;" < MANIFEST
The VMS::Filespec module, which gets installed as part of the build process on VMS, is a pure Perl module that can easily be installed on non-VMS platforms and can be helpful for conversions to and from RMS native formats. It is also now the only way that you should check to see if VMS is in a case sensitive mode.
What \n
represents depends on the type of file opened. It usually
represents \012
but it could also be \015
, \012
, \015\012
,
\000
, \040
, or nothing depending on the file organization and
record format. The VMS::Stdio module provides access to the
special fopen() requirements of files with unusual attributes on VMS.
TCP/IP stacks are optional on VMS, so socket routines might not be implemented. UDP sockets may not be supported.
The TCP/IP library support for all current versions of VMS is dynamically loaded if present, so even if the routines are configured, they may return a status indicating that they are not implemented.
The value of $^O
on OpenVMS is "VMS". To determine the architecture
that you are running on without resorting to loading all of %Config
you can examine the content of the @INC
array like so:
In general, the significant differences should only be if Perl is running on VMS_VAX or one of the 64 bit OpenVMS platforms.
On VMS, perl determines the UTC offset from the SYS$TIMEZONE_DIFFERENTIAL
logical name. Although the VMS epoch began at 17-NOV-1858 00:00:00.00,
calls to localtime are adjusted to count offsets from
01-JAN-1970 00:00:00.00, just like Unix.
Also see:
README.vms (installed as README_vms), perlvms
vmsperl list, vmsperl-subscribe@perl.org
vmsperl on the web, http://www.sidhe.org/vmsperl/index.html
Perl on VOS (also known as OpenVOS) is discussed in README.vos in the perl distribution (installed as perlvos). Perl on VOS can accept either VOS- or Unix-style file specifications as in either of the following:
- $ perl -ne "print if /perl_setup/i" >system>notices
- $ perl -ne "print if /perl_setup/i" /system/notices
or even a mixture of both as in:
- $ perl -ne "print if /perl_setup/i" >system/notices
Even though VOS allows the slash character to appear in object names, because the VOS port of Perl interprets it as a pathname delimiting character, VOS files, directories, or links whose names contain a slash character cannot be processed. Such files must be renamed before they can be processed by Perl.
Older releases of VOS (prior to OpenVOS Release 17.0) limit file
names to 32 or fewer characters, prohibit file names from
starting with a -
character, and prohibit file names from
containing any character matching tr/ !#%&'()*;<=>?//
.
Newer releases of VOS (OpenVOS Release 17.0 or later) support a
feature known as extended names. On these releases, file names
can contain up to 255 characters, are prohibited from starting
with a -
character, and the set of prohibited characters is
reduced to any character matching tr/#%*<>?//. There are
restrictions involving spaces and apostrophes: these characters
must not begin or end a name, nor can they immediately precede or
follow a period. Additionally, a space must not immediately
precede another space or hyphen. Specifically, the following
character combinations are prohibited: space-space,
space-hyphen, period-space, space-period, period-apostrophe,
apostrophe-period, leading or trailing space, and leading or
trailing apostrophe. Although an extended file name is limited
to 255 characters, a path name is still limited to 256
characters.
The value of $^O
on VOS is "vos". To determine the
architecture that you are running on without resorting to loading
all of %Config
you can examine the content of the @INC array
like so:
Also see:
README.vos (installed as perlvos)
The VOS mailing list.
There is no specific mailing list for Perl on VOS. You can contact the Stratus Technologies Customer Assistance Center (CAC) for your region, or you can use the contact information located in the distribution files on the Stratus Anonymous FTP site.
Stratus Technologies on the web at http://www.stratus.com
VOS Open-Source Software on the web at http://ftp.stratus.com/pub/vos/vos.html
Recent versions of Perl have been ported to platforms such as OS/400 on AS/400 minicomputers as well as OS/390, VM/ESA, and BS2000 for S/390 Mainframes. Such computers use EBCDIC character sets internally (usually Character Code Set ID 0037 for OS/400 and either 1047 or POSIX-BC for S/390 systems). On the mainframe perl currently works under the "Unix system services for OS/390" (formerly known as OpenEdition), VM/ESA OpenEdition, or the BS200 POSIX-BC system (BS2000 is supported in perl 5.6 and greater). See perlos390 for details. Note that for OS/400 there is also a port of Perl 5.8.1/5.10.0 or later to the PASE which is ASCII-based (as opposed to ILE which is EBCDIC-based), see perlos400.
As of R2.5 of USS for OS/390 and Version 2.3 of VM/ESA these Unix
sub-systems do not support the #!
shebang trick for script invocation.
Hence, on OS/390 and VM/ESA perl scripts can be executed with a header
similar to the following simple script:
OS/390 will support the #!
shebang trick in release 2.8 and beyond.
Calls to system and backticks can use POSIX shell syntax on all
S/390 systems.
On the AS/400, if PERL5 is in your library list, you may need to wrap your perl scripts in a CL procedure to invoke them like so:
- BEGIN
- CALL PGM(PERL5/PERL) PARM('/QOpenSys/hello.pl')
- ENDPGM
This will invoke the perl script hello.pl in the root of the
QOpenSys file system. On the AS/400 calls to system or backticks
must use CL syntax.
On these platforms, bear in mind that the EBCDIC character set may have
an effect on what happens with some perl functions (such as chr,
pack, print, printf, ord, sort, sprintf, unpack), as
well as bit-fiddling with ASCII constants using operators like ^, &
and |, not to mention dealing with socket interfaces to ASCII computers
(see Newlines).
Fortunately, most web servers for the mainframe will correctly
translate the \n
in the following statement to its ASCII equivalent
(\r
is the same under both Unix and OS/390):
- print "Content-type: text/html\r\n\r\n";
The values of $^O
on some of these platforms includes:
- uname $^O $Config{'archname'}
- --------------------------------------------
- OS/390 os390 os390
- OS400 os400 os400
- POSIX-BC posix-bc BS2000-posix-bc
Some simple tricks for determining if you are running on an EBCDIC platform could include any of the following (perhaps all):
One thing you may not want to rely on is the EBCDIC encoding of punctuation characters since these may differ from code page to code page (and once your module or script is rumoured to work with EBCDIC, folks will want it to work with all EBCDIC character sets).
Also see:
perlos390, README.os390, perlbs2000, perlebcdic.
The perl-mvs@perl.org list is for discussion of porting issues as well as general usage issues for all EBCDIC Perls. Send a message body of "subscribe perl-mvs" to majordomo@perl.org.
AS/400 Perl information at http://as400.rochester.ibm.com/ as well as on CPAN in the ports/ directory.
Because Acorns use ASCII with newlines (\n
) in text files as \012
like
Unix, and because Unix filename emulation is turned on by default,
most simple scripts will probably work "out of the box". The native
filesystem is modular, and individual filesystems are free to be
case-sensitive or insensitive, and are usually case-preserving. Some
native filesystems have name length limits, which file and directory
names are silently truncated to fit. Scripts should be aware that the
standard filesystem currently has a name length limit of 10
characters, with up to 77 items in a directory, but other filesystems
may not impose such limitations.
Native filenames are of the form
- Filesystem#Special_Field::DiskName.$.Directory.Directory.File
where
- Special_Field is not usually present, but may contain . and $ .
- Filesystem =~ m|[A-Za-z0-9_]|
- DsicName =~ m|[A-Za-z0-9_/]|
- $ represents the root directory
- . is the path separator
- @ is the current directory (per filesystem but machine global)
- ^ is the parent directory
- Directory and File =~ m|[^\0- "\.\$\%\&:\@\\^\|\177]+|
The default filename translation is roughly tr|/.|./|;
Note that "ADFS::HardDisk.$.File" ne 'ADFS::HardDisk.$.File'
and that
the second stage of $
interpolation in regular expressions will fall
foul of the $.
if scripts are not careful.
Logical paths specified by system variables containing comma-separated
search lists are also allowed; hence System:Modules
is a valid
filename, and the filesystem will prefix Modules
with each section of
System$Path
until a name is made that points to an object on disk.
Writing to a new file System:Modules
would be allowed only if
System$Path
contains a single item list. The filesystem will also
expand system variables in filenames if enclosed in angle brackets, so
<System$Dir>.Modules
would look for the file
$ENV{'System$Dir'} . 'Modules'
. The obvious implication of this is
that fully qualified filenames can start with <>
and should
be protected when open is used for input.
Because . was in use as a directory separator and filenames could not
be assumed to be unique after 10 characters, Acorn implemented the C
compiler to strip the trailing .c .h .s and .o suffix from
filenames specified in source code and store the respective files in
subdirectories named after the suffix. Hence files are translated:
- foo.h h.foo
- C:foo.h C:h.foo (logical path variable)
- sys/os.h sys.h.os (C compiler groks Unix-speak)
- 10charname.c c.10charname
- 10charname.o o.10charname
- 11charname_.c c.11charname (assuming filesystem truncates at 10)
The Unix emulation library's translation of filenames to native assumes
that this sort of translation is required, and it allows a user-defined list
of known suffixes that it will transpose in this fashion. This may
seem transparent, but consider that with these rules foo/bar/baz.h
and foo/bar/h/baz both map to foo.bar.h.baz, and that readdir and
glob cannot and do not attempt to emulate the reverse mapping. Other
.'s in filenames are translated to /.
As implied above, the environment accessed through %ENV
is global, and
the convention is that program specific environment variables are of the
form Program$Name
. Each filesystem maintains a current directory,
and the current filesystem's current directory is the global current
directory. Consequently, sociable programs don't change the current
directory but rely on full pathnames, and programs (and Makefiles) cannot
assume that they can spawn a child process which can change the current
directory without affecting its parent (and everyone else for that
matter).
Because native operating system filehandles are global and are currently
allocated down from 255, with 0 being a reserved value, the Unix emulation
library emulates Unix filehandles. Consequently, you can't rely on
passing STDIN
, STDOUT
, or STDERR
to your children.
The desire of users to express filenames of the form
<Foo$Dir>.Bar
on the command line unquoted causes problems,
too: ``
command output capture has to perform a guessing game. It
assumes that a string <[^<>]+\$[^<>]> is a
reference to an environment variable, whereas anything else involving
<
or > is redirection, and generally manages to be 99%
right. Of course, the problem remains that scripts cannot rely on any
Unix tools being available, or that any tools found have Unix-like command
line arguments.
Extensions and XS are, in theory, buildable by anyone using free
tools. In practice, many don't, as users of the Acorn platform are
used to binary distributions. MakeMaker does run, but no available
make currently copes with MakeMaker's makefiles; even if and when
this should be fixed, the lack of a Unix-like shell will cause
problems with makefile rules, especially lines of the form cd
sdbm && make all
, and anything using quoting.
"RISC OS" is the proper name for the operating system, but the value
in $^O
is "riscos" (because we don't like shouting).
Perl has been ported to many platforms that do not fit into any of the categories listed above. Some, such as AmigaOS, QNX, Plan 9, and VOS, have been well-integrated into the standard Perl source code kit. You may need to see the ports/ directory on CPAN for information, and possibly binaries, for the likes of: aos, Atari ST, lynxos, riscos, Novell Netware, Tandem Guardian, etc. (Yes, we know that some of these OSes may fall under the Unix category, but we are not a standards body.)
Some approximate operating system names and their $^O
values
in the "OTHER" category include:
- OS $^O $Config{'archname'}
- ------------------------------------------
- Amiga DOS amigaos m68k-amigos
See also:
Amiga, README.amiga (installed as perlamiga).
A free perl5-based PERL.NLM for Novell Netware is available in precompiled binary and source code form from http://www.novell.com/ as well as from CPAN.
Plan 9, README.plan9
Listed below are functions that are either completely unimplemented or else have been implemented differently on various platforms. Following each description will be, in parentheses, a list of platforms that the description applies to.
The list may well be incomplete, or even wrong in some places. When in doubt, consult the platform-specific README files in the Perl source distribution, and any other documentation resources accompanying a given port.
Be aware, moreover, that even among Unix-ish systems there are variations.
For many functions, you can also query %Config
, exported by
default from the Config module. For example, to check whether the
platform has the lstat call, check $Config{d_lstat}
. See
Config for a full description of available variables.
-w
only inspects the read-only file attribute (FILE_ATTRIBUTE_READONLY),
which determines whether the directory can be deleted, not whether it can
be written to. Directories always have read and write access unless denied
by discretionary access control lists (DACLs). (Win32)
-r
, -w
, -x
, and -o
tell whether the file is accessible,
which may not reflect UIC-based file protections. (VMS)
-s
by name on an open file will return the space reserved on disk,
rather than the current extent. -s
on an open filehandle returns the
current size. (RISC OS)
-R
, -W
, -X, -O
are indistinguishable from -r
, -w
,
-x
, -o
. (Win32, VMS, RISC OS)
-g
, -k
, -l
, -u
, -A
are not particularly meaningful.
(Win32, VMS, RISC OS)
-p
is not particularly meaningful. (VMS, RISC OS)
-d
is true if passed a device spec without an explicit directory.
(VMS)
-x
(or -X) determine if a file ends in one of the executable
suffixes. -S
is meaningless. (Win32)
-x
(or -X) determine if a file has an executable file type.
(RISC OS)
Emulated using timers that must be explicitly polled whenever Perl wants to dispatch "safe signals" and therefore cannot interrupt blocking system calls. (Win32)
Due to issues with various CPUs, math libraries, compilers, and standards,
results for atan2() may vary depending on any combination of the above.
Perl attempts to conform to the Open Group/IEEE standards for the results
returned from atan2(), but cannot force the issue if the system Perl is
run on does not allow it. (Tru64, HP-UX 10.20)
The current version of the standards for atan2() is available at
http://www.opengroup.org/onlinepubs/009695399/functions/atan2.html.
Meaningless. (RISC OS)
Reopens file and restores pointer; if function fails, underlying filehandle may be closed, or pointer may be in a different position. (VMS)
The value returned by tell may be affected after the call, and
the filehandle may be flushed. (Win32)
Only good for changing "owner" read-write access, "group", and "other" bits are meaningless. (Win32)
Only good for changing "owner" and "other" read-write access. (RISC OS)
Access permissions are mapped onto VOS access-control list changes. (VOS)
The actual permissions set depend on the value of the CYGWIN
in the SYSTEM environment settings. (Cygwin)
Not implemented. (Win32, Plan 9, RISC OS)
Does nothing, but won't fail. (Win32)
A little funky, because VOS's notion of ownership is a little funky (VOS).
Not implemented. (Win32, VMS, Plan 9, RISC OS, VOS)
May not be available if library or source was not provided when building perl. (Win32)
Not implemented. (VMS, Plan 9, VOS)
Not implemented. (VMS, Plan 9, VOS)
Not useful. (RISC OS)
Not supported. (Cygwin, Win32)
Invokes VMS debugger. (VMS)
Does not automatically flush output handles on some platforms. (SunOS, Solaris, HP-UX)
Not supported. (Symbian OS)
Emulates Unix exit() (which considers exit 1
to indicate an error) by
mapping the 1
to SS$_ABORT (44
). This behavior may be overridden
with the pragma use vmsish 'exit'
. As with the CRTL's exit()
function, exit 0
is also mapped to an exit status of SS$_NORMAL
(1
); this mapping cannot be overridden. Any other argument to exit()
is used directly as Perl's exit status. On VMS, unless the future
POSIX_EXIT mode is enabled, the exit code should always be a valid
VMS exit code and not a generic number. When the POSIX_EXIT mode is
enabled, a generic number will be encoded in a method compatible with
the C library _POSIX_EXIT macro so that it can be decoded by other
programs, particularly ones written in C, like the GNV package. (VMS)
exit() resets file pointers, which is a problem when called
from a child process (created by fork()) in BEGIN
.
A workaround is to use POSIX::_exit
. (Solaris)
Not implemented. (Win32)
Some functions available based on the version of VMS. (VMS)
Not implemented (VMS, RISC OS, VOS).
Not implemented. (AmigaOS, RISC OS, VMS)
Emulated using multiple interpreters. See perlfork. (Win32)
Does not automatically flush output handles on some platforms. (SunOS, Solaris, HP-UX)
Not implemented. (RISC OS)
Not implemented. (Win32, VMS, RISC OS)
Not implemented. (Win32, RISC OS)
Not implemented. (Win32, VMS, RISC OS, VOS)
Not implemented. (Win32)
Not useful. (RISC OS)
Not implemented. (Win32, VMS, RISC OS)
Not implemented. (Win32, Plan 9)
Not implemented. (Win32)
Not useful. (RISC OS)
Not implemented. (Win32, VMS, RISC OS)
Not implemented. (Win32, Plan 9)
Not implemented. (Win32)
Not implemented. (Win32, VMS)
gethostbyname('localhost') does not work everywhere: you may have
to use gethostbyname('127.0.0.1'). (Irix 5)
Not implemented. (Win32)
Not implemented. (Win32, Plan 9)
Not implemented. (Win32, Plan 9)
Not implemented. (Win32, Plan 9)
Not implemented. (Win32, Plan 9, RISC OS)
Not implemented. (Win32, Plan 9, RISC OS)
Not implemented. (Win32, Plan 9, RISC OS)
Not implemented. (Plan 9, Win32, RISC OS)
Not implemented. (Win32)
Not implemented. (RISC OS, VMS, Win32)
Not implemented. (Win32)
Not implemented. (Win32, Plan 9)
Not implemented. (Win32, Plan 9)
Not implemented. (Plan 9, Win32)
Not implemented. (Plan 9)
This operator is implemented via the File::Glob extension on most platforms. See File::Glob for portability information.
In theory, gmtime() is reliable from -2**63 to 2**63-1. However, because work arounds in the implementation use floating point numbers, it will become inaccurate as the time gets larger. This is a bug and will be fixed in the future.
On VOS, time values are 32-bit quantities.
Not implemented. (VMS)
Available only for socket handles, and it does what the ioctlsocket() call in the Winsock API does. (Win32)
Available only for socket handles. (RISC OS)
Not implemented, hence not useful for taint checking. (RISC OS)
kill() doesn't have the semantics of raise()
, i.e. it doesn't send
a signal to the identified process like it does on Unix platforms.
Instead kill($sig, $pid)
terminates the process identified by $pid,
and makes it exit immediately with exit status $sig. As in Unix, if
$sig is 0 and the specified process exists, it returns true without
actually terminating it. (Win32)
kill(-9, $pid)
will terminate the process specified by $pid and
recursively all child processes owned by it. This is different from
the Unix semantics, where the signal will be delivered to all
processes in the same process group as the process specified by
$pid. (Win32)
Is not supported for process identification number of 0 or negative numbers. (VMS)
Not implemented. (RISC OS, VOS)
Link count not updated because hard links are not quite that hard (They are sort of half-way between hard and soft links). (AmigaOS)
Hard links are implemented on Win32 under NTFS only. They are natively supported on Windows 2000 and later. On Windows NT they are implemented using the Windows POSIX subsystem support and the Perl process will need Administrator or Backup Operator privileges to create hard links.
Available on 64 bit OpenVMS 8.2 and later. (VMS)
localtime() has the same range as gmtime, but because time zone rules change its accuracy for historical and future times may degrade but usually by no more than an hour.
Not implemented. (RISC OS)
Return values (especially for device and inode) may be bogus. (Win32)
Not implemented. (Win32, VMS, Plan 9, RISC OS, VOS)
open to |- and -| are unsupported. (Win32, RISC OS)
Opening a process does not automatically flush output handles on some platforms. (SunOS, Solaris, HP-UX)
Not implemented. (Win32, VMS, RISC OS)
Can't move directories between directories on different logical volumes. (Win32)
Will not cause readdir() to re-read the directory stream. The entries already read before the rewinddir() call will just be returned again from a cache buffer. (Win32)
Only implemented on sockets. (Win32, VMS)
Only reliable on sockets. (RISC OS)
Note that the select FILEHANDLE
form is generally portable.
Not implemented. (Win32, VMS, RISC OS)
Not implemented. (VMS, Win32, RISC OS)
Not implemented. (Win32, VMS, RISC OS, VOS)
Not implemented. (Win32, VMS, RISC OS, VOS)
Not implemented. (Win32, RISC OS)
Not implemented. (Plan 9)
Not implemented. (Win32, VMS, RISC OS)
Emulated using synchronization functions such that it can be interrupted by alarm(), and limited to a maximum of 4294967 seconds, approximately 49 days. (Win32)
A relatively recent addition to socket functions, may not be implemented even in Unix platforms.
Not implemented. (RISC OS)
Available on 64 bit OpenVMS 8.2 and later. (VMS)
Platforms that do not have rdev, blksize, or blocks will return these as '', so numeric comparison or manipulation of these fields may cause 'not numeric' warnings.
ctime not supported on UFS (Mac OS X).
ctime is creation time instead of inode change time (Win32).
device and inode are not meaningful. (Win32)
device and inode are not necessarily reliable. (VMS)
mtime, atime and ctime all return the last modification time. Device and inode are not necessarily reliable. (RISC OS)
dev, rdev, blksize, and blocks are not available. inode is not meaningful and will differ between stat calls on the same file. (os2)
some versions of cygwin when doing a stat("foo") and if not finding it may then attempt to stat("foo.exe") (Cygwin)
On Win32 stat() needs to open the file to determine the link count and update attributes that may have been changed through hard links. Setting ${^WIN32_SLOPPY_STAT} to a true value speeds up stat() by not performing this operation. (Win32)
Not implemented. (Win32, RISC OS)
Implemented on 64 bit VMS 8.3. VMS requires the symbolic link to be in Unix syntax if it is intended to resolve to a valid path.
Not implemented. (Win32, VMS, RISC OS, VOS)
The traditional "0", "1", and "2" MODEs are implemented with different
numeric values on some systems. The flags exported by Fcntl
(O_RDONLY, O_WRONLY, O_RDWR) should work everywhere though. (Mac
OS, OS/390)
As an optimization, may not call the command shell specified in
$ENV{PERL5SHELL}
. system(1, @args)
spawns an external
process and immediately returns its process designator, without
waiting for it to terminate. Return value may be used subsequently
in wait or waitpid. Failure to spawn() a subprocess is indicated
by setting $? to "255 << 8". $?
is set in a way compatible with
Unix (i.e. the exitstatus of the subprocess is obtained by "$?>> 8",
as described in the documentation). (Win32)
There is no shell to process metacharacters, and the native standard is
to pass a command line terminated by "\n" "\r" or "\0" to the spawned
program. Redirection such as > foo is performed (if at all) by
the run time library of the spawned program. system list will call
the Unix emulation library's exec emulation, which attempts to provide
emulation of the stdin, stdout, stderr in force in the parent, providing
the child program uses a compatible version of the emulation library.
scalar will call the native command line direct and no such emulation
of a child Unix program will exists. Mileage will vary. (RISC OS)
Does not automatically flush output handles on some platforms. (SunOS, Solaris, HP-UX)
The return value is POSIX-like (shifted up by 8 bits), which only allows
room for a made-up value derived from the severity bits of the native
32-bit condition code (unless overridden by use vmsish 'status'
).
If the native condition code is one that has a POSIX value encoded, the
POSIX value will be decoded to extract the expected exit value.
For more details see $? in perlvms. (VMS)
"cumulative" times will be bogus. On anything other than Windows NT or Windows 2000, "system" time will be bogus, and "user" time is actually the time returned by the clock() function in the C runtime library. (Win32)
Not useful. (RISC OS)
Not implemented. (Older versions of VMS)
Truncation to same-or-shorter lengths only. (VOS)
If a FILEHANDLE is supplied, it must be writable and opened in append
mode (i.e., use open(FH, '>>filename')
or sysopen(FH,...,O_APPEND|O_RDWR). If a filename is supplied, it
should not be held open elsewhere. (Win32)
Returns undef where unavailable.
umask works but the correct permissions are set only when the file
is finally closed. (AmigaOS)
Only the modification time is updated. (VMS, RISC OS)
May not behave as expected. Behavior depends on the C runtime library's implementation of utime(), and the filesystem being used. The FAT filesystem typically does not support an "access time" field, and it may limit timestamps to a granularity of two seconds. (Win32)
Can only be applied to process handles returned for processes spawned
using system(1, ...)
or pseudo processes created with fork(). (Win32)
Not useful. (RISC OS)
The following platforms are known to build Perl 5.12 (as of April 2010, its release date) from the standard source code distribution available at http://www.cpan.org/src
Caveats:
The following platforms were supported by a previous version of Perl but have been officially removed from Perl's source code as of 5.12:
The following platforms were supported up to 5.10. They may still have worked in 5.12, but supporting code has been removed for 5.14:
As of July 2002 (the Perl release 5.8.0), the following platforms were able to build Perl from the standard source code distribution available at http://www.cpan.org/src/
- AIX
- BeOS
- BSD/OS (BSDi)
- Cygwin
- DG/UX
- DOS DJGPP 1)
- DYNIX/ptx
- EPOC R5
- FreeBSD
- HI-UXMPP (Hitachi) (5.8.0 worked but we didn't know it)
- HP-UX
- IRIX
- Linux
- Mac OS Classic
- Mac OS X (Darwin)
- MPE/iX
- NetBSD
- NetWare
- NonStop-UX
- ReliantUNIX (formerly SINIX)
- OpenBSD
- OpenVMS (formerly VMS)
- Open UNIX (Unixware) (since Perl 5.8.1/5.9.0)
- OS/2
- OS/400 (using the PASE) (since Perl 5.8.1/5.9.0)
- PowerUX
- POSIX-BC (formerly BS2000)
- QNX
- Solaris
- SunOS 4
- SUPER-UX (NEC)
- Tru64 UNIX (formerly DEC OSF/1, Digital UNIX)
- UNICOS
- UNICOS/mk
- UTS
- VOS / OpenVOS
- Win95/98/ME/2K/XP 2)
- WinCE
- z/OS (formerly OS/390)
- VM/ESA
- 1) in DOS mode either the DOS or OS/2 ports can be used
- 2) compilers: Borland, MinGW (GCC), VC6
The following platforms worked with the previous releases (5.6 and 5.7), but we did not manage either to fix or to test these in time for the 5.8.0 release. There is a very good chance that many of these will work fine with the 5.8.0.
- BSD/OS
- DomainOS
- Hurd
- LynxOS
- MachTen
- PowerMAX
- SCO SV
- SVR4
- Unixware
- Windows 3.1
Known to be broken for 5.8.0 (but 5.6.1 and 5.7.2 can be used):
- AmigaOS
The following platforms have been known to build Perl from source in the past (5.005_03 and earlier), but we haven't been able to verify their status for the current release, either because the hardware/software platforms are rare or because we don't have an active champion on these platforms--or both. They used to work, though, so go ahead and try compiling them, and let perlbug@perl.org of any trouble.
- 3b1
- A/UX
- ConvexOS
- CX/UX
- DC/OSx
- DDE SMES
- DOS EMX
- Dynix
- EP/IX
- ESIX
- FPS
- GENIX
- Greenhills
- ISC
- MachTen 68k
- MPC
- NEWS-OS
- NextSTEP
- OpenSTEP
- Opus
- Plan 9
- RISC/os
- SCO ODT/OSR
- Stellar
- SVR2
- TI1500
- TitanOS
- Ultrix
- Unisys Dynix
The following platforms have their own source code distributions and binaries available via http://www.cpan.org/ports/
- Perl release
- OS/400 (ILE) 5.005_02
- Tandem Guardian 5.004
The following platforms have only binaries available via http://www.cpan.org/ports/index.html :
- Perl release
- Acorn RISCOS 5.005_02
- AOS 5.002
- LynxOS 5.004_02
Although we do suggest that you always build your own Perl from the source code, both for maximal configurability and for security, in case you are in a hurry you can check http://www.cpan.org/ports/index.html for binary distributions.
perlaix, perlamiga, perlbs2000, perlce, perlcygwin, perldgux, perldos, perlebcdic, perlfreebsd, perlhurd, perlhpux, perlirix, perlmacos, perlmacosx, perlnetware, perlos2, perlos390, perlos400, perlplan9, perlqnx, perlsolaris, perltru64, perlunicode, perlvms, perlvos, perlwin32, and Win32.
Abigail <abigail@foad.org>, Charles Bailey <bailey@newman.upenn.edu>, Graham Barr <gbarr@pobox.com>, Tom Christiansen <tchrist@perl.com>, Nicholas Clark <nick@ccl4.org>, Thomas Dorner <Thomas.Dorner@start.de>, Andy Dougherty <doughera@lafayette.edu>, Dominic Dunlop <domo@computer.org>, Neale Ferguson <neale@vma.tabnsw.com.au>, David J. Fiander <davidf@mks.com>, Paul Green <Paul.Green@stratus.com>, M.J.T. Guy <mjtg@cam.ac.uk>, Jarkko Hietaniemi <jhi@iki.fi>, Luther Huffman <lutherh@stratcom.com>, Nick Ing-Simmons <nick@ing-simmons.net>, Andreas J. König <a.koenig@mind.de>, Markus Laker <mlaker@contax.co.uk>, Andrew M. Langmead <aml@world.std.com>, Larry Moore <ljmoore@freespace.net>, Paul Moore <Paul.Moore@uk.origin-it.com>, Chris Nandor <pudge@pobox.com>, Matthias Neeracher <neeracher@mac.com>, Philip Newton <pne@cpan.org>, Gary Ng <71564.1743@CompuServe.COM>, Tom Phoenix <rootbeer@teleport.com>, André Pirard <A.Pirard@ulg.ac.be>, Peter Prymmer <pvhp@forte.com>, Hugo van der Sanden <hv@crypt0.demon.co.uk>, Gurusamy Sarathy <gsar@activestate.com>, Paul J. Schinder <schinder@pobox.com>, Michael G Schwern <schwern@pobox.com>, Dan Sugalski <dan@sidhe.org>, Nathan Torkington <gnat@frii.com>, John Malmberg <wb8tyw@qsl.net>
perlpragma - how to write a user pragma
A pragma is a module which influences some aspect of the compile time or run
time behaviour of Perl, such as strict
or warnings
. With Perl 5.10 you
are no longer limited to the built in pragmata; you can now create user
pragmata that modify the behaviour of user functions within a lexical scope.
For example, say you need to create a class implementing overloaded
mathematical operators, and would like to provide your own pragma that
functions much like use integer;
You'd like this code
to give the output
- A: 4.6
- B: 4
- C: 4.6
- D: 4
- E: 4.6
i.e., where use myint;
is in effect, addition operations are forced
to integer, whereas by default they are not, with the default behaviour being
restored via no myint;
The minimal implementation of the package MyMaths
would be something like
this:
Note how we load the user pragma myint
with an empty list ()
to
prevent its import being called.
The interaction with the Perl compilation happens inside package myint
:
As pragmata are implemented as modules, like any other module, use myint;
becomes
- BEGIN {
- require myint;
- myint->import();
- }
and no myint;
is
- BEGIN {
- require myint;
- myint->unimport();
- }
Hence the import and unimport
routines are called at compile time
for the user's code.
User pragmata store their state by writing to the magical hash %^H
,
hence these two routines manipulate it. The state information in %^H
is
stored in the optree, and can be retrieved read-only at runtime with caller(),
at index 10 of the list of returned results. In the example pragma, retrieval
is encapsulated into the routine in_effect()
, which takes as parameter
the number of call frames to go up to find the value of the pragma in the
user's script. This uses caller() to determine the value of
$^H{"myint/in_effect"}
when each line of the user's script was called, and
therefore provide the correct semantics in the subroutine implementing the
overloaded addition.
There is only a single %^H
, but arbitrarily many modules that want
to use its scoping semantics. To avoid stepping on each other's toes,
they need to be sure to use different keys in the hash. It is therefore
conventional for a module to use only keys that begin with the module's
name (the name of its main package) and a "/" character. After this
module-identifying prefix, the rest of the key is entirely up to the
module: it may include any characters whatsoever. For example, a module
Foo::Bar
should use keys such as Foo::Bar/baz
and Foo::Bar/$%/_!
.
Modules following this convention all play nicely with each other.
The Perl core uses a handful of keys in %^H
which do not follow this
convention, because they predate it. Keys that follow the convention
won't conflict with the core's historical keys.
The optree is shared between threads. This means there is a possibility that
the optree will outlive the particular thread (and therefore the interpreter
instance) that created it, so true Perl scalars cannot be stored in the
optree. Instead a compact form is used, which can only store values that are
integers (signed and unsigned), strings or undef - references and
floating point values are stringified. If you need to store multiple values
or complex structures, you should serialise them, for example with pack.
The deletion of a hash key from %^H
is recorded, and as ever can be
distinguished from the existence of a key with value undef with
exists.
Don't attempt to store references to data structures as integers which
are retrieved via caller and converted back, as this will not be threadsafe.
Accesses would be to the structure without locking (which is not safe for
Perl's scalars), and either the structure has to leak, or it has to be
freed when its creating thread terminates, which may be before the optree
referencing it is deleted, if other threads outlive it.
perlqnx - Perl version 5 on QNX
As of perl5.7.2 all tests pass under:
- QNX 4.24G
- Watcom 10.6 with Beta/970211.wcc.update.tar.F
- socket3r.lib Nov21 1996.
As of perl5.8.1 there is at least one test still failing.
Some tests may complain under known circumstances.
See below and hints/qnx.sh for more information.
Under QNX 6.2.0 there are still a few tests which fail. See below and hints/qnx.sh for more information.
As with many unix ports, this one depends on a few "standard" unix utilities which are not necessarily standard for QNX4.
This is used heavily by Configure and then by perl itself. QNX4's version is fine, but Configure will choke on the 16-bit version, so if you are running QNX 4.22, link /bin/sh to /bin32/ksh
This is the standard unix library builder. We use wlib. With Watcom 10.6, when wlib is linked as "ar", it behaves like ar and all is fine. Under 9.5, a cover is required. One is included in ../qnx
This is used (optionally) by configure to list the contents of libraries. I will generate a cover function on the fly in the UU directory.
Configure and perl need a way to invoke a C preprocessor. I have created a simple cover for cc which does the right thing. Without this, Configure will create its own wrapper which works, but it doesn't handle some of the command line arguments that perl will throw at it.
You really need GNU make to compile this. GNU make ships by default with QNX 4.23, but you can get it from quics for earlier versions.
There is no support for dynamically linked libraries in QNX4.
If you wish to compile with the Socket extension, you need to have the TCP/IP toolkit, and you need to make sure that -lsocket locates the correct copy of socket3r.lib. Beware that the Watcom compiler ships with a stub version of socket3r.lib which has very little functionality. Also beware the order in which wlink searches directories for libraries. You may have /usr/lib/socket3r.lib pointing to the correct library, but wlink may pick up /usr/watcom/10.6/usr/lib/socket3r.lib instead. Make sure they both point to the correct library, that is, /usr/tcptk/current/usr/lib/socket3r.lib.
The following tests may report errors under QNX4:
dist/Cwd/Cwd.t will complain if `pwd` and cwd don't give the same results. cwd calls `fullpath -t`, so if you cd `fullpath -t` before running the test, it will pass.
lib/File/Find/taint.t will complain if '.' is in your PATH. The PATH test is triggered because cwd calls `fullpath -t`.
ext/IO/lib/IO/t/io_sock.t: Subtests 14 and 22 are skipped due to the fact that the functionality to read back the non-blocking status of a socket is not implemented in QNX's TCP/IP. This has been reported to QNX and it may work with later versions of TCP/IP.
t/io/tell.t: Subtest 27 is failing. We are still investigating.
The files in the "qnx" directory are:
A script that emulates the standard unix archive (aka library) utility. Under Watcom 10.6, ar is linked to wlib and provides the expected interface. With Watcom 9.5, a cover function is required. This one is fairly crude but has proved adequate for compiling perl.
A script that provides C preprocessing functionality. Configure can generate a similar cover, but it doesn't handle all the command-line options that perl throws at it. This might be reasonably placed in /usr/local/bin.
The following tests are still failing for Perl 5.8.1 under QNX 6.2.0:
- op/sprintf.........................FAILED at test 91
- lib/Benchmark......................FAILED at test 26
This is due to a bug in the C library's printf routine. printf("'%e'", 0. ) produces '0.000000e+0', but ANSI requires '0.000000e+00'. QNX has acknowledged the bug.
Norton T. Allen (allen@huarp.harvard.edu)
perlre - Perl regular expressions
This page describes the syntax of regular expressions in Perl.
If you haven't used regular expressions before, a quick-start introduction is available in perlrequick, and a longer tutorial introduction is available in perlretut.
For reference on how regular expressions are used in matching
operations, plus various examples of the same, see discussions of
m//, s///, qr// and ??
in Regexp Quote-Like Operators in perlop.
Matching operations can have various modifiers. Modifiers that relate to the interpretation of the regular expression inside are listed below. Modifiers that alter the way a regular expression is used by Perl are detailed in Regexp Quote-Like Operators in perlop and Gory details of parsing quoted constructs in perlop.
Treat string as multiple lines. That is, change "^" and "$" from matching the start or end of line only at the left and right ends of the string to matching them anywhere within the string.
Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.
Used together, as /ms, they let the "." match any character whatsoever,
while still allowing "^" and "$" to match, respectively, just after
and just before newlines within the string.
Do case-insensitive pattern matching.
If locale matching rules are in effect, the case map is taken from the current locale for code points less than 255, and from Unicode rules for larger code points. However, matches that would cross the Unicode rules/non-Unicode rules boundary (ords 255/256) will not succeed. See perllocale.
There are a number of Unicode characters that match multiple characters
under /i. For example, LATIN SMALL LIGATURE FI
should match the sequence fi
. Perl is not
currently able to do this when the multiple characters are in the pattern and
are split between groupings, or when one or more are quantified. Thus
- "\N{LATIN SMALL LIGATURE FI}" =~ /fi/i; # Matches
- "\N{LATIN SMALL LIGATURE FI}" =~ /[fi][fi]/i; # Doesn't match!
- "\N{LATIN SMALL LIGATURE FI}" =~ /fi*/i; # Doesn't match!
- # The below doesn't match, and it isn't clear what $1 and $2 would
- # be even if it did!!
- "\N{LATIN SMALL LIGATURE FI}" =~ /(f)(i)/i; # Doesn't match!
Perl doesn't match multiple characters in a bracketed character class unless the character that maps to them is explicitly mentioned, and it doesn't match them at all if the character class is inverted, which otherwise could be highly confusing. See Bracketed Character Classes in perlrecharclass, and Negation in perlrecharclass.
Extend your pattern's legibility by permitting whitespace and comments. Details in /x
Preserve the string matched such that ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} are available for use after matching.
Global matching, and keep the Current position after failed matching. Unlike i, m, s and x, these two flags affect the way the regex is used rather than the regex itself. See Using regular expressions in Perl in perlretut for further explanation of the g and c modifiers.
These modifiers, all new in 5.14, affect which character-set semantics (Unicode, etc.) are used, as described below in Character set modifiers.
Regular expression modifiers are usually written in documentation
as e.g., "the /x modifier", even though the delimiter
in question might not really be a slash. The modifiers /imsxadlup
may also be embedded within the regular expression itself using
the (?...) construct, see Extended Patterns below.
/x tells
the regular expression parser to ignore most whitespace that is neither
backslashed nor within a character class. You can use this to break up
your regular expression into (slightly) more readable parts. The #
character is also treated as a metacharacter introducing a comment,
just as in ordinary Perl code. This also means that if you want real
whitespace or #
characters in the pattern (outside a character
class, where they are unaffected by /x), then you'll either have to
escape them (using backslashes or \Q...\E
) or encode them using octal,
hex, or \N{}
escapes. Taken together, these features go a long way towards
making Perl's regular expressions more readable. Note that you have to
be careful not to include the pattern delimiter in the comment--perl has
no way of knowing you did not intend to close the pattern early. See
the C-comment deletion code in perlop. Also note that anything inside
a \Q...\E
stays unaffected by /x. And note that /x doesn't affect
space interpretation within a single multi-character construct. For
example in \x{...}
, regardless of the /x modifier, there can be no
spaces. Same for a quantifier such as {3}
or
{5,}
. Similarly, (?:...) can't have a space between the (,
?, and :
. Within any delimiters for such a
construct, allowed spaces are not affected by /x, and depend on the
construct. For example, \x{...}
can't have spaces because hexadecimal
numbers don't have spaces in them. But, Unicode properties can have spaces, so
in \p{...}
there can be spaces that follow the Unicode rules, for which see
Properties accessible through \p{} and \P{} in perluniprops.
/d, /u
, /a
, and /l
, available starting in 5.14, are called
the character set modifiers; they affect the character set semantics
used for the regular expression.
The /d, /u
, and /l
modifiers are not likely to be of much use
to you, and so you need not worry about them very much. They exist for
Perl's internal use, so that complex regular expression data structures
can be automatically serialized and later exactly reconstituted,
including all their nuances. But, since Perl can't keep a secret, and
there may be rare instances where they are useful, they are documented
here.
The /a
modifier, on the other hand, may be useful. Its purpose is to
allow code that is to work mostly on ASCII data to not have to concern
itself with Unicode.
Briefly, /l
sets the character set to that of whatever Locale is in
effect at the time of the execution of the pattern match.
/u
sets the character set to Unicode.
/a
also sets the character set to Unicode, BUT adds several
restrictions for ASCII-safe matching.
/d is the old, problematic, pre-5.14 Default character set
behavior. Its only use is to force that old behavior.
At any given time, exactly one of these modifiers is in effect. Their existence allows Perl to keep the originally compiled behavior of a regular expression, regardless of what rules are in effect when it is actually executed. And if it is interpolated into a larger regex, the original's rules continue to apply to it, and only it.
The /l
and /u
modifiers are automatically selected for
regular expressions compiled within the scope of various pragmas,
and we recommend that in general, you use those pragmas instead of
specifying these modifiers explicitly. For one thing, the modifiers
affect only pattern matching, and do not extend to even any replacement
done, whereas using the pragmas give consistent results for all
appropriate operations within their scopes. For example,
- s/foo/\Ubar/il
will match "foo" using the locale's rules for case-insensitive matching,
but the /l
does not affect how the \U
operates. Most likely you
want both of them to use locale rules. To do this, instead compile the
regular expression within the scope of use locale
. This both
implicitly adds the /l
and applies locale rules to the \U
. The
lesson is to use locale
and not /l
explicitly.
Similarly, it would be better to use use feature 'unicode_strings'
instead of,
- s/foo/\Lbar/iu
to get Unicode rules, as the \L
in the former (but not necessarily
the latter) would also use Unicode rules.
More detail on each of the modifiers follows. Most likely you don't
need to know this detail for /l
, /u
, and /d, and can skip ahead
to /a.
means to use the current locale's rules (see perllocale) when pattern
matching. For example, \w
will match the "word" characters of that
locale, and "/i"
case-insensitive matching will match according to
the locale's case folding rules. The locale used will be the one in
effect at the time of execution of the pattern match. This may not be
the same as the compilation-time locale, and can differ from one match
to another if there is an intervening call of the
setlocale() function.
Perl only supports single-byte locales. This means that code points
above 255 are treated as Unicode no matter what locale is in effect.
Under Unicode rules, there are a few case-insensitive matches that cross
the 255/256 boundary. These are disallowed under /l
. For example,
0xFF (on ASCII platforms) does not caselessly match the character at
0x178, LATIN CAPITAL LETTER Y WITH DIAERESIS
, because 0xFF may not be
LATIN SMALL LETTER Y WITH DIAERESIS
in the current locale, and Perl
has no way of knowing if that character even exists in the locale, much
less what code point it is.
This modifier may be specified to be the default by use locale
, but
see Which character set modifier is in effect?.
means to use Unicode rules when pattern matching. On ASCII platforms,
this means that the code points between 128 and 255 take on their
Latin-1 (ISO-8859-1) meanings (which are the same as Unicode's).
(Otherwise Perl considers their meanings to be undefined.) Thus,
under this modifier, the ASCII platform effectively becomes a Unicode
platform; and hence, for example, \w
will match any of the more than
100_000 word characters in Unicode.
Unlike most locales, which are specific to a language and country pair,
Unicode classifies all the characters that are letters somewhere in
the world as
\w
. For example, your locale might not think that LATIN SMALL
LETTER ETH
is a letter (unless you happen to speak Icelandic), but
Unicode does. Similarly, all the characters that are decimal digits
somewhere in the world will match \d
; this is hundreds, not 10,
possible matches. And some of those digits look like some of the 10
ASCII digits, but mean a different number, so a human could easily think
a number is a different quantity than it really is. For example,
BENGALI DIGIT FOUR
(U+09EA) looks very much like an
ASCII DIGIT EIGHT
(U+0038). And, \d+
, may match strings of digits
that are a mixture from different writing systems, creating a security
issue. num() in Unicode::UCD can be used to sort
this out. Or the /a
modifier can be used to force \d
to match
just the ASCII 0 through 9.
Also, under this modifier, case-insensitive matching works on the full
set of Unicode
characters. The KELVIN SIGN
, for example matches the letters "k" and
"K"; and LATIN SMALL LIGATURE FF
matches the sequence "ff", which,
if you're not prepared, might make it look like a hexadecimal constant,
presenting another potential security issue. See
http://unicode.org/reports/tr36 for a detailed discussion of Unicode
security issues.
This modifier may be specified to be the default by use feature
'unicode_strings
, use locale ':not_characters'
, or
use VERSION (or higher),
but see Which character set modifier is in effect?.
This modifier means to use the "Default" native rules of the platform except when there is cause to use Unicode rules instead, as follows:
the target string is encoded in UTF-8; or
the pattern is encoded in UTF-8; or
the pattern explicitly mentions a code point that is above 255 (say by
\x{100}
); or
the pattern uses a Unicode name (\N{...}
); or
the pattern uses a Unicode property (\p{...}
); or
the pattern uses (?[ ])
Another mnemonic for this modifier is "Depends", as the rules actually used depend on various things, and as a result you can get unexpected results. See The Unicode Bug in perlunicode. The Unicode Bug has become rather infamous, leading to yet another (printable) name for this modifier, "Dodgy".
Unless the pattern or string are encoded in UTF-8, only ASCII characters can match positively.
Here are some examples of how that works on an ASCII platform:
- $str = "\xDF"; # $str is not in UTF-8 format.
- $str =~ /^\w/; # No match, as $str isn't in UTF-8 format.
- $str .= "\x{0e0b}"; # Now $str is in UTF-8 format.
- $str =~ /^\w/; # Match! $str is now in UTF-8 format.
- chop $str;
- $str =~ /^\w/; # Still a match! $str remains in UTF-8 format.
This modifier is automatically selected by default when none of the others are, so yet another name for it is "Default".
Because of the unexpected behaviors associated with this modifier, you probably should only use it to maintain weird backward compatibilities.
This modifier stands for ASCII-restrict (or ASCII-safe). This modifier, unlike the others, may be doubled-up to increase its effect.
When it appears singly, it causes the sequences \d
, \s, \w
, and
the Posix character classes to match only in the ASCII range. They thus
revert to their pre-5.6, pre-Unicode meanings. Under /a
, \d
always means precisely the digits "0"
to "9"
; \s means the five
characters [ \f\n\r\t]
, and starting in Perl v5.18, experimentally,
the vertical tab; \w
means the 63 characters
[A-Za-z0-9_]
; and likewise, all the Posix classes such as
[[:print:]] match only the appropriate ASCII-range characters.
This modifier is useful for people who only incidentally use Unicode, and who do not wish to be burdened with its complexities and security concerns.
With /a
, one can write \d
with confidence that it will only match
ASCII characters, and should the need arise to match beyond ASCII, you
can instead use \p{Digit}
(or \p{Word}
for \w
). There are
similar \p{...}
constructs that can match beyond ASCII both white
space (see Whitespace in perlrecharclass), and Posix classes (see
POSIX Character Classes in perlrecharclass). Thus, this modifier
doesn't mean you can't use Unicode, it means that to get Unicode
matching you must explicitly use a construct (\p{}
, \P{}
) that
signals Unicode.
As you would expect, this modifier causes, for example, \D
to mean
the same thing as [^0-9]
; in fact, all non-ASCII characters match
\D
, \S
, and \W
. \b
still means to match at the boundary
between \w
and \W
, using the /a
definitions of them (similarly
for \B
).
Otherwise, /a
behaves like the /u
modifier, in that
case-insensitive matching uses Unicode semantics; for example, "k" will
match the Unicode \N{KELVIN SIGN}
under /i matching, and code
points in the Latin1 range, above ASCII will have Unicode rules when it
comes to case-insensitive matching.
To forbid ASCII/non-ASCII matches (like "k" with \N{KELVIN SIGN}
),
specify the "a" twice, for example /aai
or /aia
. (The first
occurrence of "a" restricts the \d
, etc., and the second occurrence
adds the /i restrictions.) But, note that code points outside the
ASCII range will use Unicode rules for /i matching, so the modifier
doesn't really restrict things to just ASCII; it just forbids the
intermixing of ASCII and non-ASCII.
To summarize, this modifier provides protection for applications that don't wish to be exposed to all of Unicode. Specifying it twice gives added protection.
This modifier may be specified to be the default by use re '/a'
or use re '/aa'
. If you do so, you may actually have occasion to use
the /u
modifier explictly if there are a few regular expressions
where you do want full Unicode rules (but even here, it's best if
everything were under feature "unicode_strings"
, along with the
use re '/aa'
). Also see Which character set modifier is in effect?.
Which of these modifiers is in effect at any given point in a regular expression depends on a fairly complex set of interactions. These have been designed so that in general you don't have to worry about it, but this section gives the gory details. As explained below in Extended Patterns it is possible to explicitly specify modifiers that apply only to portions of a regular expression. The innermost always has priority over any outer ones, and one applying to the whole expression has priority over any of the default settings that are described in the remainder of this section.
The use re '/foo' pragma can be used to set
default modifiers (including these) for regular expressions compiled
within its scope. This pragma has precedence over the other pragmas
listed below that also change the defaults.
Otherwise, use locale sets the default modifier to /l
;
and use feature 'unicode_strings, or
use VERSION (or higher) set the default to
/u
when not in the same scope as either use locale
or use bytes.
(use locale ':not_characters' also
sets the default to /u
, overriding any plain use locale
.)
Unlike the mechanisms mentioned above, these
affect operations besides regular expressions pattern matching, and so
give more consistent results with other operators, including using
\U
, \l
, etc. in substitution replacements.
If none of the above apply, for backwards compatibility reasons, the
/d modifier is the one in effect by default. As this can lead to
unexpected results, it is best to specify which other rule set should be
used.
Prior to 5.14, there were no explicit modifiers, but /l
was implied
for regexes compiled within the scope of use locale
, and /d was
implied otherwise. However, interpolating a regex into a larger regex
would ignore the original compilation in favor of whatever was in effect
at the time of the second compilation. There were a number of
inconsistencies (bugs) with the /d modifier, where Unicode rules
would be used when inappropriate, and vice versa. \p{}
did not imply
Unicode rules, and neither did all occurrences of \N{}
, until 5.12.
The patterns used in Perl pattern matching evolved from those supplied in the Version 8 regex routines. (The routines are derived (distantly) from Henry Spencer's freely redistributable reimplementation of the V8 routines.) See Version 8 Regular Expressions for details.
In particular the following metacharacters have their standard egrep-ish meanings:
- \ Quote the next metacharacter
- ^ Match the beginning of the line
- . Match any character (except newline)
- $ Match the end of the line (or before newline at the end)
- | Alternation
- () Grouping
- [] Bracketed Character class
By default, the "^" character is guaranteed to match only the
beginning of the string, the "$" character only the end (or before the
newline at the end), and Perl does certain optimizations with the
assumption that the string contains only one line. Embedded newlines
will not be matched by "^" or "$". You may, however, wish to treat a
string as a multi-line buffer, such that the "^" will match after any
newline within the string (except if the newline is the last character in
the string), and "$" will match before any newline. At the
cost of a little more overhead, you can do this by using the /m modifier
on the pattern match operator. (Older programs did this by setting $*
,
but this option was removed in perl 5.10.)
To simplify multi-line substitutions, the "." character never matches a
newline unless you use the /s modifier, which in effect tells Perl to pretend
the string is a single line--even if it isn't.
The following standard quantifiers are recognized:
(If a curly bracket occurs in any other context and does not form part of
a backslashed sequence like \x{...}
, it is treated as a regular
character. In particular, the lower quantifier bound is not optional,
and a typo in a quantifier silently causes it to be treated as the
literal characters. For example,
- /o{4,3}/
looks like a quantifier that matches 0 times, since 4 is greater than 3,
but it really means to match the sequence of six characters
"o { 4 , 3 }"
. It is planned to eventually require literal uses
of curly brackets to be escaped, say by preceding them with a backslash
or enclosing them within square brackets, ("\{"
or "[{]"
). This
change will allow for future syntax extensions (like making the lower
bound of a quantifier optional), and better error checking. In the
meantime, you should get in the habit of escaping all instances where
you mean a literal "{".)
The "*" quantifier is equivalent to {0,}
, the "+"
quantifier to {1,}
, and the "?" quantifier to {0,1}
. n and m are limited
to non-negative integral values less than a preset limit defined when perl is built.
This is usually 32766 on the most common platforms. The actual limit can
be seen in the error message generated by code such as this:
- $_ **= $_ , / {$_} / for 2 .. 42;
By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?". Note that the meanings don't change, just the "greediness":
By default, when a quantified subpattern does not allow the rest of the overall pattern to match, Perl will backtrack. However, this behaviour is sometimes undesirable. Thus Perl provides the "possessive" quantifier form as well.
- *+ Match 0 or more times and give nothing back
- ++ Match 1 or more times and give nothing back
- ?+ Match 0 or 1 time and give nothing back
- {n}+ Match exactly n times and give nothing back (redundant)
- {n,}+ Match at least n times and give nothing back
- {n,m}+ Match at least n but not more than m times and give nothing back
For instance,
- 'aaaa' =~ /a++a/
will never match, as the a++
will gobble up all the a
's in the
string and won't leave any for the remaining part of the pattern. This
feature can be extremely useful to give perl hints about where it
shouldn't backtrack. For instance, the typical "match a double-quoted
string" problem can be most efficiently performed when written as:
- /"(?:[^"\\]++|\\.)*+"/
as we know that if the final quote does not match, backtracking will not help. See the independent subexpression (?>pattern) for more details; possessive quantifiers are just syntactic sugar for that construct. For instance the above example could also be written as follows:
- /"(?>(?:(?>[^"\\]+)|\\.)*)"/
Because patterns are processed as double-quoted strings, the following also work:
- \t tab (HT, TAB)
- \n newline (LF, NL)
- \r return (CR)
- \f form feed (FF)
- \a alarm (bell) (BEL)
- \e escape (think troff) (ESC)
- \cK control char (example: VT)
- \x{}, \x00 character whose ordinal is the given hexadecimal number
- \N{name} named Unicode character or character sequence
- \N{U+263D} Unicode character (example: FIRST QUARTER MOON)
- \o{}, \000 character whose ordinal is the given octal number
- \l lowercase next char (think vi)
- \u uppercase next char (think vi)
- \L lowercase till \E (think vi)
- \U uppercase till \E (think vi)
- \Q quote (disable) pattern metacharacters till \E
- \E end either case modification or quoted section, think vi
Details are in Quote and Quote-like Operators in perlop.
In addition, Perl defines the following:
- Sequence Note Description
- [...] [1] Match a character according to the rules of the
- bracketed character class defined by the "...".
- Example: [a-z] matches "a" or "b" or "c" ... or "z"
- [[:...:]] [2] Match a character according to the rules of the POSIX
- character class "..." within the outer bracketed
- character class. Example: [[:upper:]] matches any
- uppercase character.
- (?[...]) [8] Extended bracketed character class
- \w [3] Match a "word" character (alphanumeric plus "_", plus
- other connector punctuation chars plus Unicode
- marks)
- \W [3] Match a non-"word" character
- \s [3] Match a whitespace character
- \S [3] Match a non-whitespace character
- \d [3] Match a decimal digit character
- \D [3] Match a non-digit character
- \pP [3] Match P, named property. Use \p{Prop} for longer names
- \PP [3] Match non-P
- \X [4] Match Unicode "eXtended grapheme cluster"
- \C Match a single C-language char (octet) even if that is
- part of a larger UTF-8 character. Thus it breaks up
- characters into their UTF-8 bytes, so you may end up
- with malformed pieces of UTF-8. Unsupported in
- lookbehind.
- \1 [5] Backreference to a specific capture group or buffer.
- '1' may actually be any positive integer.
- \g1 [5] Backreference to a specific or previous group,
- \g{-1} [5] The number may be negative indicating a relative
- previous group and may optionally be wrapped in
- curly brackets for safer parsing.
- \g{name} [5] Named backreference
- \k<name> [5] Named backreference
- \K [6] Keep the stuff left of the \K, don't include it in $&
- \N [7] Any character but \n. Not affected by /s modifier
- \v [3] Vertical whitespace
- \V [3] Not vertical whitespace
- \h [3] Horizontal whitespace
- \H [3] Not horizontal whitespace
- \R [4] Linebreak
See Bracketed Character Classes in perlrecharclass for details.
See POSIX Character Classes in perlrecharclass for details.
See Backslash sequences in perlrecharclass for details.
See Misc in perlrebackslash for details.
See Capture groups below for details.
See Extended Patterns below for details.
Note that \N
has two meanings. When of the form \N{NAME}
, it matches the
character or character sequence whose name is NAME
; and similarly
when of the form \N{U+hex}, it matches the character whose Unicode
code point is hex. Otherwise it matches any character but \n
.
See Extended Bracketed Character Classes in perlrecharclass for details.
Perl defines the following zero-width assertions:
- \b Match a word boundary
- \B Match except at a word boundary
- \A Match only at beginning of string
- \Z Match only at end of string, or before newline at the end
- \z Match only at end of string
- \G Match only at pos() (e.g. at the end-of-match position
- of prior m//g)
A word boundary (\b
) is a spot between two characters
that has a \w
on one side of it and a \W
on the other side
of it (in either order), counting the imaginary characters off the
beginning and end of the string as matching a \W
. (Within
character classes \b
represents backspace rather than a word
boundary, just as it normally does in any double-quoted string.)
The \A
and \Z
are just like "^" and "$", except that they
won't match multiple times when the /m modifier is used, while
"^" and "$" will match at every internal line boundary. To match
the actual end of the string and not ignore an optional trailing
newline, use \z
.
The \G
assertion can be used to chain global matches (using
m//g), as described in Regexp Quote-Like Operators in perlop.
It is also useful when writing lex
-like scanners, when you have
several patterns that you want to match against consequent substrings
of your string; see the previous reference. The actual location
where \G
will match can also be influenced by using pos() as
an lvalue: see pos. Note that the rule for zero-length
matches (see Repeated Patterns Matching a Zero-length Substring)
is modified somewhat, in that contents to the left of \G
are
not counted when determining the length of the match. Thus the following
will not match forever:
It will print 'A' and then terminate, as it considers the match to be zero-width, and thus will not match at the same position twice in a row.
It is worth noting that \G
improperly used can result in an infinite
loop. Take care when using patterns that include \G
in an alternation.
The bracketing construct ( ... )
creates capture groups (also referred to as
capture buffers). To refer to the current contents of a group later on, within
the same pattern, use \g1
(or \g{1}
) for the first, \g2
(or \g{2}
)
for the second, and so on.
This is called a backreference.
There is no limit to the number of captured substrings that you may use.
Groups are numbered with the leftmost open parenthesis being number 1, etc. If
a group did not match, the associated backreference won't match either. (This
can happen if the group is optional, or in a different branch of an
alternation.)
You can omit the "g"
, and write "\1"
, etc, but there are some issues with
this form, described below.
You can also refer to capture groups relatively, by using a negative number, so
that \g-1
and \g{-1}
both refer to the immediately preceding capture
group, and \g-2
and \g{-2}
both refer to the group before it. For
example:
- /
- (Y) # group 1
- ( # group 2
- (X) # group 3
- \g{-1} # backref to group 3
- \g{-3} # backref to group 1
- )
- /x
would match the same as /(Y) ( (X) \g3 \g1 )/x
. This allows you to
interpolate regexes into larger regexes and not have to worry about the
capture groups being renumbered.
You can dispense with numbers altogether and create named capture groups.
The notation is (?<name>...) to declare and \g{name} to
reference. (To be compatible with .Net regular expressions, \g{name} may
also be written as \k{name}, \k<name> or \k'name'.)
name must not begin with a number, nor contain hyphens.
When different groups within the same pattern have the same name, any reference
to that name assumes the leftmost defined group. Named groups count in
absolute and relative numbering, and so can also be referred to by those
numbers.
(It's possible to do things with named capture groups that would otherwise
require (??{})
.)
Capture group contents are dynamically scoped and available to you outside the
pattern until the end of the enclosing block or until the next successful
match, whichever comes first. (See Compound Statements in perlsyn.)
You can refer to them by absolute number (using "$1"
instead of "\g1"
,
etc); or by name via the %+
hash, using "$+{name}".
Braces are required in referring to named capture groups, but are optional for
absolute or relative numbered ones. Braces are safer when creating a regex by
concatenating smaller strings. For example if you have qr/$a$b/, and $a
contained "\g1"
, and $b
contained "37"
, you would get /\g137/
which
is probably not what you intended.
The \g
and \k
notations were introduced in Perl 5.10.0. Prior to that
there were no named nor relative numbered capture groups. Absolute numbered
groups were referred to using \1
,
\2
, etc., and this notation is still
accepted (and likely always will be). But it leads to some ambiguities if
there are more than 9 capture groups, as \10
could mean either the tenth
capture group, or the character whose ordinal in octal is 010 (a backspace in
ASCII). Perl resolves this ambiguity by interpreting \10
as a backreference
only if at least 10 left parentheses have opened before it. Likewise \11
is
a backreference only if at least 11 left parentheses have opened before it.
And so on. \1
through \9
are always interpreted as backreferences.
There are several examples below that illustrate these perils. You can avoid
the ambiguity by always using \g{}
or \g
if you mean capturing groups;
and for octal constants always using \o{}
, or for \077
and below, using 3
digits padded with leading zeros, since a leading zero implies an octal
constant.
The \digit notation also works in certain circumstances outside
the pattern. See Warning on \1 Instead of $1 below for details.
Examples:
- s/^([^ ]*) *([^ ]*)/$2 $1/; # swap first two words
- /(.)\g1/ # find first doubled char
- and print "'$1' is the first doubled character\n";
- /(?<char>.)\k<char>/ # ... a different way
- and print "'$+{char}' is the first doubled character\n";
- /(?'char'.)\g1/ # ... mix and match
- and print "'$1' is the first doubled character\n";
- if (/Time: (..):(..):(..)/) { # parse out values
- $hours = $1;
- $minutes = $2;
- $seconds = $3;
- }
- /(.)(.)(.)(.)(.)(.)(.)(.)(.)\g10/ # \g10 is a backreference
- /(.)(.)(.)(.)(.)(.)(.)(.)(.)\10/ # \10 is octal
- /((.)(.)(.)(.)(.)(.)(.)(.)(.))\10/ # \10 is a backreference
- /((.)(.)(.)(.)(.)(.)(.)(.)(.))\010/ # \010 is octal
- $a = '(.)\1'; # Creates problems when concatenated.
- $b = '(.)\g{1}'; # Avoids the problems.
- "aa" =~ /${a}/; # True
- "aa" =~ /${b}/; # True
- "aa0" =~ /${a}0/; # False!
- "aa0" =~ /${b}0/; # True
- "aa\x08" =~ /${a}0/; # True!
- "aa\x08" =~ /${b}0/; # False
Several special variables also refer back to portions of the previous
match. $+
returns whatever the last bracket match matched.
$&
returns the entire matched string. (At one point $0
did
also, but now it returns the name of the program.) $`
returns
everything before the matched string. $'
returns everything
after the matched string. And $^N
contains whatever was matched by
the most-recently closed group (submatch). $^N
can be used in
extended patterns (see below), for example to assign a submatch to a
variable.
These special variables, like the %+
hash and the numbered match variables
($1
, $2
, $3
, etc.) are dynamically scoped
until the end of the enclosing block or until the next successful
match, whichever comes first. (See Compound Statements in perlsyn.)
NOTE: Failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.
WARNING: Once Perl sees that you need one of $&
, $`
, or
$'
anywhere in the program, it has to provide them for every
pattern match. This may substantially slow your program. Perl
uses the same mechanism to produce $1
, $2
, etc, so you also pay a
price for each pattern that contains capturing parentheses. (To
avoid this cost while retaining the grouping behaviour, use the
extended regular expression (?: ... ) instead.) But if you never
use $&
, $`
or $'
, then patterns without capturing
parentheses will not be penalized. So avoid $&
, $'
, and $`
if you can, but if you can't (and some algorithms really appreciate
them), once you've used them once, use them at will, because you've
already paid the price. As of 5.17.4, the presence of each of the three
variables in a program is recorded separately, and depending on
circumstances, perl may be able be more efficient knowing that only $&
rather than all three have been seen, for example.
As a workaround for this problem, Perl 5.10.0 introduces ${^PREMATCH}
,
${^MATCH}
and ${^POSTMATCH}
, which are equivalent to $`
, $&
and $'
, except that they are only guaranteed to be defined after a
successful match that was executed with the /p (preserve) modifier.
The use of these variables incurs no global performance penalty, unlike
their punctuation char equivalents, however at the trade-off that you
have to tell perl when you want to use them.
Backslashed metacharacters in Perl are alphanumeric, such as \b
,
\w
, \n
. Unlike some other regular expression languages, there
are no backslashed symbols that aren't alphanumeric. So anything
that looks like \\, \(, \), \[, \], \{, or \} is always
interpreted as a literal character, not a metacharacter. This was
once used in a common idiom to disable or quote the special meanings
of regular expression metacharacters in a string that you want to
use for a pattern. Simply quote all non-"word" characters:
- $pattern =~ s/(\W)/\\$1/g;
(If use locale
is set, then this depends on the current locale.)
Today it is more common to use the quotemeta() function or the \Q
metaquoting escape sequence to disable all metacharacters' special
meanings like this:
- /$unquoted\Q$quoted\E$unquoted/
Beware that if you put literal backslashes (those not inside
interpolated variables) between \Q
and \E
, double-quotish
backslash interpolation may lead to confusing results. If you
need to use literal backslashes within \Q...\E
,
consult Gory details of parsing quoted constructs in perlop.
quotemeta() and \Q
are fully described in quotemeta.
Perl also defines a consistent extension syntax for features not found in standard tools like awk and lex. The syntax for most of these is a pair of parentheses with a question mark as the first thing within the parentheses. The character after the question mark indicates the extension.
The stability of these extensions varies widely. Some have been part of the core language for many years. Others are experimental and may change without warning or be completely removed. Check the documentation on an individual feature to verify its current status.
A question mark was chosen for this and for the minimal-matching construct because 1) question marks are rare in older regular expressions, and 2) whenever you see one, you should stop and "question" exactly what is going on. That's psychology....
(?#text)
A comment. The text is ignored. If the /x modifier enables
whitespace formatting, a simple #
will suffice. Note that Perl closes
the comment as soon as it sees a ), so there is no way to put a literal
) in the comment.
(?adlupimsx-imsx)
(?^alupimsx)
One or more embedded pattern-match modifiers, to be turned on (or
turned off, if preceded by -
) for the remainder of the pattern or
the remainder of the enclosing pattern group (if any).
This is particularly useful for dynamic patterns, such as those read in from a
configuration file, taken from an argument, or specified in a table
somewhere. Consider the case where some patterns want to be
case-sensitive and some do not: The case-insensitive ones merely need to
include (?i) at the front of the pattern. For example:
- $pattern = "foobar";
- if ( /$pattern/i ) { }
- # more flexible:
- $pattern = "(?i)foobar";
- if ( /$pattern/ ) { }
These modifiers are restored at the end of the enclosing group. For example,
- ( (?i) blah ) \s+ \g1
will match blah
in any case, some spaces, and an exact (including the case!)
repetition of the previous word, assuming the /x modifier, and no /i
modifier outside this group.
These modifiers do not carry over into named subpatterns called in the
enclosing group. In other words, a pattern such as ((?i)(?&NAME))
does not
change the case-sensitivity of the "NAME" pattern.
Any of these modifiers can be set to apply globally to all regular
expressions compiled within the scope of a use re
. See
'/flags' mode in re.
Starting in Perl 5.14, a "^"
(caret or circumflex accent) immediately
after the "?"
is a shorthand equivalent to d-imsx
. Flags (except
"d"
) may follow the caret to override it.
But a minus sign is not legal with it.
Note that the a
, d
, l
, p
, and u
modifiers are special in
that they can only be enabled, not disabled, and the a
, d
, l
, and
u
modifiers are mutually exclusive: specifying one de-specifies the
others, and a maximum of one (or two a
's) may appear in the
construct. Thus, for
example, (?-p) will warn when compiled under use warnings
;
(?-d:...)
and (?dl:...)
are fatal errors.
Note also that the p
modifier is special in that its presence
anywhere in a pattern has a global effect.
(?:pattern)
(?adluimsx-imsx:pattern)
(?^aluimsx:pattern)
This is for clustering, not capturing; it groups subexpressions like "()", but doesn't make backreferences as "()" does. So
- @fields = split(/\b(?:a|b|c)\b/)
is like
- @fields = split(/\b(a|b|c)\b/)
but doesn't spit out extra fields. It's also cheaper not to capture characters if you don't need to.
Any letters between ? and :
act as flags modifiers as with
(?adluimsx-imsx)
. For example,
- /(?s-i:more.*than).*million/i
is equivalent to the more verbose
- /(?:(?s-i)more.*than).*million/i
Starting in Perl 5.14, a "^"
(caret or circumflex accent) immediately
after the "?"
is a shorthand equivalent to d-imsx
. Any positive
flags (except "d"
) may follow the caret, so
- (?^x:foo)
is equivalent to
- (?x-ims:foo)
The caret tells Perl that this cluster doesn't inherit the flags of any
surrounding pattern, but uses the system defaults (d-imsx
),
modified by any flags specified.
The caret allows for simpler stringification of compiled regular expressions. These look like
- (?^:pattern)
with any non-default flags appearing between the caret and the colon. A test that looks at such stringification thus doesn't need to have the system default flags hard-coded in it, just the caret. If new flags are added to Perl, the meaning of the caret's expansion will change to include the default for those flags, so the test will still work, unchanged.
Specifying a negative flag after the caret is an error, as the flag is redundant.
Mnemonic for (?^...)
: A fresh beginning since the usual use of a caret is
to match at the beginning.
(?|pattern)
This is the "branch reset" pattern, which has the special property that the capture groups are numbered from the same starting point in each alternation branch. It is available starting from perl 5.10.0.
Capture groups are numbered from left to right, but inside this construct the numbering is restarted for each branch.
The numbering within each branch will be as normal, and any groups following this construct will be numbered as though the construct contained only one branch, that being the one with the most capture groups in it.
This construct is useful when you want to capture one of a number of alternative matches.
Consider the following pattern. The numbers underneath show in which group the captured content will be stored.
- # before ---------------branch-reset----------- after
- / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
- # 1 2 2 3 2 3 4
Be careful when using the branch reset pattern in combination with named captures. Named captures are implemented as being aliases to numbered groups holding the captures, and that interferes with the implementation of the branch reset pattern. If you are using named captures in a branch reset pattern, it's best to use the same names, in the same order, in each of the alternations:
- /(?| (?<a> x ) (?<b> y )
- | (?<a> z ) (?<b> w )) /x
Not doing so may lead to surprises:
The problem here is that both the group named a
and the group
named b
are aliases for the group belonging to $1
.
Look-around assertions are zero-width patterns which match a specific
pattern without including it in $&
. Positive assertions match when
their subpattern matches, negative assertions match when their subpattern
fails. Look-behind matches text up to the current match position,
look-ahead matches text following the current match position.
(?=pattern)
A zero-width positive look-ahead assertion. For example, /\w+(?=\t)/
matches a word followed by a tab, without including the tab in $&
.
(?!pattern)
A zero-width negative look-ahead assertion. For example /foo(?!bar)/
matches any occurrence of "foo" that isn't followed by "bar". Note
however that look-ahead and look-behind are NOT the same thing. You cannot
use this for look-behind.
If you are looking for a "bar" that isn't preceded by a "foo", /(?!foo)bar/
will not do what you want. That's because the (?!foo) is just saying that
the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will
match. Use look-behind instead (see below).
(?<=pattern) \K
A zero-width positive look-behind assertion. For example, /(?<=\t)\w+/
matches a word that follows a tab, without including the tab in $&
.
Works only for fixed-width look-behind.
There is a special form of this construct, called \K
, which causes the
regex engine to "keep" everything it had matched prior to the \K
and
not include it in $&
. This effectively provides variable-length
look-behind. The use of \K
inside of another look-around assertion
is allowed, but the behaviour is currently not well defined.
For various reasons \K
may be significantly more efficient than the
equivalent (?<=...) construct, and it is especially useful in
situations where you want to efficiently remove something following
something else in a string. For instance
- s/(foo)bar/$1/g;
can be rewritten as the much more efficient
- s/foo\Kbar//g;
(?<!pattern)
A zero-width negative look-behind assertion. For example /(?<!bar)foo/
matches any occurrence of "foo" that does not follow "bar". Works
only for fixed-width look-behind.
(?'NAME'pattern)
(?<NAME>pattern)
A named capture group. Identical in every respect to normal capturing
parentheses ()
but for the additional fact that the group
can be referred to by name in various regular expression
constructs (like \g{NAME}
) and can be accessed by name
after a successful match via %+
or %-
. See perlvar
for more details on the %+
and %-
hashes.
If multiple distinct capture groups have the same name then the $+{NAME} will refer to the leftmost defined group in the match.
The forms (?'NAME'pattern) and (?<NAME>pattern) are equivalent.
NOTE: While the notation of this construct is the same as the similar function in .NET regexes, the behavior is not. In Perl the groups are numbered sequentially regardless of being named or not. Thus in the pattern
- /(x)(?<foo>y)(z)/
$+{foo} will be the same as $2, and $3 will contain 'z' instead of the opposite which is what a .NET regex hacker might expect.
Currently NAME is restricted to simple identifiers only.
In other words, it must match /^[_A-Za-z][_A-Za-z0-9]*\z/
or
its Unicode extension (see utf8),
though it isn't extended by the locale (see perllocale).
NOTE: In order to make things easier for programmers with experience
with the Python or PCRE regex engines, the pattern (?P<NAME>pattern)
may be used instead of (?<NAME>pattern); however this form does not
support the use of single quotes as a delimiter for the name.
\k<NAME>
\k'NAME'
Named backreference. Similar to numeric backreferences, except that the group is designated by name and not number. If multiple groups have the same name then it refers to the leftmost defined group in the current match.
It is an error to refer to a name not defined by a (?<NAME>)
earlier in the pattern.
Both forms are equivalent.
NOTE: In order to make things easier for programmers with experience
with the Python or PCRE regex engines, the pattern (?P=NAME)
may be used instead of \k<NAME>
.
(?{ code })
WARNING: This extended regular expression feature is considered experimental, and may be changed without notice. Code executed that has side effects may not perform identically from version to version due to the effect of future optimisations in the regex engine. The implementation of this feature was radically overhauled for the 5.18.0 release, and its behaviour in earlier versions of perl was much buggier, especially in relation to parsing, lexical vars, scoping, recursion and reentrancy.
This zero-width assertion executes any embedded Perl code. It always
succeeds, and its return value is set as $^R
.
In literal patterns, the code is parsed at the same time as the surrounding code. While within the pattern, control is passed temporarily back to the perl parser, until the logically-balancing closing brace is encountered. This is similar to the way that an array index expression in a literal string is handled, for example
- "abc$array[ 1 + f('[') + g()]def"
In particular, braces do not need to be balanced:
- s/abc(?{ f('{'); })/def/
Even in a pattern that is interpolated and compiled at run-time, literal code blocks will be compiled once, at perl compile time; the following prints "ABCD":
In patterns where the text of the code is derived from run-time
information rather than appearing literally in a source code /pattern/,
the code is compiled at the same time that the pattern is compiled, and
for reasons of security, use re 'eval'
must be in scope. This is to
stop user-supplied patterns containing code snippets from being
executable.
In situations where you need to enable this with use re 'eval'
, you should
also have taint checking enabled. Better yet, use the carefully
constrained evaluation within a Safe compartment. See perlsec for
details about both these mechanisms.
From the viewpoint of parsing, lexical variable scope and closures,
- /AAA(?{ BBB })CCC/
behaves approximately like
- /AAA/ && do { BBB } && /CCC/
Similarly,
- qr/AAA(?{ BBB })CCC/
behaves approximately like
In particular:
Inside a (?{...}) block, $_
refers to the string the regular
expression is matching against. You can also use pos() to know what is
the current position of matching within this string.
The code block introduces a new scope from the perspective of lexical
variable declarations, but not from the perspective of local and
similar localizing behaviours. So later code blocks within the same
pattern will still see the values which were localized in earlier blocks.
These accumulated localizations are undone either at the end of a
successful match, or if the assertion is backtracked (compare
Backtracking). For example,
- $_ = 'a' x 8;
- m<
- (?{ $cnt = 0 }) # Initialize $cnt.
- (
- a
- (?{
- local $cnt = $cnt + 1; # Update $cnt,
- # backtracking-safe.
- })
- )*
- aaaa
- (?{ $res = $cnt }) # On success copy to
- # non-localized location.
- >x;
will initially increment $cnt
up to 8; then during backtracking, its
value will be unwound back to 4, which is the value assigned to $res
.
At the end of the regex execution, $cnt will be wound back to its initial
value of 0.
This assertion may be used as the condition in a
- (?(condition)yes-pattern|no-pattern)
switch. If not used in this way, the result of evaluation of code
is put into the special variable $^R
. This happens immediately, so
$^R
can be used from other (?{ code }) assertions inside the same
regular expression.
The assignment to $^R
above is properly localized, so the old
value of $^R
is restored if the assertion is backtracked; compare
Backtracking.
Note that the special variable $^N
is particularly useful with code
blocks to capture the results of submatches in variables without having to
keep track of the number of nested parentheses. For example:
- $_ = "The brown fox jumps over the lazy dog";
- /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
- print "color = $color, animal = $animal\n";
(??{ code })
WARNING: This extended regular expression feature is considered experimental, and may be changed without notice. Code executed that has side effects may not perform identically from version to version due to the effect of future optimisations in the regex engine.
This is a "postponed" regular subexpression. It behaves in exactly the
same way as a (?{ code }) code block as described above, except that
its return value, rather than being assigned to $^R
, is treated as a
pattern, compiled if it's a string (or used as-is if its a qr// object),
then matched as if it were inserted instead of this construct.
During the matching of this sub-pattern, it has its own set of captures which are valid during the sub-match, but are discarded once control returns to the main pattern. For example, the following matches, with the inner pattern capturing "B" and matching "BB", while the outer pattern captures "A";
Note that this means that there is no way for the inner pattern to refer
to a capture group defined outside. (The code block itself can use $1
,
etc., to refer to the enclosing pattern's capture groups.) Thus, although
- ('a' x 100)=~/(??{'(.)' x 100})/
will match, it will not set $1 on exit.
The following pattern matches a parenthesized group:
- $re = qr{
- \(
- (?:
- (?> [^()]+ ) # Non-parens without backtracking
- |
- (??{ $re }) # Group with matching parens
- )*
- \)
- }x;
See also (?PARNO) for a different, more efficient way to accomplish the same task.
Executing a postponed regular expression 50 times without consuming any input string will result in a fatal error. The maximum depth is compiled into perl, so changing it requires a custom build.
(?PARNO) (?-PARNO) (?+PARNO) (?R) (?0)
Similar to (??{ code })
except that it does not involve executing any
code or potentially compiling a returned pattern string; instead it treats
the part of the current pattern contained within a specified capture group
as an independent pattern that must match at the current position.
Capture groups contained by the pattern will have the value as determined
by the outermost recursion.
PARNO is a sequence of digits (not starting with 0) whose value reflects
the paren-number of the capture group to recurse to. (?R) recurses to
the beginning of the whole pattern. (?0) is an alternate syntax for
(?R). If PARNO is preceded by a plus or minus sign then it is assumed
to be relative, with negative numbers indicating preceding capture groups
and positive ones following. Thus (?-1) refers to the most recently
declared group, and (?+1) indicates the next group to be declared.
Note that the counting for relative recursion differs from that of
relative backreferences, in that with recursion unclosed groups are
included.
The following pattern matches a function foo() which may contain balanced parentheses as the argument.
- $re = qr{ ( # paren group 1 (full function)
- foo
- ( # paren group 2 (parens)
- \(
- ( # paren group 3 (contents of parens)
- (?:
- (?> [^()]+ ) # Non-parens without backtracking
- |
- (?2) # Recurse to start of paren group 2
- )*
- )
- \)
- )
- )
- }x;
If the pattern was used as follows
- 'foo(bar(baz)+baz(bop))'=~/$re/
- and print "\$1 = $1\n",
- "\$2 = $2\n",
- "\$3 = $3\n";
the output produced should be the following:
- $1 = foo(bar(baz)+baz(bop))
- $2 = (bar(baz)+baz(bop))
- $3 = bar(baz)+baz(bop)
If there is no corresponding capture group defined, then it is a fatal error. Recursing deeper than 50 times without consuming any input string will also result in a fatal error. The maximum depth is compiled into perl, so changing it requires a custom build.
The following shows how using negative indexing can make it
easier to embed recursive patterns inside of a qr// construct
for later use:
Note that this pattern does not behave the same way as the equivalent PCRE or Python construct of the same form. In Perl you can backtrack into a recursed group, in PCRE and Python the recursed into group is treated as atomic. Also, modifiers are resolved at compile time, so constructs like (?i:(?1)) or (?:(?i)(?1)) do not affect how the sub-pattern will be processed.
(?&NAME)
Recurse to a named subpattern. Identical to (?PARNO) except that the
parenthesis to recurse to is determined by name. If multiple parentheses have
the same name, then it recurses to the leftmost.
It is an error to refer to a name that is not declared somewhere in the pattern.
NOTE: In order to make things easier for programmers with experience
with the Python or PCRE regex engines the pattern (?P>NAME)
may be used instead of (?&NAME).
(?(condition)yes-pattern|no-pattern)
(?(condition)yes-pattern)
Conditional expression. Matches yes-pattern
if condition
yields
a true value, matches no-pattern otherwise. A missing pattern always
matches.
(condition)
should be one of: 1) an integer in
parentheses (which is valid if the corresponding pair of parentheses
matched); 2) a look-ahead/look-behind/evaluate zero-width assertion; 3) a
name in angle brackets or single quotes (which is valid if a group
with the given name matched); or 4) the special symbol (R) (true when
evaluated inside of recursion or eval). Additionally the R may be
followed by a number, (which will be true when evaluated when recursing
inside of the appropriate group), or by &NAME
, in which case it will
be true only when evaluated during recursion in the named group.
Here's a summary of the possible predicates:
Checks if the numbered capturing group has matched something.
Checks if a group with the given name has matched something.
Checks whether the pattern matches (or does not match, for the '!' variants).
Treats the return value of the code block as the condition.
Checks if the expression has been evaluated inside of recursion.
Checks if the expression has been evaluated while executing directly inside of the n-th capture group. This check is the regex equivalent of
- if ((caller(0))[3] eq 'subname') { ... }
In other words, it does not check the full recursion stack.
Similar to (R1)
, this predicate checks to see if we're executing
directly inside of the leftmost group with a given name (this is the same
logic used by (?&NAME) to disambiguate). It does not check the full
stack, but only the name of the innermost active recursion.
In this case, the yes-pattern is never directly executed, and no
no-pattern is allowed. Similar in spirit to (?{0}) but more efficient.
See below for details.
For example:
- m{ ( \( )?
- [^()]+
- (?(1) \) )
- }x
matches a chunk of non-parentheses, possibly included in parentheses themselves.
A special form is the (DEFINE)
predicate, which never executes its
yes-pattern directly, and does not allow a no-pattern. This allows one to
define subpatterns which will be executed only by the recursion mechanism.
This way, you can define a set of regular expression rules that can be
bundled into any pattern you choose.
It is recommended that for this usage you put the DEFINE block at the end of the pattern, and that you name any subpatterns defined within it.
Also, it's worth noting that patterns defined this way probably will not be as efficient, as the optimiser is not very clever about handling them.
An example of how this might be used is as follows:
- /(?<NAME>(?&NAME_PAT))(?<ADDR>(?&ADDRESS_PAT))
- (?(DEFINE)
- (?<NAME_PAT>....)
- (?<ADRESS_PAT>....)
- )/x
Note that capture groups matched inside of recursion are not accessible
after the recursion returns, so the extra layer of capturing groups is
necessary. Thus $+{NAME_PAT}
would not be defined even though
$+{NAME}
would be.
Finally, keep in mind that subpatterns created inside a DEFINE block count towards the absolute and relative number of captures, so this:
Will output 2, not 1. This is particularly important if you intend to
compile the definitions with the qr// operator, and later
interpolate them in another pattern.
(?>pattern)
An "independent" subexpression, one which matches the substring
that a standalone pattern
would match if anchored at the given
position, and it matches nothing other than this substring. This
construct is useful for optimizations of what would otherwise be
"eternal" matches, because it will not backtrack (see Backtracking).
It may also be useful in places where the "grab all you can, and do not
give anything back" semantic is desirable.
For example: ^(?>a*)ab will never match, since (?>a*)
(anchored at the beginning of string, as above) will match all
characters a
at the beginning of string, leaving no a
for
ab
to match. In contrast, a*ab
will match the same as a+b
,
since the match of the subgroup a*
is influenced by the following
group ab
(see Backtracking). In particular, a*
inside
a*ab
will match fewer characters than a standalone a*
, since
this makes the tail match.
(?>pattern) does not disable backtracking altogether once it has
matched. It is still possible to backtrack past the construct, but not
into it. So ((?>a*)|(?>b*))ar
will still match "bar".
An effect similar to (?>pattern) may be achieved by writing
(?=(pattern))\g{-1}
. This matches the same substring as a standalone
a+
, and the following \g{-1}
eats the matched string; it therefore
makes a zero-length assertion into an analogue of (?>...).
(The difference between these two constructs is that the second one
uses a capturing group, thus shifting ordinals of backreferences
in the rest of a regular expression.)
Consider this pattern:
- m{ \(
- (
- [^()]+ # x+
- |
- \( [^()]* \)
- )+
- \)
- }x
That will efficiently match a nonempty group with matching parentheses
two levels deep or less. However, if there is no such group, it
will take virtually forever on a long string. That's because there
are so many different ways to split a long string into several
substrings. This is what (.+)+ is doing, and (.+)+ is similar
to a subpattern of the above pattern. Consider how the pattern
above detects no-match on ((()aaaaaaaaaaaaaaaaaa in several
seconds, but that each extra letter doubles this time. This
exponential performance will make it appear that your program has
hung. However, a tiny change to this pattern
- m{ \(
- (
- (?> [^()]+ ) # change x+ above to (?> x+ )
- |
- \( [^()]* \)
- )+
- \)
- }x
which uses (?>...) matches exactly when the one above does (verifying
this yourself would be a productive exercise), but finishes in a fourth
the time when used on a similar string with 1000000 a
s. Be aware,
however, that, when this construct is followed by a
quantifier, it currently triggers a warning message under
the use warnings
pragma or -w switch saying it
"matches null string many times in regex"
.
On simple groups, such as the pattern (?> [^()]+ ), a comparable
effect may be achieved by negative look-ahead, as in [^()]+ (?! [^()] ).
This was only 4 times slower on a string with 1000000 a
s.
The "grab all you can, and do not give anything back" semantic is desirable
in many situations where on the first sight a simple ()*
looks like
the correct solution. Suppose we parse text with comments being delimited
by #
followed by some optional (horizontal) whitespace. Contrary to
its appearance, #[ \t]*
is not the correct subexpression to match
the comment delimiter, because it may "give up" some whitespace if
the remainder of the pattern can be made to match that way. The correct
answer is either one of these:
- (?>#[ \t]*)
- #[ \t]*(?![ \t])
For example, to grab non-empty comments into $1, one should use either one of these:
- / (?> \# [ \t]* ) ( .+ ) /x;
- / \# [ \t]* ( [^ \t] .* ) /x;
Which one you pick depends on which of these expressions better reflects the above specification of comments.
In some literature this construct is called "atomic matching" or "possessive matching".
Possessive quantifiers are equivalent to putting the item they are applied to inside of one of these constructs. The following equivalences apply:
- Quantifier Form Bracketing Form
- --------------- ---------------
- PAT*+ (?>PAT*)
- PAT++ (?>PAT+)
- PAT?+ (?>PAT?)
- PAT{min,max}+ (?>PAT{min,max})
(?[ ])
See Extended Bracketed Character Classes in perlrecharclass.
WARNING: These patterns are experimental and subject to change or removal in a future version of Perl. Their usage in production code should be noted to avoid problems during upgrades.
These special patterns are generally of the form (*VERB:ARG). Unless
otherwise stated the ARG argument is optional; in some cases, it is
forbidden.
Any pattern containing a special backtracking verb that allows an argument
has the special behaviour that when executed it sets the current package's
$REGERROR
and $REGMARK
variables. When doing so the following
rules apply:
On failure, the $REGERROR
variable will be set to the ARG value of the
verb pattern, if the verb was involved in the failure of the match. If the
ARG part of the pattern was omitted, then $REGERROR
will be set to the
name of the last (*MARK:NAME) pattern executed, or to TRUE if there was
none. Also, the $REGMARK
variable will be set to FALSE.
On a successful match, the $REGERROR
variable will be set to FALSE, and
the $REGMARK
variable will be set to the name of the last
(*MARK:NAME) pattern executed. See the explanation for the
(*MARK:NAME) verb below for more details.
NOTE: $REGERROR
and $REGMARK
are not magic variables like $1
and most other regex-related variables. They are not local to a scope, nor
readonly, but instead are volatile package variables similar to $AUTOLOAD
.
Use local to localize changes to them to a specific scope if necessary.
If a pattern does not contain a special backtracking verb that allows an
argument, then $REGERROR
and $REGMARK
are not touched at all.
(*PRUNE)
(*PRUNE:NAME)
This zero-width pattern prunes the backtracking tree at the current point
when backtracked into on failure. Consider the pattern A (*PRUNE) B,
where A and B are complex patterns. Until the (*PRUNE)
verb is reached,
A may backtrack as necessary to match. Once it is reached, matching
continues in B, which may also backtrack as necessary; however, should B
not match, then no further backtracking will take place, and the pattern
will fail outright at the current starting position.
The following example counts all the possible matching strings in a pattern (without actually matching any of them).
- 'aaab' =~ /a+b?(?{print "$&\n"; $count++})(*FAIL)/;
- print "Count=$count\n";
which produces:
- aaab
- aaa
- aa
- a
- aab
- aa
- a
- ab
- a
- Count=9
If we add a (*PRUNE)
before the count like the following
- 'aaab' =~ /a+b?(*PRUNE)(?{print "$&\n"; $count++})(*FAIL)/;
- print "Count=$count\n";
we prevent backtracking and find the count of the longest matching string at each matching starting point like so:
- aaab
- aab
- ab
- Count=3
Any number of (*PRUNE)
assertions may be used in a pattern.
See also (?>pattern) and possessive quantifiers for other ways to
control backtracking. In some cases, the use of (*PRUNE)
can be
replaced with a (?>pattern) with no functional difference; however,
(*PRUNE)
can be used to handle cases that cannot be expressed using a
(?>pattern) alone.
(*SKIP)
(*SKIP:NAME)
This zero-width pattern is similar to (*PRUNE)
, except that on
failure it also signifies that whatever text that was matched leading up
to the (*SKIP)
pattern being executed cannot be part of any match
of this pattern. This effectively means that the regex engine "skips" forward
to this position on failure and tries to match again, (assuming that
there is sufficient room to match).
The name of the (*SKIP:NAME) pattern has special significance. If a
(*MARK:NAME) was encountered while matching, then it is that position
which is used as the "skip point". If no (*MARK)
of that name was
encountered, then the (*SKIP)
operator has no effect. When used
without a name the "skip point" is where the match point was when
executing the (*SKIP) pattern.
Compare the following to the examples in (*PRUNE)
; note the string
is twice as long:
- 'aaabaaab' =~ /a+b?(*SKIP)(?{print "$&\n"; $count++})(*FAIL)/;
- print "Count=$count\n";
outputs
- aaab
- aaab
- Count=2
Once the 'aaab' at the start of the string has matched, and the (*SKIP)
executed, the next starting point will be where the cursor was when the
(*SKIP)
was executed.
(*MARK:NAME) (*:NAME)
This zero-width pattern can be used to mark the point reached in a string
when a certain part of the pattern has been successfully matched. This
mark may be given a name. A later (*SKIP)
pattern will then skip
forward to that point if backtracked into on failure. Any number of
(*MARK)
patterns are allowed, and the NAME portion may be duplicated.
In addition to interacting with the (*SKIP)
pattern, (*MARK:NAME)
can be used to "label" a pattern branch, so that after matching, the
program can determine which branches of the pattern were involved in the
match.
When a match is successful, the $REGMARK
variable will be set to the
name of the most recently executed (*MARK:NAME) that was involved
in the match.
This can be used to determine which branch of a pattern was matched
without using a separate capture group for each branch, which in turn
can result in a performance improvement, as perl cannot optimize
/(?:(x)|(y)|(z))/
as efficiently as something like
/(?:x(*MARK:x)|y(*MARK:y)|z(*MARK:z))/
.
When a match has failed, and unless another verb has been involved in
failing the match and has provided its own name to use, the $REGERROR
variable will be set to the name of the most recently executed
(*MARK:NAME).
See (*SKIP) for more details.
As a shortcut (*MARK:NAME) can be written (*:NAME).
(*THEN)
(*THEN:NAME)
This is similar to the "cut group" operator ::
from Perl 6. Like
(*PRUNE)
, this verb always matches, and when backtracked into on
failure, it causes the regex engine to try the next alternation in the
innermost enclosing group (capturing or otherwise) that has alternations.
The two branches of a (?(condition)yes-pattern|no-pattern) do not
count as an alternation, as far as (*THEN)
is concerned.
Its name comes from the observation that this operation combined with the
alternation operator (|) can be used to create what is essentially a
pattern-based if/then/else block:
- ( COND (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ )
Note that if this operator is used and NOT inside of an alternation then
it acts exactly like the (*PRUNE)
operator.
- / A (*PRUNE) B /
is the same as
- / A (*THEN) B /
but
- / ( A (*THEN) B | C ) /
is not the same as
- / ( A (*PRUNE) B | C ) /
as after matching the A but failing on the B the (*THEN)
verb will
backtrack and try C; but the (*PRUNE)
verb will simply fail.
(*COMMIT)
This is the Perl 6 "commit pattern" <commit>
or :::. It's a
zero-width pattern similar to (*SKIP)
, except that when backtracked
into on failure it causes the match to fail outright. No further attempts
to find a valid match by advancing the start pointer will occur again.
For example,
- 'aaabaaab' =~ /a+b?(*COMMIT)(?{print "$&\n"; $count++})(*FAIL)/;
- print "Count=$count\n";
outputs
- aaab
- Count=1
In other words, once the (*COMMIT)
has been entered, and if the pattern
does not match, the regex engine will not try any further matching on the
rest of the string.
(*FAIL)
(*F)
This pattern matches nothing and always fails. It can be used to force the
engine to backtrack. It is equivalent to (?!), but easier to read. In
fact, (?!) gets optimised into (*FAIL)
internally.
It is probably useful only when combined with (?{}) or (??{})
.
(*ACCEPT)
WARNING: This feature is highly experimental. It is not recommended for production code.
This pattern matches nothing and causes the end of successful matching at
the point at which the (*ACCEPT)
pattern was encountered, regardless of
whether there is actually more to match in the string. When inside of a
nested pattern, such as recursion, or in a subpattern dynamically generated
via (??{})
, only the innermost pattern is ended immediately.
If the (*ACCEPT)
is inside of capturing groups then the groups are
marked as ended at the point at which the (*ACCEPT)
was encountered.
For instance:
- 'AB' =~ /(A (A|B(*ACCEPT)|C) D)(E)/x;
will match, and $1
will be AB
and $2
will be B
, $3
will not
be set. If another branch in the inner parentheses was matched, such as in the
string 'ACDE', then the D
and E
would have to be matched as well.
NOTE: This section presents an abstract approximation of regular expression behavior. For a more rigorous (and complicated) view of the rules involved in selecting a match among possible alternatives, see Combining RE Pieces.
A fundamental feature of regular expression matching involves the
notion called backtracking, which is currently used (when needed)
by all regular non-possessive expression quantifiers, namely *
, *?
, +
,
+?, {n,m}, and {n,m}?. Backtracking is often optimized
internally, but the general principle outlined here is valid.
For a regular expression to match, the entire regular expression must match, not just part of it. So if the beginning of a pattern containing a quantifier succeeds in a way that causes later parts in the pattern to fail, the matching engine backs up and recalculates the beginning part--that's why it's called backtracking.
Here is an example of backtracking: Let's say you want to find the word following "foo" in the string "Food is on the foo table.":
- $_ = "Food is on the foo table.";
- if ( /\b(foo)\s+(\w+)/i ) {
- print "$2 follows $1.\n";
- }
When the match runs, the first part of the regular expression (\b(foo)
)
finds a possible match right at the beginning of the string, and loads up
$1 with "Foo". However, as soon as the matching engine sees that there's
no whitespace following the "Foo" that it had saved in $1, it realizes its
mistake and starts over again one character after where it had the
tentative match. This time it goes all the way until the next occurrence
of "foo". The complete regular expression matches this time, and you get
the expected output of "table follows foo."
Sometimes minimal matching can help a lot. Imagine you'd like to match everything between "foo" and "bar". Initially, you write something like this:
- $_ = "The food is under the bar in the barn.";
- if ( /foo(.*)bar/ ) {
- print "got <$1>\n";
- }
Which perhaps unexpectedly yields:
- got <d is under the bar in the >
That's because .* was greedy, so you get everything between the
first "foo" and the last "bar". Here it's more effective
to use minimal matching to make sure you get the text between a "foo"
and the first "bar" thereafter.
- if ( /foo(.*?)bar/ ) { print "got <$1>\n" }
- got <d is under the >
Here's another example. Let's say you'd like to match a number at the end of a string, and you also want to keep the preceding part of the match. So you write this:
- $_ = "I have 2 numbers: 53147";
- if ( /(.*)(\d*)/ ) { # Wrong!
- print "Beginning is <$1>, number is <$2>.\n";
- }
That won't work at all, because .* was greedy and gobbled up the
whole string. As \d*
can match on an empty string the complete
regular expression matched successfully.
- Beginning is <I have 2 numbers: 53147>, number is <>.
Here are some variants, most of which don't work:
That will print out:
- (.*)(\d*) <I have 2 numbers: 53147> <>
- (.*)(\d+) <I have 2 numbers: 5314> <7>
- (.*?)(\d*) <> <>
- (.*?)(\d+) <I have > <2>
- (.*)(\d+)$ <I have 2 numbers: 5314> <7>
- (.*?)(\d+)$ <I have 2 numbers: > <53147>
- (.*)\b(\d+)$ <I have 2 numbers: > <53147>
- (.*\D)(\d+)$ <I have 2 numbers: > <53147>
As you see, this can be a bit tricky. It's important to realize that a regular expression is merely a set of assertions that gives a definition of success. There may be 0, 1, or several different ways that the definition might succeed against a particular string. And if there are multiple ways it might succeed, you need to understand backtracking to know which variety of success you will achieve.
When using look-ahead assertions and negations, this can all get even trickier. Imagine you'd like to find a sequence of non-digits not followed by "123". You might try to write that as
- $_ = "ABC123";
- if ( /^\D*(?!123)/ ) { # Wrong!
- print "Yup, no 123 in $_\n";
- }
But that isn't going to match; at least, not the way you're hoping. It claims that there is no 123 in the string. Here's a clearer picture of why that pattern matches, contrary to popular expectations:
This prints
- 2: got ABC
- 3: got AB
- 4: got ABC
You might have expected test 3 to fail because it seems to a more
general purpose version of test 1. The important difference between
them is that test 3 contains a quantifier (\D*
) and so can use
backtracking, whereas test 1 will not. What's happening is
that you've asked "Is it true that at the start of $x, following 0 or more
non-digits, you have something that's not 123?" If the pattern matcher had
let \D*
expand to "ABC", this would have caused the whole pattern to
fail.
The search engine will initially match \D*
with "ABC". Then it will
try to match (?!123)
with "123", which fails. But because
a quantifier (\D*
) has been used in the regular expression, the
search engine can backtrack and retry the match differently
in the hope of matching the complete regular expression.
The pattern really, really wants to succeed, so it uses the
standard pattern back-off-and-retry and lets \D*
expand to just "AB" this
time. Now there's indeed something following "AB" that is not
"123". It's "C123", which suffices.
We can deal with this by using both an assertion and a negation. We'll say that the first part in $1 must be followed both by a digit and by something that's not "123". Remember that the look-aheads are zero-width expressions--they only look, but don't consume any of the string in their match. So rewriting this way produces what you'd expect; that is, case 5 will fail, but case 6 succeeds:
- print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/;
- print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/;
- 6: got ABC
In other words, the two zero-width assertions next to each other work as though
they're ANDed together, just as you'd use any built-in assertions: /^$/
matches only if you're at the beginning of the line AND the end of the
line simultaneously. The deeper underlying truth is that juxtaposition in
regular expressions always means AND, except when you write an explicit OR
using the vertical bar. /ab/
means match "a" AND (then) match "b",
although the attempted matches are made at different positions because "a"
is not a zero-width assertion, but a one-width assertion.
WARNING: Particularly complicated regular expressions can take exponential time to solve because of the immense number of possible ways they can use backtracking to try for a match. For example, without internal optimizations done by the regular expression engine, this will take a painfully long time to run:
- 'aaaaaaaaaaaa' =~ /((a{0,5}){0,5})*[c]/
And if you used *
's in the internal groups instead of limiting them
to 0 through 5 matches, then it would take forever--or until you ran
out of stack space. Moreover, these internal optimizations are not
always applicable. For example, if you put {0,5}
instead of *
on the external group, no current optimization is applicable, and the
match takes a long time to finish.
A powerful tool for optimizing such beasts is what is known as an "independent group", which does not backtrack (see (?>pattern)). Note also that zero-length look-ahead/look-behind assertions will not backtrack to make the tail match, since they are in "logical" context: only whether they match is considered relevant. For an example where side-effects of look-ahead might have influenced the following match, see (?>pattern).
In case you're not familiar with the "regular" Version 8 regex routines, here are the pattern-matching rules not described above.
Any single character matches itself, unless it is a metacharacter with a special meaning described here or above. You can cause characters that normally function as metacharacters to be interpreted literally by prefixing them with a "\" (e.g., "\." matches a ".", not any character; "\\" matches a "\"). This escape mechanism is also required for the character used as the pattern delimiter.
A series of characters matches that series of characters in the target
string, so the pattern blurfl
would match "blurfl" in the target
string.
You can specify a character class, by enclosing a list of characters
in []
, which will match any character from the list. If the
first character after the "[" is "^", the class matches any character not
in the list. Within a list, the "-" character specifies a
range, so that a-z
represents all characters between "a" and "z",
inclusive. If you want either "-" or "]" itself to be a member of a
class, put it at the start of the list (possibly after a "^"), or
escape it with a backslash. "-" is also taken literally when it is
at the end of the list, just before the closing "]". (The
following all specify the same class of three characters: [-az]
,
[az-]
, and [a\-z]
. All are different from [a-z]
, which
specifies a class containing twenty-six characters, even on EBCDIC-based
character sets.) Also, if you try to use the character
classes \w
, \W
, \s, \S
, \d
, or \D
as endpoints of
a range, the "-" is understood literally.
Note also that the whole range idea is rather unportable between character sets--and even within character sets they may cause results you probably didn't expect. A sound principle is to use only ranges that begin from and end at either alphabetics of equal case ([a-e], [A-E]), or digits ([0-9]). Anything else is unsafe. If in doubt, spell out the character sets in full.
Characters may be specified using a metacharacter syntax much like that
used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return,
"\f" a form feed, etc. More generally, \nnn, where nnn is a string
of three octal digits, matches the character whose coded character set value
is nnn. Similarly, \xnn, where nn are hexadecimal digits,
matches the character whose ordinal is nn. The expression \cx
matches the character control-x. Finally, the "." metacharacter
matches any character except "\n" (unless you use /s).
You can specify a series of alternatives for a pattern using "|" to
separate them, so that fee|fie|foe
will match any of "fee", "fie",
or "foe" in the target string (as would f(e|i|o)e). The
first alternative includes everything from the last pattern delimiter
("(", "(?:", etc. or the beginning of the pattern) up to the first "|", and
the last alternative contains everything from the last "|" to the next
closing pattern delimiter. That's why it's common practice to include
alternatives in parentheses: to minimize confusion about where they
start and end.
Alternatives are tried from left to right, so the first
alternative found for which the entire expression matches, is the one that
is chosen. This means that alternatives are not necessarily greedy. For
example: when matching foo|foot
against "barefoot", only the "foo"
part will match, as that is the first alternative tried, and it successfully
matches the target string. (This might not seem important, but it is
important when you are capturing matched text using parentheses.)
Also remember that "|" is interpreted as a literal within square brackets,
so if you write [fee|fie|foe]
you're really only matching [feio|]
.
Within a pattern, you may designate subpatterns for later reference
by enclosing them in parentheses, and you may refer back to the
nth subpattern later in the pattern using the metacharacter
\n or \gn. Subpatterns are numbered based on the left to right order
of their opening parenthesis. A backreference matches whatever
actually matched the subpattern in the string being examined, not
the rules for that subpattern. Therefore, (0|0x)\d*\s\g1\d*
will
match "0x1234 0x4321", but not "0x1234 01234", because subpattern
1 matched "0x", even though the rule 0|0x
could potentially match
the leading 0 in the second number.
Some people get too used to writing things like:
- $pattern =~ s/(\W)/\\\1/g;
This is grandfathered (for \1 to \9) for the RHS of a substitute to avoid
shocking the
sed addicts, but it's a dirty habit to get into. That's because in
PerlThink, the righthand side of an s/// is a double-quoted string. \1
in
the usual double-quoted string means a control-A. The customary Unix
meaning of \1
is kludged in for s///. However, if you get into the habit
of doing that, you get yourself into trouble if you then add an /e
modifier.
- s/(\d+)/ \1 + 1 /eg; # causes warning under -w
Or if you try to do
- s/(\d+)/\1000/;
You can't disambiguate that by saying \{1}000
, whereas you can fix it with
${1}000. The operation of interpolation should not be confused
with the operation of matching a backreference. Certainly they mean two
different things on the left side of the s///.
WARNING: Difficult material (and prose) ahead. This section needs a rewrite.
Regular expressions provide a terse and powerful programming language. As with most other power tools, power comes together with the ability to wreak havoc.
A common abuse of this power stems from the ability to make infinite loops using regular expressions, with something as innocuous as:
- 'foo' =~ m{ ( o? )* }x;
The o? matches at the beginning of 'foo'
, and since the position
in the string is not moved by the match, o? would match again and again
because of the *
quantifier. Another common way to create a similar cycle
is with the looping modifier //g
:
- @matches = ( 'foo' =~ m{ o? }xg );
or
- print "match: <$&>\n" while 'foo' =~ m{ o? }xg;
or the loop implied by split().
However, long experience has shown that many programming tasks may be significantly simplified by using repeated subexpressions that may match zero-length substrings. Here's a simple example being:
- @chars = split //, $string; # // is not magic in split
- ($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
Thus Perl allows such constructs, by forcefully breaking
the infinite loop. The rules for this are different for lower-level
loops given by the greedy quantifiers *+{}
, and for higher-level
ones like the /g modifier or split() operator.
The lower-level loops are interrupted (that is, the loop is broken) when Perl detects that a repeated expression matched a zero-length substring. Thus
- m{ (?: NON_ZERO_LENGTH | ZERO_LENGTH )* }x;
is made equivalent to
- m{ (?: NON_ZERO_LENGTH )* (?: ZERO_LENGTH )? }x;
For example, this program
prints
- hello
- aaaaa
- b
Notice that "hello" is only printed once, as when Perl sees that the sixth
iteration of the outermost (?:)*
matches a zero-length string, it stops
the *
.
The higher-level loops preserve an additional state between iterations: whether the last match was zero-length. To break the loop, the following match after a zero-length match is prohibited to have a length of zero. This prohibition interacts with backtracking (see Backtracking), and so the second best match is chosen if the best match is of zero length.
For example:
- $_ = 'bar';
- s/\w??/<$&>/g;
results in <><b><><a><><r><>
. At each position of the string the best
match given by non-greedy ??
is the zero-length match, and the second
best match is what is matched by \w
. Thus zero-length matches
alternate with one-character-long matches.
Similarly, for repeated m/()/g the second-best match is the match at the
position one notch further in the string.
The additional state of being matched with zero-length is associated with
the matched string, and is reset by each assignment to pos().
Zero-length matches at the end of the previous match are ignored
during split.
Each of the elementary pieces of regular expressions which were described
before (such as ab
or \Z
) could match at most one substring
at the given position of the input string. However, in a typical regular
expression these elementary pieces are combined into more complicated
patterns using combining operators ST
, S|T
, S*
etc.
(in these examples S
and T
are regular subexpressions).
Such combinations can include alternatives, leading to a problem of choice:
if we match a regular expression a|ab
against "abc"
, will it match
substring "a"
or "ab"
? One way to describe which substring is
actually matched is the concept of backtracking (see Backtracking).
However, this description is too low-level and makes you think
in terms of a particular implementation.
Another description starts with notions of "better"/"worse". All the substrings which may be matched by the given regular expression can be sorted from the "best" match to the "worst" match, and it is the "best" match which is chosen. This substitutes the question of "what is chosen?" by the question of "which matches are better, and which are worse?".
Again, for elementary pieces there is no such question, since at most
one match at a given position is possible. This section describes the
notion of better/worse for combining operators. In the description
below S
and T
are regular subexpressions.
ST
Consider two possible matches, AB
and A'B', A
and A' are
substrings which can be matched by S
, B
and B' are substrings
which can be matched by T
.
If A
is a better match for S
than A', AB
is a better
match than A'B'.
If A
and A' coincide: AB
is a better match than AB' if
B
is a better match for T
than B'.
S|T
When S
can match, it is a better match than when only T
can match.
Ordering of two matches for S
is the same as for S
. Similar for
two matches for T
.
S{REPEAT_COUNT}
Matches as SSS...S
(repeated as many times as necessary).
S{min,max}
Matches as S{max}|S{max-1}|...|S{min+1}|S{min}.
S{min,max}?
Matches as S{min}|S{min+1}|...|S{max-1}|S{max}.
S?, S*
, S+
Same as S{0,1}
, S{0,BIG_NUMBER}
, S{1,BIG_NUMBER}
respectively.
S??
, S*?, S+?
Same as S{0,1}?, S{0,BIG_NUMBER}?, S{1,BIG_NUMBER}? respectively.
(?>S)
Matches the best match for S
and only that.
(?=S), (?<=S)
Only the best match for S
is considered. (This is important only if
S
has capturing parentheses, and backreferences are used somewhere
else in the whole regular expression.)
(?!S), (?<!S)
For this grouping operator there is no need to describe the ordering, since
only whether or not S
can match is important.
(??{ EXPR })
, (?PARNO)
The ordering is the same as for the regular expression which is the result of EXPR, or the pattern contained by capture group PARNO.
(?(condition)yes-pattern|no-pattern)
Recall that which of yes-pattern
or no-pattern actually matches is
already determined. The ordering of the matches is the same as for the
chosen subexpression.
The above recipes describe the ordering of matches at a given position. One more rule is needed to understand how a match is determined for the whole regular expression: a match at an earlier position is always better than a match at a later position.
As of Perl 5.10.0, one can create custom regular expression engines. This is not for the faint of heart, as they have to plug in at the C level. See perlreapi for more details.
As an alternative, overloaded constants (see overload) provide a simple way to extend the functionality of the RE engine, by substituting one pattern for another.
Suppose that we want to enable a new RE escape-sequence \Y|
which
matches at a boundary between whitespace characters and non-whitespace
characters. Note that (?=\S)(?<!\S)|(?!\S)(?<=\S)
matches exactly
at these positions, so we want to have each \Y|
in the place of the
more complicated version. We can create a module customre
to do
this:
- package customre;
- use overload;
- sub import {
- shift;
- die "No argument to customre::import allowed" if @_;
- overload::constant 'qr' => \&convert;
- }
- sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"}
- # We must also take care of not escaping the legitimate \\Y|
- # sequence, hence the presence of '\\' in the conversion rules.
- my %rules = ( '\\' => '\\\\',
- 'Y|' => qr/(?=\S)(?<!\S)|(?!\S)(?<=\S)/ );
- sub convert {
- my $re = shift;
- $re =~ s{
- \\ ( \\ | Y . )
- }
- { $rules{$1} or invalid($re,$1) }sgex;
- return $re;
- }
Now use customre
enables the new escape in constant regular
expressions, i.e., those without any runtime variable interpolations.
As documented in overload, this conversion will work only over
literal parts of regular expressions. For \Y|$re\Y|
the variable
part of this regular expression needs to be converted explicitly
(but only if the special meaning of \Y|
should be enabled inside $re):
As of Perl 5.10.0, Perl supports several Python/PCRE-specific extensions to the regex syntax. While Perl programmers are encouraged to use the Perl-specific syntax, the following are also accepted:
(?P<NAME>pattern)
Define a named capture group. Equivalent to (?<NAME>pattern).
(?P=NAME)
Backreference to a named capture group. Equivalent to \g{NAME}
.
(?P>NAME)
Subroutine call to a named capture group. Equivalent to (?&NAME).
Many regular expression constructs don't work on EBCDIC platforms.
There are a number of issues with regard to case-insensitive matching
in Unicode rules. See i
under Modifiers above.
This document varies from difficult to understand to completely and utterly opaque. The wandering prose riddled with jargon is hard to fathom in several places.
This document needs a rewrite that separates the tutorial content from the reference content.
Regexp Quote-Like Operators in perlop.
Gory details of parsing quoted constructs in perlop.
pos.
Mastering Regular Expressions by Jeffrey Friedl, published by O'Reilly and Associates.
perlreapi - Perl regular expression plugin interface
As of Perl 5.9.5 there is a new interface for plugging and using regular expression engines other than the default one.
Each engine is supposed to provide access to a constant structure of the following format:
- typedef struct regexp_engine {
- REGEXP* (*comp) (pTHX_
- const SV * const pattern, const U32 flags);
- I32 (*exec) (pTHX_
- REGEXP * const rx,
- char* stringarg,
- char* strend, char* strbeg,
- I32 minend, SV* screamer,
- void* data, U32 flags);
- char* (*intuit) (pTHX_
- REGEXP * const rx, SV *sv,
- char *strpos, char *strend, U32 flags,
- struct re_scream_pos_data_s *data);
- SV* (*checkstr) (pTHX_ REGEXP * const rx);
- void (*free) (pTHX_ REGEXP * const rx);
- void (*numbered_buff_FETCH) (pTHX_
- REGEXP * const rx,
- const I32 paren,
- SV * const sv);
- void (*numbered_buff_STORE) (pTHX_
- REGEXP * const rx,
- const I32 paren,
- SV const * const value);
- I32 (*numbered_buff_LENGTH) (pTHX_
- REGEXP * const rx,
- const SV * const sv,
- const I32 paren);
- SV* (*named_buff) (pTHX_
- REGEXP * const rx,
- SV * const key,
- SV * const value,
- U32 flags);
- SV* (*named_buff_iter) (pTHX_
- REGEXP * const rx,
- const SV * const lastkey,
- const U32 flags);
- SV* (*qr_package)(pTHX_ REGEXP * const rx);
- #ifdef USE_ITHREADS
- void* (*dupe) (pTHX_ REGEXP * const rx, CLONE_PARAMS *param);
- #endif
- REGEXP* (*op_comp) (...);
When a regexp is compiled, its engine
field is then set to point at
the appropriate structure, so that when it needs to be used Perl can find
the right routines to do so.
In order to install a new regexp handler, $^H{regcomp}
is set
to an integer which (when casted appropriately) resolves to one of these
structures. When compiling, the comp
method is executed, and the
resulting regexp
structure's engine field is expected to point back at
the same structure.
The pTHX_ symbol in the definition is a macro used by Perl under threading to provide an extra argument to the routine holding a pointer back to the interpreter that is executing the regexp. So under threading all routines get an extra argument.
- REGEXP* comp(pTHX_ const SV * const pattern, const U32 flags);
Compile the pattern stored in pattern
using the given flags
and
return a pointer to a prepared REGEXP
structure that can perform
the match. See The REGEXP structure below for an explanation of
the individual fields in the REGEXP struct.
The pattern
parameter is the scalar that was used as the
pattern. Previous versions of Perl would pass two char*
indicating
the start and end of the stringified pattern; the following snippet can
be used to get the old parameters:
Since any scalar can be passed as a pattern, it's possible to implement
an engine that does something with an array ("ook" =~ [ qw/ eek
hlagh / ]
) or with the non-stringified form of a compiled regular
expression ("ook" =~ qr/eek/
). Perl's own engine will always
stringify everything using the snippet above, but that doesn't mean
other engines have to.
The flags
parameter is a bitfield which indicates which of the
msixp
flags the regex was compiled with. It also contains
additional info, such as if use locale
is in effect.
The eogc
flags are stripped out before being passed to the comp
routine. The regex engine does not need to know if any of these
are set, as those flags should only affect what Perl does with the
pattern and its match variables, not how it gets compiled and
executed.
By the time the comp callback is called, some of these flags have
already had effect (noted below where applicable). However most of
their effect occurs after the comp callback has run, in routines that
read the rx->extflags
field which it populates.
In general the flags should be preserved in rx->extflags
after
compilation, although the regex engine might want to add or delete
some of them to invoke or disable some special behavior in Perl. The
flags along with any special behavior they cause are documented below:
The pattern modifiers:
/m - RXf_PMf_MULTILINE
If this is in rx->extflags
it will be passed to
Perl_fbm_instr
by pp_split
which will treat the subject string
as a multi-line string.
/s - RXf_PMf_SINGLELINE
/i - RXf_PMf_FOLD
/x - RXf_PMf_EXTENDED
If present on a regex, "#"
comments will be handled differently by the
tokenizer in some cases.
TODO: Document those cases.
/p - RXf_PMf_KEEPCOPY
TODO: Document this
The character set semantics are determined by an enum that is contained
in this field. This is still experimental and subject to change, but
the current interface returns the rules by use of the in-line function
get_regex_charset(const U32 flags)
. The only currently documented
value returned from it is REGEX_LOCALE_CHARSET, which is set if
use locale
is in effect. If present in rx->extflags
,
split will use the locale dependent definition of whitespace
when RXf_SKIPWHITE or RXf_WHITE is in effect. ASCII whitespace
is defined as per isSPACE, and by the internal
macros is_utf8_space
under UTF-8, and isSPACE_LC
under use
locale
.
Additional flags:
This flag was removed in perl 5.18.0. split ' '
is now special-cased
solely in the parser. RXf_SPLIT is still #defined, so you can test for it.
This is how it used to work:
If split is invoked as split ' '
or with no arguments (which
really means split(' ', $_)
, see split), Perl will
set this flag. The regex engine can then check for it and set the
SKIPWHITE and WHITE extflags. To do this, the Perl engine does:
- if (flags & RXf_SPLIT && r->prelen == 1 && r->precomp[0] == ' ')
- r->extflags |= (RXf_SKIPWHITE|RXf_WHITE);
These flags can be set during compilation to enable optimizations in
the split operator.
This flag was removed in perl 5.18.0. It is still #defined, so you can set it, but doing so will have no effect. This is how it used to work:
If the flag is present in rx->extflags
split will delete
whitespace from the start of the subject string before it's operated
on. What is considered whitespace depends on if the subject is a
UTF-8 string and if the RXf_PMf_LOCALE
flag is set.
If RXf_WHITE is set in addition to this flag, split will behave like
split " "
under the Perl engine.
Tells the split operator to split the target string on newlines
(\n
) without invoking the regex engine.
Perl's engine sets this if the pattern is /^/
(plen == 1 && *exp
== '^'
), even under /^/s
; see split. Of course a
different regex engine might want to use the same optimizations
with a different syntax.
Tells the split operator to split the target string on whitespace without invoking the regex engine. The definition of whitespace varies depending on if the target string is a UTF-8 string and on if RXf_PMf_LOCALE is set.
Perl's engine sets this flag if the pattern is \s+.
Tells the split operator to split the target string on characters. The definition of character varies depending on if the target string is a UTF-8 string.
Perl's engine sets this flag on empty patterns, this optimization
makes split //
much faster than it would otherwise be. It's even
faster than unpack.
Added in perl 5.18.0, this flag indicates that a regular expression might
perform an operation that would interfere with inplace substituion. For
instance it might contain lookbehind, or assign to non-magical variables
(such as $REGMARK and $REGERROR) during matching. s/// will skip
certain optimisations when this is set.
- I32 exec(pTHX_ REGEXP * const rx,
- char *stringarg, char* strend, char* strbeg,
- I32 minend, SV* screamer,
- void* data, U32 flags);
Execute a regexp. The arguments are
The regular expression to execute.
This strangely-named arg is the SV to be matched against. Note that the
actual char array to be matched against is supplied by the arguments
described below; the SV is just used to determine UTF8ness, pos() etc.
Pointer to the physical start of the string.
Pointer to the character following the physical end of the string (i.e.
the \0
).
Pointer to the position in the string where matching should start; it might
not be equal to strbeg
(for example in a later iteration of /.../g
).
Minimum length of string (measured in bytes from stringarg
) that must
match; if the engine reaches the end of the match but hasn't reached this
position in the string, it should fail.
Optimisation data; subject to change.
Optimisation flags; subject to change.
- char* intuit(pTHX_ REGEXP * const rx,
- SV *sv, char *strpos, char *strend,
- const U32 flags, struct re_scream_pos_data_s *data);
Find the start position where a regex match should be attempted,
or possibly if the regex engine should not be run because the
pattern can't match. This is called, as appropriate, by the core,
depending on the values of the extflags
member of the regexp
structure.
- SV* checkstr(pTHX_ REGEXP * const rx);
Return a SV containing a string that must appear in the pattern. Used
by split for optimising matches.
- void free(pTHX_ REGEXP * const rx);
Called by Perl when it is freeing a regexp pattern so that the engine
can release any resources pointed to by the pprivate
member of the
regexp
structure. This is only responsible for freeing private data;
Perl will handle releasing anything else contained in the regexp
structure.
Called to get/set the value of $`
, $'
, $&
and their named
equivalents, ${^PREMATCH}, ${^POSTMATCH} and $^{MATCH}, as well as the
numbered capture groups ($1
, $2
, ...).
The paren
parameter will be 1
for $1
, 2
for $2
and so
forth, and have these symbolic values for the special variables:
- ${^PREMATCH} RX_BUFF_IDX_CARET_PREMATCH
- ${^POSTMATCH} RX_BUFF_IDX_CARET_POSTMATCH
- ${^MATCH} RX_BUFF_IDX_CARET_FULLMATCH
- $` RX_BUFF_IDX_PREMATCH
- $' RX_BUFF_IDX_POSTMATCH
- $& RX_BUFF_IDX_FULLMATCH
Note that in Perl 5.17.3 and earlier, the last three constants were also used for the caret variants of the variables.
The names have been chosen by analogy with Tie::Scalar methods names with an additional LENGTH callback for efficiency. However named capture variables are currently not tied internally but implemented via magic.
- void numbered_buff_FETCH(pTHX_ REGEXP * const rx, const I32 paren,
- SV * const sv);
Fetch a specified numbered capture. sv
should be set to the scalar
to return, the scalar is passed as an argument rather than being
returned from the function because when it's called Perl already has a
scalar to store the value, creating another one would be
redundant. The scalar can be set with sv_setsv
, sv_setpvn
and
friends, see perlapi.
This callback is where Perl untaints its own capture variables under
taint mode (see perlsec). See the Perl_reg_numbered_buff_fetch
function in regcomp.c for how to untaint capture variables if
that's something you'd like your engine to do as well.
- void (*numbered_buff_STORE) (pTHX_
- REGEXP * const rx,
- const I32 paren,
- SV const * const value);
Set the value of a numbered capture variable. value
is the scalar
that is to be used as the new value. It's up to the engine to make
sure this is used as the new value (or reject it).
Example:
- if ("ook" =~ /(o*)/) {
- # 'paren' will be '1' and 'value' will be 'ee'
- $1 =~ tr/o/e/;
- }
Perl's own engine will croak on any attempt to modify the capture
variables, to do this in another engine use the following callback
(copied from Perl_reg_numbered_buff_store
):
- void
- Example_reg_numbered_buff_store(pTHX_
- REGEXP * const rx,
- const I32 paren,
- SV const * const value)
- {
- PERL_UNUSED_ARG(rx);
- PERL_UNUSED_ARG(paren);
- PERL_UNUSED_ARG(value);
- if (!PL_localizing)
- Perl_croak(aTHX_ PL_no_modify);
- }
Actually Perl will not always croak in a statement that looks like it would modify a numbered capture variable. This is because the STORE callback will not be called if Perl can determine that it doesn't have to modify the value. This is exactly how tied variables behave in the same situation:
Because $sv
is undef when the y/// operator is applied to it,
the transliteration won't actually execute and the program won't
die. This is different to how 5.8 and earlier versions behaved
since the capture variables were READONLY variables then; now they'll
just die when assigned to in the default engine.
- I32 numbered_buff_LENGTH (pTHX_
- REGEXP * const rx,
- const SV * const sv,
- const I32 paren);
Get the length of a capture variable. There's a special callback
for this so that Perl doesn't have to do a FETCH and run length on
the result, since the length is (in Perl's case) known from an offset
stored in rx->offs
, this is much more efficient:
- I32 s1 = rx->offs[paren].start;
- I32 s2 = rx->offs[paren].end;
- I32 len = t1 - s1;
This is a little bit more complex in the case of UTF-8, see what
Perl_reg_numbered_buff_length
does with
is_utf8_string_loclen.
Called to get/set the value of %+
and %-
, as well as by some
utility functions in re.
There are two callbacks, named_buff
is called in all the cases the
FETCH, STORE, DELETE, CLEAR, EXISTS and SCALAR Tie::Hash callbacks
would be on changes to %+
and %-
and named_buff_iter
in the
same cases as FIRSTKEY and NEXTKEY.
The flags
parameter can be used to determine which of these
operations the callbacks should respond to. The following flags are
currently defined:
Which Tie::Hash operation is being performed from the Perl level on
%+
or %+
, if any:
- RXapif_FETCH
- RXapif_STORE
- RXapif_DELETE
- RXapif_CLEAR
- RXapif_EXISTS
- RXapif_SCALAR
- RXapif_FIRSTKEY
- RXapif_NEXTKEY
If %+
or %-
is being operated on, if any.
- RXapif_ONE /* %+ */
- RXapif_ALL /* %- */
If this is being called as re::regname
, re::regnames
or
re::regnames_count
, if any. The first two will be combined with
RXapif_ONE
or RXapif_ALL
.
- RXapif_REGNAME
- RXapif_REGNAMES
- RXapif_REGNAMES_COUNT
Internally %+
and %-
are implemented with a real tied interface
via Tie::Hash::NamedCapture. The methods in that package will call
back into these functions. However the usage of
Tie::Hash::NamedCapture for this purpose might change in future
releases. For instance this might be implemented by magic instead
(would need an extension to mgvtbl).
- SV* (*named_buff) (pTHX_ REGEXP * const rx, SV * const key,
- SV * const value, U32 flags);
- SV* (*named_buff_iter) (pTHX_
- REGEXP * const rx,
- const SV * const lastkey,
- const U32 flags);
- SV* qr_package(pTHX_ REGEXP * const rx);
The package the qr// magic object is blessed into (as seen by ref
qr//
). It is recommended that engines change this to their package
name for identification regardless of if they implement methods
on the object.
The package this method returns should also have the internal
Regexp
package in its @ISA
. qr//->isa("Regexp") should always
be true regardless of what engine is being used.
Example implementation might be:
- SV*
- Example_qr_package(pTHX_ REGEXP * const rx)
- {
- PERL_UNUSED_ARG(rx);
- return newSVpvs("re::engine::Example");
- }
Any method calls on an object created with qr// will be dispatched to the
package as a normal object.
To retrieve the REGEXP
object from the scalar in an XS function use
the SvRX
macro, see REGEXP Functions in perlapi.
- void meth(SV * rv)
- PPCODE:
- REGEXP * re = SvRX(sv);
- void* dupe(pTHX_ REGEXP * const rx, CLONE_PARAMS *param);
On threaded builds a regexp may need to be duplicated so that the pattern
can be used by multiple threads. This routine is expected to handle the
duplication of any private data pointed to by the pprivate
member of
the regexp
structure. It will be called with the preconstructed new
regexp
structure as an argument, the pprivate
member will point at
the old private structure, and it is this routine's responsibility to
construct a copy and return a pointer to it (which Perl will then use to
overwrite the field as passed to this routine.)
This allows the engine to dupe its private data but also if necessary modify the final structure if it really must.
On unthreaded builds this field doesn't exist.
This is private to the Perl core and subject to change. Should be left null.
The REGEXP struct is defined in regexp.h. All regex engines must be able to correctly build such a structure in their comp routine.
The REGEXP structure contains all the data that Perl needs to be aware of to properly work with the regular expression. It includes data about optimisations that Perl can use to determine if the regex engine should really be used, and various other control info that is needed to properly execute patterns in various contexts, such as if the pattern anchored in some way, or what flags were used during the compile, or if the program contains special constructs that Perl needs to be aware of.
In addition it contains two fields that are intended for the private
use of the regex engine that compiled the pattern. These are the
intflags
and pprivate
members. pprivate
is a void pointer to
an arbitrary structure, whose use and management is the responsibility
of the compiling engine. Perl will never modify either of these
values.
- typedef struct regexp {
- /* what engine created this regexp? */
- const struct regexp_engine* engine;
- /* what re is this a lightweight copy of? */
- struct regexp* mother_re;
- /* Information about the match that the Perl core uses to manage
- * things */
- U32 extflags; /* Flags used both externally and internally */
- I32 minlen; /* mininum possible number of chars in */
- string to match */
- I32 minlenret; /* mininum possible number of chars in $& */
- U32 gofs; /* chars left of pos that we search from */
- /* substring data about strings that must appear
- in the final match, used for optimisations */
- struct reg_substr_data *substrs;
- U32 nparens; /* number of capture groups */
- /* private engine specific data */
- U32 intflags; /* Engine Specific Internal flags */
- void *pprivate; /* Data private to the regex engine which
- created this object. */
- /* Data about the last/current match. These are modified during
- * matching*/
- U32 lastparen; /* highest close paren matched ($+) */
- U32 lastcloseparen; /* last close paren matched ($^N) */
- regexp_paren_pair *swap; /* Swap copy of *offs */
- regexp_paren_pair *offs; /* Array of offsets for (@-) and
- (@+) */
- char *subbeg; /* saved or original string so \digit works
- forever. */
- SV_SAVED_COPY /* If non-NULL, SV which is COW from original */
- I32 sublen; /* Length of string pointed by subbeg */
- I32 suboffset; /* byte offset of subbeg from logical start of
- str */
- I32 subcoffset; /* suboffset equiv, but in chars (for @-/@+) */
- /* Information about the match that isn't often used */
- I32 prelen; /* length of precomp */
- const char *precomp; /* pre-compilation regular expression */
- char *wrapped; /* wrapped version of the pattern */
- I32 wraplen; /* length of wrapped */
- I32 seen_evals; /* number of eval groups in the pattern - for
- security checks */
- HV *paren_names; /* Optional hash of paren names */
- /* Refcount of this regexp */
- I32 refcnt; /* Refcount of this regexp */
- } regexp;
The fields are discussed in more detail below:
engine
This field points at a regexp_engine
structure which contains pointers
to the subroutines that are to be used for performing a match. It
is the compiling routine's responsibility to populate this field before
returning the regexp object.
Internally this is set to NULL
unless a custom engine is specified in
$^H{regcomp}
, Perl's own set of callbacks can be accessed in the struct
pointed to by RE_ENGINE_PTR
.
mother_re
TODO, see http://www.mail-archive.com/perl5-changes@perl.org/msg17328.html
extflags
This will be used by Perl to see what flags the regexp was compiled with, this will normally be set to the value of the flags parameter by the comp callback. See the comp documentation for valid flags.
minlen
minlenret
The minimum string length (in characters) required for the pattern to match. This is used to prune the search space by not bothering to match any closer to the end of a string than would allow a match. For instance there is no point in even starting the regex engine if the minlen is 10 but the string is only 5 characters long. There is no way that the pattern can match.
minlenret
is the minimum length (in characters) of the string that would
be found in $& after a match.
The difference between minlen
and minlenret
can be seen in the
following pattern:
- /ns(?=\d)/
where the minlen
would be 3 but minlenret
would only be 2 as the \d is
required to match but is not actually
included in the matched content. This
distinction is particularly important as the substitution logic uses the
minlenret
to tell if it can do in-place substitutions (these can
result in considerable speed-up).
gofs
Left offset from pos() to start match at.
substrs
Substring data about strings that must appear in the final match. This is currently only used internally by Perl's engine, but might be used in the future for all engines for optimisations.
nparens
, lastparen
, and lastcloseparen
These fields are used to keep track of how many paren groups could be matched in the pattern, which was the last open paren to be entered, and which was the last close paren to be entered.
intflags
The engine's private copy of the flags the pattern was compiled with. Usually
this is the same as extflags
unless the engine chose to modify one of them.
pprivate
A void* pointing to an engine-defined
data structure. The Perl engine uses the
regexp_internal
structure (see Base Structures in perlreguts) but a custom
engine should use something else.
swap
Unused. Left in for compatibility with Perl 5.10.0.
offs
A regexp_paren_pair
structure which defines offsets into the string being
matched which correspond to the $&
and $1
, $2
etc. captures, the
regexp_paren_pair
struct is defined as follows:
- typedef struct regexp_paren_pair {
- I32 start;
- I32 end;
- } regexp_paren_pair;
If ->offs[num].start
or ->offs[num].end
is -1
then that
capture group did not match.
->offs[0].start/end
represents $&
(or
${^MATCH}
under //p
) and ->offs[paren].end
matches $$paren
where
$paren
= 1>.
precomp
prelen
Used for optimisations. precomp
holds a copy of the pattern that
was compiled and prelen
its length. When a new pattern is to be
compiled (such as inside a loop) the internal regcomp
operator
checks if the last compiled REGEXP
's precomp
and prelen
are equivalent to the new one, and if so uses the old pattern instead
of compiling a new one.
The relevant snippet from Perl_pp_regcomp
:
- if (!re || !re->precomp || re->prelen != (I32)len ||
- memNE(re->precomp, t, len))
- /* Compile a new pattern */
paren_names
This is a hash used internally to track named capture groups and their offsets. The keys are the names of the buffers the values are dualvars, with the IV slot holding the number of buffers with the given name and the pv being an embedded array of I32. The values may also be contained independently in the data array in cases where named backreferences are used.
substrs
Holds information on the longest string that must occur at a fixed offset from the start of the pattern, and the longest string that must occur at a floating offset from the start of the pattern. Used to do Fast-Boyer-Moore searches on the string to find out if its worth using the regex engine at all, and if so where in the string to search.
subbeg
sublen
saved_copy
suboffset
subcoffset
Used during the execution phase for managing search and replace patterns,
and for providing the text for $&
, $1
etc. subbeg
points to a
buffer (either the original string, or a copy in the case of
RX_MATCH_COPIED(rx)
), and sublen
is the length of the buffer. The
RX_OFFS
start and end indices index into this buffer.
In the presence of the REXEC_COPY_STR
flag, but with the addition of
the REXEC_COPY_SKIP_PRE
or REXEC_COPY_SKIP_POST
flags, an engine
can choose not to copy the full buffer (although it must still do so in
the presence of RXf_PMf_KEEPCOPY
or the relevant bits being set in
PL_sawampersand
). In this case, it may set suboffset
to indicate the
number of bytes from the logical start of the buffer to the physical start
(i.e. subbeg
). It should also set subcoffset
, the number of
characters in the offset. The latter is needed to support @-
and @+
which work in characters, not bytes.
wrapped
wraplen
Stores the string qr// stringifies to. The Perl engine for example
stores (?^:eek)
in the case of qr/eek/.
When using a custom engine that doesn't support the (?:) construct
for inline modifiers, it's probably best to have qr// stringify to
the supplied pattern, note that this will create undesired patterns in
cases such as:
There's no solution for this problem other than making the custom
engine understand a construct like (?:).
seen_evals
This stores the number of eval groups in
the pattern. This is used for security
purposes when embedding compiled regexes into larger patterns with qr//.
refcnt
The number of times the structure is referenced. When this falls to 0, the regexp is automatically freed by a call to pregfree. This should be set to 1 in each engine's comp routine.
Originally part of perlreguts.
Originally written by Yves Orton, expanded by Ævar Arnfjörð Bjarmason.
Copyright 2006 Yves Orton and 2007 Ævar Arnfjörð Bjarmason.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
perlrebackslash - Perl Regular Expression Backslash Sequences and Escapes
The top level documentation about Perl regular expressions is found in perlre.
This document describes all backslash and escape sequences. After explaining the role of the backslash, it lists all the sequences that have a special meaning in Perl regular expressions (in alphabetical order), then describes each of them.
Most sequences are described in detail in different documents; the primary purpose of this document is to have a quick reference guide describing all backslash and escape sequences.
In a regular expression, the backslash can perform one of two tasks:
it either takes away the special meaning of the character following it
(for instance, \| matches a vertical bar, it's not an alternation),
or it is the start of a backslash or escape sequence.
The rules determining what it is are quite simple: if the character following the backslash is an ASCII punctuation (non-word) character (that is, anything that is not a letter, digit, or underscore), then the backslash just takes away any special meaning of the character following it.
If the character following the backslash is an ASCII letter or an ASCII digit, then the sequence may be special; if so, it's listed below. A few letters have not been used yet, so escaping them with a backslash doesn't change them to be special. A future version of Perl may assign a special meaning to them, so if you have warnings turned on, Perl issues a warning if you use such a sequence. [1].
It is however guaranteed that backslash or escape sequences never have a punctuation character following the backslash, not now, and not in a future version of Perl 5. So it is safe to put a backslash in front of a non-word character.
Note that the backslash itself is special; if you want to match a backslash,
you have to escape the backslash with a backslash: /\\/
matches a single
backslash.
There is one exception. If you use an alphanumeric character as the delimiter of your pattern (which you probably shouldn't do for readability reasons), you have to escape the delimiter if you want to match it. Perl won't warn then. See also Gory details of parsing quoted constructs in perlop.
Those not usable within a bracketed character class (like [\da-z]
) are marked
as Not in [].
- \000 Octal escape sequence. See also \o{}.
- \1 Absolute backreference. Not in [].
- \a Alarm or bell.
- \A Beginning of string. Not in [].
- \b Word/non-word boundary. (Backspace in []).
- \B Not a word/non-word boundary. Not in [].
- \cX Control-X.
- \C Single octet, even under UTF-8. Not in [].
- \d Character class for digits.
- \D Character class for non-digits.
- \e Escape character.
- \E Turn off \Q, \L and \U processing. Not in [].
- \f Form feed.
- \F Foldcase till \E. Not in [].
- \g{}, \g1 Named, absolute or relative backreference.
- Not in [].
- \G Pos assertion. Not in [].
- \h Character class for horizontal whitespace.
- \H Character class for non horizontal whitespace.
- \k{}, \k<>, \k'' Named backreference. Not in [].
- \K Keep the stuff left of \K. Not in [].
- \l Lowercase next character. Not in [].
- \L Lowercase till \E. Not in [].
- \n (Logical) newline character.
- \N Any character but newline. Not in [].
- \N{} Named or numbered (Unicode) character or sequence.
- \o{} Octal escape sequence.
- \p{}, \pP Character with the given Unicode property.
- \P{}, \PP Character without the given Unicode property.
- \Q Quote (disable) pattern metacharacters till \E. Not
- in [].
- \r Return character.
- \R Generic new line. Not in [].
- \s Character class for whitespace.
- \S Character class for non whitespace.
- \t Tab character.
- \u Titlecase next character. Not in [].
- \U Uppercase till \E. Not in [].
- \v Character class for vertical whitespace.
- \V Character class for non vertical whitespace.
- \w Character class for word characters.
- \W Character class for non-word characters.
- \x{}, \x00 Hexadecimal escape sequence.
- \X Unicode "extended grapheme cluster". Not in [].
- \z End of string. Not in [].
- \Z End of string. Not in [].
A handful of characters have a dedicated character escape. The following table shows them, along with their ASCII code points (in decimal and hex), their ASCII name, the control escape on ASCII platforms and a short description. (For EBCDIC platforms, see OPERATOR DIFFERENCES in perlebcdic.)
\b
is the backspace character only inside a character class. Outside a
character class, \b
is a word/non-word boundary.
\n
matches a logical newline. Perl converts between \n
and your
OS's native newline character when reading from or writing to text files.
- $str =~ /\t/; # Matches if $str contains a (horizontal) tab.
\c
is used to denote a control character; the character following \c
determines the value of the construct. For example the value of \cA
is
chr(1), and the value of \cb
is chr(2), etc.
The gory details are in Regexp Quote-Like Operators in perlop. A complete
list of what chr(1), etc. means for ASCII and EBCDIC platforms is in
OPERATOR DIFFERENCES in perlebcdic.
Note that \c\
alone at the end of a regular expression (or doubled-quoted
string) is not valid. The backslash must be followed by another character.
That is, \c\X means chr(28) . 'X' for all characters X.
To write platform-independent code, you must use \N{NAME} instead, like
\N{ESCAPE}
or \N{U+001B}
, see charnames.
Mnemonic: control character.
- $str =~ /\cK/; # Matches if $str contains a vertical tab (control-K).
Unicode characters have a Unicode name and numeric code point (ordinal)
value. Use the
\N{}
construct to specify a character by either of these values.
Certain sequences of characters also have names.
To specify by name, the name of the character or character sequence goes between the curly braces.
To specify a character by Unicode code point, use the form \N{U+code
point}, where code point is a number in hexadecimal that gives the
code point that Unicode has assigned to the desired character. It is
customary but not required to use leading zeros to pad the number to 4
digits. Thus \N{U+0041}
means LATIN CAPITAL LETTER A
, and you will
rarely see it written without the two leading zeros. \N{U+0041}
means
"A" even on EBCDIC machines (where the ordinal value of "A" is not 0x41).
It is even possible to give your own names to characters and character sequences. For details, see charnames.
(There is an expanded internal form that you may see in debug output:
\N{U+code point.code point...}.
The ...
means any number of these code points separated by dots.
This represents the sequence formed by the characters. This is an internal
form only, subject to change, and you should not try to use it yourself.)
Mnemonic: Named character.
Note that a character or character sequence expressed as a named or numbered character is considered a character without special meaning by the regex engine, and will match "as is".
- $str =~ /\N{THAI CHARACTER SO SO}/; # Matches the Thai SO SO character
- use charnames 'Cyrillic'; # Loads Cyrillic names.
- $str =~ /\N{ZHE}\N{KA}/; # Match "ZHE" followed by "KA".
There are two forms of octal escapes. Each is used to specify a character by its code point specified in octal notation.
One form, available starting in Perl 5.14 looks like \o{...}
, where the dots
represent one or more octal digits. It can be used for any Unicode character.
It was introduced to avoid the potential problems with the other form, available in all Perls. That form consists of a backslash followed by three octal digits. One problem with this form is that it can look exactly like an old-style backreference (see Disambiguation rules between old-style octal escapes and backreferences below.) You can avoid this by making the first of the three digits always a zero, but that makes \077 the largest code point specifiable.
In some contexts, a backslash followed by two or even one octal digits may be interpreted as an octal escape, sometimes with a warning, and because of some bugs, sometimes with surprising results. Also, if you are creating a regex out of smaller snippets concatenated together, and you use fewer than three digits, the beginning of one snippet may be interpreted as adding digits to the ending of the snippet before it. See Absolute referencing for more discussion and examples of the snippet problem.
Note that a character expressed as an octal escape is considered a character without special meaning by the regex engine, and will match "as is".
To summarize, the \o{}
form is always safe to use, and the other form is
safe to use for code points through \077 when you use exactly three digits to
specify them.
Mnemonic: 0ctal or octal.
- $str = "Perl";
- $str =~ /\o{120}/; # Match, "\120" is "P".
- $str =~ /\120/; # Same.
- $str =~ /\o{120}+/; # Match, "\120" is "P",
- # it's repeated at least once.
- $str =~ /\120+/; # Same.
- $str =~ /P\053/; # No match, "\053" is "+" and taken literally.
- /\o{23073}/ # Black foreground, white background smiling face.
- /\o{4801234567}/ # Raises a warning, and yields chr(4).
Octal escapes of the \000
form outside of bracketed character classes
potentially clash with old-style backreferences (see Absolute referencing
below). They both consist of a backslash followed by numbers. So Perl has to
use heuristics to determine whether it is a backreference or an octal escape.
Perl uses the following rules to disambiguate:
If the backslash is followed by a single digit, it's a backreference.
If the first digit following the backslash is a 0, it's an octal escape.
If the number following the backslash is N (in decimal), and Perl already has seen N capture groups, Perl considers this a backreference. Otherwise, it considers it an octal escape. If N has more than three digits, Perl takes only the first three for the octal escape; the rest are matched as is.
- my $pat = "(" x 999;
- $pat .= "a";
- $pat .= ")" x 999;
- /^($pat)\1000$/; # Matches 'aa'; there are 1000 capture groups.
- /^$pat\1000$/; # Matches 'a@0'; there are 999 capture groups
- # and \1000 is seen as \100 (a '@') and a '0'.
You can force a backreference interpretation always by using the \g{...}
form. You can the force an octal interpretation always by using the \o{...}
form, or for numbers up through \077 (= 63 decimal), by using three digits,
beginning with a "0".
Like octal escapes, there are two forms of hexadecimal escapes, but both start
with the same thing, \x
. This is followed by either exactly two hexadecimal
digits forming a number, or a hexadecimal number of arbitrary length surrounded
by curly braces. The hexadecimal number is the code point of the character you
want to express.
Note that a character expressed as one of these escapes is considered a character without special meaning by the regex engine, and will match "as is".
Mnemonic: hexadecimal.
- $str = "Perl";
- $str =~ /\x50/; # Match, "\x50" is "P".
- $str =~ /\x50+/; # Match, "\x50" is "P", it is repeated at least once
- $str =~ /P\x2B/; # No match, "\x2B" is "+" and taken literally.
- /\x{2603}\x{2602}/ # Snowman with an umbrella.
- # The Unicode character 2603 is a snowman,
- # the Unicode character 2602 is an umbrella.
- /\x{263B}/ # Black smiling face.
- /\x{263b}/ # Same, the hex digits A - F are case insensitive.
A number of backslash sequences have to do with changing the character,
or characters following them. \l
will lowercase the character following
it, while \u
will uppercase (or, more accurately, titlecase) the
character following it. They provide functionality similar to the
functions lcfirst and ucfirst.
To uppercase or lowercase several characters, one might want to use
\L
or \U
, which will lowercase/uppercase all characters following
them, until either the end of the pattern or the next occurrence of
\E
, whichever comes first. They provide functionality similar to what
the functions lc and uc provide.
\Q
is used to quote (disable) pattern metacharacters, up to the next
\E
or the end of the pattern. \Q
adds a backslash to any character
that could have special meaning to Perl. In the ASCII range, it quotes
every character that isn't a letter, digit, or underscore. See
quotemeta for details on what gets quoted for non-ASCII
code points. Using this ensures that any character between \Q
and
\E
will be matched literally, not interpreted as a metacharacter by
the regex engine.
\F
can be used to casefold all characters following, up to the next \E
or the end of the pattern. It provides the functionality similar to
the fc function.
Mnemonic: Lowercase, Uppercase, Fold-case, Quotemeta, End.
- $sid = "sid";
- $greg = "GrEg";
- $miranda = "(Miranda)";
- $str =~ /\u$sid/; # Matches 'Sid'
- $str =~ /\L$greg/; # Matches 'greg'
- $str =~ /\Q$miranda\E/; # Matches '(Miranda)', as if the pattern
- # had been written as /\(Miranda\)/
Perl regular expressions have a large range of character classes. Some of the character classes are written as a backslash sequence. We will briefly discuss those here; full details of character classes can be found in perlrecharclass.
\w
is a character class that matches any single word character
(letters, digits, Unicode marks, and connector punctuation (like the
underscore)). \d
is a character class that matches any decimal
digit, while the character class \s matches any whitespace character.
New in perl 5.10.0 are the classes \h
and \v
which match horizontal
and vertical whitespace characters.
The exact set of characters matched by \d
, \s, and \w
varies
depending on various pragma and regular expression modifiers. It is
possible to restrict the match to the ASCII range by using the /a
regular expression modifier. See perlrecharclass.
The uppercase variants (\W
, \D
, \S
, \H
, and \V
) are
character classes that match, respectively, any character that isn't a
word character, digit, whitespace, horizontal whitespace, or vertical
whitespace.
Mnemonics: word, digit, space, horizontal, vertical.
\pP
(where P
is a single letter) and \p{Property}
are used to
match a character that matches the given Unicode property; properties
include things like "letter", or "thai character". Capitalizing the
sequence to \PP
and \P{Property}
make the sequence match a character
that doesn't match the given Unicode property. For more details, see
Backslash sequences in perlrecharclass and
Unicode Character Properties in perlunicode.
Mnemonic: property.
If capturing parenthesis are used in a regular expression, we can refer to the part of the source string that was matched, and match exactly the same thing. There are three ways of referring to such backreference: absolutely, relatively, and by name.
Either \gN (starting in Perl 5.10.0), or \N (old-style) where N
is a positive (unsigned) decimal number of any length is an absolute reference
to a capturing group.
N refers to the Nth set of parentheses, so \gN refers to whatever has
been matched by that set of parentheses. Thus \g1
refers to the first
capture group in the regex.
The \gN form can be equivalently written as \g{N}
which avoids ambiguity when building a regex by concatenating shorter
strings. Otherwise if you had a regex qr/$a$b/, and $a
contained
"\g1"
, and $b
contained "37"
, you would get /\g137/
which is
probably not what you intended.
In the \N form, N must not begin with a "0", and there must be at
least N capturing groups, or else N is considered an octal escape
(but something like \18
is the same as \0018
; that is, the octal escape
"\001"
followed by a literal digit "8"
).
Mnemonic: group.
- /(\w+) \g1/; # Finds a duplicated word, (e.g. "cat cat").
- /(\w+) \1/; # Same thing; written old-style.
- /(.)(.)\g2\g1/; # Match a four letter palindrome (e.g. "ABBA").
\g-N (starting in Perl 5.10.0) is used for relative addressing. (It can
be written as \g{-N.) It refers to the Nth group before the
\g{-N}.
The big advantage of this form is that it makes it much easier to write patterns with references that can be interpolated in larger patterns, even if the larger pattern also contains capture groups.
- /(A) # Group 1
- ( # Group 2
- (B) # Group 3
- \g{-1} # Refers to group 3 (B)
- \g{-3} # Refers to group 1 (A)
- )
- /x; # Matches "ABBA".
- my $qr = qr /(.)(.)\g{-2}\g{-1}/; # Matches 'abab', 'cdcd', etc.
- /$qr$qr/ # Matches 'ababcdcd'.
\g{name} (starting in Perl 5.10.0) can be used to back refer to a
named capture group, dispensing completely with having to think about capture
buffer positions.
To be compatible with .Net regular expressions, \g{name}
may also be
written as \k{name}
, \k<name>
or \k'name'.
To prevent any ambiguity, name must not start with a digit nor contain a hyphen.
- /(?<word>\w+) \g{word}/ # Finds duplicated word, (e.g. "cat cat")
- /(?<word>\w+) \k{word}/ # Same.
- /(?<word>\w+) \k<word>/ # Same.
- /(?<letter1>.)(?<letter2>.)\g{letter2}\g{letter1}/
- # Match a four letter palindrome (e.g. "ABBA")
Assertions are conditions that have to be true; they don't actually match parts of the substring. There are six assertions that are written as backslash sequences.
\A
only matches at the beginning of the string. If the /m modifier
isn't used, then /\A/
is equivalent to /^/
. However, if the /m
modifier is used, then /^/
matches internal newlines, but the meaning
of /\A/
isn't changed by the /m modifier. \A
matches at the beginning
of the string regardless whether the /m modifier is used.
\z
and \Z
match at the end of the string. If the /m modifier isn't
used, then /\Z/
is equivalent to /$/
; that is, it matches at the
end of the string, or one before the newline at the end of the string. If the
/m modifier is used, then /$/
matches at internal newlines, but the
meaning of /\Z/
isn't changed by the /m modifier. \Z
matches at
the end of the string (or just before a trailing newline) regardless whether
the /m modifier is used.
\z
is just like \Z
, except that it does not match before a trailing
newline. \z
matches at the end of the string only, regardless of the
modifiers used, and not just before a newline. It is how to anchor the
match to the true end of the string under all conditions.
\G
is usually used only in combination with the /g modifier. If the
/g modifier is used and the match is done in scalar context, Perl
remembers where in the source string the last match ended, and the next time,
it will start the match from where it ended the previous time.
\G
matches the point where the previous match on that string ended,
or the beginning of that string if there was no previous match.
Mnemonic: Global.
\b
matches at any place between a word and a non-word character; \B
matches at any place between characters where \b
doesn't match. \b
and \B
assume there's a non-word character before the beginning and after
the end of the source string; so \b
will match at the beginning (or end)
of the source string if the source string begins (or ends) with a word
character. Otherwise, \B
will match.
Do not use something like \b=head\d\b
and expect it to match the
beginning of a line. It can't, because for there to be a boundary before
the non-word "=", there must be a word character immediately previous.
All boundary determinations look for word characters alone, not for
non-words characters nor for string ends. It may help to understand how
<\b> and <\B> work by equating them as follows:
- \b really means (?:(?<=\w)(?!\w)|(?<!\w)(?=\w))
- \B really means (?:(?<=\w)(?=\w)|(?<!\w)(?!\w))
Mnemonic: boundary.
- "cat" =~ /\Acat/; # Match.
- "cat" =~ /cat\Z/; # Match.
- "cat\n" =~ /cat\Z/; # Match.
- "cat\n" =~ /cat\z/; # No match.
- "cat" =~ /\bcat\b/; # Matches.
- "cats" =~ /\bcat\b/; # No match.
- "cat" =~ /\bcat\B/; # No match.
- "cats" =~ /\bcat\B/; # Match.
- while ("cat dog" =~ /(\w+)/g) {
- print $1; # Prints 'catdog'
- }
- while ("cat dog" =~ /\G(\w+)/g) {
- print $1; # Prints 'cat'
- }
Here we document the backslash sequences that don't fall in one of the categories above. These are:
\C
always matches a single octet, even if the source string is encoded
in UTF-8 format, and the character to be matched is a multi-octet character.
This is very dangerous, because it violates
the logical character abstraction and can cause UTF-8 sequences to become malformed.
Mnemonic: oCtet.
This appeared in perl 5.10.0. Anything matched left of \K
is
not included in $&
, and will not be replaced if the pattern is
used in a substitution. This lets you write s/PAT1 \K PAT2/REPL/x
instead of s/(PAT1) PAT2/${1}REPL/x
or s/(?<=PAT1) PAT2/REPL/x
.
Mnemonic: Keep.
This feature, available starting in v5.12, matches any character
that is not a newline. It is a short-hand for writing [^\n], and is
identical to the . metasymbol, except under the /s flag, which changes
the meaning of ., but not \N
.
Note that \N{...}
can mean a
named or numbered character .
Mnemonic: Complement of \n.
\R
matches a generic newline; that is, anything considered a
linebreak sequence by Unicode. This includes all characters matched by
\v
(vertical whitespace), and the multi character sequence "\x0D\x0A"
(carriage return followed by a line feed, sometimes called the network
newline; it's the end of line sequence used in Microsoft text files opened
in binary mode). \R
is equivalent to (?>\x0D\x0A|\v)
. (The
reason it doesn't backtrack is that the sequence is considered
inseparable. That means that
- "\x0D\x0A" =~ /^\R\x0A$/ # No match
fails, because the \R
matches the entire string, and won't backtrack
to match just the "\x0D"
.) Since
\R
can match a sequence of more than one character, it cannot be put
inside a bracketed character class; /[\R]/
is an error; use \v
instead. \R
was introduced in perl 5.10.0.
Note that this does not respect any locale that might be in effect; it matches according to the platform's native character set.
Mnemonic: none really. \R
was picked because PCRE already uses \R
,
and more importantly because Unicode recommends such a regular expression
metacharacter, and suggests \R
as its notation.
This matches a Unicode extended grapheme cluster.
\X
matches quite well what normal (non-Unicode-programmer) usage
would consider a single character. As an example, consider a G with some sort
of diacritic mark, such as an arrow. There is no such single character in
Unicode, but one can be composed by using a G followed by a Unicode "COMBINING
UPWARDS ARROW BELOW", and would be displayed by Unicode-aware software as if it
were a single character.
Mnemonic: eXtended Unicode character.
- "\x{256}" =~ /^\C\C$/; # Match as chr (0x256) takes
- # 2 octets in UTF-8.
- $str =~ s/foo\Kbar/baz/g; # Change any 'bar' following a 'foo' to 'baz'
- $str =~ s/(.)\K\g1//g; # Delete duplicated characters.
- "\n" =~ /^\R$/; # Match, \n is a generic newline.
- "\r" =~ /^\R$/; # Match, \r is a generic newline.
- "\r\n" =~ /^\R$/; # Match, \r\n is a generic newline.
- "P\x{307}" =~ /^\X$/ # \X matches a P with a dot above.
perlrecharclass - Perl Regular Expression Character Classes
The top level documentation about Perl regular expressions is found in perlre.
This manual page discusses the syntax and use of character classes in Perl regular expressions.
A character class is a way of denoting a set of characters in such a way that one character of the set is matched. It's important to remember that: matching a character class consumes exactly one character in the source string. (The source string is the string the regular expression is matched against.)
There are three types of character classes in Perl regular expressions: the dot, backslash sequences, and the form enclosed in square brackets. Keep in mind, though, that often the term "character class" is used to mean just the bracketed form. Certainly, most Perl documentation does that.
The dot (or period), . is probably the most used, and certainly
the most well-known character class. By default, a dot matches any
character, except for the newline. That default can be changed to
add matching the newline by using the single line modifier: either
for the entire regular expression with the /s modifier, or
locally with (?s). (The \N
backslash sequence, described
below, matches any character except newline without regard to the
single line modifier.)
Here are some examples:
- "a" =~ /./ # Match
- "." =~ /./ # Match
- "" =~ /./ # No match (dot has to match a character)
- "\n" =~ /./ # No match (dot does not match a newline)
- "\n" =~ /./s # Match (global 'single line' modifier)
- "\n" =~ /(?s:.)/ # Match (local 'single line' modifier)
- "ab" =~ /^.$/ # No match (dot matches one character)
A backslash sequence is a sequence of characters, the first one of which is a backslash. Perl ascribes special meaning to many such sequences, and some of these are character classes. That is, they match a single character each, provided that the character belongs to the specific set of characters defined by the sequence.
Here's a list of the backslash sequences that are character classes. They are discussed in more detail below. (For the backslash sequences that aren't character classes, see perlrebackslash.)
- \d Match a decimal digit character.
- \D Match a non-decimal-digit character.
- \w Match a "word" character.
- \W Match a non-"word" character.
- \s Match a whitespace character.
- \S Match a non-whitespace character.
- \h Match a horizontal whitespace character.
- \H Match a character that isn't horizontal whitespace.
- \v Match a vertical whitespace character.
- \V Match a character that isn't vertical whitespace.
- \N Match a character that isn't a newline.
- \pP, \p{Prop} Match a character that has the given Unicode property.
- \PP, \P{Prop} Match a character that doesn't have the Unicode property
\N
, available starting in v5.12, like the dot, matches any
character that is not a newline. The difference is that \N
is not influenced
by the single line regular expression modifier (see The dot above). Note
that the form \N{...}
may mean something completely different. When the
{...}
is a quantifier, it means to match a non-newline
character that many times. For example, \N{3}
means to match 3
non-newlines; \N{5,}
means to match 5 or more non-newlines. But if {...}
is not a legal quantifier, it is presumed to be a named character. See
charnames for those. For example, none of \N{COLON}
, \N{4F}, and
\N{F4}
contain legal quantifiers, so Perl will try to find characters whose
names are respectively COLON
, 4F, and F4
.
\d
matches a single character considered to be a decimal digit.
If the /a
regular expression modifier is in effect, it matches [0-9].
Otherwise, it
matches anything that is matched by \p{Digit}
, which includes [0-9].
(An unlikely possible exception is that under locale matching rules, the
current locale might not have [0-9] matched by \d
, and/or might match
other characters whose code point is less than 256. Such a locale
definition would be in violation of the C language standard, but Perl
doesn't currently assume anything in regard to this.)
What this means is that unless the /a
modifier is in effect \d
not
only matches the digits '0' - '9', but also Arabic, Devanagari, and
digits from other languages. This may cause some confusion, and some
security issues.
Some digits that \d
matches look like some of the [0-9] ones, but
have different values. For example, BENGALI DIGIT FOUR (U+09EA) looks
very much like an ASCII DIGIT EIGHT (U+0038). An application that
is expecting only the ASCII digits might be misled, or if the match is
\d+
, the matched string might contain a mixture of digits from
different writing systems that look like they signify a number different
than they actually do. num() in Unicode::UCD can
be used to safely
calculate the value, returning undef if the input string contains
such a mixture.
What \p{Digit}
means (and hence \d
except under the /a
modifier) is \p{General_Category=Decimal_Number}
, or synonymously,
\p{General_Category=Digit}
. Starting with Unicode version 4.1, this
is the same set of characters matched by \p{Numeric_Type=Decimal}
.
But Unicode also has a different property with a similar name,
\p{Numeric_Type=Digit}
, which matches a completely different set of
characters. These characters are things such as CIRCLED DIGIT ONE
or subscripts, or are from writing systems that lack all ten digits.
The design intent is for \d
to exactly match the set of characters
that can safely be used with "normal" big-endian positional decimal
syntax, where, for example 123 means one 'hundred', plus two 'tens',
plus three 'ones'. This positional notation does not necessarily apply
to characters that match the other type of "digit",
\p{Numeric_Type=Digit}
, and so \d
doesn't match them.
The Tamil digits (U+0BE6 - U+0BEF) can also legally be used in old-style Tamil numbers in which they would appear no more than one in a row, separated by characters that mean "times 10", "times 100", etc. (See http://www.unicode.org/notes/tn21.)
Any character not matched by \d
is matched by \D
.
A \w
matches a single alphanumeric character (an alphabetic character, or a
decimal digit); or a connecting punctuation character, such as an
underscore ("_"); or a "mark" character (like some sort of accent) that
attaches to one of those. It does not match a whole word. To match a
whole word, use \w+
. This isn't the same thing as matching an
English word, but in the ASCII range it is the same as a string of
Perl-identifier characters.
/a
modifier is in effect ...
\w
matches the 63 characters [a-zA-Z0-9_].
\w
matches the same as \p{Word}
matches in this range. That is,
it matches Thai letters, Greek letters, etc. This includes connector
punctuation (like the underscore) which connect two words together, or
diacritics, such as a COMBINING TILDE
and the modifier letters, which
are generally used to add auxiliary markings to letters.
Which rules apply are determined as described in Which character set modifier is in effect? in perlre.
There are a number of security issues with the full Unicode list of word characters. See http://unicode.org/reports/tr36.
Also, for a somewhat finer-grained set of characters that are in programming
language identifiers beyond the ASCII range, you may wish to instead use the
more customized Unicode Properties, \p{ID_Start}
,
\p{ID_Continue}
, \p{XID_Start}
, and \p{XID_Continue}
. See
http://unicode.org/reports/tr31.
Any character not matched by \w
is matched by \W
.
\s matches any single character considered whitespace.
/a
modifier is in effect ...
In all Perl versions, \s matches the 5 characters [\t\n\f\r ]; that
is, the horizontal tab,
the newline, the form feed, the carriage return, and the space.
Starting in Perl v5.18, experimentally, it also matches the vertical tab, \cK
.
See note [1]
below for a discussion of this.
\s matches exactly the code points above 255 shown with an "s" column
in the table below.
\s matches whatever the locale considers to be whitespace.
\s matches exactly the characters shown with an "s" column in the
table below.
\s matches [\t\n\f\r\cK ] and, starting, experimentally in Perl
v5.18, the vertical tab, \cK
.
(See note [1]
below for a discussion of this.)
Note that this list doesn't include the non-breaking space.
Which rules apply are determined as described in Which character set modifier is in effect? in perlre.
Any character not matched by \s is matched by \S
.
\h
matches any character considered horizontal whitespace;
this includes the platform's space and tab characters and several others
listed in the table below. \H
matches any character
not considered horizontal whitespace. They use the platform's native
character set, and do not consider any locale that may otherwise be in
use.
\v
matches any character considered vertical whitespace;
this includes the platform's carriage return and line feed characters (newline)
plus several other characters, all listed in the table below.
\V
matches any character not considered vertical whitespace.
They use the platform's native character set, and do not consider any
locale that may otherwise be in use.
\R
matches anything that can be considered a newline under Unicode
rules. It's not a character class, as it can match a multi-character
sequence. Therefore, it cannot be used inside a bracketed character
class; use \v
instead (vertical whitespace). It uses the platform's
native character set, and does not consider any locale that may
otherwise be in use.
Details are discussed in perlrebackslash.
Note that unlike \s (and \d
and \w
), \h
and \v
always match
the same characters, without regard to other factors, such as the active
locale or whether the source string is in UTF-8 format.
One might think that \s is equivalent to [\h\v]
. This is indeed true
starting in Perl v5.18, but prior to that, the sole difference was that the
vertical tab ("\cK"
) was not matched by \s.
The following table is a complete listing of characters matched by
\s, \h
and \v
as of Unicode 6.0.
The first column gives the Unicode code point of the character (in hex format),
the second column gives the (Unicode) name. The third column indicates
by which class(es) the character is matched (assuming no locale is in
effect that changes the \s matching).
- 0x0009 CHARACTER TABULATION h s
- 0x000a LINE FEED (LF) vs
- 0x000b LINE TABULATION vs [1]
- 0x000c FORM FEED (FF) vs
- 0x000d CARRIAGE RETURN (CR) vs
- 0x0020 SPACE h s
- 0x0085 NEXT LINE (NEL) vs [2]
- 0x00a0 NO-BREAK SPACE h s [2]
- 0x1680 OGHAM SPACE MARK h s
- 0x180e MONGOLIAN VOWEL SEPARATOR h s
- 0x2000 EN QUAD h s
- 0x2001 EM QUAD h s
- 0x2002 EN SPACE h s
- 0x2003 EM SPACE h s
- 0x2004 THREE-PER-EM SPACE h s
- 0x2005 FOUR-PER-EM SPACE h s
- 0x2006 SIX-PER-EM SPACE h s
- 0x2007 FIGURE SPACE h s
- 0x2008 PUNCTUATION SPACE h s
- 0x2009 THIN SPACE h s
- 0x200a HAIR SPACE h s
- 0x2028 LINE SEPARATOR vs
- 0x2029 PARAGRAPH SEPARATOR vs
- 0x202f NARROW NO-BREAK SPACE h s
- 0x205f MEDIUM MATHEMATICAL SPACE h s
- 0x3000 IDEOGRAPHIC SPACE h s
Prior to Perl v5.18, \s did not match the vertical tab. The change
in v5.18 is considered an experiment, which means it could be backed out
in v5.20 or v5.22 if experience indicates that it breaks too much
existing code. If this change adversely affects you, send email to
perlbug@perl.org
; if it affects you positively, email
perlthanks@perl.org
. In the meantime, [^\S\cK]
(obscurely)
matches what \s traditionally did.
NEXT LINE and NO-BREAK SPACE may or may not match \s depending
on the rules in effect. See
the beginning of this section.
\pP
and \p{Prop}
are character classes to match characters that fit given
Unicode properties. One letter property names can be used in the \pP
form,
with the property name following the \p
, otherwise, braces are required.
When using braces, there is a single form, which is just the property name
enclosed in the braces, and a compound form which looks like \p{name=value}
,
which means to match if the property "name" for the character has that particular
"value".
For instance, a match for a number can be written as /\pN/
or as
/\p{Number}/
, or as /\p{Number=True}/
.
Lowercase letters are matched by the property Lowercase_Letter which
has the short form Ll. They need the braces, so are written as /\p{Ll}/
or
/\p{Lowercase_Letter}/
, or /\p{General_Category=Lowercase_Letter}/
(the underscores are optional).
/\pLl/
is valid, but means something different.
It matches a two character string: a letter (Unicode property \pL
),
followed by a lowercase l
.
If locale rules are not in effect, the use of a Unicode property will force the regular expression into using Unicode rules, if it isn't already.
Note that almost all properties are immune to case-insensitive matching.
That is, adding a /i regular expression modifier does not change what
they match. There are two sets that are affected. The first set is
Uppercase_Letter
,
Lowercase_Letter
,
and Titlecase_Letter
,
all of which match Cased_Letter
under /i matching.
The second set is
Uppercase
,
Lowercase
,
and Titlecase
,
all of which match Cased
under /i matching.
(The difference between these sets is that some things, such as Roman
numerals, come in both upper and lower case, so they are Cased
, but
aren't considered to be letters, so they aren't Cased_Letter
s. They're
actually Letter_Number
s.)
This set also includes its subsets PosixUpper
and PosixLower
, both
of which under /i match PosixAlpha
.
For more details on Unicode properties, see Unicode Character Properties in perlunicode; for a
complete list of possible properties, see
Properties accessible through \p{} and \P{} in perluniprops,
which notes all forms that have /i differences.
It is also possible to define your own properties. This is discussed in
User-Defined Character Properties in perlunicode.
Unicode properties are defined (surprise!) only on Unicode code points. A warning is raised and all matches fail on non-Unicode code points (those above the legal Unicode maximum of 0x10FFFF). This can be somewhat surprising,
Even though these two matches might be thought of as complements, they are so only on Unicode code points.
- "a" =~ /\w/ # Match, "a" is a 'word' character.
- "7" =~ /\w/ # Match, "7" is a 'word' character as well.
- "a" =~ /\d/ # No match, "a" isn't a digit.
- "7" =~ /\d/ # Match, "7" is a digit.
- " " =~ /\s/ # Match, a space is whitespace.
- "a" =~ /\D/ # Match, "a" is a non-digit.
- "7" =~ /\D/ # No match, "7" is not a non-digit.
- " " =~ /\S/ # No match, a space is not non-whitespace.
- " " =~ /\h/ # Match, space is horizontal whitespace.
- " " =~ /\v/ # No match, space is not vertical whitespace.
- "\r" =~ /\v/ # Match, a return is vertical whitespace.
- "a" =~ /\pL/ # Match, "a" is a letter.
- "a" =~ /\p{Lu}/ # No match, /\p{Lu}/ matches upper case letters.
- "\x{0e0b}" =~ /\p{Thai}/ # Match, \x{0e0b} is the character
- # 'THAI CHARACTER SO SO', and that's in
- # Thai Unicode class.
- "a" =~ /\P{Lao}/ # Match, as "a" is not a Laotian character.
It is worth emphasizing that \d
, \w
, etc, match single characters, not
complete numbers or words. To match a number (that consists of digits),
use \d+
; to match a word, use \w+
. But be aware of the security
considerations in doing so, as mentioned above.
The third form of character class you can use in Perl regular expressions
is the bracketed character class. In its simplest form, it lists the characters
that may be matched, surrounded by square brackets, like this: [aeiou]
.
This matches one of a
, e
, i
, o
or u
. Like the other
character classes, exactly one character is matched.* To match
a longer string consisting of characters mentioned in the character
class, follow the character class with a quantifier. For
instance, [aeiou]+
matches one or more lowercase English vowels.
Repeating a character in a character class has no effect; it's considered to be in the set only once.
Examples:
- "e" =~ /[aeiou]/ # Match, as "e" is listed in the class.
- "p" =~ /[aeiou]/ # No match, "p" is not listed in the class.
- "ae" =~ /^[aeiou]$/ # No match, a character class only matches
- # a single character.
- "ae" =~ /^[aeiou]+$/ # Match, due to the quantifier.
- -------
* There is an exception to a bracketed character class matching a
single character only. When the class is to match caselessly under /i
matching rules, and a character that is explicitly mentioned inside the
class matches a
multiple-character sequence caselessly under Unicode rules, the class
(when not inverted) will also match that sequence. For
example, Unicode says that the letter LATIN SMALL LETTER SHARP S
should match the sequence ss
under /i rules. Thus,
- 'ss' =~ /\A\N{LATIN SMALL LETTER SHARP S}\z/i # Matches
- 'ss' =~ /\A[aeioust\N{LATIN SMALL LETTER SHARP S}]\z/i # Matches
For this to happen, the character must be explicitly specified, and not be part of a multi-character range (not even as one of its endpoints). (Character Ranges will be explained shortly.) Therefore,
- 'ss' =~ /\A[\0-\x{ff}]\z/i # Doesn't match
- 'ss' =~ /\A[\0-\N{LATIN SMALL LETTER SHARP S}]\z/i # No match
- 'ss' =~ /\A[\xDF-\xDF]\z/i # Matches on ASCII platforms, since \XDF
- # is LATIN SMALL LETTER SHARP S, and the
- # range is just a single element
Note that it isn't a good idea to specify these types of ranges anyway.
Most characters that are meta characters in regular expressions (that
is, characters that carry a special meaning like ., *
, or () lose
their special meaning and can be used inside a character class without
the need to escape them. For instance, [()]
matches either an opening
parenthesis, or a closing parenthesis, and the parens inside the character
class don't group or capture.
Characters that may carry a special meaning inside a character class are:
\
, ^, -
, [ and ], and are discussed below. They can be
escaped with a backslash, although this is sometimes not needed, in which
case the backslash may be omitted.
The sequence \b
is special inside a bracketed character class. While
outside the character class, \b
is an assertion indicating a point
that does not have either two word characters or two non-word characters
on either side, inside a bracketed character class, \b
matches a
backspace character.
The sequences
\a
,
\c
,
\e
,
\f
,
\n
,
\N{NAME},
\N{U+hex char},
\r
,
\t
,
and
\x
are also special and have the same meanings as they do outside a
bracketed character class. (However, inside a bracketed character
class, if \N{NAME} expands to a sequence of characters, only the first
one in the sequence is used, with a warning.)
Also, a backslash followed by two or three octal digits is considered an octal number.
A [ is not special inside a character class, unless it's the start of a
POSIX character class (see POSIX Character Classes below). It normally does
not need escaping.
A ] is normally either the end of a POSIX character class (see
POSIX Character Classes below), or it signals the end of the bracketed
character class. If you want to include a ] in the set of characters, you
must generally escape it.
However, if the ] is the first (or the second if the first
character is a caret) character of a bracketed character class, it
does not denote the end of the class (as you cannot have an empty class)
and is considered part of the set of characters that can be matched without
escaping.
Examples:
- "+" =~ /[+?*]/ # Match, "+" in a character class is not special.
- "\cH" =~ /[\b]/ # Match, \b inside in a character class.
- # is equivalent to a backspace.
- "]" =~ /[][]/ # Match, as the character class contains.
- # both [ and ].
- "[]" =~ /[[]]/ # Match, the pattern contains a character class
- # containing just ], and the character class is
- # followed by a ].
It is not uncommon to want to match a range of characters. Luckily, instead
of listing all characters in the range, one may use the hyphen (-
).
If inside a bracketed character class you have two characters separated
by a hyphen, it's treated as if all characters between the two were in
the class. For instance, [0-9]
matches any ASCII digit, and [a-m]
matches any lowercase letter from the first half of the ASCII alphabet.
Note that the two characters on either side of the hyphen are not
necessarily both letters or both digits. Any character is possible,
although not advisable. ['-?] contains a range of characters, but
most people will not know which characters that means. Furthermore,
such ranges may lead to portability problems if the code has to run on
a platform that uses a different character set, such as EBCDIC.
If a hyphen in a character class cannot syntactically be part of a range, for instance because it is the first or the last character of the character class, or if it immediately follows a range, the hyphen isn't special, and so is considered a character to be matched literally. If you want a hyphen in your set of characters to be matched and its position in the class is such that it could be considered part of a range, you must escape that hyphen with a backslash.
Examples:
- [a-z] # Matches a character that is a lower case ASCII letter.
- [a-fz] # Matches any letter between 'a' and 'f' (inclusive) or
- # the letter 'z'.
- [-z] # Matches either a hyphen ('-') or the letter 'z'.
- [a-f-m] # Matches any letter between 'a' and 'f' (inclusive), the
- # hyphen ('-'), or the letter 'm'.
- ['-?] # Matches any of the characters '()*+,-./0123456789:;<=>?
- # (But not on an EBCDIC platform).
It is also possible to instead list the characters you do not want to
match. You can do so by using a caret (^) as the first character in the
character class. For instance, [^a-z] matches any character that is not a
lowercase ASCII letter, which therefore includes more than a million
Unicode code points. The class is said to be "negated" or "inverted".
This syntax make the caret a special character inside a bracketed character class, but only if it is the first character of the class. So if you want the caret as one of the characters to match, either escape the caret or else don't list it first.
In inverted bracketed character classes, Perl ignores the Unicode rules
that normally say that certain characters should match a sequence of
multiple characters under caseless /i matching. Following those
rules could lead to highly confusing situations:
- "ss" =~ /^[^\xDF]+$/ui; # Matches!
This should match any sequences of characters that aren't \xDF
nor
what \xDF
matches under /i. "s"
isn't \xDF
, but Unicode
says that "ss"
is what \xDF
matches under /i. So which one
"wins"? Do you fail the match because the string has ss
or accept it
because it has an s followed by another s? Perl has chosen the
latter.
Examples:
- "e" =~ /[^aeiou]/ # No match, the 'e' is listed.
- "x" =~ /[^aeiou]/ # Match, as 'x' isn't a lowercase vowel.
- "^" =~ /[^^]/ # No match, matches anything that isn't a caret.
- "^" =~ /[x^]/ # Match, caret is not special here.
You can put any backslash sequence character class (with the exception of
\N
and \R
) inside a bracketed character class, and it will act just
as if you had put all characters matched by the backslash sequence inside the
character class. For instance, [a-f\d]
matches any decimal digit, or any
of the lowercase letters between 'a' and 'f' inclusive.
\N
within a bracketed character class must be of the forms \N{name}
or \N{U+hex char}, and NOT be the form that matches non-newlines,
for the same reason that a dot . inside a bracketed character class loses
its special meaning: it matches nearly anything, which generally isn't what you
want to happen.
Examples:
- /[\p{Thai}\d]/ # Matches a character that is either a Thai
- # character, or a digit.
- /[^\p{Arabic}()]/ # Matches a character that is neither an Arabic
- # character, nor a parenthesis.
Backslash sequence character classes cannot form one of the endpoints of a range. Thus, you can't say:
- /[\p{Thai}-\d]/ # Wrong!
POSIX character classes have the form [:class:], where class is
name, and the [: and :] delimiters. POSIX character classes only appear
inside bracketed character classes, and are a convenient and descriptive
way of listing a group of characters.
Be careful about the syntax,
- # Correct:
- $string =~ /[[:alpha:]]/
- # Incorrect (will warn):
- $string =~ /[:alpha:]/
The latter pattern would be a character class consisting of a colon,
and the letters a
, l
, p
and h
.
POSIX character classes can be part of a larger bracketed character class.
For example,
- [01[:alpha:]%]
is valid and matches '0', '1', any alphabetic character, and the percent sign.
Perl recognizes the following POSIX character classes:
- alpha Any alphabetical character ("[A-Za-z]").
- alnum Any alphanumeric character ("[A-Za-z0-9]").
- ascii Any character in the ASCII character set.
- blank A GNU extension, equal to a space or a horizontal tab ("\t").
- cntrl Any control character. See Note [2] below.
- digit Any decimal digit ("[0-9]"), equivalent to "\d".
- graph Any printable character, excluding a space. See Note [3] below.
- lower Any lowercase character ("[a-z]").
- print Any printable character, including a space. See Note [4] below.
- punct Any graphical character excluding "word" characters. Note [5].
- space Any whitespace character. "\s" including the vertical tab
- ("\cK").
- upper Any uppercase character ("[A-Z]").
- word A Perl extension ("[A-Za-z0-9_]"), equivalent to "\w".
- xdigit Any hexadecimal digit ("[0-9a-fA-F]").
Most POSIX character classes have two Unicode-style \p
property
counterparts. (They are not official Unicode properties, but Perl extensions
derived from official Unicode properties.) The table below shows the relation
between POSIX character classes and these counterparts.
One counterpart, in the column labelled "ASCII-range Unicode" in the table, matches only characters in the ASCII character set.
The other counterpart, in the column labelled "Full-range Unicode", matches any
appropriate characters in the full Unicode character set. For example,
\p{Alpha}
matches not just the ASCII alphabetic characters, but any
character in the entire Unicode character set considered alphabetic.
An entry in the column labelled "backslash sequence" is a (short)
equivalent.
- [[:...:]] ASCII-range Full-range backslash Note
- Unicode Unicode sequence
- -----------------------------------------------------
- alpha \p{PosixAlpha} \p{XPosixAlpha}
- alnum \p{PosixAlnum} \p{XPosixAlnum}
- ascii \p{ASCII}
- blank \p{PosixBlank} \p{XPosixBlank} \h [1]
- or \p{HorizSpace} [1]
- cntrl \p{PosixCntrl} \p{XPosixCntrl} [2]
- digit \p{PosixDigit} \p{XPosixDigit} \d
- graph \p{PosixGraph} \p{XPosixGraph} [3]
- lower \p{PosixLower} \p{XPosixLower}
- print \p{PosixPrint} \p{XPosixPrint} [4]
- punct \p{PosixPunct} \p{XPosixPunct} [5]
- \p{PerlSpace} \p{XPerlSpace} \s [6]
- space \p{PosixSpace} \p{XPosixSpace} [6]
- upper \p{PosixUpper} \p{XPosixUpper}
- word \p{PosixWord} \p{XPosixWord} \w
- xdigit \p{PosixXDigit} \p{XPosixXDigit}
\p{Blank}
and \p{HorizSpace}
are synonyms.
Control characters don't produce output as such, but instead usually control
the terminal somehow: for example, newline and backspace are control characters.
In the ASCII range, characters whose code points are between 0 and 31 inclusive,
plus 127 (DEL
) are control characters.
Any character that is graphical, that is, visible. This class consists of all alphanumeric characters and all punctuation characters.
All printable characters, which is the set of all graphical characters plus those whitespace characters which are not also controls.
\p{PosixPunct}
and [[:punct:]] in the ASCII range match all
non-controls, non-alphanumeric, non-space characters:
[-!"#$%&'()*+,./:;<=>?@[\\\]^_`{|}~] (although if a locale is in effect,
it could alter the behavior of [[:punct:]]).
The similarly named property, \p{Punct}
, matches a somewhat different
set in the ASCII range, namely
[-!"#%&'()*,./:;?@[\\\]_{}]
. That is, it is missing the nine
characters [$+<=>^`|~]
.
This is because Unicode splits what POSIX considers to be punctuation into two
categories, Punctuation and Symbols.
\p{XPosixPunct}
and (under Unicode rules) [[:punct:]], match what
\p{PosixPunct}
matches in the ASCII range, plus what \p{Punct}
matches. This is different than strictly matching according to
\p{Punct}
. Another way to say it is that
if Unicode rules are in effect, [[:punct:]] matches all characters
that Unicode considers punctuation, plus all ASCII-range characters that
Unicode considers symbols.
\p{SpacePerl}
and \p{Space}
match identically starting with Perl
v5.18. In earlier versions, these differ only in that in non-locale
matching, \p{SpacePerl}
does not match the vertical tab, \cK
.
Same for the two ASCII-only range forms.
There are various other synonyms that can be used besides the names
listed in the table. For example, \p{PosixAlpha}
can be written as
\p{Alpha}
. All are listed in
Properties accessible through \p{} and \P{} in perluniprops,
plus all characters matched by each ASCII-range property.
Both the \p
counterparts always assume Unicode rules are in effect.
On ASCII platforms, this means they assume that the code points from 128
to 255 are Latin-1, and that means that using them under locale rules is
unwise unless the locale is guaranteed to be Latin-1 or UTF-8. In contrast, the
POSIX character classes are useful under locale rules. They are
affected by the actual rules in effect, as follows:
/a
modifier, is in effect ...
Each of the POSIX classes matches exactly the same as their ASCII-range counterparts.
The POSIX class matches the same as its Full-range counterpart.
The POSIX class matches according to the locale, except that
word
uses the platform's native underscore character, no matter what
the locale is.
The POSIX class matches the same as the Full-range counterpart.
The POSIX class matches the same as the ASCII range counterpart.
Which rules apply are determined as described in Which character set modifier is in effect? in perlre.
It is proposed to change this behavior in a future release of Perl so that
whether or not Unicode rules are in effect would not change the
behavior: Outside of locale, the POSIX classes
would behave like their ASCII-range counterparts. If you wish to
comment on this proposal, send email to perl5-porters@perl.org
.
A Perl extension to the POSIX character class is the ability to
negate it. This is done by prefixing the class name with a caret (^).
Some examples:
- POSIX ASCII-range Full-range backslash
- Unicode Unicode sequence
- -----------------------------------------------------
- [[:^digit:]] \P{PosixDigit} \P{XPosixDigit} \D
- [[:^space:]] \P{PosixSpace} \P{XPosixSpace}
- \P{PerlSpace} \P{XPerlSpace} \S
- [[:^word:]] \P{PerlWord} \P{XPosixWord} \W
The backslash sequence can mean either ASCII- or Full-range Unicode, depending on various factors as described in Which character set modifier is in effect? in perlre.
Perl recognizes the POSIX character classes [=class=]
and
[.class.], but does not (yet?) support them. Any attempt to use
either construct raises an exception.
- /[[:digit:]]/ # Matches a character that is a digit.
- /[01[:lower:]]/ # Matches a character that is either a
- # lowercase letter, or '0' or '1'.
- /[[:digit:][:^xdigit:]]/ # Matches a character that can be anything
- # except the letters 'a' to 'f' and 'A' to
- # 'F'. This is because the main character
- # class is composed of two POSIX character
- # classes that are ORed together, one that
- # matches any digit, and the other that
- # matches anything that isn't a hex digit.
- # The OR adds the digits, leaving only the
- # letters 'a' to 'f' and 'A' to 'F' excluded.
This is a fancy bracketed character class that can be used for more readable and less error-prone classes, and to perform set operations, such as intersection. An example is
- /(?[ \p{Thai} & \p{Digit} ])/
This will match all the digit characters that are in the Thai script.
This is an experimental feature available starting in 5.18, and is subject to change as we gain field experience with it. Any attempt to use it will raise a warning, unless disabled via
- no warnings "experimental::regex_sets";
Comments on this feature are welcome; send email to
perl5-porters@perl.org
.
We can extend the example above:
- /(?[ ( \p{Thai} + \p{Lao} ) & \p{Digit} ])/
This matches digits that are in either the Thai or Laotian scripts.
Notice the white space in these examples. This construct always has
the /x modifier turned on.
The available binary operators are:
- & intersection
- + union
- | another name for '+', hence means union
- - subtraction (the result matches the set consisting of those
- code points matched by the first operand, excluding any that
- are also matched by the second operand)
- ^ symmetric difference (the union minus the intersection). This
- is like an exclusive or, in that the result is the set of code
- points that are matched by either, but not both, of the
- operands.
There is one unary operator:
- ! complement
All the binary operators left associate, and are of equal precedence. The unary operator right associates, and has higher precedence. Use parentheses to override the default associations. Some feedback we've received indicates a desire for intersection to have higher precedence than union. This is something that feedback from the field may cause us to change in future releases; you may want to parenthesize copiously to avoid such changes affecting your code, until this feature is no longer considered experimental.
The main restriction is that everything is a metacharacter. Thus, you cannot refer to single characters by doing something like this:
- /(?[ a + b ])/ # Syntax error!
The easiest way to specify an individual typable character is to enclose it in brackets:
- /(?[ [a] + [b] ])/
(This is the same thing as [ab]
.) You could also have said the
equivalent:
- /(?[[ a b ]])/
(You can, of course, specify single characters by using, \x{ }
,
\N{ }
, etc.)
This last example shows the use of this construct to specify an ordinary
bracketed character class without additional set operations. Note the
white space within it; /x is turned on even within bracketed
character classes, except you can't have comments inside them. Hence,
- (?[ [#] ])
matches the literal character "#". To specify a literal white space character, you can escape it with a backslash, like:
- /(?[ [ a e i o u \ ] ])/
This matches the English vowels plus the SPACE character. All the other escapes accepted by normal bracketed character classes are accepted here as well; but unrecognized escapes that generate warnings in normal classes are fatal errors here.
All warnings from these class elements are fatal, as well as some practices that don't currently warn. For example you cannot say
- /(?[ [ \xF ] ])/ # Syntax error!
You have to have two hex digits after a braceless \x
(use a leading
zero to make two). These restrictions are to lower the incidence of
typos causing the class to not match what you thought it would.
The final difference between regular bracketed character classes and these, is that it is not possible to get these to match a multi-character fold. Thus,
- /(?[ [\xDF] ])/iu
does not match the string ss
.
You don't have to enclose POSIX class names inside double brackets, hence both of the following work:
- /(?[ [:word:] - [:lower:] ])/
- /(?[ [[:word:]] - [[:lower:]] ])/
Any contained POSIX character classes, including things like \w
and \D
respect the /a
(and /aa
) modifiers.
(?[ ])
is a regex-compile-time construct. Any attempt to use
something which isn't knowable at the time the containing regular
expression is compiled is a fatal error. In practice, this means
just three limitiations:
This construct cannot be used within the scope of
use locale
(or the /l
regex modifier).
Any user-defined property used must be already defined by the time the regular expression is compiled (but note that this construct can be used instead of such properties).
A regular expression that otherwise would compile
using /d rules, and which uses this construct will instead
use /u
. Thus this construct tells Perl that you don't want
/d rules for the entire regular expression containing it.
The /x processing within this class is an extended form.
Besides the characters that are considered white space in normal /x
processing, there are 5 others, recommended by the Unicode standard:
- U+0085 NEXT LINE
- U+200E LEFT-TO-RIGHT MARK
- U+200F RIGHT-TO-LEFT MARK
- U+2028 LINE SEPARATOR
- U+2029 PARAGRAPH SEPARATOR
Note that skipping white space applies only to the interior of this
construct. There must not be any space between any of the characters
that form the initial (?[
. Nor may there be space between the
closing ])
characters.
Just as in all regular expressions, the pattern can can be built up by including variables that are interpolated at regex compilation time. Care must be taken to ensure that you are getting what you expect. For example:
- my $thai_or_lao = '\p{Thai} + \p{Lao}';
- ...
- qr/(?[ \p{Digit} & $thai_or_lao ])/;
compiles to
- qr/(?[ \p{Digit} & \p{Thai} + \p{Lao} ])/;
But this does not have the effect that someone reading the code would
likely expect, as the intersection applies just to \p{Thai}
,
excluding the Laotian. Pitfalls like this can be avoided by
parenthesizing the component pieces:
- my $thai_or_lao = '( \p{Thai} + \p{Lao} )';
But any modifiers will still apply to all the components:
- my $lower = '\p{Lower} + \p{Digit}';
- qr/(?[ \p{Greek} & $lower ])/i;
matches upper case things. You can avoid surprises by making the components into instances of this construct by compiling them:
When these are embedded in another pattern, what they match does not change, regardless of parenthesization or what modifiers are in effect in that outer pattern.
Due to the way that Perl parses things, your parentheses and brackets
may need to be balanced, even including comments. If you run into any
examples, please send them to perlbug@perl.org
, so that we can have a
concrete example for this man page.
We may change it so that things that remain legal uses in normal bracketed
character classes might become illegal within this experimental
construct. One proposal, for example, is to forbid adjacent uses of the
same character, as in (?[ [aa] ])
. The motivation for such a change
is that this usage is likely a typo, as the second "a" adds nothing.
perlref - Perl references and nested data structures
This is complete documentation about all aspects of references. For a shorter, tutorial introduction to just the essential features, see perlreftut.
Before release 5 of Perl it was difficult to represent complex data structures, because all references had to be symbolic--and even then it was difficult to refer to a variable instead of a symbol table entry. Perl now not only makes it easier to use symbolic references to variables, but also lets you have "hard" references to any piece of data or code. Any scalar may hold a hard reference. Because arrays and hashes contain scalars, you can now easily build arrays of arrays, arrays of hashes, hashes of arrays, arrays of hashes of functions, and so on.
Hard references are smart--they keep track of reference counts for you, automatically freeing the thing referred to when its reference count goes to zero. (Reference counts for values in self-referential or cyclic data structures may not go to zero without a little help; see Circular References for a detailed explanation.) If that thing happens to be an object, the object is destructed. See perlobj for more about objects. (In a sense, everything in Perl is an object, but we usually reserve the word for references to objects that have been officially "blessed" into a class package.)
Symbolic references are names of variables or other objects, just as a
symbolic link in a Unix filesystem contains merely the name of a file.
The *glob
notation is something of a symbolic reference. (Symbolic
references are sometimes called "soft references", but please don't call
them that; references are confusing enough without useless synonyms.)
In contrast, hard references are more like hard links in a Unix file system: They are used to access an underlying object without concern for what its (other) name is. When the word "reference" is used without an adjective, as in the following paragraph, it is usually talking about a hard reference.
References are easy to use in Perl. There is just one overriding principle: in general, Perl does no implicit referencing or dereferencing. When a scalar is holding a reference, it always behaves as a simple scalar. It doesn't magically start being an array or hash or subroutine; you have to tell it explicitly to do so, by dereferencing it.
That said, be aware that Perl version 5.14 introduces an exception to the rule, for syntactic convenience. Experimental array and hash container function behavior allows array and hash references to be handled by Perl as if they had been explicitly syntactically dereferenced. See Syntactical Enhancements in perl5140delta and perlfunc for details.
References can be created in several ways.
By using the backslash operator on a variable, subroutine, or value. (This works much like the & (address-of) operator in C.) This typically creates another reference to a variable, because there's already a reference to the variable in the symbol table. But the symbol table reference might go away, and you'll still have the reference that the backslash returned. Here are some examples:
- $scalarref = \$foo;
- $arrayref = \@ARGV;
- $hashref = \%ENV;
- $coderef = \&handler;
- $globref = \*foo;
It isn't possible to create a true reference to an IO handle (filehandle
or dirhandle) using the backslash operator. The most you can get is a
reference to a typeglob, which is actually a complete symbol table entry.
But see the explanation of the *foo{THING}
syntax below. However,
you can still use type globs and globrefs as though they were IO handles.
A reference to an anonymous array can be created using square brackets:
- $arrayref = [1, 2, ['a', 'b', 'c']];
Here we've created a reference to an anonymous array of three elements
whose final element is itself a reference to another anonymous array of three
elements. (The multidimensional syntax described later can be used to
access this. For example, after the above, $arrayref->[2][1]
would have
the value "b".)
Taking a reference to an enumerated list is not the same as using square brackets--instead it's the same as creating a list of references!
- @list = (\$a, \@b, \%c);
- @list = \($a, @b, %c); # same thing!
As a special case, \(@foo)
returns a list of references to the contents
of @foo
, not a reference to @foo
itself. Likewise for %foo
,
except that the key references are to copies (since the keys are just
strings rather than full-fledged scalars).
A reference to an anonymous hash can be created using curly brackets:
- $hashref = {
- 'Adam' => 'Eve',
- 'Clyde' => 'Bonnie',
- };
Anonymous hash and array composers like these can be intermixed freely to produce as complicated a structure as you want. The multidimensional syntax described below works for these too. The values above are literals, but variables and expressions would work just as well, because assignment operators in Perl (even within local() or my()) are executable statements, not compile-time declarations.
Because curly brackets (braces) are used for several other things
including BLOCKs, you may occasionally have to disambiguate braces at the
beginning of a statement by putting a +
or a return in front so
that Perl realizes the opening brace isn't starting a BLOCK. The economy and
mnemonic value of using curlies is deemed worth this occasional extra
hassle.
For example, if you wanted a function to make a new hash and return a reference to it, you have these options:
- sub hashem { { @_ } } # silently wrong
- sub hashem { +{ @_ } } # ok
- sub hashem { return { @_ } } # ok
On the other hand, if you want the other meaning, you can do this:
- sub showem { { @_ } } # ambiguous (currently ok, but may change)
- sub showem { {; @_ } } # ok
- sub showem { { return @_ } } # ok
The leading +{ and {; always serve to disambiguate
the expression to mean either the HASH reference, or the BLOCK.
A reference to an anonymous subroutine can be created by using
sub without a subname:
Note the semicolon. Except for the code
inside not being immediately executed, a sub {}
is not so much a
declaration as it is an operator, like do{} or eval{}. (However, no
matter how many times you execute that particular line (unless you're in an
eval("...")), $coderef will still have a reference to the same
anonymous subroutine.)
Anonymous subroutines act as closures with respect to my() variables, that is, variables lexically visible within the current scope. Closure is a notion out of the Lisp world that says if you define an anonymous function in a particular lexical context, it pretends to run in that context even when it's called outside the context.
In human terms, it's a funny way of passing arguments to a subroutine when you define it as well as when you call it. It's useful for setting up little bits of code to run later, such as callbacks. You can even do object-oriented stuff with it, though Perl already provides a different mechanism to do that--see perlobj.
You might also think of closure as a way to write a subroutine template without using eval(). Here's a small example of how closures work:
This prints
- Howdy, world!
- Greetings, earthlings!
Note particularly that $x continues to refer to the value passed into newprint() despite "my $x" having gone out of scope by the time the anonymous subroutine runs. That's what a closure is all about.
This applies only to lexical variables, by the way. Dynamic variables continue to work as they have always worked. Closure is not something that most Perl programmers need trouble themselves about to begin with.
References are often returned by special subroutines called constructors. Perl
objects are just references to a special type of object that happens to know
which package it's associated with. Constructors are just special subroutines
that know how to create that association. They do so by starting with an
ordinary reference, and it remains an ordinary reference even while it's also
being an object. Constructors are often named new()
. You can call them
indirectly:
- $objref = new Doggie( Tail => 'short', Ears => 'long' );
But that can produce ambiguous syntax in certain cases, so it's often better to use the direct method invocation approach:
References of the appropriate type can spring into existence if you dereference them in a context that assumes they exist. Because we haven't talked about dereferencing yet, we can't show you any examples yet.
A reference can be created by using a special syntax, lovingly known as the *foo{THING} syntax. *foo{THING} returns a reference to the THING slot in *foo (which is the symbol table entry which holds everything known as foo).
- $scalarref = *foo{SCALAR};
- $arrayref = *ARGV{ARRAY};
- $hashref = *ENV{HASH};
- $coderef = *handler{CODE};
- $ioref = *STDIN{IO};
- $globref = *foo{GLOB};
- $formatref = *foo{FORMAT};
- $globname = *foo{NAME}; # "foo"
- $pkgname = *foo{PACKAGE}; # "main"
Most of these are self-explanatory, but *foo{IO}
deserves special attention. It returns
the IO handle, used for file handles (open), sockets
(socket and socketpair), and directory
handles (opendir). For compatibility with previous
versions of Perl, *foo{FILEHANDLE}
is a synonym for *foo{IO}
, though it
is deprecated as of 5.8.0. If deprecation warnings are in effect, it will warn
of its use.
*foo{THING}
returns undef if that particular THING hasn't been used yet,
except in the case of scalars. *foo{SCALAR}
returns a reference to an
anonymous scalar if $foo hasn't been used yet. This might change in a
future release.
*foo{NAME}
and *foo{PACKAGE}
are the exception, in that they return
strings, rather than references. These return the package and name of the
typeglob itself, rather than one that has been assigned to it. So, after
*foo=*Foo::bar
, *foo
will become "*Foo::bar" when used as a string,
but *foo{PACKAGE}
and *foo{NAME}
will continue to produce "main" and
"foo", respectively.
*foo{IO}
is an alternative to the *HANDLE
mechanism given in
Typeglobs and Filehandles in perldata for passing filehandles
into or out of subroutines, or storing into larger data structures.
Its disadvantage is that it won't create a new filehandle for you.
Its advantage is that you have less risk of clobbering more than
you want to with a typeglob assignment. (It still conflates file
and directory handles, though.) However, if you assign the incoming
value to a scalar instead of a typeglob as we do in the examples
below, there's no risk of that happening.
- splutter(*STDOUT); # pass the whole glob
- splutter(*STDOUT{IO}); # pass both file and dir handles
- sub splutter {
- my $fh = shift;
- print $fh "her um well a hmmm\n";
- }
- $rec = get_rec(*STDIN); # pass the whole glob
- $rec = get_rec(*STDIN{IO}); # pass both file and dir handles
- sub get_rec {
- my $fh = shift;
- return scalar <$fh>;
- }
That's it for creating references. By now you're probably dying to know how to use references to get back to your long-lost data. There are several basic methods.
Anywhere you'd put an identifier (or chain of identifiers) as part of a variable or subroutine name, you can replace the identifier with a simple scalar variable containing a reference of the correct type:
It's important to understand that we are specifically not dereferencing
$arrayref[0]
or $hashref{"KEY"}
there. The dereference of the
scalar variable happens before it does any key lookups. Anything more
complicated than a simple scalar variable must use methods 2 or 3 below.
However, a "simple scalar" includes an identifier that itself uses method
1 recursively. Therefore, the following prints "howdy".
- $refrefref = \\\"howdy";
- print $$$$refrefref;
Anywhere you'd put an identifier (or chain of identifiers) as part of a variable or subroutine name, you can replace the identifier with a BLOCK returning a reference of the correct type. In other words, the previous examples could be written like this:
- $bar = ${$scalarref};
- push(@{$arrayref}, $filename);
- ${$arrayref}[0] = "January";
- ${$hashref}{"KEY"} = "VALUE";
- &{$coderef}(1,2,3);
- $globref->print("output\n"); # iff IO::Handle is loaded
Admittedly, it's a little silly to use the curlies in this case, but the BLOCK can contain any arbitrary expression, in particular, subscripted expressions:
- &{ $dispatch{$index} }(1,2,3); # call correct routine
Because of being able to omit the curlies for the simple case of $$x
,
people often make the mistake of viewing the dereferencing symbols as
proper operators, and wonder about their precedence. If they were,
though, you could use parentheses instead of braces. That's not the case.
Consider the difference below; case 0 is a short-hand version of case 1,
not case 2:
- $$hashref{"KEY"} = "VALUE"; # CASE 0
- ${$hashref}{"KEY"} = "VALUE"; # CASE 1
- ${$hashref{"KEY"}} = "VALUE"; # CASE 2
- ${$hashref->{"KEY"}} = "VALUE"; # CASE 3
Case 2 is also deceptive in that you're accessing a variable called %hashref, not dereferencing through $hashref to the hash it's presumably referencing. That would be case 3.
Subroutine calls and lookups of individual array elements arise often enough that it gets cumbersome to use method 2. As a form of syntactic sugar, the examples for method 2 may be written:
- $arrayref->[0] = "January"; # Array element
- $hashref->{"KEY"} = "VALUE"; # Hash element
- $coderef->(1,2,3); # Subroutine call
The left side of the arrow can be any expression returning a reference,
including a previous dereference. Note that $array[$x]
is not the
same thing as $array->[$x]
here:
- $array[$x]->{"foo"}->[0] = "January";
This is one of the cases we mentioned earlier in which references could
spring into existence when in an lvalue context. Before this
statement, $array[$x]
may have been undefined. If so, it's
automatically defined with a hash reference so that we can look up
{"foo"}
in it. Likewise $array[$x]->{"foo"}
will automatically get
defined with an array reference so that we can look up [0]
in it.
This process is called autovivification.
One more thing here. The arrow is optional between brackets subscripts, so you can shrink the above down to
- $array[$x]{"foo"}[0] = "January";
Which, in the degenerate case of using only ordinary arrays, gives you multidimensional arrays just like C's:
- $score[$x][$y][$z] += 42;
Well, okay, not entirely like C's arrays, actually. C doesn't know how to grow its arrays on demand. Perl does.
If a reference happens to be a reference to an object, then there are probably methods to access the things referred to, and you should probably stick to those methods unless you're in the class package that defines the object's methods. In other words, be nice, and don't violate the object's encapsulation without a very good reason. Perl does not enforce encapsulation. We are not totalitarians here. We do expect some basic civility though.
Using a string or number as a reference produces a symbolic reference, as explained above. Using a reference as a number produces an integer representing its storage location in memory. The only useful thing to be done with this is to compare two references numerically to see whether they refer to the same location.
- if ($ref1 == $ref2) { # cheap numeric compare of references
- print "refs 1 and 2 refer to the same thing\n";
- }
Using a reference as a string produces both its referent's type, including any package blessing as described in perlobj, as well as the numeric address expressed in hex. The ref() operator returns just the type of thing the reference is pointing to, without the address. See ref for details and examples of its use.
The bless() operator may be used to associate the object a reference points to with a package functioning as an object class. See perlobj.
A typeglob may be dereferenced the same way a reference can, because
the dereference syntax always indicates the type of reference desired.
So ${*foo}
and ${\$foo}
both indicate the same scalar variable.
Here's a trick for interpolating a subroutine call into a string:
- print "My sub returned @{[mysub(1,2,3)]} that time.\n";
The way it works is that when the @{...}
is seen in the double-quoted
string, it's evaluated as a block. The block creates a reference to an
anonymous array containing the results of the call to mysub(1,2,3)
. So
the whole block returns a reference to an array, which is then
dereferenced by @{...}
and stuck into the double-quoted string. This
chicanery is also useful for arbitrary expressions:
- print "That yields @{[$n + 5]} widgets\n";
Similarly, an expression that returns a reference to a scalar can be
dereferenced via ${...}
. Thus, the above expression may be written
as:
- print "That yields ${\($n + 5)} widgets\n";
It is possible to create a "circular reference" in Perl, which can lead to memory leaks. A circular reference occurs when two references contain a reference to each other, like this:
You can also create a circular reference with a single variable:
- my $foo;
- $foo = \$foo;
In this case, the reference count for the variables will never reach 0, and the references will never be garbage-collected. This can lead to memory leaks.
Because objects in Perl are implemented as references, it's possible to have circular references with objects as well. Imagine a TreeNode class where each node references its parent and child nodes. Any node with a parent will be part of a circular reference.
You can break circular references by creating a "weak reference". A
weak reference does not increment the reference count for a variable,
which means that the object can go out of scope and be destroyed. You
can weaken a reference with the weaken
function exported by the
Scalar::Util module.
Here's how we can make the first example safer:
The reference from $foo
to $bar
has been weakened. When the
$bar
variable goes out of scope, it will be garbage-collected. The
next time you look at the value of the $foo->{bar}
key, it will
be undef.
This action at a distance can be confusing, so you should be careful with your use of weaken. You should weaken the reference in the variable that will go out of scope first. That way, the longer-lived variable will contain the expected reference until it goes out of scope.
We said that references spring into existence as necessary if they are undefined, but we didn't say what happens if a value used as a reference is already defined, but isn't a hard reference. If you use it as a reference, it'll be treated as a symbolic reference. That is, the value of the scalar is taken to be the name of a variable, rather than a direct link to a (possibly) anonymous value.
People frequently expect it to work like this. So it does.
- $name = "foo";
- $$name = 1; # Sets $foo
- ${$name} = 2; # Sets $foo
- ${$name x 2} = 3; # Sets $foofoo
- $name->[0] = 4; # Sets $foo[0]
- @$name = (); # Clears @foo
- &$name(); # Calls &foo()
- $pack = "THAT";
- ${"${pack}::$name"} = 5; # Sets $THAT::foo without eval
This is powerful, and slightly dangerous, in that it's possible to intend (with the utmost sincerity) to use a hard reference, and accidentally use a symbolic reference instead. To protect against that, you can say
- use strict 'refs';
and then only hard references will be allowed for the rest of the enclosing block. An inner block may countermand that with
- no strict 'refs';
Only package variables (globals, even if localized) are visible to symbolic references. Lexical variables (declared with my()) aren't in a symbol table, and thus are invisible to this mechanism. For example:
This will still print 10, not 20. Remember that local() affects package variables, which are all "global" to the package.
Brackets around a symbolic reference can simply serve to isolate an identifier or variable name from the rest of an expression, just as they always have within a string. For example,
- $push = "pop on ";
- print "${push}over";
has always meant to print "pop on over", even though push is a reserved word. This is generalized to work the same without the enclosing double quotes, so that
- print ${push} . "over";
and even
- print ${ push } . "over";
will have the same effect. This construct is not considered to be a symbolic reference when you're using strict refs:
- use strict 'refs';
- ${ bareword }; # Okay, means $bareword.
- ${ "bareword" }; # Error, symbolic reference.
Similarly, because of all the subscripting that is done using single words, the same rule applies to any bareword that is used for subscripting a hash. So now, instead of writing
- $array{ "aaa" }{ "bbb" }{ "ccc" }
you can write just
- $array{ aaa }{ bbb }{ ccc }
and not worry about whether the subscripts are reserved words. In the rare event that you do wish to do something like
- $array{ shift }
you can force interpretation as a reserved word by adding anything that makes it more than a bareword:
- $array{ shift() }
- $array{ +shift }
- $array{ shift @_ }
The use warnings
pragma or the -w switch will warn you if it
interprets a reserved word as a string.
But it will no longer warn you about using lowercase words, because the
string is effectively quoted.
Pseudo-hashes have been removed from Perl. The 'fields' pragma remains available.
As explained above, an anonymous function with access to the lexical variables visible when that function was compiled, creates a closure. It retains access to those variables even though it doesn't get run until later, such as in a signal handler or a Tk callback.
Using a closure as a function template allows us to generate many functions that act similarly. Suppose you wanted functions named after the colors that generated HTML font changes for the various colors:
- print "Be ", red("careful"), "with that ", green("light");
The red() and green() functions would be similar. To create these, we'll assign a closure to a typeglob of the name of the function we're trying to build.
Now all those different functions appear to exist independently. You can
call red(), RED(), blue(), BLUE(), green(), etc. This technique saves on
both compile time and memory use, and is less error-prone as well, since
syntax checks happen at compile time. It's critical that any variables in
the anonymous subroutine be lexicals in order to create a proper closure.
That's the reasons for the my on the loop iteration variable.
This is one of the only places where giving a prototype to a closure makes much sense. If you wanted to impose scalar context on the arguments of these functions (probably not a wise idea for this particular example), you could have written it this way instead:
- *$name = sub ($) { "<FONT COLOR='$name'>$_[0]</FONT>" };
However, since prototype checking happens at compile time, the assignment above happens too late to be of much use. You could address this by putting the whole loop of assignments within a BEGIN block, forcing it to occur during compilation.
Access to lexicals that change over time--like those in the for
loop
above, basically aliases to elements from the surrounding lexical scopes--
only works with anonymous subs, not with named subroutines. Generally
said, named subroutines do not nest properly and should only be declared
in the main package scope.
This is because named subroutines are created at compile time so their lexical variables get assigned to the parent lexicals from the first execution of the parent block. If a parent scope is entered a second time, its lexicals are created again, while the nested subs still reference the old ones.
Anonymous subroutines get to capture each time you execute the sub
operator, as they are created on the fly. If you are accustomed to using
nested subroutines in other programming languages with their own private
variables, you'll have to work at it a bit in Perl. The intuitive coding
of this type of thing incurs mysterious warnings about "will not stay
shared" due to the reasons explained above.
For example, this won't work:
A work-around is the following:
Now inner() can only be called from within outer(), because of the temporary assignments of the anonymous subroutine. But when it does, it has normal access to the lexical variable $x from the scope of outer() at the time outer is invoked.
This has the interesting effect of creating a function local to another function, something not normally supported in Perl.
You may not (usefully) use a reference as the key to a hash. It will be converted into a string:
- $x{ \$a } = $a;
If you try to dereference the key, it won't do a hard dereference, and you won't accomplish what you're attempting. You might want to do something more like
- $r = \@a;
- $x{ $r } = $r;
And then at least you can use the values(), which will be real refs, instead of the keys(), which won't.
The standard Tie::RefHash module provides a convenient workaround to this.
Besides the obvious documents, source code can be instructive. Some pathological examples of the use of references can be found in the t/op/ref.t regression test in the Perl source directory.
See also perldsc and perllol for how to use references to create complex data structures, and perlootut and perlobj for how to use them to create objects.
perlreftut - Mark's very short tutorial about references
One of the most important new features in Perl 5 was the capability to manage complicated data structures like multidimensional arrays and nested hashes. To enable these, Perl 5 introduced a feature called 'references', and using references is the key to managing complicated, structured data in Perl. Unfortunately, there's a lot of funny syntax to learn, and the main manual page can be hard to follow. The manual is quite complete, and sometimes people find that a problem, because it can be hard to tell what is important and what isn't.
Fortunately, you only need to know 10% of what's in the main page to get 90% of the benefit. This page will show you that 10%.
One problem that comes up all the time is needing a hash whose values are lists. Perl has hashes, of course, but the values have to be scalars; they can't be lists.
Why would you want a hash of lists? Let's take a simple example: You have a file of city and country names, like this:
- Chicago, USA
- Frankfurt, Germany
- Berlin, Germany
- Washington, USA
- Helsinki, Finland
- New York, USA
and you want to produce an output like this, with each country mentioned once, and then an alphabetical list of the cities in that country:
- Finland: Helsinki.
- Germany: Berlin, Frankfurt.
- USA: Chicago, New York, Washington.
The natural way to do this is to have a hash whose keys are country names. Associated with each country name key is a list of the cities in that country. Each time you read a line of input, split it into a country and a city, look up the list of cities already known to be in that country, and append the new city to the list. When you're done reading the input, iterate over the hash as usual, sorting each list of cities before you print it out.
If hash values couldn't be lists, you lose. You'd probably have to combine all the cities into a single string somehow, and then when time came to write the output, you'd have to break the string into a list, sort the list, and turn it back into a string. This is messy and error-prone. And it's frustrating, because Perl already has perfectly good lists that would solve the problem if only you could use them.
By the time Perl 5 rolled around, we were already stuck with this design: Hash values must be scalars. The solution to this is references.
A reference is a scalar value that refers to an entire array or an entire hash (or to just about anything else). Names are one kind of reference that you're already familiar with. Think of the President of the United States: a messy, inconvenient bag of blood and bones. But to talk about him, or to represent him in a computer program, all you need is the easy, convenient scalar string "Barack Obama".
References in Perl are like names for arrays and hashes. They're Perl's private, internal names, so you can be sure they're unambiguous. Unlike "Barack Obama", a reference only refers to one thing, and you always know what it refers to. If you have a reference to an array, you can recover the entire array from it. If you have a reference to a hash, you can recover the entire hash. But the reference is still an easy, compact scalar value.
You can't have a hash whose values are arrays; hash values can only be scalars. We're stuck with that. But a single reference can refer to an entire array, and references are scalars, so you can have a hash of references to arrays, and it'll act a lot like a hash of arrays, and it'll be just as useful as a hash of arrays.
We'll come back to this city-country problem later, after we've seen some syntax for managing references.
There are just two ways to make a reference, and just two ways to use it once you have it.
If you put a \
in front of a variable, you get a
reference to that variable.
- $aref = \@array; # $aref now holds a reference to @array
- $href = \%hash; # $href now holds a reference to %hash
- $sref = \$scalar; # $sref now holds a reference to $scalar
Once the reference is stored in a variable like $aref or $href, you can copy it or store it just the same as any other scalar value:
- $xy = $aref; # $xy now holds a reference to @array
- $p[3] = $href; # $p[3] now holds a reference to %hash
- $z = $p[3]; # $z now holds a reference to %hash
These examples show how to make references to variables with names.
Sometimes you want to make an array or a hash that doesn't have a
name. This is analogous to the way you like to be able to use the
string "\n"
or the number 80 without having to store it in a named
variable first.
Make Rule 2
[ ITEMS ]
makes a new, anonymous array, and returns a reference to
that array. { ITEMS }
makes a new, anonymous hash, and returns a
reference to that hash.
- $aref = [ 1, "foo", undef, 13 ];
- # $aref now holds a reference to an array
- $href = { APR => 4, AUG => 8 };
- # $href now holds a reference to a hash
The references you get from rule 2 are the same kind of references that you get from rule 1:
- # This:
- $aref = [ 1, 2, 3 ];
- # Does the same as this:
- @array = (1, 2, 3);
- $aref = \@array;
The first line is an abbreviation for the following two lines, except
that it doesn't create the superfluous array variable @array
.
If you write just []
, you get a new, empty anonymous array.
If you write just {}
, you get a new, empty anonymous hash.
What can you do with a reference once you have it? It's a scalar value, and we've seen that you can store it as a scalar and get it back again just like any scalar. There are just two more ways to use it:
You can always use an array reference, in curly braces, in place of
the name of an array. For example, @{$aref}
instead of @array
.
Here are some examples of that:
Arrays:
- @a @{$aref} An array
- reverse @a reverse @{$aref} Reverse the array
- $a[3] ${$aref}[3] An element of the array
- $a[3] = 17; ${$aref}[3] = 17 Assigning an element
On each line are two expressions that do the same thing. The
left-hand versions operate on the array @a
. The right-hand
versions operate on the array that is referred to by $aref
. Once
they find the array they're operating on, both versions do the same
things to the arrays.
Using a hash reference is exactly the same:
- %h %{$href} A hash
- keys %h keys %{$href} Get the keys from the hash
- $h{'red'} ${$href}{'red'} An element of the hash
- $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element
Whatever you want to do with a reference, Use Rule 1 tells you how
to do it. You just write the Perl code that you would have written
for doing the same thing to a regular array or hash, and then replace
the array or hash name with {$reference}
. "How do I loop over an
array when all I have is a reference?" Well, to loop over an array, you
would write
- for my $element (@array) {
- ...
- }
so replace the array name, @array
, with the reference:
- for my $element (@{$aref}) {
- ...
- }
"How do I print out the contents of a hash when all I have is a reference?" First write the code for printing out a hash:
And then replace the hash name with the reference:
Use Rule 1 is all you really need, because it tells you how to do absolutely everything you ever need to do with references. But the most common thing to do with an array or a hash is to extract a single element, and the Use Rule 1 notation is cumbersome. So there is an abbreviation.
${$aref}[3]
is too hard to read, so you can write $aref->[3]
instead.
${$href}{red}
is too hard to read, so you can write
$href->{red}
instead.
If $aref
holds a reference to an array, then $aref->[3]
is
the fourth element of the array. Don't confuse this with $aref[3]
,
which is the fourth element of a totally different array, one
deceptively named @aref
. $aref
and @aref
are unrelated the
same way that $item
and @item
are.
Similarly, $href->{'red'}
is part of the hash referred to by
the scalar variable $href
, perhaps even one with no name.
$href{'red'}
is part of the deceptively named %href
hash. It's
easy to forget to leave out the ->
, and if you do, you'll get
bizarre results when your program gets array and hash elements out of
totally unexpected hashes and arrays that weren't the ones you wanted
to use.
Let's see a quick example of how all this is useful.
First, remember that [1, 2, 3]
makes an anonymous array containing
(1, 2, 3)
, and gives you a reference to that array.
Now think about
- @a = ( [1, 2, 3],
- [4, 5, 6],
- [7, 8, 9]
- );
@a is an array with three elements, and each one is a reference to another array.
$a[1]
is one of these references. It refers to an array, the array
containing (4, 5, 6)
, and because it is a reference to an array,
Use Rule 2 says that we can write $a[1]->[2]
to get the
third element from that array. $a[1]->[2]
is the 6.
Similarly, $a[0]->[1]
is the 2. What we have here is like a
two-dimensional array; you can write $a[ROW]->[COLUMN]
to get
or set the element in any row and any column of the array.
The notation still looks a little cumbersome, so there's one more abbreviation:
In between two subscripts, the arrow is optional.
Instead of $a[1]->[2]
, we can write $a[1][2]
; it means the
same thing. Instead of $a[0]->[1] = 23
, we can write
$a[0][1] = 23
; it means the same thing.
Now it really looks like two-dimensional arrays!
You can see why the arrows are important. Without them, we would have
had to write ${$a[1]}[2]
instead of $a[1][2]
. For
three-dimensional arrays, they let us write $x[2][3][5]
instead of
the unreadable ${${$x[2]}[3]}[5]
.
Here's the answer to the problem I posed earlier, of reformatting a file of city and country names.
- 1 my %table;
- 2 while (<>) {
- 3 chomp;
- 4 my ($city, $country) = split /, /;
- 5 $table{$country} = [] unless exists $table{$country};
- 6 push @{$table{$country}}, $city;
- 7 }
- 8 foreach $country (sort keys %table) {
- 9 print "$country: ";
- 10 my @cities = @{$table{$country}};
- 11 print join ', ', sort @cities;
- 12 print ".\n";
- 13 }
The program has two pieces: Lines 2--7 read the input and build a data
structure, and lines 8-13 analyze the data and print out the report.
We're going to have a hash, %table
, whose keys are country names,
and whose values are references to arrays of city names. The data
structure will look like this:
- %table
- +-------+---+
- | | | +-----------+--------+
- |Germany| *---->| Frankfurt | Berlin |
- | | | +-----------+--------+
- +-------+---+
- | | | +----------+
- |Finland| *---->| Helsinki |
- | | | +----------+
- +-------+---+
- | | | +---------+------------+----------+
- | USA | *---->| Chicago | Washington | New York |
- | | | +---------+------------+----------+
- +-------+---+
We'll look at output first. Supposing we already have this structure, how do we print it out?
- 8 foreach $country (sort keys %table) {
- 9 print "$country: ";
- 10 my @cities = @{$table{$country}};
- 11 print join ', ', sort @cities;
- 12 print ".\n";
- 13 }
%table
is an
ordinary hash, and we get a list of keys from it, sort the keys, and
loop over the keys as usual. The only use of references is in line 10.
$table{$country}
looks up the key $country
in the hash
and gets the value, which is a reference to an array of cities in that country.
Use Rule 1 says that
we can recover the array by saying
@{$table{$country}}
. Line 10 is just like
- @cities = @array;
except that the name array
has been replaced by the reference
{$table{$country}}
. The @
tells Perl to get the entire array.
Having gotten the list of cities, we sort it, join it, and print it
out as usual.
Lines 2-7 are responsible for building the structure in the first place. Here they are again:
- 2 while (<>) {
- 3 chomp;
- 4 my ($city, $country) = split /, /;
- 5 $table{$country} = [] unless exists $table{$country};
- 6 push @{$table{$country}}, $city;
- 7 }
Lines 2-4 acquire a city and country name. Line 5 looks to see if the
country is already present as a key in the hash. If it's not, the
program uses the []
notation (Make Rule 2) to manufacture a new,
empty anonymous array of cities, and installs a reference to it into
the hash under the appropriate key.
Line 6 installs the city name into the appropriate array.
$table{$country}
now holds a reference to the array of cities seen
in that country so far. Line 6 is exactly like
- push @array, $city;
except that the name array
has been replaced by the reference
{$table{$country}}
. The push adds a city name to the end of the
referred-to array.
There's one fine point I skipped. Line 5 is unnecessary, and we can get rid of it.
- 2 while (<>) {
- 3 chomp;
- 4 my ($city, $country) = split /, /;
- 5 #### $table{$country} = [] unless exists $table{$country};
- 6 push @{$table{$country}}, $city;
- 7 }
If there's already an entry in %table
for the current $country
,
then nothing is different. Line 6 will locate the value in
$table{$country}
, which is a reference to an array, and push
$city
into the array. But
what does it do when
$country
holds a key, say Greece
, that is not yet in %table
?
This is Perl, so it does the exact right thing. It sees that you want
to push Athens
onto an array that doesn't exist, so it helpfully
makes a new, empty, anonymous array for you, installs it into
%table
, and then pushes Athens
onto it. This is called
'autovivification'--bringing things to life automatically. Perl saw
that the key wasn't in the hash, so it created a new hash entry
automatically. Perl saw that you wanted to use the hash value as an
array, so it created a new empty array and installed a reference to it
in the hash automatically. And as usual, Perl made the array one
element longer to hold the new city name.
I promised to give you 90% of the benefit with 10% of the details, and that means I left out 90% of the details. Now that you have an overview of the important parts, it should be easier to read the perlref manual page, which discusses 100% of the details.
Some of the highlights of perlref:
You can make references to anything, including scalars, functions, and other references.
In Use Rule 1, you can omit the curly brackets whenever the thing
inside them is an atomic scalar variable like $aref
. For example,
@$aref
is the same as @{$aref}
, and $$aref[1]
is the same as
${$aref}[1]
. If you're just starting out, you may want to adopt
the habit of always including the curly brackets.
This doesn't copy the underlying array:
- $aref2 = $aref1;
You get two references to the same array. If you modify
$aref1->[23]
and then look at
$aref2->[23]
you'll see the change.
To copy the array, use
- $aref2 = [@{$aref1}];
This uses [...]
notation to create a new anonymous array, and
$aref2
is assigned a reference to the new array. The new array is
initialized with the contents of the array referred to by $aref1
.
Similarly, to copy an anonymous hash, you can use
- $href2 = {%{$href1}};
To see if a variable contains a reference, use the ref function. It
returns true if its argument is a reference. Actually it's a little
better than that: It returns HASH
for hash references and ARRAY
for array references.
If you try to use a reference like a string, you get strings like
- ARRAY(0x80f5dec) or HASH(0x826afc0)
If you ever see a string that looks like this, you'll know you printed out a reference by mistake.
A side effect of this representation is that you can use eq
to see
if two references refer to the same thing. (But you should usually use
==
instead because it's much faster.)
You can use a string as if it were a reference. If you use the string
"foo"
as an array reference, it's taken to be a reference to the
array @foo
. This is called a soft reference or symbolic
reference. The declaration use strict 'refs'
disables this
feature, which can cause all sorts of trouble if you use it by accident.
You might prefer to go on to perllol instead of perlref; it discusses lists of lists and multidimensional arrays in detail. After that, you should move on to perldsc; it's a Data Structure Cookbook that shows recipes for using and printing out arrays of hashes, hashes of arrays, and other kinds of data.
Everyone needs compound data structures, and in Perl the way you get them is with references. There are four important rules for managing references: Two for making references and two for using them. Once you know these rules you can do most of the important things you need to do with references.
Author: Mark Jason Dominus, Plover Systems (mjd-perl-ref+@plover.com
)
This article originally appeared in The Perl Journal ( http://www.tpj.com/ ) volume 3, #2. Reprinted with permission.
The original title was Understand References Today.
Copyright 1998 The Perl Journal.
This documentation is free; you can redistribute it and/or modify it under the same terms as Perl itself.
Irrespective of its distribution, all code examples in these files are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required.
perlreguts - Description of the Perl regular expression engine.
This document is an attempt to shine some light on the guts of the regex engine and how it works. The regex engine represents a significant chunk of the perl codebase, but is relatively poorly understood. This document is a meagre attempt at addressing this situation. It is derived from the author's experience, comments in the source code, other papers on the regex engine, feedback on the perl5-porters mail list, and no doubt other places as well.
NOTICE! It should be clearly understood that the behavior and structures discussed in this represents the state of the engine as the author understood it at the time of writing. It is NOT an API definition, it is purely an internals guide for those who want to hack the regex engine, or understand how the regex engine works. Readers of this document are expected to understand perl's regex syntax and its usage in detail. If you want to learn about the basics of Perl's regular expressions, see perlre. And if you want to replace the regex engine with your own, see perlreapi.
There is some debate as to whether to say "regexp" or "regex". In this document we will use the term "regex" unless there is a special reason not to, in which case we will explain why.
When speaking about regexes we need to distinguish between their source code form and their internal form. In this document we will use the term "pattern" when we speak of their textual, source code form, and the term "program" when we speak of their internal representation. These correspond to the terms S-regex and B-regex that Mark Jason Dominus employs in his paper on "Rx" ([1] in REFERENCES).
A regular expression engine is a program that takes a set of constraints specified in a mini-language, and then applies those constraints to a target string, and determines whether or not the string satisfies the constraints. See perlre for a full definition of the language.
In less grandiose terms, the first part of the job is to turn a pattern into something the computer can efficiently use to find the matching point in the string, and the second part is performing the search itself.
To do this we need to produce a program by parsing the text. We then need to execute the program to find the point in the string that matches. And we need to do the whole thing efficiently.
Although it is a bit confusing and some people object to the terminology, it is worth taking a look at a comment that has been in regexp.h for years:
This is essentially a linear encoding of a nondeterministic finite-state machine (aka syntax charts or "railroad normal form" in parsing technology).
The term "railroad normal form" is a bit esoteric, with "syntax diagram/charts", or "railroad diagram/charts" being more common terms. Nevertheless it provides a useful mental image of a regex program: each node can be thought of as a unit of track, with a single entry and in most cases a single exit point (there are pieces of track that fork, but statistically not many), and the whole forms a layout with a single entry and single exit point. The matching process can be thought of as a car that moves along the track, with the particular route through the system being determined by the character read at each possible connector point. A car can fall off the track at any point but it may only proceed as long as it matches the track.
Thus the pattern /foo(?:\w+|\d+|\s+)bar/
can be thought of as the
following chart:
- [start]
- |
- <foo>
- |
- +-----+-----+
- | | |
- <\w+> <\d+> <\s+>
- | | |
- +-----+-----+
- |
- <bar>
- |
- [end]
The truth of the matter is that perl's regular expressions these days are much more complex than this kind of structure, but visualising it this way can help when trying to get your bearings, and it matches the current implementation pretty closely.
To be more precise, we will say that a regex program is an encoding of a graph. Each node in the graph corresponds to part of the original regex pattern, such as a literal string or a branch, and has a pointer to the nodes representing the next component to be matched. Since "node" and "opcode" already have other meanings in the perl source, we will call the nodes in a regex program "regops".
The program is represented by an array of regnode
structures, one or
more of which represent a single regop of the program. Struct
regnode
is the smallest struct needed, and has a field structure which is
shared with all the other larger structures.
The "next" pointers of all regops except BRANCH
implement concatenation;
a "next" pointer with a BRANCH
on both ends of it is connecting two
alternatives. [Here we have one of the subtle syntax dependencies: an
individual BRANCH
(as opposed to a collection of them) is never
concatenated with anything because of operator precedence.]
The operand of some types of regop is a literal string; for others,
it is a regop leading into a sub-program. In particular, the operand
of a BRANCH
node is the first regop of the branch.
NOTE: As the railroad metaphor suggests, this is not a tree
structure: the tail of the branch connects to the thing following the
set of BRANCH
es. It is a like a single line of railway track that
splits as it goes into a station or railway yard and rejoins as it comes
out the other side.
The base structure of a regop is defined in regexp.h as follows:
- struct regnode {
- U8 flags; /* Various purposes, sometimes overridden */
- U8 type; /* Opcode value as specified by regnodes.h */
- U16 next_off; /* Offset in size regnode */
- };
Other larger regnode
-like structures are defined in regcomp.h. They
are almost like subclasses in that they have the same fields as
regnode
, with possibly additional fields following in
the structure, and in some cases the specific meaning (and name)
of some of base fields are overridden. The following is a more
complete description.
regnode_1
regnode_2
regnode_1
structures have the same header, followed by a single
four-byte argument; regnode_2
structures contain two two-byte
arguments instead:
- regnode_1 U32 arg1;
- regnode_2 U16 arg1; U16 arg2;
regnode_string
regnode_string
structures, used for literal strings, follow the header
with a one-byte length and then the string data. Strings are padded on
the end with zero bytes so that the total length of the node is a
multiple of four bytes:
- regnode_string char string[1];
- U8 str_len; /* overrides flags */
regnode_charclass
Character classes are represented by regnode_charclass
structures,
which have a four-byte argument and then a 32-byte (256-bit) bitmap
indicating which characters are included in the class.
- regnode_charclass U32 arg1;
- char bitmap[ANYOF_BITMAP_SIZE];
regnode_charclass_class
There is also a larger form of a char class structure used to represent
POSIX char classes called regnode_charclass_class
which has an
additional 4-byte (32-bit) bitmap indicating which POSIX char classes
have been included.
- regnode_charclass_class U32 arg1;
- char bitmap[ANYOF_BITMAP_SIZE];
- char classflags[ANYOF_CLASSBITMAP_SIZE];
regnodes.h defines an array called regarglen[]
which gives the size
of each opcode in units of size regnode
(4-byte). A macro is used
to calculate the size of an EXACT
node based on its str_len
field.
The regops are defined in regnodes.h which is generated from regcomp.sym by regcomp.pl. Currently the maximum possible number of distinct regops is restricted to 256, with about a quarter already used.
A set of macros makes accessing the fields
easier and more consistent. These include OP()
, which is used to determine
the type of a regnode
-like structure; NEXT_OFF()
, which is the offset to
the next node (more on this later); ARG()
, ARG1()
, ARG2()
, ARG_SET()
,
and equivalents for reading and setting the arguments; and STR_LEN()
,
STRING()
and OPERAND()
for manipulating strings and regop bearing
types.
There are three distinct concepts of "next" in the regex engine, and it is important to keep them clear.
There is the "next regnode" from a given regnode, a value which is rarely useful except that sometimes it matches up in terms of value with one of the others, and that sometimes the code assumes this to always be so.
There is the "next regop" from a given regop/regnode. This is the regop physically located after the current one, as determined by the size of the current regop. This is often useful, such as when dumping the structure we use this order to traverse. Sometimes the code assumes that the "next regnode" is the same as the "next regop", or in other words assumes that the sizeof a given regop type is always going to be one regnode large.
There is the "regnext" from a given regop. This is the regop which
is reached by jumping forward by the value of NEXT_OFF()
,
or in a few cases for longer jumps by the arg1
field of the regnode_1
structure. The subroutine regnext()
handles this transparently.
This is the logical successor of the node, which in some cases, like
that of the BRANCH
regop, has special meaning.
Broadly speaking, performing a match of a string against a pattern involves the following steps:
Where these steps occur in the actual execution of a perl program is
determined by whether the pattern involves interpolating any string
variables. If interpolation occurs, then compilation happens at run time. If it
does not, then compilation is performed at compile time. (The /o modifier changes this,
as does qr// to a certain extent.) The engine doesn't really care that
much.
This code resides primarily in regcomp.c, along with the header files regcomp.h, regexp.h and regnodes.h.
Compilation starts with pregcomp()
, which is mostly an initialisation
wrapper which farms work out to two other routines for the heavy lifting: the
first is reg()
, which is the start point for parsing; the second,
study_chunk()
, is responsible for optimisation.
Initialisation in pregcomp()
mostly involves the creation and data-filling
of a special structure, RExC_state_t
(defined in regcomp.c).
Almost all internally-used routines in regcomp.h take a pointer to one
of these structures as their first argument, with the name pRExC_state
.
This structure is used to store the compilation state and contains many
fields. Likewise there are many macros which operate on this
variable: anything that looks like RExC_xxxx
is a macro that operates on
this pointer/structure.
In this pass the input pattern is parsed in order to calculate how much space is needed for each regop we would need to emit. The size is also used to determine whether long jumps will be required in the program.
This stage is controlled by the macro SIZE_ONLY
being set.
The parse proceeds pretty much exactly as it does during the
construction phase, except that most routines are short-circuited to
change the size field RExC_size
and not do anything else.
Once the size of the program has been determined, the pattern is parsed
again, but this time for real. Now SIZE_ONLY
will be false, and the
actual construction can occur.
reg()
is the start of the parse process. It is responsible for
parsing an arbitrary chunk of pattern up to either the end of the
string, or the first closing parenthesis it encounters in the pattern.
This means it can be used to parse the top-level regex, or any section
inside of a grouping parenthesis. It also handles the "special parens"
that perl's regexes have. For instance when parsing /x(?:foo)y/
reg()
will at one point be called to parse from the "?" symbol up to and
including the ")".
Additionally, reg()
is responsible for parsing the one or more
branches from the pattern, and for "finishing them off" by correctly
setting their next pointers. In order to do the parsing, it repeatedly
calls out to regbranch()
, which is responsible for handling up to the
first | symbol it sees.
regbranch()
in turn calls regpiece()
which
handles "things" followed by a quantifier. In order to parse the
"things", regatom()
is called. This is the lowest level routine, which
parses out constant strings, character classes, and the
various special symbols like $
. If regatom()
encounters a "("
character it in turn calls reg()
.
The routine regtail()
is called by both reg()
and regbranch()
in order to "set the tail pointer" correctly. When executing and
we get to the end of a branch, we need to go to the node following the
grouping parens. When parsing, however, we don't know where the end will
be until we get there, so when we do we must go back and update the
offsets as appropriate. regtail
is used to make this easier.
A subtlety of the parsing process means that a regex like /foo/
is
originally parsed into an alternation with a single branch. It is only
afterwards that the optimiser converts single branch alternations into the
simpler form.
The call graph looks like this:
- reg() # parse a top level regex, or inside of
- # parens
- regbranch() # parse a single branch of an alternation
- regpiece() # parse a pattern followed by a quantifier
- regatom() # parse a simple pattern
- regclass() # used to handle a class
- reg() # used to handle a parenthesised
- # subpattern
- ....
- ...
- regtail() # finish off the branch
- ...
- regtail() # finish off the branch sequence. Tie each
- # branch's tail to the tail of the
- # sequence
- # (NEW) In Debug mode this is
- # regtail_study().
A grammar form might be something like this:
- atom : constant | class
- quant : '*' | '+' | '?' | '{min,max}'
- _branch: piece
- | piece _branch
- | nothing
- branch: _branch
- | _branch '|' branch
- group : '(' branch ')'
- _piece: atom | group
- piece : _piece
- | _piece quant
The implication of the above description is that a pattern containing nested
parentheses will result in a call graph which cycles through reg()
,
regbranch()
, regpiece()
, regatom()
, reg()
, regbranch()
etc
multiple times, until the deepest level of nesting is reached. All the above
routines return a pointer to a regnode
, which is usually the last regnode
added to the program. However, one complication is that reg() returns NULL
for parsing (?:) syntax for embedded modifiers, setting the flag
TRYAGAIN
. The TRYAGAIN
propagates upwards until it is captured, in
some cases by by regatom()
, but otherwise unconditionally by
regbranch()
. Hence it will never be returned by regbranch()
to
reg()
. This flag permits patterns such as (?i)+
to be detected as
errors (Quantifier follows nothing in regex; marked by <-- HERE in m/(?i)+
<-- HERE /).
Another complication is that the representation used for the program differs
if it needs to store Unicode, but it's not always possible to know for sure
whether it does until midway through parsing. The Unicode representation for
the program is larger, and cannot be matched as efficiently. (See Unicode and Localisation Support below for more details as to why.) If the pattern
contains literal Unicode, it's obvious that the program needs to store
Unicode. Otherwise, the parser optimistically assumes that the more
efficient representation can be used, and starts sizing on this basis.
However, if it then encounters something in the pattern which must be stored
as Unicode, such as an \x{...}
escape sequence representing a character
literal, then this means that all previously calculated sizes need to be
redone, using values appropriate for the Unicode representation. Currently,
all regular expression constructions which can trigger this are parsed by code
in regatom()
.
To avoid wasted work when a restart is needed, the sizing pass is abandoned
- regatom()
immediately returns NULL, setting the flag RESTART_UTF8
.
(This action is encapsulated using the macro REQUIRE_UTF8
.) This restart
request is propagated up the call chain in a similar fashion, until it is
"caught" in Perl_re_op_compile()
, which marks the pattern as containing
Unicode, and restarts the sizing pass. It is also possible for constructions
within run-time code blocks to turn out to need Unicode representation.,
which is signalled by S_compile_runtime_code()
returning false to
Perl_re_op_compile()
.
The restart was previously implemented using a longjmp
in regatom()
back to a setjmp
in Perl_re_op_compile()
, but this proved to be
problematic as the latter is a large function containing many automatic
variables, which interact badly with the emergent control flow of setjmp
.
In the 5.9.x development version of perl you can use re Debug => 'PARSE'
to see some trace information about the parse process. We will start with some
simple patterns and build up to more complex patterns.
So when we parse /foo/
we see something like the following table. The
left shows what is being parsed, and the number indicates where the next regop
would go. The stuff on the right is the trace output of the graph. The
names are chosen to be short to make it less dense on the screen. 'tsdy'
is a special form of regtail()
which does some extra analysis.
- >foo< 1 reg
- brnc
- piec
- atom
- >< 4 tsdy~ EXACT <foo> (EXACT) (1)
- ~ attach to END (3) offset to 2
The resulting program then looks like:
- 1: EXACT <foo>(3)
- 3: END(0)
As you can see, even though we parsed out a branch and a piece, it was ultimately
only an atom. The final program shows us how things work. We have an EXACT
regop,
followed by an END
regop. The number in parens indicates where the regnext
of
the node goes. The regnext
of an END
regop is unused, as END
regops mean
we have successfully matched. The number on the left indicates the position of
the regop in the regnode array.
Now let's try a harder pattern. We will add a quantifier, so now we have the pattern
/foo+/
. We will see that regbranch()
calls regpiece()
twice.
- >foo+< 1 reg
- brnc
- piec
- atom
- >o+< 3 piec
- atom
- >< 6 tail~ EXACT <fo> (1)
- 7 tsdy~ EXACT <fo> (EXACT) (1)
- ~ PLUS (END) (3)
- ~ attach to END (6) offset to 3
And we end up with the program:
- 1: EXACT <fo>(3)
- 3: PLUS(6)
- 4: EXACT <o>(0)
- 6: END(0)
Now we have a special case. The EXACT
regop has a regnext
of 0. This is
because if it matches it should try to match itself again. The PLUS
regop
handles the actual failure of the EXACT
regop and acts appropriately (going
to regnode 6 if the EXACT
matched at least once, or failing if it didn't).
Now for something much more complex: /x(?:foo*|b[a][rR])(foo|bar)$/
- >x(?:foo*|b... 1 reg
- brnc
- piec
- atom
- >(?:foo*|b[... 3 piec
- atom
- >?:foo*|b[a... reg
- >foo*|b[a][... brnc
- piec
- atom
- >o*|b[a][rR... 5 piec
- atom
- >|b[a][rR])... 8 tail~ EXACT <fo> (3)
- >b[a][rR])(... 9 brnc
- 10 piec
- atom
- >[a][rR])(f... 12 piec
- atom
- >a][rR])(fo... clas
- >[rR])(foo|... 14 tail~ EXACT <b> (10)
- piec
- atom
- >rR])(foo|b... clas
- >)(foo|bar)... 25 tail~ EXACT <a> (12)
- tail~ BRANCH (3)
- 26 tsdy~ BRANCH (END) (9)
- ~ attach to TAIL (25) offset to 16
- tsdy~ EXACT <fo> (EXACT) (4)
- ~ STAR (END) (6)
- ~ attach to TAIL (25) offset to 19
- tsdy~ EXACT <b> (EXACT) (10)
- ~ EXACT <a> (EXACT) (12)
- ~ ANYOF[Rr] (END) (14)
- ~ attach to TAIL (25) offset to 11
- >(foo|bar)$< tail~ EXACT <x> (1)
- piec
- atom
- >foo|bar)$< reg
- 28 brnc
- piec
- atom
- >|bar)$< 31 tail~ OPEN1 (26)
- >bar)$< brnc
- 32 piec
- atom
- >)$< 34 tail~ BRANCH (28)
- 36 tsdy~ BRANCH (END) (31)
- ~ attach to CLOSE1 (34) offset to 3
- tsdy~ EXACT <foo> (EXACT) (29)
- ~ attach to CLOSE1 (34) offset to 5
- tsdy~ EXACT <bar> (EXACT) (32)
- ~ attach to CLOSE1 (34) offset to 2
- >$< tail~ BRANCH (3)
- ~ BRANCH (9)
- ~ TAIL (25)
- piec
- atom
- >< 37 tail~ OPEN1 (26)
- ~ BRANCH (28)
- ~ BRANCH (31)
- ~ CLOSE1 (34)
- 38 tsdy~ EXACT <x> (EXACT) (1)
- ~ BRANCH (END) (3)
- ~ BRANCH (END) (9)
- ~ TAIL (END) (25)
- ~ OPEN1 (END) (26)
- ~ BRANCH (END) (28)
- ~ BRANCH (END) (31)
- ~ CLOSE1 (END) (34)
- ~ EOL (END) (36)
- ~ attach to END (37) offset to 1
Resulting in the program
- 1: EXACT <x>(3)
- 3: BRANCH(9)
- 4: EXACT <fo>(6)
- 6: STAR(26)
- 7: EXACT <o>(0)
- 9: BRANCH(25)
- 10: EXACT <ba>(14)
- 12: OPTIMIZED (2 nodes)
- 14: ANYOF[Rr](26)
- 25: TAIL(26)
- 26: OPEN1(28)
- 28: TRIE-EXACT(34)
- [StS:1 Wds:2 Cs:6 Uq:5 #Sts:7 Mn:3 Mx:3 Stcls:bf]
- <foo>
- <bar>
- 30: OPTIMIZED (4 nodes)
- 34: CLOSE1(36)
- 36: EOL(37)
- 37: END(0)
Here we can see a much more complex program, with various optimisations in
play. At regnode 10 we see an example where a character class with only
one character in it was turned into an EXACT
node. We can also see where
an entire alternation was turned into a TRIE-EXACT
node. As a consequence,
some of the regnodes have been marked as optimised away. We can see that
the $
symbol has been converted into an EOL
regop, a special piece of
code that looks for \n
or the end of the string.
The next pointer for BRANCH
es is interesting in that it points at where
execution should go if the branch fails. When executing, if the engine
tries to traverse from a branch to a regnext
that isn't a branch then
the engine will know that the entire set of branches has failed.
The regular expression engine can be a weighty tool to wield. On long strings and complex patterns it can end up having to do a lot of work to find a match, and even more to decide that no match is possible. Consider a situation like the following pattern.
- 'ababababababababababab' =~ /(a|b)*z/
The (a|b)*
part can match at every char in the string, and then fail
every time because there is no z
in the string. So obviously we can
avoid using the regex engine unless there is a z
in the string.
Likewise in a pattern like:
- /foo(\w+)bar/
In this case we know that the string must contain a foo
which must be
followed by bar
. We can use Fast Boyer-Moore matching as implemented
in fbm_instr()
to find the location of these strings. If they don't exist
then we don't need to resort to the much more expensive regex engine.
Even better, if they do exist then we can use their positions to
reduce the search space that the regex engine needs to cover to determine
if the entire pattern matches.
There are various aspects of the pattern that can be used to facilitate optimisations along these lines:
Another form of optimisation that can occur is the post-parse "peep-hole"
optimisation, where inefficient constructs are replaced by more efficient
constructs. The TAIL
regops which are used during parsing to mark the end
of branches and the end of groups are examples of this. These regops are used
as place-holders during construction and "always match" so they can be
"optimised away" by making the things that point to the TAIL
point to the
thing that TAIL
points to, thus "skipping" the node.
Another optimisation that can occur is that of "EXACT
merging" which is
where two consecutive EXACT
nodes are merged into a single
regop. An even more aggressive form of this is that a branch
sequence of the form EXACT BRANCH ... EXACT
can be converted into a
TRIE-EXACT
regop.
All of this occurs in the routine study_chunk()
which uses a special
structure scan_data_t
to store the analysis that it has performed, and
does the "peep-hole" optimisations as it goes.
The code involved in study_chunk()
is extremely cryptic. Be careful. :-)
Execution of a regex generally involves two phases, the first being finding the start point in the string where we should match from, and the second being running the regop interpreter.
If we can tell that there is no valid start point then we don't bother running interpreter at all. Likewise, if we know from the analysis phase that we cannot detect a short-cut to the start position, we go straight to the interpreter.
The two entry points are re_intuit_start()
and pregexec()
. These routines
have a somewhat incestuous relationship with overlap between their functions,
and pregexec()
may even call re_intuit_start()
on its own. Nevertheless
other parts of the perl source code may call into either, or both.
Execution of the interpreter itself used to be recursive, but thanks to the efforts of Dave Mitchell in the 5.9.x development track, that has changed: now an internal stack is maintained on the heap and the routine is fully iterative. This can make it tricky as the code is quite conservative about what state it stores, with the result that two consecutive lines in the code can actually be running in totally different contexts due to the simulated recursion.
re_intuit_start()
is responsible for handling start points and no-match
optimisations as determined by the results of the analysis done by
study_chunk()
(and described in Peep-hole Optimisation and Analysis).
The basic structure of this routine is to try to find the start- and/or end-points of where the pattern could match, and to ensure that the string is long enough to match the pattern. It tries to use more efficient methods over less efficient methods and may involve considerable cross-checking of constraints to find the place in the string that matches. For instance it may try to determine that a given fixed string must be not only present but a certain number of chars before the end of the string, or whatever.
It calls several other routines, such as fbm_instr()
which does
Fast Boyer Moore matching and find_byclass()
which is responsible for
finding the start using the first mandatory regop in the program.
When the optimisation criteria have been satisfied, reg_try()
is called
to perform the match.
pregexec()
is the main entry point for running a regex. It contains
support for initialising the regex interpreter's state, running
re_intuit_start()
if needed, and running the interpreter on the string
from various start positions as needed. When it is necessary to use
the regex interpreter pregexec()
calls regtry()
.
regtry()
is the entry point into the regex interpreter. It expects
as arguments a pointer to a regmatch_info
structure and a pointer to
a string. It returns an integer 1 for success and a 0 for failure.
It is basically a set-up wrapper around regmatch()
.
regmatch
is the main "recursive loop" of the interpreter. It is
basically a giant switch statement that implements a state machine, where
the possible states are the regops themselves, plus a number of additional
intermediate and failure states. A few of the states are implemented as
subroutines but the bulk are inline code.
When dealing with strings containing characters that cannot be represented using an eight-bit character set, perl uses an internal representation that is a permissive version of Unicode's UTF-8 encoding[2]. This uses single bytes to represent characters from the ASCII character set, and sequences of two or more bytes for all other characters. (See perlunitut for more information about the relationship between UTF-8 and perl's encoding, utf8. The difference isn't important for this discussion.)
No matter how you look at it, Unicode support is going to be a pain in a
regex engine. Tricks that might be fine when you have 256 possible
characters often won't scale to handle the size of the UTF-8 character
set. Things you can take for granted with ASCII may not be true with
Unicode. For instance, in ASCII, it is safe to assume that
sizeof(char1) == sizeof(char2)
, but in UTF-8 it isn't. Unicode case folding is
vastly more complex than the simple rules of ASCII, and even when not
using Unicode but only localised single byte encodings, things can get
tricky (for example, LATIN SMALL LETTER SHARP S (U+00DF, ß)
should match 'SS' in localised case-insensitive matching).
Making things worse is that UTF-8 support was a later addition to the regex engine (as it was to perl) and this necessarily made things a lot more complicated. Obviously it is easier to design a regex engine with Unicode support in mind from the beginning than it is to retrofit it to one that wasn't.
Nearly all regops that involve looking at the input string have two cases, one for UTF-8, and one not. In fact, it's often more complex than that, as the pattern may be UTF-8 as well.
Care must be taken when making changes to make sure that you handle UTF-8 properly, both at compile time and at execution time, including when the string and pattern are mismatched.
The following comment in regcomp.h gives an example of exactly how tricky this can be:
- Two problematic code points in Unicode casefolding of EXACT nodes:
- U+0390 - GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
- U+03B0 - GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
- which casefold to
- Unicode UTF-8
- U+03B9 U+0308 U+0301 0xCE 0xB9 0xCC 0x88 0xCC 0x81
- U+03C5 U+0308 U+0301 0xCF 0x85 0xCC 0x88 0xCC 0x81
- This means that in case-insensitive matching (or "loose matching",
- as Unicode calls it), an EXACTF of length six (the UTF-8 encoded
- byte length of the above casefolded versions) can match a target
- string of length two (the byte length of UTF-8 encoded U+0390 or
- U+03B0). This would rather mess up the minimum length computation.
- What we'll do is to look for the tail four bytes, and then peek
- at the preceding two bytes to see whether we need to decrease
- the minimum length by four (six minus two).
- Thanks to the design of UTF-8, there cannot be false matches:
- A sequence of valid UTF-8 bytes cannot be a subsequence of
- another valid sequence of UTF-8 bytes.
The regexp
structure described in perlreapi is common to all
regex engines. Two of its fields that are intended for the private use
of the regex engine that compiled the pattern. These are the
intflags
and pprivate members. The pprivate
is a void pointer to
an arbitrary structure whose use and management is the responsibility
of the compiling engine. perl will never modify either of these
values. In the case of the stock engine the structure pointed to by
pprivate
is called regexp_internal
.
Its pprivate
and intflags
fields contain data
specific to each engine.
There are two structures used to store a compiled regular expression.
One, the regexp
structure described in perlreapi is populated by
the engine currently being. used and some of its fields read by perl to
implement things such as the stringification of qr//.
The other structure is pointed to be the regexp
struct's
pprivate
and is in addition to intflags
in the same struct
considered to be the property of the regex engine which compiled the
regular expression;
The regexp structure contains all the data that perl needs to be aware of to properly work with the regular expression. It includes data about optimisations that perl can use to determine if the regex engine should really be used, and various other control info that is needed to properly execute patterns in various contexts such as is the pattern anchored in some way, or what flags were used during the compile, or whether the program contains special constructs that perl needs to be aware of.
In addition it contains two fields that are intended for the private use
of the regex engine that compiled the pattern. These are the intflags
and pprivate members. The pprivate
is a void pointer to an arbitrary
structure whose use and management is the responsibility of the compiling
engine. perl will never modify either of these values.
As mentioned earlier, in the case of the default engines, the pprivate
will be a pointer to a regexp_internal structure which holds the compiled
program and any additional data that is private to the regex engine
implementation.
pprivate
structureThe following structure is used as the pprivate
struct by perl's
regex engine. Since it is specific to perl it is only of curiosity
value to other engine implementations.
- typedef struct regexp_internal {
- U32 *offsets; /* offset annotations 20001228 MJD
- * data about mapping the program to
- * the string*/
- regnode *regstclass; /* Optional startclass as identified or
- * constructed by the optimiser */
- struct reg_data *data; /* Additional miscellaneous data used
- * by the program. Used to make it
- * easier to clone and free arbitrary
- * data that the regops need. Often the
- * ARG field of a regop is an index
- * into this structure */
- regnode program[1]; /* Unwarranted chumminess with
- * compiler. */
- } regexp_internal;
offsets
Offsets holds a mapping of offset in the program
to offset in the precomp
string. This is only used by ActiveState's
visual regex debugger.
regstclass
Special regop that is used by re_intuit_start()
to check if a pattern
can match at a certain position. For instance if the regex engine knows
that the pattern must start with a 'Z' then it can scan the string until
it finds one and then launch the regex engine from there. The routine
that handles this is called find_by_class()
. Sometimes this field
points at a regop embedded in the program, and sometimes it points at
an independent synthetic regop that has been constructed by the optimiser.
data
This field points at a reg_data structure, which is defined as follows
- struct reg_data {
- U32 count;
- U8 *what;
- void* data[1];
- };
This structure is used for handling data structures that the regex engine needs to handle specially during a clone or free operation on the compiled product. Each element in the data array has a corresponding element in the what array. During compilation regops that need special structures stored will add an element to each array using the add_data() routine and then store the index in the regop.
program
Compiled program. Inlined into the structure so the entire struct can be treated as a single blob.
by Yves Orton, 2006.
With excerpts from Perl, and contributions and suggestions from Ronald J. Kimball, Dave Mitchell, Dominic Dunlop, Mark Jason Dominus, Stephen McCamant, and David Landgren.
Same terms as Perl.
perlrepository - Links to current information on the Perl source repository
Perl's source code is stored in a Git repository.
See perlhack for an explanation of Perl development, including the Super Quick Patch Guide for making and submitting a small patch.
See perlgit for detailed information about Perl's Git repository.
(The above documents supersede the information that was formerly here in perlrepository.)
perlrequick - Perl regular expressions quick start
This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl.
The simplest regex is simply a word, or more generally, a string of characters. A regex consisting of a word matches any string that contains that word:
- "Hello World" =~ /World/; # matches
In this statement, World
is a regex and the //
enclosing
/World/
tells Perl to search a string for a match. The operator
=~
associates the string with the regex match and produces a true
value if the regex matched, or false if the regex did not match. In
our case, World
matches the second word in "Hello World"
, so the
expression is true. This idea has several variations.
Expressions like this are useful in conditionals:
- print "It matches\n" if "Hello World" =~ /World/;
The sense of the match can be reversed by using !~
operator:
- print "It doesn't match\n" if "Hello World" !~ /World/;
The literal string in the regex can be replaced by a variable:
- $greeting = "World";
- print "It matches\n" if "Hello World" =~ /$greeting/;
If you're matching against $_
, the $_ =~
part can be omitted:
- $_ = "Hello World";
- print "It matches\n" if /World/;
Finally, the //
default delimiters for a match can be changed to
arbitrary delimiters by putting an 'm'
out front:
- "Hello World" =~ m!World!; # matches, delimited by '!'
- "Hello World" =~ m{World}; # matches, note the matching '{}'
- "/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin',
- # '/' becomes an ordinary char
Regexes must match a part of the string exactly in order for the statement to be true:
- "Hello World" =~ /world/; # doesn't match, case sensitive
- "Hello World" =~ /o W/; # matches, ' ' is an ordinary char
- "Hello World" =~ /World /; # doesn't match, no ' ' at end
Perl will always match at the earliest possible point in the string:
- "Hello World" =~ /o/; # matches 'o' in 'Hello'
- "That hat is red" =~ /hat/; # matches 'hat' in 'That'
Not all characters can be used 'as is' in a match. Some characters, called metacharacters, are reserved for use in regex notation. The metacharacters are
- {}[]()^$.|*+?\
A metacharacter can be matched by putting a backslash before it:
- "2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter
- "2+2=4" =~ /2\+2/; # matches, \+ is treated like an ordinary +
- 'C:\WIN32' =~ /C:\\WIN/; # matches
- "/usr/bin/perl" =~ /\/usr\/bin\/perl/; # matches
In the last regex, the forward slash '/'
is also backslashed,
because it is used to delimit the regex.
Non-printable ASCII characters are represented by escape sequences.
Common examples are \t
for a tab, \n
for a newline, and \r
for a carriage return. Arbitrary bytes are represented by octal
escape sequences, e.g., \033
, or hexadecimal escape sequences,
e.g., \x1B
:
- "1000\t2000" =~ m(0\t2) # matches
- "cat" =~ /\143\x61\x74/ # matches in ASCII, but a weird way to spell cat
Regexes are treated mostly as double-quoted strings, so variable substitution works:
- $foo = 'house';
- 'cathouse' =~ /cat$foo/; # matches
- 'housecat' =~ /${foo}cat/; # matches
With all of the regexes above, if the regex matched anywhere in the
string, it was considered a match. To specify where it should
match, we would use the anchor metacharacters ^ and $
. The
anchor ^ means match at the beginning of the string and the anchor
$
means match at the end of the string, or before a newline at the
end of the string. Some examples:
- "housekeeper" =~ /keeper/; # matches
- "housekeeper" =~ /^keeper/; # doesn't match
- "housekeeper" =~ /keeper$/; # matches
- "housekeeper\n" =~ /keeper$/; # matches
- "housekeeper" =~ /^housekeeper$/; # matches
A character class allows a set of possible characters, rather than
just a single character, to match at a particular point in a regex.
Character classes are denoted by brackets [...]
, with the set of
characters to be possibly matched inside. Here are some examples:
- /cat/; # matches 'cat'
- /[bcr]at/; # matches 'bat', 'cat', or 'rat'
- "abc" =~ /[cab]/; # matches 'a'
In the last statement, even though 'c'
is the first character in
the class, the earliest point at which the regex can match is 'a'
.
- /[yY][eE][sS]/; # match 'yes' in a case-insensitive way
- # 'yes', 'Yes', 'YES', etc.
- /yes/i; # also match 'yes' in a case-insensitive way
The last example shows a match with an 'i'
modifier, which makes
the match case-insensitive.
Character classes also have ordinary and special characters, but the
sets of ordinary and special characters inside a character class are
different than those outside a character class. The special
characters for a character class are -]\^$ and are matched using an
escape:
- /[\]c]def/; # matches ']def' or 'cdef'
- $x = 'bcr';
- /[$x]at/; # matches 'bat, 'cat', or 'rat'
- /[\$x]at/; # matches '$at' or 'xat'
- /[\\$x]at/; # matches '\at', 'bat, 'cat', or 'rat'
The special character '-'
acts as a range operator within character
classes, so that the unwieldy [0123456789] and [abc...xyz]
become the svelte [0-9]
and [a-z]
:
- /item[0-9]/; # matches 'item0' or ... or 'item9'
- /[0-9a-fA-F]/; # matches a hexadecimal digit
If '-'
is the first or last character in a character class, it is
treated as an ordinary character.
The special character ^ in the first position of a character class
denotes a negated character class, which matches any character but
those in the brackets. Both [...]
and [^...] must match a
character, or the match fails. Then
- /[^a]at/; # doesn't match 'aat' or 'at', but matches
- # all other 'bat', 'cat, '0at', '%at', etc.
- /[^0-9]/; # matches a non-numeric character
- /[a^]at/; # matches 'aat' or '^at'; here '^' is ordinary
Perl has several abbreviations for common character classes. (These
definitions are those that Perl uses in ASCII-safe mode with the /a
modifier.
Otherwise they could match many more non-ASCII Unicode characters as
well. See Backslash sequences in perlrecharclass for details.)
\d is a digit and represents
- [0-9]
\s is a whitespace character and represents
- [\ \t\r\n\f]
\w is a word character (alphanumeric or _) and represents
- [0-9a-zA-Z_]
\D is a negated \d; it represents any character but a digit
- [^0-9]
\S is a negated \s; it represents any non-whitespace character
- [^\s]
\W is a negated \w; it represents any non-word character
- [^\w]
The period '.' matches any character but "\n"
The \d\s\w\D\S\W abbreviations can be used both inside and outside
of character classes. Here are some in use:
- /\d\d:\d\d:\d\d/; # matches a hh:mm:ss time format
- /[\d\s]/; # matches any digit or whitespace character
- /\w\W\w/; # matches a word char, followed by a
- # non-word char, followed by a word char
- /..rt/; # matches any two chars, followed by 'rt'
- /end\./; # matches 'end.'
- /end[.]/; # same thing, matches 'end.'
The word anchor \b
matches a boundary between a word
character and a non-word character \w\W
or \W\w
:
- $x = "Housecat catenates house and cat";
- $x =~ /\bcat/; # matches cat in 'catenates'
- $x =~ /cat\b/; # matches cat in 'housecat'
- $x =~ /\bcat\b/; # matches 'cat' at end of string
In the last example, the end of the string is considered a word boundary.
We can match different character strings with the alternation
metacharacter '|'
. To match dog
or cat
, we form the regex
dog|cat
. As before, Perl will try to match the regex at the
earliest possible point in the string. At each character position,
Perl will first try to match the first alternative, dog
. If
dog
doesn't match, Perl will then try the next alternative, cat
.
If cat
doesn't match either, then the match fails and Perl moves to
the next position in the string. Some examples:
- "cats and dogs" =~ /cat|dog|bird/; # matches "cat"
- "cats and dogs" =~ /dog|cat|bird/; # matches "cat"
Even though dog
is the first alternative in the second regex,
cat
is able to match earlier in the string.
- "cats" =~ /c|ca|cat|cats/; # matches "c"
- "cats" =~ /cats|cat|ca|c/; # matches "cats"
At a given character position, the first alternative that allows the regex match to succeed will be the one that matches. Here, all the alternatives match at the first string position, so the first matches.
The grouping metacharacters ()
allow a part of a regex to be
treated as a single unit. Parts of a regex are grouped by enclosing
them in parentheses. The regex house(cat|keeper)
means match
house
followed by either cat
or keeper
. Some more examples
are
- /(a|b)b/; # matches 'ab' or 'bb'
- /(^a|b)c/; # matches 'ac' at start of string or 'bc' anywhere
- /house(cat|)/; # matches either 'housecat' or 'house'
- /house(cat(s|)|)/; # matches either 'housecats' or 'housecat' or
- # 'house'. Note groups can be nested.
- "20" =~ /(19|20|)\d\d/; # matches the null alternative '()\d\d',
- # because '20\d\d' can't match
The grouping metacharacters ()
also allow the extraction of the
parts of a string that matched. For each grouping, the part that
matched inside goes into the special variables $1
, $2
, etc.
They can be used just as ordinary variables:
- # extract hours, minutes, seconds
- $time =~ /(\d\d):(\d\d):(\d\d)/; # match hh:mm:ss format
- $hours = $1;
- $minutes = $2;
- $seconds = $3;
In list context, a match /regex/
with groupings will return the
list of matched values ($1,$2,...)
. So we could rewrite it as
- ($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);
If the groupings in a regex are nested, $1
gets the group with the
leftmost opening parenthesis, $2
the next opening parenthesis,
etc. For example, here is a complex regex and the matching variables
indicated below it:
- /(ab(cd|ef)((gi)|j))/;
- 1 2 34
Associated with the matching variables $1
, $2
, ... are
the backreferences \g1
, \g2
, ... Backreferences are
matching variables that can be used inside a regex:
- /(\w\w\w)\s\g1/; # find sequences like 'the the' in string
$1
, $2
, ... should only be used outside of a regex, and \g1
,
\g2
, ... only inside a regex.
The quantifier metacharacters ?, *
, +
, and {}
allow us
to determine the number of repeats of a portion of a regex we
consider to be a match. Quantifiers are put immediately after the
character, character class, or grouping that we want to specify. They
have the following meanings:
a? = match 'a' 1 or 0 times
a*
= match 'a' 0 or more times, i.e., any number of times
a+
= match 'a' 1 or more times, i.e., at least once
a{n,m} = match at least n
times, but not more than m
times.
a{n,}
= match at least n
or more times
a{n}
= match exactly n
times
Here are some examples:
- /[a-z]+\s+\d*/; # match a lowercase word, at least some space, and
- # any number of digits
- /(\w+)\s+\g1/; # match doubled words of arbitrary length
- $year =~ /^\d{2,4}$/; # make sure year is at least 2 but not more
- # than 4 digits
- $year =~ /^\d{4}$|^\d{2}$/; # better match; throw out 3 digit dates
These quantifiers will try to match as much of the string as possible, while still allowing the regex to match. So we have
- $x = 'the cat in the hat';
- $x =~ /^(.*)(at)(.*)$/; # matches,
- # $1 = 'the cat in the h'
- # $2 = 'at'
- # $3 = '' (0 matches)
The first quantifier .* grabs as much of the string as possible
while still having the regex match. The second quantifier .* has
no string left to it, so it matches 0 times.
There are a few more things you might want to know about matching
operators.
The global modifier //g
allows the matching operator to match
within a string as many times as possible. In scalar context,
successive matches against a string will have //g
jump from match
to match, keeping track of position in the string as it goes along.
You can get or set the position with the pos() function.
For example,
prints
- Word is cat, ends at position 3
- Word is dog, ends at position 7
- Word is house, ends at position 13
A failed match or changing the target string resets the position. If
you don't want the position reset after failure to match, add the
//c
, as in /regex/gc
.
In list context, //g
returns a list of matched groupings, or if
there are no groupings, a list of matches to the whole regex. So
- @words = ($x =~ /(\w+)/g); # matches,
- # $word[0] = 'cat'
- # $word[1] = 'dog'
- # $word[2] = 'house'
Search and replace is performed using s/regex/replacement/modifiers.
The replacement
is a Perl double-quoted string that replaces in the
string whatever is matched with the regex
. The operator =~
is
also used here to associate a string with s///. If matching
against $_
, the $_ =~
can be dropped. If there is a match,
s/// returns the number of substitutions made; otherwise it returns
false. Here are a few examples:
- $x = "Time to feed the cat!";
- $x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!"
- $y = "'quoted words'";
- $y =~ s/^'(.*)'$/$1/; # strip single quotes,
- # $y contains "quoted words"
With the s/// operator, the matched variables $1
, $2
, etc.
are immediately available for use in the replacement expression. With
the global modifier, s///g will search and replace all occurrences
of the regex in the string:
- $x = "I batted 4 for 4";
- $x =~ s/4/four/; # $x contains "I batted four for 4"
- $x = "I batted 4 for 4";
- $x =~ s/4/four/g; # $x contains "I batted four for four"
The non-destructive modifier s///r causes the result of the substitution
to be returned instead of modifying $_
(or whatever variable the
substitute was bound to with =~
):
- $x = "I like dogs.";
- $y = $x =~ s/dogs/cats/r;
- print "$x $y\n"; # prints "I like dogs. I like cats."
- $x = "Cats are great.";
- print $x =~ s/Cats/Dogs/r =~ s/Dogs/Frogs/r =~ s/Frogs/Hedgehogs/r, "\n";
- # prints "Hedgehogs are great."
- @foo = map { s/[a-z]/X/r } qw(a b c 1 2 3);
- # @foo is now qw(X X X 1 2 3)
The evaluation modifier s///e wraps an eval{...} around the
replacement string and the evaluated result is substituted for the
matched substring. Some examples:
- # reverse all the words in a string
- $x = "the cat in the hat";
- $x =~ s/(\w+)/reverse $1/ge; # $x contains "eht tac ni eht tah"
- # convert percentage to decimal
- $x = "A 39% hit rate";
- $x =~ s!(\d+)%!$1/100!e; # $x contains "A 0.39 hit rate"
The last example shows that s/// can use other delimiters, such as
s!!! and s{}{}, and even s{}//. If single quotes are used
s''', then the regex and replacement are treated as single-quoted
strings.
split /regex/, string
splits string
into a list of substrings
and returns that list. The regex determines the character sequence
that string
is split with respect to. For example, to split a
string into words, use
- $x = "Calvin and Hobbes";
- @word = split /\s+/, $x; # $word[0] = 'Calvin'
- # $word[1] = 'and'
- # $word[2] = 'Hobbes'
To extract a comma-delimited list of numbers, use
- $x = "1.618,2.718, 3.142";
- @const = split /,\s*/, $x; # $const[0] = '1.618'
- # $const[1] = '2.718'
- # $const[2] = '3.142'
If the empty regex //
is used, the string is split into individual
characters. If the regex has groupings, then the list produced contains
the matched substrings from the groupings as well:
- $x = "/usr/bin";
- @parts = split m!(/)!, $x; # $parts[0] = ''
- # $parts[1] = '/'
- # $parts[2] = 'usr'
- # $parts[3] = '/'
- # $parts[4] = 'bin'
Since the first character of $x matched the regex, split prepended
an empty initial element to the list.
None.
This is just a quick start guide. For a more in-depth tutorial on regexes, see perlretut and for the reference page, see perlre.
Copyright (c) 2000 Mark Kvale All rights reserved.
This document may be distributed under the same terms as Perl itself.
The author would like to thank Mark-Jason Dominus, Tom Christiansen, Ilya Zakharevich, Brad Hughes, and Mike Giroux for all their helpful comments.
perlreref - Perl Regular Expressions Reference
This is a quick reference to Perl's regular expressions. For full information see perlre and perlop, as well as the SEE ALSO section in this document.
=~
determines to which variable the regex is applied.
In its absence, $_ is used.
- $var =~ /foo/;
!~
determines to which variable the regex is applied,
and negates the result of the match; it returns
false if the match succeeds, and true if it fails.
- $var !~ /foo/;
m/pattern/msixpogcdual searches a string for a pattern match,
applying the given options.
- m Multiline mode - ^ and $ match internal lines
- s match as a Single line - . matches \n
- i case-Insensitive
- x eXtended legibility - free whitespace and comments
- p Preserve a copy of the matched string -
- ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
- o compile pattern Once
- g Global - all occurrences
- c don't reset pos on failed matches when using /g
- a restrict \d, \s, \w and [:posix:] to match ASCII only
- aa (two a's) also /i matches exclude ASCII/non-ASCII
- l match according to current locale
- u match according to Unicode rules
- d match according to native rules unless something indicates
- Unicode
If 'pattern' is an empty string, the last successfully matched
regex is used. Delimiters other than '/' may be used for both this
operator and the following ones. The leading m can be omitted
if the delimiter is '/'.
qr/pattern/msixpodual lets you store a regex in a variable,
or pass one around. Modifiers as for m//, and are stored
within the regex.
s/pattern/replacement/msixpogcedual substitutes matches of
'pattern' with 'replacement'. Modifiers as for m//,
with two additions:
- e Evaluate 'replacement' as an expression
- r Return substitution and leave the original string untouched.
'e' may be specified multiple times. 'replacement' is interpreted
as a double quoted string unless a single-quote (') is the delimiter.
?pattern?
is like m/pattern/ but matches only once. No alternate
delimiters can be used. Must be reset with reset().
- \ Escapes the character immediately following it
- . Matches any single character except a newline (unless /s is
- used)
- ^ Matches at the beginning of the string (or line, if /m is used)
- $ Matches at the end of the string (or line, if /m is used)
- * Matches the preceding element 0 or more times
- + Matches the preceding element 1 or more times
- ? Matches the preceding element 0 or 1 times
- {...} Specifies a range of occurrences for the element preceding it
- [...] Matches any one of the characters contained within the brackets
- (...) Groups subexpressions for capturing to $1, $2...
- (?:...) Groups subexpressions without capturing (cluster)
- | Matches either the subexpression preceding or following it
- \g1 or \g{1}, \g2 ... Matches the text from the Nth group
- \1, \2, \3 ... Matches the text from the Nth group
- \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
- \g{name} Named backreference
- \k<name> Named backreference
- \k'name' Named backreference
- (?P=name) Named backreference (python syntax)
These work as in normal strings.
- \a Alarm (beep)
- \e Escape
- \f Formfeed
- \n Newline
- \r Carriage return
- \t Tab
- \037 Char whose ordinal is the 3 octal digits, max \777
- \o{2307} Char whose ordinal is the octal number, unrestricted
- \x7f Char whose ordinal is the 2 hex digits, max \xFF
- \x{263a} Char whose ordinal is the hex number, unrestricted
- \cx Control-x
- \N{name} A named Unicode character or character sequence
- \N{U+263D} A Unicode character by hex ordinal
- \l Lowercase next character
- \u Titlecase next character
- \L Lowercase until \E
- \U Uppercase until \E
- \F Foldcase until \E
- \Q Disable pattern metacharacters until \E
- \E End modification
For Titlecase, see Titlecase.
This one works differently from normal strings:
- \b An assertion, not backspace, except in a character class
- [amy] Match 'a', 'm' or 'y'
- [f-j] Dash specifies "range"
- [f-j-] Dash escaped or at start or end means 'dash'
- [^f-j] Caret indicates "match any character _except_ these"
The following sequences (except \N
) work within or without a character class.
The first six are locale aware, all are Unicode aware. See perllocale
and perlunicode for details.
- \d A digit
- \D A nondigit
- \w A word character
- \W A non-word character
- \s A whitespace character
- \S A non-whitespace character
- \h An horizontal whitespace
- \H A non horizontal whitespace
- \N A non newline (when not followed by '{NAME}';;
- not valid in a character class; equivalent to [^\n]; it's
- like '.' without /s modifier)
- \v A vertical whitespace
- \V A non vertical whitespace
- \R A generic newline (?>\v|\x0D\x0A)
- \C Match a byte (with Unicode, '.' matches a character)
- \pP Match P-named (Unicode) property
- \p{...} Match Unicode property with name longer than 1 character
- \PP Match non-P
- \P{...} Match lack of Unicode property with name longer than 1 char
- \X Match Unicode extended grapheme cluster
POSIX character classes and their Unicode and Perl equivalents:
- ASCII- Full-
- POSIX range range backslash
- [[:...:]] \p{...} \p{...} sequence Description
- -----------------------------------------------------------------------
- alnum PosixAlnum XPosixAlnum Alpha plus Digit
- alpha PosixAlpha XPosixAlpha Alphabetic characters
- ascii ASCII Any ASCII character
- blank PosixBlank XPosixBlank \h Horizontal whitespace;
- full-range also
- written as
- \p{HorizSpace} (GNU
- extension)
- cntrl PosixCntrl XPosixCntrl Control characters
- digit PosixDigit XPosixDigit \d Decimal digits
- graph PosixGraph XPosixGraph Alnum plus Punct
- lower PosixLower XPosixLower Lowercase characters
- print PosixPrint XPosixPrint Graph plus Print, but
- not any Cntrls
- punct PosixPunct XPosixPunct Punctuation and Symbols
- in ASCII-range; just
- punct outside it
- space PosixSpace XPosixSpace [\s\cK]
- PerlSpace XPerlSpace \s Perl's whitespace def'n
- upper PosixUpper XPosixUpper Uppercase characters
- word PosixWord XPosixWord \w Alnum + Unicode marks +
- connectors, like '_'
- (Perl extension)
- xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit,
- ASCII-range is
- [0-9A-Fa-f]
Also, various synonyms like \p{Alpha}
for \p{XPosixAlpha}
; all listed
in Properties accessible through \p{} and \P{} in perluniprops
Within a character class:
- POSIX traditional Unicode
- [:digit:] \d \p{Digit}
- [:^digit:] \D \P{Digit}
All are zero-width assertions.
- ^ Match string start (or line, if /m is used)
- $ Match string end (or line, if /m is used) or before newline
- \b Match word boundary (between \w and \W)
- \B Match except at word boundary (between \w and \w or \W and \W)
- \A Match string start (regardless of /m)
- \Z Match string end (before optional newline)
- \z Match absolute string end
- \G Match where previous m//g left off
- \K Keep the stuff left of the \K, don't include it in $&
Quantifiers are greedy by default and match the longest leftmost.
- Maximal Minimal Possessive Allowed range
- ------- ------- ---------- -------------
- {n,m} {n,m}? {n,m}+ Must occur at least n times
- but no more than m times
- {n,} {n,}? {n,}+ Must occur at least n times
- {n} {n}? {n}+ Must occur exactly n times
- * *? *+ 0 or more times (same as {0,})
- + +? ++ 1 or more times (same as {1,})
- ? ?? ?+ 0 or 1 time (same as {0,1})
The possessive forms (new in Perl 5.10) prevent backtracking: what gets matched by a pattern with a possessive quantifier will not be backtracked into, even if that causes the whole match to fail.
There is no quantifier {,n}
. That's interpreted as a literal string.
- (?#text) A comment
- (?:...) Groups subexpressions without capturing (cluster)
- (?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
- (?=...) Zero-width positive lookahead assertion
- (?!...) Zero-width negative lookahead assertion
- (?<=...) Zero-width positive lookbehind assertion
- (?<!...) Zero-width negative lookbehind assertion
- (?>...) Grab what we can, prohibit backtracking
- (?|...) Branch reset
- (?<name>...) Named capture
- (?'name'...) Named capture
- (?P<name>...) Named capture (python syntax)
- (?{ code }) Embedded code, return value becomes $^R
- (??{ code }) Dynamic regex, return value used as regex
- (?N) Recurse into subpattern number N
- (?-N), (?+N) Recurse into Nth previous/next subpattern
- (?R), (?0) Recurse at the beginning of the whole pattern
- (?&name) Recurse into a named subpattern
- (?P>name) Recurse into a named subpattern (python syntax)
- (?(cond)yes|no)
- (?(cond)yes) Conditional expression, where "cond" can be:
- (?=pat) look-ahead
- (?!pat) negative look-ahead
- (?<=pat) look-behind
- (?<!pat) negative look-behind
- (N) subpattern N has matched something
- (<name>) named subpattern has matched something
- ('name') named subpattern has matched something
- (?{code}) code condition
- (R) true if recursing
- (RN) true if recursing into Nth subpattern
- (R&name) true if recursing into named subpattern
- (DEFINE) always false, no no-pattern allowed
- $_ Default variable for operators to use
- $` Everything prior to matched string
- $& Entire matched string
- $' Everything after to matched string
- ${^PREMATCH} Everything prior to matched string
- ${^MATCH} Entire matched string
- ${^POSTMATCH} Everything after to matched string
The use of $`
, $&
or $'
will slow down all regex use
within your program. Consult perlvar for @-
to see equivalent expressions that won't cause slow down.
See also Devel::SawAmpersand. Starting with Perl 5.10, you
can also use the equivalent variables ${^PREMATCH}
, ${^MATCH}
and ${^POSTMATCH}
, but for them to be defined, you have to
specify the /p (preserve) modifier on your regular expression.
- $1, $2 ... hold the Xth captured expr
- $+ Last parenthesized pattern match
- $^N Holds the most recently closed capture
- $^R Holds the result of the last (?{...}) expr
- @- Offsets of starts of groups. $-[0] holds start of whole match
- @+ Offsets of ends of groups. $+[0] holds end of whole match
- %+ Named capture groups
- %- Named capture groups, as array refs
Captured groups are numbered according to their opening paren.
- lc Lowercase a string
- lcfirst Lowercase first char of a string
- uc Uppercase a string
- ucfirst Titlecase first char of a string
- fc Foldcase a string
- pos Return or set current match position
- quotemeta Quote metacharacters
- reset Reset ?pattern? status
- study Analyze string for optimizing matching
- split Use a regex to split a string into parts
The first five of these are like the escape sequences \L
, \l
,
\U
, \u
, and \F
. For Titlecase, see Titlecase; For
Foldcase, see Foldcase.
Unicode concept which most often is equal to uppercase, but for certain characters like the German "sharp s" there is a difference.
Unicode form that is useful when comparing strings regardless of case, as certain characters have compex one-to-many case mappings. Primarily a variant of lowercase.
Iain Truskett. Updated by the Perl 5 Porters.
This document may be distributed under the same terms as Perl itself.
perlretut for a tutorial on regular expressions.
perlrequick for a rapid tutorial.
perlre for more details.
perlvar for details on the variables.
perlop for details on the operators.
perlfunc for details on the functions.
perlfaq6 for FAQs on regular expressions.
perlrebackslash for a reference on backslash sequences.
perlrecharclass for a reference on character classes.
The re module to alter behaviour and aid debugging.
perluniintro, perlunicode, charnames and perllocale for details on regexes and internationalisation.
Mastering Regular Expressions by Jeffrey Friedl (http://oreilly.com/catalog/9780596528126/) for a thorough grounding and reference on the topic.
David P.C. Wollmann, Richard Soderberg, Sean M. Burke, Tom Christiansen, Jim Cromie, and Jeffrey Goff for useful advice.
perlretut - Perl regular expressions tutorial
This page provides a basic tutorial on understanding, creating and
using regular expressions in Perl. It serves as a complement to the
reference page on regular expressions perlre. Regular expressions
are an integral part of the m//, s///, qr// and split
operators and so this tutorial also overlaps with
Regexp Quote-Like Operators in perlop and split.
Perl is widely renowned for excellence in text processing, and regular expressions are one of the big factors behind this fame. Perl regular expressions display an efficiency and flexibility unknown in most other computer languages. Mastering even the basics of regular expressions will allow you to manipulate text with surprising ease.
What is a regular expression? A regular expression is simply a string
that describes a pattern. Patterns are in common use these days;
examples are the patterns typed into a search engine to find web pages
and the patterns used to list files in a directory, e.g., ls *.txt
or dir *.*. In Perl, the patterns described by regular expressions
are used to search strings, extract desired parts of strings, and to
do search and replace operations.
Regular expressions have the undeserved reputation of being abstract
and difficult to understand. Regular expressions are constructed using
simple concepts like conditionals and loops and are no more difficult
to understand than the corresponding if
conditionals and while
loops in the Perl language itself. In fact, the main challenge in
learning regular expressions is just getting used to the terse
notation used to express these concepts.
This tutorial flattens the learning curve by discussing regular expression concepts, along with their notation, one at a time and with many examples. The first part of the tutorial will progress from the simplest word searches to the basic regular expression concepts. If you master the first part, you will have all the tools needed to solve about 98% of your needs. The second part of the tutorial is for those comfortable with the basics and hungry for more power tools. It discusses the more advanced regular expression operators and introduces the latest cutting-edge innovations.
A note: to save time, 'regular expression' is often abbreviated as regexp or regex. Regexp is a more natural abbreviation than regex, but is harder to pronounce. The Perl pod documentation is evenly split on regexp vs regex; in Perl, there is more than one way to abbreviate it. We'll use regexp in this tutorial.
The simplest regexp is simply a word, or more generally, a string of characters. A regexp consisting of a word matches any string that contains that word:
- "Hello World" =~ /World/; # matches
What is this Perl statement all about? "Hello World"
is a simple
double-quoted string. World
is the regular expression and the
//
enclosing /World/
tells Perl to search a string for a match.
The operator =~
associates the string with the regexp match and
produces a true value if the regexp matched, or false if the regexp
did not match. In our case, World
matches the second word in
"Hello World"
, so the expression is true. Expressions like this
are useful in conditionals:
There are useful variations on this theme. The sense of the match can
be reversed by using the !~
operator:
The literal string in the regexp can be replaced by a variable:
If you're matching against the special default variable $_
, the
$_ =~
part can be omitted:
And finally, the //
default delimiters for a match can be changed
to arbitrary delimiters by putting an 'm'
out front:
- "Hello World" =~ m!World!; # matches, delimited by '!'
- "Hello World" =~ m{World}; # matches, note the matching '{}'
- "/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin',
- # '/' becomes an ordinary char
/World/
, m!World!, and m{World} all represent the
same thing. When, e.g., the quote (") is used as a delimiter, the forward
slash '/'
becomes an ordinary character and can be used in this regexp
without trouble.
Let's consider how different regexps would match "Hello World"
:
- "Hello World" =~ /world/; # doesn't match
- "Hello World" =~ /o W/; # matches
- "Hello World" =~ /oW/; # doesn't match
- "Hello World" =~ /World /; # doesn't match
The first regexp world
doesn't match because regexps are
case-sensitive. The second regexp matches because the substring
'o W'
occurs in the string "Hello World"
. The space
character ' ' is treated like any other character in a regexp and is
needed to match in this case. The lack of a space character is the
reason the third regexp 'oW'
doesn't match. The fourth regexp
'World '
doesn't match because there is a space at the end of the
regexp, but not at the end of the string. The lesson here is that
regexps must match a part of the string exactly in order for the
statement to be true.
If a regexp matches in more than one place in the string, Perl will always match at the earliest possible point in the string:
- "Hello World" =~ /o/; # matches 'o' in 'Hello'
- "That hat is red" =~ /hat/; # matches 'hat' in 'That'
With respect to character matching, there are a few more points you need to know about. First of all, not all characters can be used 'as is' in a match. Some characters, called metacharacters, are reserved for use in regexp notation. The metacharacters are
- {}[]()^$.|*+?\
The significance of each of these will be explained in the rest of the tutorial, but for now, it is important only to know that a metacharacter can be matched by putting a backslash before it:
- "2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter
- "2+2=4" =~ /2\+2/; # matches, \+ is treated like an ordinary +
- "The interval is [0,1)." =~ /[0,1)./ # is a syntax error!
- "The interval is [0,1)." =~ /\[0,1\)\./ # matches
- "#!/usr/bin/perl" =~ /#!\/usr\/bin\/perl/; # matches
In the last regexp, the forward slash '/'
is also backslashed,
because it is used to delimit the regexp. This can lead to LTS
(leaning toothpick syndrome), however, and it is often more readable
to change delimiters.
- "#!/usr/bin/perl" =~ m!#\!/usr/bin/perl!; # easier to read
The backslash character '\' is a metacharacter itself and needs to
be backslashed:
- 'C:\WIN32' =~ /C:\\WIN/; # matches
In addition to the metacharacters, there are some ASCII characters
which don't have printable character equivalents and are instead
represented by escape sequences. Common examples are \t
for a
tab, \n
for a newline, \r
for a carriage return and \a
for a
bell (or alert). If your string is better thought of as a sequence of arbitrary
bytes, the octal escape sequence, e.g., \033
, or hexadecimal escape
sequence, e.g., \x1B
may be a more natural representation for your
bytes. Here are some examples of escapes:
- "1000\t2000" =~ m(0\t2) # matches
- "1000\n2000" =~ /0\n20/ # matches
- "1000\t2000" =~ /\000\t2/ # doesn't match, "0" ne "\000"
- "cat" =~ /\o{143}\x61\x74/ # matches in ASCII, but a weird way
- # to spell cat
If you've been around Perl a while, all this talk of escape sequences may seem familiar. Similar escape sequences are used in double-quoted strings and in fact the regexps in Perl are mostly treated as double-quoted strings. This means that variables can be used in regexps as well. Just like double-quoted strings, the values of the variables in the regexp will be substituted in before the regexp is evaluated for matching purposes. So we have:
- $foo = 'house';
- 'housecat' =~ /$foo/; # matches
- 'cathouse' =~ /cat$foo/; # matches
- 'housecat' =~ /${foo}cat/; # matches
So far, so good. With the knowledge above you can already perform searches with just about any literal string regexp you can dream up. Here is a very simple emulation of the Unix grep program:
- % cat > simple_grep
- #!/usr/bin/perl
- $regexp = shift;
- while (<>) {
- print if /$regexp/;
- }
- ^D
- % chmod +x simple_grep
- % simple_grep abba /usr/dict/words
- Babbage
- cabbage
- cabbages
- sabbath
- Sabbathize
- Sabbathizes
- sabbatical
- scabbard
- scabbards
This program is easy to understand. #!/usr/bin/perl
is the standard
way to invoke a perl program from the shell.
$regexp = shift;
saves the first command line argument as the
regexp to be used, leaving the rest of the command line arguments to
be treated as files. while (<>)
loops over all the lines in
all the files. For each line, print if /$regexp/;
prints the
line if the regexp matches the line. In this line, both print and
/$regexp/
use the default variable $_
implicitly.
With all of the regexps above, if the regexp matched anywhere in the
string, it was considered a match. Sometimes, however, we'd like to
specify where in the string the regexp should try to match. To do
this, we would use the anchor metacharacters ^ and $
. The
anchor ^ means match at the beginning of the string and the anchor
$
means match at the end of the string, or before a newline at the
end of the string. Here is how they are used:
- "housekeeper" =~ /keeper/; # matches
- "housekeeper" =~ /^keeper/; # doesn't match
- "housekeeper" =~ /keeper$/; # matches
- "housekeeper\n" =~ /keeper$/; # matches
The second regexp doesn't match because ^ constrains keeper
to
match only at the beginning of the string, but "housekeeper"
has
keeper starting in the middle. The third regexp does match, since the
$
constrains keeper
to match only at the end of the string.
When both ^ and $
are used at the same time, the regexp has to
match both the beginning and the end of the string, i.e., the regexp
matches the whole string. Consider
- "keeper" =~ /^keep$/; # doesn't match
- "keeper" =~ /^keeper$/; # matches
- "" =~ /^$/; # ^$ matches an empty string
The first regexp doesn't match because the string has more to it than
keep
. Since the second regexp is exactly the string, it
matches. Using both ^ and $
in a regexp forces the complete
string to match, so it gives you complete control over which strings
match and which don't. Suppose you are looking for a fellow named
bert, off in a string by himself:
- "dogbert" =~ /bert/; # matches, but not what you want
- "dilbert" =~ /^bert/; # doesn't match, but ..
- "bertram" =~ /^bert/; # matches, so still not good enough
- "bertram" =~ /^bert$/; # doesn't match, good
- "dilbert" =~ /^bert$/; # doesn't match, good
- "bert" =~ /^bert$/; # matches, perfect
Of course, in the case of a literal string, one could just as easily
use the string comparison $string eq 'bert'
and it would be
more efficient. The ^...$ regexp really becomes useful when we
add in the more powerful regexp tools below.
Although one can already do quite a lot with the literal string regexps above, we've only scratched the surface of regular expression technology. In this and subsequent sections we will introduce regexp concepts (and associated metacharacter notations) that will allow a regexp to represent not just a single character sequence, but a whole class of them.
One such concept is that of a character class. A character class
allows a set of possible characters, rather than just a single
character, to match at a particular point in a regexp. Character
classes are denoted by brackets [...]
, with the set of characters
to be possibly matched inside. Here are some examples:
- /cat/; # matches 'cat'
- /[bcr]at/; # matches 'bat, 'cat', or 'rat'
- /item[0123456789]/; # matches 'item0' or ... or 'item9'
- "abc" =~ /[cab]/; # matches 'a'
In the last statement, even though 'c'
is the first character in
the class, 'a'
matches because the first character position in the
string is the earliest point at which the regexp can match.
- /[yY][eE][sS]/; # match 'yes' in a case-insensitive way
- # 'yes', 'Yes', 'YES', etc.
This regexp displays a common task: perform a case-insensitive
match. Perl provides a way of avoiding all those brackets by simply
appending an 'i'
to the end of the match. Then /[yY][eE][sS]/;
can be rewritten as /yes/i;
. The 'i'
stands for
case-insensitive and is an example of a modifier of the matching
operation. We will meet other modifiers later in the tutorial.
We saw in the section above that there were ordinary characters, which
represented themselves, and special characters, which needed a
backslash \
to represent themselves. The same is true in a
character class, but the sets of ordinary and special characters
inside a character class are different than those outside a character
class. The special characters for a character class are -]\^$ (and
the pattern delimiter, whatever it is).
] is special because it denotes the end of a character class. $
is
special because it denotes a scalar variable. \
is special because
it is used in escape sequences, just like above. Here is how the
special characters ]$\ are handled:
- /[\]c]def/; # matches ']def' or 'cdef'
- $x = 'bcr';
- /[$x]at/; # matches 'bat', 'cat', or 'rat'
- /[\$x]at/; # matches '$at' or 'xat'
- /[\\$x]at/; # matches '\at', 'bat, 'cat', or 'rat'
The last two are a little tricky. In [\$x]
, the backslash protects
the dollar sign, so the character class has two members $
and x
.
In [\\$x]
, the backslash is protected, so $x
is treated as a
variable and substituted in double quote fashion.
The special character '-'
acts as a range operator within character
classes, so that a contiguous set of characters can be written as a
range. With ranges, the unwieldy [0123456789] and [abc...xyz]
become the svelte [0-9]
and [a-z]
. Some examples are
- /item[0-9]/; # matches 'item0' or ... or 'item9'
- /[0-9bx-z]aa/; # matches '0aa', ..., '9aa',
- # 'baa', 'xaa', 'yaa', or 'zaa'
- /[0-9a-fA-F]/; # matches a hexadecimal digit
- /[0-9a-zA-Z_]/; # matches a "word" character,
- # like those in a Perl variable name
If '-'
is the first or last character in a character class, it is
treated as an ordinary character; [-ab]
, [ab-]
and [a\-b]
are
all equivalent.
The special character ^ in the first position of a character class
denotes a negated character class, which matches any character but
those in the brackets. Both [...]
and [^...] must match a
character, or the match fails. Then
- /[^a]at/; # doesn't match 'aat' or 'at', but matches
- # all other 'bat', 'cat, '0at', '%at', etc.
- /[^0-9]/; # matches a non-numeric character
- /[a^]at/; # matches 'aat' or '^at'; here '^' is ordinary
Now, even [0-9]
can be a bother to write multiple times, so in the
interest of saving keystrokes and making regexps more readable, Perl
has several abbreviations for common character classes, as shown below.
Since the introduction of Unicode, unless the //a
modifier is in
effect, these character classes match more than just a few characters in
the ASCII range.
\d matches a digit, not just [0-9] but also digits from non-roman scripts
\s matches a whitespace character, the set [\ \t\r\n\f] and others
\w matches a word character (alphanumeric or _), not just [0-9a-zA-Z_] but also digits and characters from non-roman scripts
\D is a negated \d; it represents any other character than a digit, or [^\d]
\S is a negated \s; it represents any non-whitespace character [^\s]
\W is a negated \w; it represents any non-word character [^\w]
The period '.' matches any character but "\n" (unless the modifier //s
is
in effect, as explained below).
\N, like the period, matches any character but "\n", but it does so
regardless of whether the modifier //s
is in effect.
The //a
modifier, available starting in Perl 5.14, is used to
restrict the matches of \d, \s, and \w to just those in the ASCII range.
It is useful to keep your program from being needlessly exposed to full
Unicode (and its accompanying security considerations) when all you want
is to process English-like text. (The "a" may be doubled, //aa
, to
provide even more restrictions, preventing case-insensitive matching of
ASCII with non-ASCII characters; otherwise a Unicode "Kelvin Sign"
would caselessly match a "k" or "K".)
The \d\s\w\D\S\W abbreviations can be used both inside and outside
of character classes. Here are some in use:
- /\d\d:\d\d:\d\d/; # matches a hh:mm:ss time format
- /[\d\s]/; # matches any digit or whitespace character
- /\w\W\w/; # matches a word char, followed by a
- # non-word char, followed by a word char
- /..rt/; # matches any two chars, followed by 'rt'
- /end\./; # matches 'end.'
- /end[.]/; # same thing, matches 'end.'
Because a period is a metacharacter, it needs to be escaped to match
as an ordinary period. Because, for example, \d
and \w
are sets
of characters, it is incorrect to think of [^\d\w] as [\D\W]
; in
fact [^\d\w] is the same as [^\w], which is the same as
[\W]
. Think DeMorgan's laws.
An anchor useful in basic regexps is the word anchor
\b
. This matches a boundary between a word character and a non-word
character \w\W
or \W\w
:
- $x = "Housecat catenates house and cat";
- $x =~ /cat/; # matches cat in 'housecat'
- $x =~ /\bcat/; # matches cat in 'catenates'
- $x =~ /cat\b/; # matches cat in 'housecat'
- $x =~ /\bcat\b/; # matches 'cat' at end of string
Note in the last example, the end of the string is considered a word boundary.
You might wonder why '.'
matches everything but "\n"
- why not
every character? The reason is that often one is matching against
lines and would like to ignore the newline characters. For instance,
while the string "\n"
represents one line, we would like to think
of it as empty. Then
- "" =~ /^$/; # matches
- "\n" =~ /^$/; # matches, $ anchors before "\n"
- "" =~ /./; # doesn't match; it needs a char
- "" =~ /^.$/; # doesn't match; it needs a char
- "\n" =~ /^.$/; # doesn't match; it needs a char other than "\n"
- "a" =~ /^.$/; # matches
- "a\n" =~ /^.$/; # matches, $ anchors before "\n"
This behavior is convenient, because we usually want to ignore
newlines when we count and match characters in a line. Sometimes,
however, we want to keep track of newlines. We might even want ^
and $
to anchor at the beginning and end of lines within the
string, rather than just the beginning and end of the string. Perl
allows us to choose between ignoring and paying attention to newlines
by using the //s
and //m
modifiers. //s
and //m
stand for
single line and multi-line and they determine whether a string is to
be treated as one continuous string, or as a set of lines. The two
modifiers affect two aspects of how the regexp is interpreted: 1) how
the '.'
character class is defined, and 2) where the anchors ^
and $
are able to match. Here are the four possible combinations:
no modifiers (//): Default behavior. '.'
matches any character
except "\n"
. ^ matches only at the beginning of the string and
$
matches only at the end or before a newline at the end.
s modifier (//s): Treat string as a single long line. '.'
matches
any character, even "\n"
. ^ matches only at the beginning of
the string and $
matches only at the end or before a newline at the
end.
m modifier (//m): Treat string as a set of multiple lines. '.'
matches any character except "\n"
. ^ and $
are able to match
at the start or end of any line within the string.
both s and m modifiers (//sm): Treat string as a single long line, but
detect multiple lines. '.'
matches any character, even
"\n"
. ^ and $
, however, are able to match at the start or end
of any line within the string.
Here are examples of //s
and //m
in action:
- $x = "There once was a girl\nWho programmed in Perl\n";
- $x =~ /^Who/; # doesn't match, "Who" not at start of string
- $x =~ /^Who/s; # doesn't match, "Who" not at start of string
- $x =~ /^Who/m; # matches, "Who" at start of second line
- $x =~ /^Who/sm; # matches, "Who" at start of second line
- $x =~ /girl.Who/; # doesn't match, "." doesn't match "\n"
- $x =~ /girl.Who/s; # matches, "." matches "\n"
- $x =~ /girl.Who/m; # doesn't match, "." doesn't match "\n"
- $x =~ /girl.Who/sm; # matches, "." matches "\n"
Most of the time, the default behavior is what is wanted, but //s
and
//m
are occasionally very useful. If //m
is being used, the start
of the string can still be matched with \A
and the end of the string
can still be matched with the anchors \Z
(matches both the end and
the newline before, like $
), and \z
(matches only the end):
- $x =~ /^Who/m; # matches, "Who" at start of second line
- $x =~ /\AWho/m; # doesn't match, "Who" is not at start of string
- $x =~ /girl$/m; # matches, "girl" at end of first line
- $x =~ /girl\Z/m; # doesn't match, "girl" is not at end of string
- $x =~ /Perl\Z/m; # matches, "Perl" is at newline before end
- $x =~ /Perl\z/m; # doesn't match, "Perl" is not at end of string
We now know how to create choices among classes of characters in a regexp. What about choices among words or character strings? Such choices are described in the next section.
Sometimes we would like our regexp to be able to match different
possible words or character strings. This is accomplished by using
the alternation metacharacter |. To match dog
or cat
, we
form the regexp dog|cat
. As before, Perl will try to match the
regexp at the earliest possible point in the string. At each
character position, Perl will first try to match the first
alternative, dog
. If dog
doesn't match, Perl will then try the
next alternative, cat
. If cat
doesn't match either, then the
match fails and Perl moves to the next position in the string. Some
examples:
- "cats and dogs" =~ /cat|dog|bird/; # matches "cat"
- "cats and dogs" =~ /dog|cat|bird/; # matches "cat"
Even though dog
is the first alternative in the second regexp,
cat
is able to match earlier in the string.
- "cats" =~ /c|ca|cat|cats/; # matches "c"
- "cats" =~ /cats|cat|ca|c/; # matches "cats"
Here, all the alternatives match at the first string position, so the first alternative is the one that matches. If some of the alternatives are truncations of the others, put the longest ones first to give them a chance to match.
- "cab" =~ /a|b|c/ # matches "c"
- # /a|b|c/ == /[abc]/
The last example points out that character classes are like alternations of characters. At a given character position, the first alternative that allows the regexp match to succeed will be the one that matches.
Alternation allows a regexp to choose among alternatives, but by
itself it is unsatisfying. The reason is that each alternative is a whole
regexp, but sometime we want alternatives for just part of a
regexp. For instance, suppose we want to search for housecats or
housekeepers. The regexp housecat|housekeeper
fits the bill, but is
inefficient because we had to type house
twice. It would be nice to
have parts of the regexp be constant, like house
, and some
parts have alternatives, like cat|keeper
.
The grouping metacharacters ()
solve this problem. Grouping
allows parts of a regexp to be treated as a single unit. Parts of a
regexp are grouped by enclosing them in parentheses. Thus we could solve
the housecat|housekeeper
by forming the regexp as
house(cat|keeper)
. The regexp house(cat|keeper)
means match
house
followed by either cat
or keeper
. Some more examples
are
- /(a|b)b/; # matches 'ab' or 'bb'
- /(ac|b)b/; # matches 'acb' or 'bb'
- /(^a|b)c/; # matches 'ac' at start of string or 'bc' anywhere
- /(a|[bc])d/; # matches 'ad', 'bd', or 'cd'
- /house(cat|)/; # matches either 'housecat' or 'house'
- /house(cat(s|)|)/; # matches either 'housecats' or 'housecat' or
- # 'house'. Note groups can be nested.
- /(19|20|)\d\d/; # match years 19xx, 20xx, or the Y2K problem, xx
- "20" =~ /(19|20|)\d\d/; # matches the null alternative '()\d\d',
- # because '20\d\d' can't match
Alternations behave the same way in groups as out of them: at a given
string position, the leftmost alternative that allows the regexp to
match is taken. So in the last example at the first string position,
"20"
matches the second alternative, but there is nothing left over
to match the next two digits \d\d
. So Perl moves on to the next
alternative, which is the null alternative and that works, since
"20"
is two digits.
The process of trying one alternative, seeing if it matches, and moving on to the next alternative, while going back in the string from where the previous alternative was tried, if it doesn't, is called backtracking. The term 'backtracking' comes from the idea that matching a regexp is like a walk in the woods. Successfully matching a regexp is like arriving at a destination. There are many possible trailheads, one for each string position, and each one is tried in order, left to right. From each trailhead there may be many paths, some of which get you there, and some which are dead ends. When you walk along a trail and hit a dead end, you have to backtrack along the trail to an earlier point to try another trail. If you hit your destination, you stop immediately and forget about trying all the other trails. You are persistent, and only if you have tried all the trails from all the trailheads and not arrived at your destination, do you declare failure. To be concrete, here is a step-by-step analysis of what Perl does when it tries to match the regexp
- "abcde" =~ /(abd|abc)(df|d|de)/;
Start with the first letter in the string 'a'.
Try the first alternative in the first group 'abd'.
Match 'a' followed by 'b'. So far so good.
'd' in the regexp doesn't match 'c' in the string - a dead end. So backtrack two characters and pick the second alternative in the first group 'abc'.
Match 'a' followed by 'b' followed by 'c'. We are on a roll and have satisfied the first group. Set $1 to 'abc'.
Move on to the second group and pick the first alternative 'df'.
Match the 'd'.
'f' in the regexp doesn't match 'e' in the string, so a dead end. Backtrack one character and pick the second alternative in the second group 'd'.
'd' matches. The second grouping is satisfied, so set $2 to 'd'.
We are at the end of the regexp, so we are done! We have matched 'abcd' out of the string "abcde".
There are a couple of things to note about this analysis. First, the
third alternative in the second group 'de' also allows a match, but we
stopped before we got to it - at a given character position, leftmost
wins. Second, we were able to get a match at the first character
position of the string 'a'. If there were no matches at the first
position, Perl would move to the second character position 'b' and
attempt the match all over again. Only when all possible paths at all
possible character positions have been exhausted does Perl give
up and declare $string =~ /(abd|abc)(df|d|de)/;
to be false.
Even with all this work, regexp matching happens remarkably fast. To speed things up, Perl compiles the regexp into a compact sequence of opcodes that can often fit inside a processor cache. When the code is executed, these opcodes can then run at full throttle and search very quickly.
The grouping metacharacters ()
also serve another completely
different function: they allow the extraction of the parts of a string
that matched. This is very useful to find out what matched and for
text processing in general. For each grouping, the part that matched
inside goes into the special variables $1
, $2
, etc. They can be
used just as ordinary variables:
- # extract hours, minutes, seconds
- if ($time =~ /(\d\d):(\d\d):(\d\d)/) { # match hh:mm:ss format
- $hours = $1;
- $minutes = $2;
- $seconds = $3;
- }
Now, we know that in scalar context,
$time =~ /(\d\d):(\d\d):(\d\d)/
returns a true or false
value. In list context, however, it returns the list of matched values
($1,$2,$3)
. So we could write the code more compactly as
- # extract hours, minutes, seconds
- ($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);
If the groupings in a regexp are nested, $1
gets the group with the
leftmost opening parenthesis, $2
the next opening parenthesis,
etc. Here is a regexp with nested groups:
- /(ab(cd|ef)((gi)|j))/;
- 1 2 34
If this regexp matches, $1
contains a string starting with
'ab'
, $2
is either set to 'cd'
or 'ef'
, $3
equals either
'gi'
or 'j'
, and $4
is either set to 'gi'
, just like $3
,
or it remains undefined.
For convenience, Perl sets $+
to the string held by the highest numbered
$1
, $2
,... that got assigned (and, somewhat related, $^N
to the
value of the $1
, $2
,... most-recently assigned; i.e. the $1
,
$2
,... associated with the rightmost closing parenthesis used in the
match).
Closely associated with the matching variables $1
, $2
, ... are
the backreferences \g1
, \g2
,... Backreferences are simply
matching variables that can be used inside a regexp. This is a
really nice feature; what matches later in a regexp is made to depend on
what matched earlier in the regexp. Suppose we wanted to look
for doubled words in a text, like 'the the'. The following regexp finds
all 3-letter doubles with a space in between:
- /\b(\w\w\w)\s\g1\b/;
The grouping assigns a value to \g1, so that the same 3-letter sequence is used for both parts.
A similar task is to find words consisting of two identical parts:
- % simple_grep '^(\w\w\w\w|\w\w\w|\w\w|\w)\g1$' /usr/dict/words
- beriberi
- booboo
- coco
- mama
- murmur
- papa
The regexp has a single grouping which considers 4-letter
combinations, then 3-letter combinations, etc., and uses \g1
to look for
a repeat. Although $1
and \g1
represent the same thing, care should be
taken to use matched variables $1
, $2
,... only outside a regexp
and backreferences \g1
, \g2
,... only inside a regexp; not doing
so may lead to surprising and unsatisfactory results.
Counting the opening parentheses to get the correct number for a
backreference is error-prone as soon as there is more than one
capturing group. A more convenient technique became available
with Perl 5.10: relative backreferences. To refer to the immediately
preceding capture group one now may write \g{-1}
, the next but
last is available via \g{-2}
, and so on.
Another good reason in addition to readability and maintainability for using relative backreferences is illustrated by the following example, where a simple pattern for matching peculiar strings is used:
- $a99a = '([a-z])(\d)\g2\g1'; # matches a11a, g22g, x33x, etc.
Now that we have this pattern stored as a handy string, we might feel tempted to use it as a part of some other pattern:
But this doesn't match, at least not the way one might expect. Only
after inserting the interpolated $a99a
and looking at the resulting
full text of the regexp is it obvious that the backreferences have
backfired. The subexpression (\w+)
has snatched number 1 and
demoted the groups in $a99a
by one rank. This can be avoided by
using relative backreferences:
- $a99a = '([a-z])(\d)\g{-1}\g{-2}'; # safe for being interpolated
Perl 5.10 also introduced named capture groups and named backreferences.
To attach a name to a capturing group, you write either
(?<name>...) or (?'name'...). The backreference may
then be written as \g{name}
. It is permissible to attach the
same name to more than one group, but then only the leftmost one of the
eponymous set can be referenced. Outside of the pattern a named
capture group is accessible through the %+
hash.
Assuming that we have to match calendar dates which may be given in one of the three formats yyyy-mm-dd, mm/dd/yyyy or dd.mm.yyyy, we can write three suitable patterns where we use 'd', 'm' and 'y' respectively as the names of the groups capturing the pertaining components of a date. The matching operation combines the three patterns as alternatives:
If any of the alternatives matches, the hash %+
is bound to contain the
three key-value pairs.
Yet another capturing group numbering technique (also as from Perl 5.10) deals with the problem of referring to groups within a set of alternatives. Consider a pattern for matching a time of the day, civil or military style:
- if ( $time =~ /(\d\d|\d):(\d\d)|(\d\d)(\d\d)/ ){
- # process hour and minute
- }
Processing the results requires an additional if statement to determine
whether $1
and $2
or $3
and $4
contain the goodies. It would
be easier if we could use group numbers 1 and 2 in second alternative as
well, and this is exactly what the parenthesized construct (?|...),
set around an alternative achieves. Here is an extended version of the
previous pattern:
- if ( $time =~ /(?|(\d\d|\d):(\d\d)|(\d\d)(\d\d))\s+([A-Z][A-Z][A-Z])/ ){
- print "hour=$1 minute=$2 zone=$3\n";
- }
Within the alternative numbering group, group numbers start at the same position for each alternative. After the group, numbering continues with one higher than the maximum reached across all the alternatives.
In addition to what was matched, Perl also provides the
positions of what was matched as contents of the @-
and @+
arrays. $-[0]
is the position of the start of the entire match and
$+[0]
is the position of the end. Similarly, $-[n]
is the
position of the start of the $n
match and $+[n]
is the position
of the end. If $n
is undefined, so are $-[n]
and $+[n]
. Then
this code
- $x = "Mmm...donut, thought Homer";
- $x =~ /^(Mmm|Yech)\.\.\.(donut|peas)/; # matches
- foreach $expr (1..$#-) {
- print "Match $expr: '${$expr}' at position ($-[$expr],$+[$expr])\n";
- }
prints
- Match 1: 'Mmm' at position (0,3)
- Match 2: 'donut' at position (6,11)
Even if there are no groupings in a regexp, it is still possible to
find out what exactly matched in a string. If you use them, Perl
will set $`
to the part of the string before the match, will set $&
to the part of the string that matched, and will set $'
to the part
of the string after the match. An example:
- $x = "the cat caught the mouse";
- $x =~ /cat/; # $` = 'the ', $& = 'cat', $' = ' caught the mouse'
- $x =~ /the/; # $` = '', $& = 'the', $' = ' cat caught the mouse'
In the second match, $`
equals ''
because the regexp matched at the
first character position in the string and stopped; it never saw the
second 'the'. It is important to note that using $`
and $'
slows down regexp matching quite a bit, while $&
slows it down to a
lesser extent, because if they are used in one regexp in a program,
they are generated for all regexps in the program. So if raw
performance is a goal of your application, they should be avoided.
If you need to extract the corresponding substrings, use @-
and
@+
instead:
- $` is the same as substr( $x, 0, $-[0] )
- $& is the same as substr( $x, $-[0], $+[0]-$-[0] )
- $' is the same as substr( $x, $+[0] )
As of Perl 5.10, the ${^PREMATCH}
, ${^MATCH}
and ${^POSTMATCH}
variables may be used. These are only set if the /p modifier is present.
Consequently they do not penalize the rest of the program.
A group that is required to bundle a set of alternatives may or may not be
useful as a capturing group. If it isn't, it just creates a superfluous
addition to the set of available capture group values, inside as well as
outside the regexp. Non-capturing groupings, denoted by (?:regexp),
still allow the regexp to be treated as a single unit, but don't establish
a capturing group at the same time. Both capturing and non-capturing
groupings are allowed to co-exist in the same regexp. Because there is
no extraction, non-capturing groupings are faster than capturing
groupings. Non-capturing groupings are also handy for choosing exactly
which parts of a regexp are to be extracted to matching variables:
- # match a number, $1-$4 are set, but we only want $1
- /([+-]?\ *(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)/;
- # match a number faster , only $1 is set
- /([+-]?\ *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?)/;
- # match a number, get $1 = whole number, $2 = exponent
- /([+-]?\ *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE]([+-]?\d+))?)/;
Non-capturing groupings are also useful for removing nuisance elements gathered from a split operation where parentheses are required for some reason:
The examples in the previous section display an annoying weakness. We
were only matching 3-letter words, or chunks of words of 4 letters or
less. We'd like to be able to match words or, more generally, strings
of any length, without writing out tedious alternatives like
\w\w\w\w|\w\w\w|\w\w|\w
.
This is exactly the problem the quantifier metacharacters ?,
*
, +
, and {}
were created for. They allow us to delimit the
number of repeats for a portion of a regexp we consider to be a
match. Quantifiers are put immediately after the character, character
class, or grouping that we want to specify. They have the following
meanings:
a? means: match 'a' 1 or 0 times
a*
means: match 'a' 0 or more times, i.e., any number of times
a+
means: match 'a' 1 or more times, i.e., at least once
a{n,m} means: match at least n
times, but not more than m
times.
a{n,}
means: match at least n
or more times
a{n}
means: match exactly n
times
Here are some examples:
- /[a-z]+\s+\d*/; # match a lowercase word, at least one space, and
- # any number of digits
- /(\w+)\s+\g1/; # match doubled words of arbitrary length
- /y(es)?/i; # matches 'y', 'Y', or a case-insensitive 'yes'
- $year =~ /^\d{2,4}$/; # make sure year is at least 2 but not more
- # than 4 digits
- $year =~ /^\d{4}$|^\d{2}$/; # better match; throw out 3-digit dates
- $year =~ /^\d{2}(\d{2})?$/; # same thing written differently. However,
- # this captures the last two digits in $1
- # and the other does not.
- % simple_grep '^(\w+)\g1$' /usr/dict/words # isn't this easier?
- beriberi
- booboo
- coco
- mama
- murmur
- papa
For all of these quantifiers, Perl will try to match as much of the
string as possible, while still allowing the regexp to succeed. Thus
with /a?.../
, Perl will first try to match the regexp with the a
present; if that fails, Perl will try to match the regexp without the
a
present. For the quantifier *
, we get the following:
- $x = "the cat in the hat";
- $x =~ /^(.*)(cat)(.*)$/; # matches,
- # $1 = 'the '
- # $2 = 'cat'
- # $3 = ' in the hat'
Which is what we might expect, the match finds the only cat
in the
string and locks onto it. Consider, however, this regexp:
- $x =~ /^(.*)(at)(.*)$/; # matches,
- # $1 = 'the cat in the h'
- # $2 = 'at'
- # $3 = '' (0 characters match)
One might initially guess that Perl would find the at
in cat
and
stop there, but that wouldn't give the longest possible string to the
first quantifier .*. Instead, the first quantifier .* grabs as
much of the string as possible while still having the regexp match. In
this example, that means having the at
sequence with the final at
in the string. The other important principle illustrated here is that,
when there are two or more elements in a regexp, the leftmost
quantifier, if there is one, gets to grab as much of the string as
possible, leaving the rest of the regexp to fight over scraps. Thus in
our example, the first quantifier .* grabs most of the string, while
the second quantifier .* gets the empty string. Quantifiers that
grab as much of the string as possible are called maximal match or
greedy quantifiers.
When a regexp can match a string in several different ways, we can use the principles above to predict which way the regexp will match:
Principle 0: Taken as a whole, any regexp will be matched at the earliest possible position in the string.
Principle 1: In an alternation a|b|c...
, the leftmost alternative
that allows a match for the whole regexp will be the one used.
Principle 2: The maximal matching quantifiers ?, *
, +
and
{n,m} will in general match as much of the string as possible while
still allowing the whole regexp to match.
Principle 3: If there are two or more elements in a regexp, the leftmost greedy quantifier, if any, will match as much of the string as possible while still allowing the whole regexp to match. The next leftmost greedy quantifier, if any, will try to match as much of the string remaining available to it as possible, while still allowing the whole regexp to match. And so on, until all the regexp elements are satisfied.
As we have seen above, Principle 0 overrides the others. The regexp will be matched as early as possible, with the other principles determining how the regexp matches at that earliest character position.
Here is an example of these principles in action:
- $x = "The programming republic of Perl";
- $x =~ /^(.+)(e|r)(.*)$/; # matches,
- # $1 = 'The programming republic of Pe'
- # $2 = 'r'
- # $3 = 'l'
This regexp matches at the earliest string position, 'T'
. One
might think that e
, being leftmost in the alternation, would be
matched, but r
produces the longest string in the first quantifier.
- $x =~ /(m{1,2})(.*)$/; # matches,
- # $1 = 'mm'
- # $2 = 'ing republic of Perl'
Here, The earliest possible match is at the first 'm'
in
programming
. m{1,2} is the first quantifier, so it gets to match
a maximal mm
.
- $x =~ /.*(m{1,2})(.*)$/; # matches,
- # $1 = 'm'
- # $2 = 'ing republic of Perl'
Here, the regexp matches at the start of the string. The first
quantifier .* grabs as much as possible, leaving just a single
'm'
for the second quantifier m{1,2}.
- $x =~ /(.?)(m{1,2})(.*)$/; # matches,
- # $1 = 'a'
- # $2 = 'mm'
- # $3 = 'ing republic of Perl'
Here, .? eats its maximal one character at the earliest possible
position in the string, 'a'
in programming
, leaving m{1,2}
the opportunity to match both m's. Finally,
- "aXXXb" =~ /(X*)/; # matches with $1 = ''
because it can match zero copies of 'X'
at the beginning of the
string. If you definitely want to match at least one 'X'
, use
X+
, not X*
.
Sometimes greed is not good. At times, we would like quantifiers to
match a minimal piece of string, rather than a maximal piece. For
this purpose, Larry Wall created the minimal match or
non-greedy quantifiers ??
, *?
, +?, and {}?. These are
the usual quantifiers with a ? appended to them. They have the
following meanings:
a??
means: match 'a' 0 or 1 times. Try 0 first, then 1.
a*? means: match 'a' 0 or more times, i.e., any number of times,
but as few times as possible
a+? means: match 'a' 1 or more times, i.e., at least once, but
as few times as possible
a{n,m}? means: match at least n
times, not more than m
times, as few times as possible
a{n,}? means: match at least n
times, but as few times as
possible
a{n}? means: match exactly n
times. Because we match exactly
n
times, a{n}? is equivalent to a{n}
and is just there for
notational consistency.
Let's look at the example above, but with minimal quantifiers:
- $x = "The programming republic of Perl";
- $x =~ /^(.+?)(e|r)(.*)$/; # matches,
- # $1 = 'Th'
- # $2 = 'e'
- # $3 = ' programming republic of Perl'
The minimal string that will allow both the start of the string ^
and the alternation to match is Th
, with the alternation e|r
matching e
. The second quantifier .* is free to gobble up the
rest of the string.
- $x =~ /(m{1,2}?)(.*?)$/; # matches,
- # $1 = 'm'
- # $2 = 'ming republic of Perl'
The first string position that this regexp can match is at the first
'm'
in programming
. At this position, the minimal m{1,2}?
matches just one 'm'
. Although the second quantifier .*? would
prefer to match no characters, it is constrained by the end-of-string
anchor $
to match the rest of the string.
- $x =~ /(.*?)(m{1,2}?)(.*)$/; # matches,
- # $1 = 'The progra'
- # $2 = 'm'
- # $3 = 'ming republic of Perl'
In this regexp, you might expect the first minimal quantifier .*?
to match the empty string, because it is not constrained by a ^
anchor to match the beginning of the word. Principle 0 applies here,
however. Because it is possible for the whole regexp to match at the
start of the string, it will match at the start of the string. Thus
the first quantifier has to match everything up to the first m. The
second minimal quantifier matches just one m and the third
quantifier matches the rest of the string.
- $x =~ /(.??)(m{1,2})(.*)$/; # matches,
- # $1 = 'a'
- # $2 = 'mm'
- # $3 = 'ing republic of Perl'
Just as in the previous regexp, the first quantifier .?? can match
earliest at position 'a'
, so it does. The second quantifier is
greedy, so it matches mm
, and the third matches the rest of the
string.
We can modify principle 3 above to take into account non-greedy quantifiers:
Principle 3: If there are two or more elements in a regexp, the leftmost greedy (non-greedy) quantifier, if any, will match as much (little) of the string as possible while still allowing the whole regexp to match. The next leftmost greedy (non-greedy) quantifier, if any, will try to match as much (little) of the string remaining available to it as possible, while still allowing the whole regexp to match. And so on, until all the regexp elements are satisfied.
Just like alternation, quantifiers are also susceptible to backtracking. Here is a step-by-step analysis of the example
- $x = "the cat in the hat";
- $x =~ /^(.*)(at)(.*)$/; # matches,
- # $1 = 'the cat in the h'
- # $2 = 'at'
- # $3 = '' (0 matches)
Start with the first letter in the string 't'.
The first quantifier '.*' starts out by matching the whole string 'the cat in the hat'.
'a' in the regexp element 'at' doesn't match the end of the string. Backtrack one character.
'a' in the regexp element 'at' still doesn't match the last letter of the string 't', so backtrack one more character.
Now we can match the 'a' and the 't'.
Move on to the third element '.*'. Since we are at the end of the string and '.*' can match 0 times, assign it the empty string.
We are done!
Most of the time, all this moving forward and backtracking happens quickly and searching is fast. There are some pathological regexps, however, whose execution time exponentially grows with the size of the string. A typical structure that blows up in your face is of the form
- /(a|b+)*/;
The problem is the nested indeterminate quantifiers. There are many
different ways of partitioning a string of length n between the +
and *
: one repetition with b+
of length n, two repetitions with
the first b+
length k and the second with length n-k, m repetitions
whose bits add up to length n, etc. In fact there are an exponential
number of ways to partition a string as a function of its length. A
regexp may get lucky and match early in the process, but if there is
no match, Perl will try every possibility before giving up. So be
careful with nested *
's, {n,m}'s, and +
's. The book
Mastering Regular Expressions by Jeffrey Friedl gives a wonderful
discussion of this and other efficiency issues.
Backtracking during the relentless search for a match may be a waste of time, particularly when the match is bound to fail. Consider the simple pattern
- /^\w+\s+\w+$/; # a word, spaces, a word
Whenever this is applied to a string which doesn't quite meet the
pattern's expectations such as "abc "
or "abc def "
,
the regex engine will backtrack, approximately once for each character
in the string. But we know that there is no way around taking all
of the initial word characters to match the first repetition, that all
spaces must be eaten by the middle part, and the same goes for the second
word.
With the introduction of the possessive quantifiers in Perl 5.10, we
have a way of instructing the regex engine not to backtrack, with the
usual quantifiers with a +
appended to them. This makes them greedy as
well as stingy; once they succeed they won't give anything back to permit
another solution. They have the following meanings:
a{n,m}+ means: match at least n
times, not more than m times,
as many times as possible, and don't give anything up. a?+ is short
for a{0,1}+
a{n,}+
means: match at least n
times, but as many times as possible,
and don't give anything up. a*+
is short for a{0,}+
and a++
is
short for a{1,}+
.
a{n}+
means: match exactly n
times. It is just there for
notational consistency.
These possessive quantifiers represent a special case of a more general concept, the independent subexpression, see below.
As an example where a possessive quantifier is suitable we consider matching a quoted string, as it appears in several programming languages. The backslash is used as an escape character that indicates that the next character is to be taken literally, as another character for the string. Therefore, after the opening quote, we expect a (possibly empty) sequence of alternatives: either some character except an unescaped quote or backslash or an escaped character.
- /"(?:[^"\\]++|\\.)*+"/;
At this point, we have all the basic regexp concepts covered, so let's give a more involved example of a regular expression. We will build a regexp that matches numbers.
The first task in building a regexp is to decide what we want to match and what we want to exclude. In our case, we want to match both integers and floating point numbers and we want to reject any string that isn't a number.
The next task is to break the problem down into smaller problems that are easily converted into a regexp.
The simplest case is integers. These consist of a sequence of digits,
with an optional sign in front. The digits we can represent with
\d+
and the sign can be matched with [+-]
. Thus the integer
regexp is
- /[+-]?\d+/; # matches integers
A floating point number potentially has a sign, an integral part, a
decimal point, a fractional part, and an exponent. One or more of these
parts is optional, so we need to check out the different
possibilities. Floating point numbers which are in proper form include
123., 0.345, .34, -1e6, and 25.4E-72. As with integers, the sign out
front is completely optional and can be matched by [+-]?. We can
see that if there is no exponent, floating point numbers must have a
decimal point, otherwise they are integers. We might be tempted to
model these with \d*\.\d*, but this would also match just a single
decimal point, which is not a number. So the three cases of floating
point number without exponent are
- /[+-]?\d+\./; # 1., 321., etc.
- /[+-]?\.\d+/; # .1, .234, etc.
- /[+-]?\d+\.\d+/; # 1.0, 30.56, etc.
These can be combined into a single regexp with a three-way alternation:
- /[+-]?(\d+\.\d+|\d+\.|\.\d+)/; # floating point, no exponent
In this alternation, it is important to put '\d+\.\d+'
before
'\d+\.'
. If '\d+\.'
were first, the regexp would happily match that
and ignore the fractional part of the number.
Now consider floating point numbers with exponents. The key observation here is that both integers and numbers with decimal points are allowed in front of an exponent. Then exponents, like the overall sign, are independent of whether we are matching numbers with or without decimal points, and can be 'decoupled' from the mantissa. The overall form of the regexp now becomes clear:
- /^(optional sign)(integer | f.p. mantissa)(optional exponent)$/;
The exponent is an e
or E
, followed by an integer. So the
exponent regexp is
- /[eE][+-]?\d+/; # exponent
Putting all the parts together, we get a regexp that matches numbers:
- /^[+-]?(\d+\.\d+|\d+\.|\.\d+|\d+)([eE][+-]?\d+)?$/; # Ta da!
Long regexps like this may impress your friends, but can be hard to
decipher. In complex situations like this, the //x
modifier for a
match is invaluable. It allows one to put nearly arbitrary whitespace
and comments into a regexp without affecting their meaning. Using it,
we can rewrite our 'extended' regexp in the more pleasing form
- /^
- [+-]? # first, match an optional sign
- ( # then match integers or f.p. mantissas:
- \d+\.\d+ # mantissa of the form a.b
- |\d+\. # mantissa of the form a.
- |\.\d+ # mantissa of the form .b
- |\d+ # integer of the form a
- )
- ([eE][+-]?\d+)? # finally, optionally match an exponent
- $/x;
If whitespace is mostly irrelevant, how does one include space
characters in an extended regexp? The answer is to backslash it
'\ '
or put it in a character class [ ]
. The same thing
goes for pound signs: use \#
or [#]. For instance, Perl allows
a space between the sign and the mantissa or integer, and we could add
this to our regexp as follows:
- /^
- [+-]?\ * # first, match an optional sign *and space*
- ( # then match integers or f.p. mantissas:
- \d+\.\d+ # mantissa of the form a.b
- |\d+\. # mantissa of the form a.
- |\.\d+ # mantissa of the form .b
- |\d+ # integer of the form a
- )
- ([eE][+-]?\d+)? # finally, optionally match an exponent
- $/x;
In this form, it is easier to see a way to simplify the
alternation. Alternatives 1, 2, and 4 all start with \d+
, so it
could be factored out:
- /^
- [+-]?\ * # first, match an optional sign
- ( # then match integers or f.p. mantissas:
- \d+ # start out with a ...
- (
- \.\d* # mantissa of the form a.b or a.
- )? # ? takes care of integers of the form a
- |\.\d+ # mantissa of the form .b
- )
- ([eE][+-]?\d+)? # finally, optionally match an exponent
- $/x;
or written in the compact form,
- /^[+-]?\ *(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?$/;
This is our final regexp. To recap, we built a regexp by
specifying the task in detail,
breaking down the problem into smaller parts,
translating the small parts into regexps,
combining the regexps,
and optimizing the final combined regexp.
These are also the typical steps involved in writing a computer program. This makes perfect sense, because regular expressions are essentially programs written in a little computer language that specifies patterns.
The last topic of Part 1 briefly covers how regexps are used in Perl programs. Where do they fit into Perl syntax?
We have already introduced the matching operator in its default
/regexp/
and arbitrary delimiter m!regexp! forms. We have used
the binding operator =~
and its negation !~
to test for string
matches. Associated with the matching operator, we have discussed the
single line //s
, multi-line //m
, case-insensitive //i
and
extended //x
modifiers. There are a few more things you might
want to know about matching operators.
If you change $pattern
after the first substitution happens, Perl
will ignore it. If you don't want any substitutions at all, use the
special delimiter m'':
- @pattern = ('Seuss');
- while (<>) {
- print if m'@pattern'; # matches literal '@pattern', not 'Seuss'
- }
Similar to strings, m'' acts like apostrophes on a regexp; all other
m delimiters act like quotes. If the regexp evaluates to the empty string,
the regexp in the last successful match is used instead. So we have
- "dog" =~ /d/; # 'd' matches
- "dogbert =~ //; # this matches the 'd' regexp used before
The final two modifiers we will discuss here,
//g
and //c
, concern multiple matches.
The modifier //g
stands for global matching and allows the
matching operator to match within a string as many times as possible.
In scalar context, successive invocations against a string will have
//g
jump from match to match, keeping track of position in the
string as it goes along. You can get or set the position with the
pos() function.
The use of //g
is shown in the following example. Suppose we have
a string that consists of words separated by spaces. If we know how
many words there are in advance, we could extract the words using
groupings:
- $x = "cat dog house"; # 3 words
- $x =~ /^\s*(\w+)\s+(\w+)\s+(\w+)\s*$/; # matches,
- # $1 = 'cat'
- # $2 = 'dog'
- # $3 = 'house'
But what if we had an indeterminate number of words? This is the sort
of task //g
was made for. To extract all words, form the simple
regexp (\w+)
and loop over all matches with /(\w+)/g
:
prints
- Word is cat, ends at position 3
- Word is dog, ends at position 7
- Word is house, ends at position 13
A failed match or changing the target string resets the position. If
you don't want the position reset after failure to match, add the
//c
, as in /regexp/gc
. The current position in the string is
associated with the string, not the regexp. This means that different
strings have different positions and their respective positions can be
set or read independently.
In list context, //g
returns a list of matched groupings, or if
there are no groupings, a list of matches to the whole regexp. So if
we wanted just the words, we could use
- @words = ($x =~ /(\w+)/g); # matches,
- # $words[0] = 'cat'
- # $words[1] = 'dog'
- # $words[2] = 'house'
Closely associated with the //g
modifier is the \G
anchor. The
\G
anchor matches at the point where the previous //g
match left
off. \G
allows us to easily do context-sensitive matching:
- $metric = 1; # use metric units
- ...
- $x = <FILE>; # read in measurement
- $x =~ /^([+-]?\d+)\s*/g; # get magnitude
- $weight = $1;
- if ($metric) { # error checking
- print "Units error!" unless $x =~ /\Gkg\./g;
- }
- else {
- print "Units error!" unless $x =~ /\Glbs\./g;
- }
- $x =~ /\G\s+(widget|sprocket)/g; # continue processing
The combination of //g
and \G
allows us to process the string a
bit at a time and use arbitrary Perl logic to decide what to do next.
Currently, the \G
anchor is only fully supported when used to anchor
to the start of the pattern.
\G
is also invaluable in processing fixed-length records with
regexps. Suppose we have a snippet of coding region DNA, encoded as
base pair letters ATCGTTGAAT...
and we want to find all the stop
codons TGA
. In a coding region, codons are 3-letter sequences, so
we can think of the DNA snippet as a sequence of 3-letter records. The
naive regexp
- # expanded, this is "ATC GTT GAA TGC AAA TGA CAT GAC"
- $dna = "ATCGTTGAATGCAAATGACATGAC";
- $dna =~ /TGA/;
doesn't work; it may match a TGA
, but there is no guarantee that
the match is aligned with codon boundaries, e.g., the substring
GTT GAA
gives a match. A better solution is
which prints
- Got a TGA stop codon at position 18
- Got a TGA stop codon at position 23
Position 18 is good, but position 23 is bogus. What happened?
The answer is that our regexp works well until we get past the last
real match. Then the regexp will fail to match a synchronized TGA
and start stepping ahead one character position at a time, not what we
want. The solution is to use \G
to anchor the match to the codon
alignment:
This prints
- Got a TGA stop codon at position 18
which is the correct answer. This example illustrates that it is important not only to match what is desired, but to reject what is not desired.
(There are other regexp modifiers that are available, such as
//o
, but their specialized uses are beyond the
scope of this introduction. )
Regular expressions also play a big role in search and replace
operations in Perl. Search and replace is accomplished with the
s/// operator. The general form is
s/regexp/replacement/modifiers, with everything we know about
regexps and modifiers applying in this case as well. The
replacement
is a Perl double-quoted string that replaces in the
string whatever is matched with the regexp
. The operator =~
is
also used here to associate a string with s///. If matching
against $_
, the $_ =~
can be dropped. If there is a match,
s/// returns the number of substitutions made; otherwise it returns
false. Here are a few examples:
- $x = "Time to feed the cat!";
- $x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!"
- if ($x =~ s/^(Time.*hacker)!$/$1 now!/) {
- $more_insistent = 1;
- }
- $y = "'quoted words'";
- $y =~ s/^'(.*)'$/$1/; # strip single quotes,
- # $y contains "quoted words"
In the last example, the whole string was matched, but only the part
inside the single quotes was grouped. With the s/// operator, the
matched variables $1
, $2
, etc. are immediately available for use
in the replacement expression, so we use $1
to replace the quoted
string with just what was quoted. With the global modifier, s///g
will search and replace all occurrences of the regexp in the string:
- $x = "I batted 4 for 4";
- $x =~ s/4/four/; # doesn't do it all:
- # $x contains "I batted four for 4"
- $x = "I batted 4 for 4";
- $x =~ s/4/four/g; # does it all:
- # $x contains "I batted four for four"
If you prefer 'regex' over 'regexp' in this tutorial, you could use the following program to replace it:
In simple_replace
we used the s///g modifier to replace all
occurrences of the regexp on each line. (Even though the regular
expression appears in a loop, Perl is smart enough to compile it
only once.) As with simple_grep
, both the
print and the s/$regexp/$replacement/g use $_
implicitly.
If you don't want s/// to change your original variable you can use
the non-destructive substitute modifier, s///r. This changes the
behavior so that s///r returns the final substituted string
(instead of the number of substitutions):
- $x = "I like dogs.";
- $y = $x =~ s/dogs/cats/r;
- print "$x $y\n";
That example will print "I like dogs. I like cats". Notice the original
$x
variable has not been affected. The overall
result of the substitution is instead stored in $y
. If the
substitution doesn't affect anything then the original string is
returned:
- $x = "I like dogs.";
- $y = $x =~ s/elephants/cougars/r;
- print "$x $y\n"; # prints "I like dogs. I like dogs."
One other interesting thing that the s///r flag allows is chaining
substitutions:
- $x = "Cats are great.";
- print $x =~ s/Cats/Dogs/r =~ s/Dogs/Frogs/r =~ s/Frogs/Hedgehogs/r, "\n";
- # prints "Hedgehogs are great."
A modifier available specifically to search and replace is the
s///e evaluation modifier. s///e treats the
replacement text as Perl code, rather than a double-quoted
string. The value that the code returns is substituted for the
matched substring. s///e is useful if you need to do a bit of
computation in the process of replacing text. This example counts
character frequencies in a line:
This prints
- frequency of ' ' is 2
- frequency of 't' is 2
- frequency of 'l' is 2
- frequency of 'B' is 1
- frequency of 'c' is 1
- frequency of 'e' is 1
- frequency of 'h' is 1
- frequency of 'i' is 1
- frequency of 'a' is 1
As with the match m// operator, s/// can use other delimiters,
such as s!!! and s{}{}, and even s{}//. If single quotes are
used s''', then the regexp and replacement are
treated as single-quoted strings and there are no
variable substitutions. s/// in list context
returns the same thing as in scalar context, i.e., the number of
matches.
The split() function is another place where a regexp is used.
split /regexp/, string, limit
separates the string
operand into
a list of substrings and returns that list. The regexp must be designed
to match whatever constitutes the separators for the desired substrings.
The limit
, if present, constrains splitting into no more than limit
number of strings. For example, to split a string into words, use
- $x = "Calvin and Hobbes";
- @words = split /\s+/, $x; # $word[0] = 'Calvin'
- # $word[1] = 'and'
- # $word[2] = 'Hobbes'
If the empty regexp //
is used, the regexp always matches and
the string is split into individual characters. If the regexp has
groupings, then the resulting list contains the matched substrings from the
groupings as well. For instance,
Since the first character of $x matched the regexp, split prepended
an empty initial element to the list.
If you have read this far, congratulations! You now have all the basic tools needed to use regular expressions to solve a wide range of text processing problems. If this is your first time through the tutorial, why not stop here and play around with regexps a while.... Part 2 concerns the more esoteric aspects of regular expressions and those concepts certainly aren't needed right at the start.
OK, you know the basics of regexps and you want to know more. If matching regular expressions is analogous to a walk in the woods, then the tools discussed in Part 1 are analogous to topo maps and a compass, basic tools we use all the time. Most of the tools in part 2 are analogous to flare guns and satellite phones. They aren't used too often on a hike, but when we are stuck, they can be invaluable.
What follows are the more advanced, less used, or sometimes esoteric capabilities of Perl regexps. In Part 2, we will assume you are comfortable with the basics and concentrate on the advanced features.
There are a number of escape sequences and character classes that we haven't covered yet.
There are several escape sequences that convert characters or strings
between upper and lower case, and they are also available within
patterns. \l
and \u
convert the next character to lower or
upper case, respectively:
- $x = "perl";
- $string =~ /\u$x/; # matches 'Perl' in $string
- $x = "M(rs?|s)\\."; # note the double backslash
- $string =~ /\l$x/; # matches 'mr.', 'mrs.', and 'ms.',
A \L
or \U
indicates a lasting conversion of case, until
terminated by \E
or thrown over by another \U
or \L
:
- $x = "This word is in lower case:\L SHOUT\E";
- $x =~ /shout/; # matches
- $x = "I STILL KEYPUNCH CARDS FOR MY 360"
- $x =~ /\Ukeypunch/; # matches punch card string
If there is no \E
, case is converted until the end of the
string. The regexps \L\u$word
or \u\L$word
convert the first
character of $word
to uppercase and the rest of the characters to
lowercase.
Control characters can be escaped with \c
, so that a control-Z
character would be matched with \cZ
. The escape sequence
\Q
...\E
quotes, or protects most non-alphabetic characters. For
instance,
- $x = "\QThat !^*&%~& cat!";
- $x =~ /\Q!^*&%~&\E/; # check for rough language
It does not protect $
or @
, so that variables can still be
substituted.
\Q
, \L
, \l
, \U
, \u
and \E
are actually part of
double-quotish syntax, and not part of regexp syntax proper. They will
work if they appear in a regular expression embedded directly in a
program, but not when contained in a string that is interpolated in a
pattern.
Perl regexps can handle more than just the standard ASCII character set. Perl supports Unicode, a standard for representing the alphabets from virtually all of the world's written languages, and a host of symbols. Perl's text strings are Unicode strings, so they can contain characters with a value (codepoint or character number) higher than 255.
What does this mean for regexps? Well, regexp users don't need to know
much about Perl's internal representation of strings. But they do need
to know 1) how to represent Unicode characters in a regexp and 2) that
a matching operation will treat the string to be searched as a sequence
of characters, not bytes. The answer to 1) is that Unicode characters
greater than chr(255) are represented using the \x{hex}
notation, because
\x hex (without curly braces) doesn't go further than 255. (Starting in Perl
5.14, if you're an octal fan, you can also use \o{oct}
.)
- /\x{263a}/; # match a Unicode smiley face :)
NOTE: In Perl 5.6.0 it used to be that one needed to say use
utf8
to use any Unicode features. This is no more the case: for
almost all Unicode processing, the explicit utf8
pragma is not
needed. (The only case where it matters is if your Perl script is in
Unicode and encoded in UTF-8, then an explicit use utf8
is needed.)
Figuring out the hexadecimal sequence of a Unicode character you want
or deciphering someone else's hexadecimal Unicode regexp is about as
much fun as programming in machine code. So another way to specify
Unicode characters is to use the named character escape
sequence \N{name}. name is a name for the Unicode character, as
specified in the Unicode standard. For instance, if we wanted to
represent or match the astrological sign for the planet Mercury, we
could use
- $x = "abc\N{MERCURY}def";
- $x =~ /\N{MERCURY}/; # matches
One can also use "short" names:
You can also restrict names to a certain alphabet by specifying the charnames pragma:
An index of character names is available on-line from the Unicode Consortium, http://www.unicode.org/charts/charindex.html; explanatory material with links to other resources at http://www.unicode.org/standard/where.
The answer to requirement 2) is that a regexp (mostly)
uses Unicode characters. The "mostly" is for messy backward
compatibility reasons, but starting in Perl 5.14, any regex compiled in
the scope of a use feature 'unicode_strings'
(which is automatically
turned on within the scope of a use 5.012
or higher) will turn that
"mostly" into "always". If you want to handle Unicode properly, you
should ensure that 'unicode_strings'
is turned on.
Internally, this is encoded to bytes using either UTF-8 or a native 8
bit encoding, depending on the history of the string, but conceptually
it is a sequence of characters, not bytes. See perlunitut for a
tutorial about that.
Let us now discuss Unicode character classes. Just as with Unicode
characters, there are named Unicode character classes represented by the
\p{name}
escape sequence. Closely associated is the \P{name}
character class, which is the negation of the \p{name}
class. For
example, to match lower and uppercase characters,
- $x = "BOB";
- $x =~ /^\p{IsUpper}/; # matches, uppercase char class
- $x =~ /^\P{IsUpper}/; # doesn't match, char class sans uppercase
- $x =~ /^\p{IsLower}/; # doesn't match, lowercase char class
- $x =~ /^\P{IsLower}/; # matches, char class sans lowercase
(The "Is" is optional.)
Here is the association between some Perl named classes and the traditional Unicode classes:
- Perl class name Unicode class name or regular expression
- IsAlpha /^[LM]/
- IsAlnum /^[LMN]/
- IsASCII $code <= 127
- IsCntrl /^C/
- IsBlank $code =~ /^(0020|0009)$/ || /^Z[^lp]/
- IsDigit Nd
- IsGraph /^([LMNPS]|Co)/
- IsLower Ll
- IsPrint /^([LMNPS]|Co|Zs)/
- IsPunct /^P/
- IsSpace /^Z/ || ($code =~ /^(0009|000A|000B|000C|000D)$/
- IsSpacePerl /^Z/ || ($code =~ /^(0009|000A|000C|000D|0085|2028|2029)$/
- IsUpper /^L[ut]/
- IsWord /^[LMN]/ || $code eq "005F"
- IsXDigit $code =~ /^00(3[0-9]|[46][1-6])$/
You can also use the official Unicode class names with \p
and
\P
, like \p{L}
for Unicode 'letters', \p{Lu}
for uppercase
letters, or \P{Nd}
for non-digits. If a name
is just one
letter, the braces can be dropped. For instance, \pM
is the
character class of Unicode 'marks', for example accent marks.
For the full list see perlunicode.
Unicode has also been separated into various sets of characters
which you can test with \p{...}
(in) and \P{...}
(not in).
To test whether a character is (or is not) an element of a script
you would use the script name, for example \p{Latin}
, \p{Greek}
,
or \P{Katakana}
.
What we have described so far is the single form of the \p{...}
character
classes. There is also a compound form which you may run into. These
look like \p{name=value}
or \p{name:value}
(the equals sign and colon
can be used interchangeably). These are more general than the single form,
and in fact most of the single forms are just Perl-defined shortcuts for common
compound forms. For example, the script examples in the previous paragraph
could be written equivalently as \p{Script=Latin}
, \p{Script:Greek}
, and
\P{script=katakana}
(case is irrelevant between the {}
braces). You may
never have to use the compound forms, but sometimes it is necessary, and their
use can make your code easier to understand.
\X
is an abbreviation for a character class that comprises
a Unicode extended grapheme cluster. This represents a "logical character":
what appears to be a single character, but may be represented internally by more
than one. As an example, using the Unicode full names, e.g., A + COMBINING
RING
is a grapheme cluster with base character A
and combining character
COMBINING RING
, which translates in Danish to A with the circle atop it,
as in the word Angstrom.
For the full and latest information about Unicode see the latest Unicode standard, or the Unicode Consortium's website http://www.unicode.org
As if all those classes weren't enough, Perl also defines POSIX-style
character classes. These have the form [:name:], with name
the
name of the POSIX class. The POSIX classes are alpha
, alnum
,
ascii
, cntrl
, digit
, graph
, lower
, print, punct
,
space
, upper
, and xdigit
, and two extensions, word
(a Perl
extension to match \w
), and blank
(a GNU extension). The //a
modifier restricts these to matching just in the ASCII range; otherwise
they can match the same as their corresponding Perl Unicode classes:
[:upper:] is the same as \p{IsUpper}
, etc. (There are some
exceptions and gotchas with this; see perlrecharclass for a full
discussion.) The [:digit:], [:word:], and
[:space:] correspond to the familiar \d
, \w
, and \s
character classes. To negate a POSIX class, put a ^ in front of
the name, so that, e.g., [:^digit:] corresponds to \D
and, under
Unicode, \P{IsDigit}
. The Unicode and POSIX character classes can
be used just like \d
, with the exception that POSIX character
classes can only be used inside of a character class:
- /\s+[abc[:digit:]xyz]\s*/; # match a,b,c,x,y,z, or a digit
- /^=item\s[[:digit:]]/; # match '=item',
- # followed by a space and a digit
- /\s+[abc\p{IsDigit}xyz]\s+/; # match a,b,c,x,y,z, or a digit
- /^=item\s\p{IsDigit}/; # match '=item',
- # followed by a space and a digit
Whew! That is all the rest of the characters and character classes.
In Part 1 we mentioned that Perl compiles a regexp into a compact
sequence of opcodes. Thus, a compiled regexp is a data structure
that can be stored once and used again and again. The regexp quote
qr// does exactly that: qr/string/ compiles the string
as a
regexp and transforms the result into a form that can be assigned to a
variable:
- $reg = qr/foo+bar?/; # reg contains a compiled regexp
Then $reg
can be used as a regexp:
- $x = "fooooba";
- $x =~ $reg; # matches, just like /foo+bar?/
- $x =~ /$reg/; # same thing, alternate form
$reg
can also be interpolated into a larger regexp:
- $x =~ /(abc)?$reg/; # still matches
As with the matching operator, the regexp quote can use different
delimiters, e.g., qr!!, qr{} or qr~~. Apostrophes
as delimiters (qr'') inhibit any interpolation.
Pre-compiled regexps are useful for creating dynamic matches that
don't need to be recompiled each time they are encountered. Using
pre-compiled regexps, we write a grep_step
program which greps
for a sequence of patterns, advancing to the next pattern as soon
as one has been satisfied.
- % cat > grep_step
- #!/usr/bin/perl
- # grep_step - match <number> regexps, one after the other
- # usage: multi_grep <number> regexp1 regexp2 ... file1 file2 ...
- $number = shift;
- $regexp[$_] = shift foreach (0..$number-1);
- @compiled = map qr/$_/, @regexp;
- while ($line = <>) {
- if ($line =~ /$compiled[0]/) {
- print $line;
- shift @compiled;
- last unless @compiled;
- }
- }
- ^D
- % grep_step 3 shift print last grep_step
- $number = shift;
- print $line;
- last unless @compiled;
Storing pre-compiled regexps in an array @compiled
allows us to
simply loop through the regexps without any recompilation, thus gaining
flexibility without sacrificing speed.
Backtracking is more efficient than repeated tries with different regular
expressions. If there are several regular expressions and a match with
any of them is acceptable, then it is possible to combine them into a set
of alternatives. If the individual expressions are input data, this
can be done by programming a join operation. We'll exploit this idea in
an improved version of the simple_grep
program: a program that matches
multiple patterns:
- % cat > multi_grep
- #!/usr/bin/perl
- # multi_grep - match any of <number> regexps
- # usage: multi_grep <number> regexp1 regexp2 ... file1 file2 ...
- $number = shift;
- $regexp[$_] = shift foreach (0..$number-1);
- $pattern = join '|', @regexp;
- while ($line = <>) {
- print $line if $line =~ /$pattern/;
- }
- ^D
- % multi_grep 2 shift for multi_grep
- $number = shift;
- $regexp[$_] = shift foreach (0..$number-1);
Sometimes it is advantageous to construct a pattern from the input that is to be analyzed and use the permissible values on the left hand side of the matching operations. As an example for this somewhat paradoxical situation, let's assume that our input contains a command verb which should match one out of a set of available command verbs, with the additional twist that commands may be abbreviated as long as the given string is unique. The program below demonstrates the basic algorithm.
- % cat > keymatch
- #!/usr/bin/perl
- $kwds = 'copy compare list print';
- while( $command = <> ){
- $command =~ s/^\s+|\s+$//g; # trim leading and trailing spaces
- if( ( @matches = $kwds =~ /\b$command\w*/g ) == 1 ){
- print "command: '@matches'\n";
- } elsif( @matches == 0 ){
- print "no such command: '$command'\n";
- } else {
- print "not unique: '$command' (could be one of: @matches)\n";
- }
- }
- ^D
- % keymatch
- li
- command: 'list'
- co
- not unique: 'co' (could be one of: copy compare)
- printer
- no such command: 'printer'
Rather than trying to match the input against the keywords, we match the
combined set of keywords against the input. The pattern matching
operation $kwds =~ /\b($command\w*)/g
does several things at the
same time. It makes sure that the given command begins where a keyword
begins (\b
). It tolerates abbreviations due to the added \w*
. It
tells us the number of matches (scalar @matches
) and all the keywords
that were actually matched. You could hardly ask for more.
Starting with this section, we will be discussing Perl's set of
extended patterns. These are extensions to the traditional regular
expression syntax that provide powerful new tools for pattern
matching. We have already seen extensions in the form of the minimal
matching constructs ??
, *?
, +?, {n,m}?, and {n,}?. Most
of the extensions below have the form (?char...), where the
char
is a character that determines the type of extension.
The first extension is an embedded comment (?#text). This embeds a
comment into the regular expression without affecting its meaning. The
comment should not have any closing parentheses in the text. An
example is
- /(?# Match an integer:)[+-]?\d+/;
This style of commenting has been largely superseded by the raw,
freeform commenting that is allowed with the //x
modifier.
Most modifiers, such as //i
, //m
, //s
and //x
(or any
combination thereof) can also be embedded in
a regexp using (?i), (?m), (?s), and (?x). For instance,
- /(?i)yes/; # match 'yes' case insensitively
- /yes/i; # same thing
- /(?x)( # freeform version of an integer regexp
- [+-]? # match an optional sign
- \d+ # match a sequence of digits
- )
- /x;
Embedded modifiers can have two important advantages over the usual modifiers. Embedded modifiers allow a custom set of modifiers to each regexp pattern. This is great for matching an array of regexps that must have different modifiers:
- $pattern[0] = '(?i)doctor';
- $pattern[1] = 'Johnson';
- ...
- while (<>) {
- foreach $patt (@pattern) {
- print if /$patt/;
- }
- }
The second advantage is that embedded modifiers (except //p
, which
modifies the entire regexp) only affect the regexp
inside the group the embedded modifier is contained in. So grouping
can be used to localize the modifier's effects:
- /Answer: ((?i)yes)/; # matches 'Answer: yes', 'Answer: YES', etc.
Embedded modifiers can also turn off any modifiers already present
by using, e.g., (?-i). Modifiers can also be combined into
a single expression, e.g., (?s-i) turns on single line mode and
turns off case insensitivity.
Embedded modifiers may also be added to a non-capturing grouping.
(?i-m:regexp) is a non-capturing grouping that matches regexp
case insensitively and turns off multi-line mode.
This section concerns the lookahead and lookbehind assertions. First, a little background.
In Perl regular expressions, most regexp elements 'eat up' a certain
amount of string when they match. For instance, the regexp element
[abc}] eats up one character of the string when it matches, in the
sense that Perl moves to the next character position in the string
after the match. There are some elements, however, that don't eat up
characters (advance the character position) if they match. The examples
we have seen so far are the anchors. The anchor ^ matches the
beginning of the line, but doesn't eat any characters. Similarly, the
word boundary anchor \b
matches wherever a character matching \w
is next to a character that doesn't, but it doesn't eat up any
characters itself. Anchors are examples of zero-width assertions:
zero-width, because they consume
no characters, and assertions, because they test some property of the
string. In the context of our walk in the woods analogy to regexp
matching, most regexp elements move us along a trail, but anchors have
us stop a moment and check our surroundings. If the local environment
checks out, we can proceed forward. But if the local environment
doesn't satisfy us, we must backtrack.
Checking the environment entails either looking ahead on the trail,
looking behind, or both. ^ looks behind, to see that there are no
characters before. $
looks ahead, to see that there are no
characters after. \b
looks both ahead and behind, to see if the
characters on either side differ in their "word-ness".
The lookahead and lookbehind assertions are generalizations of the
anchor concept. Lookahead and lookbehind are zero-width assertions
that let us specify which characters we want to test for. The
lookahead assertion is denoted by (?=regexp) and the lookbehind
assertion is denoted by (?<=fixed-regexp). Some examples are
- $x = "I catch the housecat 'Tom-cat' with catnip";
- $x =~ /cat(?=\s)/; # matches 'cat' in 'housecat'
- @catwords = ($x =~ /(?<=\s)cat\w+/g); # matches,
- # $catwords[0] = 'catch'
- # $catwords[1] = 'catnip'
- $x =~ /\bcat\b/; # matches 'cat' in 'Tom-cat'
- $x =~ /(?<=\s)cat(?=\s)/; # doesn't match; no isolated 'cat' in
- # middle of $x
Note that the parentheses in (?=regexp) and (?<=regexp) are
non-capturing, since these are zero-width assertions. Thus in the
second regexp, the substrings captured are those of the whole regexp
itself. Lookahead (?=regexp) can match arbitrary regexps, but
lookbehind (?<=fixed-regexp) only works for regexps of fixed
width, i.e., a fixed number of characters long. Thus
(?<=(ab|bc)) is fine, but (?<=(ab)*) is not. The
negated versions of the lookahead and lookbehind assertions are
denoted by (?!regexp) and (?<!fixed-regexp) respectively.
They evaluate true if the regexps do not match:
- $x = "foobar";
- $x =~ /foo(?!bar)/; # doesn't match, 'bar' follows 'foo'
- $x =~ /foo(?!baz)/; # matches, 'baz' doesn't follow 'foo'
- $x =~ /(?<!\s)foo/; # matches, there is no \s before 'foo'
The \C
is unsupported in lookbehind, because the already
treacherous definition of \C
would become even more so
when going backwards.
Here is an example where a string containing blank-separated words,
numbers and single dashes is to be split into its components.
Using /\s+/
alone won't work, because spaces are not required between
dashes, or a word or a dash. Additional places for a split are established
by looking ahead and behind:
- $str = "one two - --6-8";
- @toks = split / \s+ # a run of spaces
- | (?<=\S) (?=-) # any non-space followed by '-'
- | (?<=-) (?=\S) # a '-' followed by any non-space
- /x, $str; # @toks = qw(one two - - - 6 - 8)
Independent subexpressions are regular expressions, in the
context of a larger regular expression, that function independently of
the larger regular expression. That is, they consume as much or as
little of the string as they wish without regard for the ability of
the larger regexp to match. Independent subexpressions are represented
by (?>regexp). We can illustrate their behavior by first
considering an ordinary regexp:
- $x = "ab";
- $x =~ /a*ab/; # matches
This obviously matches, but in the process of matching, the
subexpression a*
first grabbed the a
. Doing so, however,
wouldn't allow the whole regexp to match, so after backtracking, a*
eventually gave back the a
and matched the empty string. Here, what
a*
matched was dependent on what the rest of the regexp matched.
Contrast that with an independent subexpression:
- $x =~ /(?>a*)ab/; # doesn't match!
The independent subexpression (?>a*) doesn't care about the rest
of the regexp, so it sees an a
and grabs it. Then the rest of the
regexp ab
cannot match. Because (?>a*) is independent, there
is no backtracking and the independent subexpression does not give
up its a
. Thus the match of the regexp as a whole fails. A similar
behavior occurs with completely independent regexps:
- $x = "ab";
- $x =~ /a*/g; # matches, eats an 'a'
- $x =~ /\Gab/g; # doesn't match, no 'a' available
Here //g
and \G
create a 'tag team' handoff of the string from
one regexp to the other. Regexps with an independent subexpression are
much like this, with a handoff of the string to the independent
subexpression, and a handoff of the string back to the enclosing
regexp.
The ability of an independent subexpression to prevent backtracking can be quite useful. Suppose we want to match a non-empty string enclosed in parentheses up to two levels deep. Then the following regexp matches:
- $x = "abc(de(fg)h"; # unbalanced parentheses
- $x =~ /\( ( [^()]+ | \([^()]*\) )+ \)/x;
The regexp matches an open parenthesis, one or more copies of an
alternation, and a close parenthesis. The alternation is two-way, with
the first alternative [^()]+ matching a substring with no
parentheses and the second alternative \([^()]*\) matching a
substring delimited by parentheses. The problem with this regexp is
that it is pathological: it has nested indeterminate quantifiers
of the form (a+|b)+. We discussed in Part 1 how nested quantifiers
like this could take an exponentially long time to execute if there
was no match possible. To prevent the exponential blowup, we need to
prevent useless backtracking at some point. This can be done by
enclosing the inner quantifier as an independent subexpression:
- $x =~ /\( ( (?>[^()]+) | \([^()]*\) )+ \)/x;
Here, (?>[^()]+) breaks the degeneracy of string partitioning
by gobbling up as much of the string as possible and keeping it. Then
match failures fail much more quickly.
A conditional expression is a form of if-then-else statement
that allows one to choose which patterns are to be matched, based on
some condition. There are two types of conditional expression:
(?(condition)yes-regexp) and
(?(condition)yes-regexp|no-regexp). (?(condition)yes-regexp) is
like an 'if () {}'
statement in Perl. If the condition
is true,
the yes-regexp
will be matched. If the condition
is false, the
yes-regexp
will be skipped and Perl will move onto the next regexp
element. The second form is like an 'if () {} else {}'
statement
in Perl. If the condition
is true, the yes-regexp
will be
matched, otherwise the no-regexp will be matched.
The condition
can have several forms. The first form is simply an
integer in parentheses (integer)
. It is true if the corresponding
backreference \integer
matched earlier in the regexp. The same
thing can be done with a name associated with a capture group, written
as (<name>)
or ('name')
. The second form is a bare
zero-width assertion (?...), either a lookahead, a lookbehind, or a
code assertion (discussed in the next section). The third set of forms
provides tests that return true if the expression is executed within
a recursion ((R)
) or is being called from some capturing group,
referenced either by number ((R1)
, (R2)
,...) or by name
((R&name)
).
The integer or name form of the condition
allows us to choose,
with more flexibility, what to match based on what matched earlier in the
regexp. This searches for words of the form "$x$x"
or "$x$y$y$x"
:
- % simple_grep '^(\w+)(\w+)?(?(2)\g2\g1|\g1)$' /usr/dict/words
- beriberi
- coco
- couscous
- deed
- ...
- toot
- toto
- tutu
The lookbehind condition
allows, along with backreferences,
an earlier part of the match to influence a later part of the
match. For instance,
- /[ATGC]+(?(?<=AA)G|C)$/;
matches a DNA sequence such that it either ends in AAG
, or some
other base pair combination and C
. Note that the form is
(?(?<=AA)G|C) and not (?((?<=AA))G|C); for the
lookahead, lookbehind or code assertions, the parentheses around the
conditional are not needed.
Some regular expressions use identical subpatterns in several places.
Starting with Perl 5.10, it is possible to define named subpatterns in
a section of the pattern so that they can be called up by name
anywhere in the pattern. This syntactic pattern for this definition
group is (?(DEFINE)(?<name>pattern)...). An insertion
of a named pattern is written as (?&name).
The example below illustrates this feature using the pattern for floating point numbers that was presented earlier on. The three subpatterns that are used more than once are the optional sign, the digit sequence for an integer and the decimal fraction. The DEFINE group at the end of the pattern contains their definition. Notice that the decimal fraction pattern is the first place where we can reuse the integer pattern.
- /^ (?&osg)\ * ( (?&int)(?&dec)? | (?&dec) )
- (?: [eE](?&osg)(?&int) )?
- $
- (?(DEFINE)
- (?<osg>[-+]?) # optional sign
- (?<int>\d++) # integer
- (?<dec>\.(?&int)) # decimal fraction
- )/x
This feature (introduced in Perl 5.10) significantly extends the
power of Perl's pattern matching. By referring to some other
capture group anywhere in the pattern with the construct
(?group-ref), the pattern within the referenced group is used
as an independent subpattern in place of the group reference itself.
Because the group reference may be contained within the group it
refers to, it is now possible to apply pattern matching to tasks that
hitherto required a recursive parser.
To illustrate this feature, we'll design a pattern that matches if a string contains a palindrome. (This is a word or a sentence that, while ignoring spaces, interpunctuation and case, reads the same backwards as forwards. We begin by observing that the empty string or a string containing just one word character is a palindrome. Otherwise it must have a word character up front and the same at its end, with another palindrome in between.
- /(?: (\w) (?...Here be a palindrome...) \g{-1} | \w? )/x
Adding \W*
at either end to eliminate what is to be ignored, we already
have the full pattern:
In (?...) both absolute and relative backreferences may be used.
The entire pattern can be reinserted with (?R) or (?0).
If you prefer to name your groups, you can use (?&name) to
recurse into that group.
Normally, regexps are a part of Perl expressions.
Code evaluation expressions turn that around by allowing
arbitrary Perl code to be a part of a regexp. A code evaluation
expression is denoted (?{code}), with code a string of Perl
statements.
Be warned that this feature is considered experimental, and may be changed without notice.
Code expressions are zero-width assertions, and the value they return
depends on their environment. There are two possibilities: either the
code expression is used as a conditional in a conditional expression
(?(condition)...), or it is not. If the code expression is a
conditional, the code is evaluated and the result (i.e., the result of
the last statement) is used to determine truth or falsehood. If the
code expression is not used as a conditional, the assertion always
evaluates true and the result is put into the special variable
$^R
. The variable $^R
can then be used in code expressions later
in the regexp. Here are some silly examples:
- $x = "abcdef";
- $x =~ /abc(?{print "Hi Mom!";})def/; # matches,
- # prints 'Hi Mom!'
- $x =~ /aaa(?{print "Hi Mom!";})def/; # doesn't match,
- # no 'Hi Mom!'
Pay careful attention to the next example:
- $x =~ /abc(?{print "Hi Mom!";})ddd/; # doesn't match,
- # no 'Hi Mom!'
- # but why not?
At first glance, you'd think that it shouldn't print, because obviously
the ddd
isn't going to match the target string. But look at this
example:
- $x =~ /abc(?{print "Hi Mom!";})[dD]dd/; # doesn't match,
- # but _does_ print
Hmm. What happened here? If you've been following along, you know that
the above pattern should be effectively (almost) the same as the last one;
enclosing the d
in a character class isn't going to change what it
matches. So why does the first not print while the second one does?
The answer lies in the optimizations the regex engine makes. In the first
case, all the engine sees are plain old characters (aside from the
?{} construct). It's smart enough to realize that the string 'ddd'
doesn't occur in our target string before actually running the pattern
through. But in the second case, we've tricked it into thinking that our
pattern is more complicated. It takes a look, sees our
character class, and decides that it will have to actually run the
pattern to determine whether or not it matches, and in the process of
running it hits the print statement before it discovers that we don't
have a match.
To take a closer look at how the engine does optimizations, see the section Pragmas and debugging below.
More fun with ?{}:
- $x =~ /(?{print "Hi Mom!";})/; # matches,
- # prints 'Hi Mom!'
- $x =~ /(?{$c = 1;})(?{print "$c";})/; # matches,
- # prints '1'
- $x =~ /(?{$c = 1;})(?{print "$^R";})/; # matches,
- # prints '1'
The bit of magic mentioned in the section title occurs when the regexp
backtracks in the process of searching for a match. If the regexp
backtracks over a code expression and if the variables used within are
localized using local, the changes in the variables produced by the
code expression are undone! Thus, if we wanted to count how many times
a character got matched inside a group, we could use, e.g.,
- $x = "aaaa";
- $count = 0; # initialize 'a' count
- $c = "bob"; # test if $c gets clobbered
- $x =~ /(?{local $c = 0;}) # initialize count
- ( a # match 'a'
- (?{local $c = $c + 1;}) # increment count
- )* # do this any number of times,
- aa # but match 'aa' at the end
- (?{$count = $c;}) # copy local $c var into $count
- /x;
- print "'a' count is $count, \$c variable is '$c'\n";
This prints
- 'a' count is 2, $c variable is 'bob'
If we replace the (?{local $c = $c + 1;}) with
(?{$c = $c + 1;}), the variable changes are not undone
during backtracking, and we get
- 'a' count is 4, $c variable is 'bob'
Note that only localized variable changes are undone. Other side effects of code expression execution are permanent. Thus
- $x = "aaaa";
- $x =~ /(a(?{print "Yow\n";}))*aa/;
produces
- Yow
- Yow
- Yow
- Yow
The result $^R
is automatically localized, so that it will behave
properly in the presence of backtracking.
This example uses a code expression in a conditional to match a definite article, either 'the' in English or 'der|die|das' in German:
- $lang = 'DE'; # use German
- ...
- $text = "das";
- print "matched\n"
- if $text =~ /(?(?{
- $lang eq 'EN'; # is the language English?
- })
- the | # if so, then match 'the'
- (der|die|das) # else, match 'der|die|das'
- )
- /xi;
Note that the syntax here is (?(?{...})yes-regexp|no-regexp), not
(?((?{...}))yes-regexp|no-regexp). In other words, in the case of a
code expression, we don't need the extra parentheses around the
conditional.
If you try to use code expressions where the code text is contained within an interpolated variable, rather than appearing literally in the pattern, Perl may surprise you:
- $bar = 5;
- $pat = '(?{ 1 })';
- /foo(?{ $bar })bar/; # compiles ok, $bar not interpolated
- /foo(?{ 1 })$bar/; # compiles ok, $bar interpolated
- /foo${pat}bar/; # compile error!
- $pat = qr/(?{ $foo = 1 })/; # precompile code regexp
- /foo${pat}bar/; # compiles ok
If a regexp has a variable that interpolates a code expression, Perl treats the regexp as an error. If the code expression is precompiled into a variable, however, interpolating is ok. The question is, why is this an error?
The reason is that variable interpolation and code expressions together pose a security risk. The combination is dangerous because many programmers who write search engines often take user input and plug it directly into a regexp:
- $regexp = <>; # read user-supplied regexp
- $chomp $regexp; # get rid of possible newline
- $text =~ /$regexp/; # search $text for the $regexp
If the $regexp
variable contains a code expression, the user could
then execute arbitrary Perl code. For instance, some joker could
search for system('rm -rf *');
to erase your files. In this
sense, the combination of interpolation and code expressions taints
your regexp. So by default, using both interpolation and code
expressions in the same regexp is not allowed. If you're not
concerned about malicious users, it is possible to bypass this
security check by invoking use re 'eval'
:
- use re 'eval'; # throw caution out the door
- $bar = 5;
- $pat = '(?{ 1 })';
- /foo${pat}bar/; # compiles ok
Another form of code expression is the pattern code expression. The pattern code expression is like a regular code expression, except that the result of the code evaluation is treated as a regular expression and matched immediately. A simple example is
- $length = 5;
- $char = 'a';
- $x = 'aaaaabb';
- $x =~ /(??{$char x $length})/x; # matches, there are 5 of 'a'
This final example contains both ordinary and pattern code
expressions. It detects whether a binary string 1101010010001...
has a
Fibonacci spacing 0,1,1,2,3,5,... of the 1
's:
- $x = "1101010010001000001";
- $z0 = ''; $z1 = '0'; # initial conditions
- print "It is a Fibonacci sequence\n"
- if $x =~ /^1 # match an initial '1'
- (?:
- ((??{ $z0 })) # match some '0'
- 1 # and then a '1'
- (?{ $z0 = $z1; $z1 .= $^N; })
- )+ # repeat as needed
- $ # that is all there is
- /x;
- printf "Largest sequence matched was %d\n", length($z1)-length($z0);
Remember that $^N
is set to whatever was matched by the last
completed capture group. This prints
- It is a Fibonacci sequence
- Largest sequence matched was 5
Ha! Try that with your garden variety regexp package...
Note that the variables $z0
and $z1
are not substituted when the
regexp is compiled, as happens for ordinary variables outside a code
expression. Rather, the whole code block is parsed as perl code at the
same time as perl is compiling the code containing the literal regexp
pattern.
The regexp without the //x
modifier is
- /^1(?:((??{ $z0 }))1(?{ $z0 = $z1; $z1 .= $^N; }))+$/
which shows that spaces are still possible in the code parts. Nevertheless, when working with code and conditional expressions, the extended form of regexps is almost necessary in creating and debugging regexps.
Perl 5.10 introduced a number of control verbs intended to provide detailed control over the backtracking process, by directly influencing the regexp engine and by providing monitoring techniques. As all the features in this group are experimental and subject to change or removal in a future version of Perl, the interested reader is referred to Special Backtracking Control Verbs in perlre for a detailed description.
Below is just one example, illustrating the control verb (*FAIL)
,
which may be abbreviated as (*F)
. If this is inserted in a regexp
it will cause it to fail, just as it would at some
mismatch between the pattern and the string. Processing
of the regexp continues as it would after any "normal"
failure, so that, for instance, the next position in the string or another
alternative will be tried. As failing to match doesn't preserve capture
groups or produce results, it may be necessary to use this in
combination with embedded code.
The pattern begins with a class matching a subset of letters. Whenever
this matches, a statement like $count{'a'}++;
is executed, incrementing
the letter's counter. Then (*FAIL)
does what it says, and
the regexp engine proceeds according to the book: as long as the end of
the string hasn't been reached, the position is advanced before looking
for another vowel. Thus, match or no match makes no difference, and the
regexp engine proceeds until the entire string has been inspected.
(It's remarkable that an alternative solution using something like
is considerably slower.)
Speaking of debugging, there are several pragmas available to control
and debug regexps in Perl. We have already encountered one pragma in
the previous section, use re 'eval';
, that allows variable
interpolation and code expressions to coexist in a regexp. The other
pragmas are
- use re 'taint';
- $tainted = <>;
- @parts = ($tainted =~ /(\w+)\s+(\w+)/; # @parts is now tainted
The taint
pragma causes any substrings from a match with a tainted
variable to be tainted as well. This is not normally the case, as
regexps are often used to extract the safe bits from a tainted
variable. Use taint
when you are not extracting safe bits, but are
performing some other processing. Both taint
and eval pragmas
are lexically scoped, which means they are in effect only until
the end of the block enclosing the pragmas.
- use re '/m'; # or any other flags
- $multiline_string =~ /^foo/; # /m is implied
The re '/flags'
pragma (introduced in Perl
5.14) turns on the given regular expression flags
until the end of the lexical scope. See
'/flags' mode in re for more
detail.
The global debug
and debugcolor
pragmas allow one to get
detailed debugging info about regexp compilation and
execution. debugcolor
is the same as debug, except the debugging
information is displayed in color on terminals that can display
termcap color sequences. Here is example output:
- % perl -e 'use re "debug"; "abc" =~ /a*b+c/;'
- Compiling REx 'a*b+c'
- size 9 first at 1
- 1: STAR(4)
- 2: EXACT <a>(0)
- 4: PLUS(7)
- 5: EXACT <b>(0)
- 7: EXACT <c>(9)
- 9: END(0)
- floating 'bc' at 0..2147483647 (checking floating) minlen 2
- Guessing start of match, REx 'a*b+c' against 'abc'...
- Found floating substr 'bc' at offset 1...
- Guessed: match at offset 0
- Matching REx 'a*b+c' against 'abc'
- Setting an EVAL scope, savestack=3
- 0 <> <abc> | 1: STAR
- EXACT <a> can match 1 times out of 32767...
- Setting an EVAL scope, savestack=3
- 1 <a> <bc> | 4: PLUS
- EXACT <b> can match 1 times out of 32767...
- Setting an EVAL scope, savestack=3
- 2 <ab> <c> | 7: EXACT <c>
- 3 <abc> <> | 9: END
- Match successful!
- Freeing REx: 'a*b+c'
If you have gotten this far into the tutorial, you can probably guess what the different parts of the debugging output tell you. The first part
- Compiling REx 'a*b+c'
- size 9 first at 1
- 1: STAR(4)
- 2: EXACT <a>(0)
- 4: PLUS(7)
- 5: EXACT <b>(0)
- 7: EXACT <c>(9)
- 9: END(0)
describes the compilation stage. STAR(4)
means that there is a
starred object, in this case 'a'
, and if it matches, goto line 4,
i.e., PLUS(7)
. The middle lines describe some heuristics and
optimizations performed before a match:
- floating 'bc' at 0..2147483647 (checking floating) minlen 2
- Guessing start of match, REx 'a*b+c' against 'abc'...
- Found floating substr 'bc' at offset 1...
- Guessed: match at offset 0
Then the match is executed and the remaining lines describe the process:
- Matching REx 'a*b+c' against 'abc'
- Setting an EVAL scope, savestack=3
- 0 <> <abc> | 1: STAR
- EXACT <a> can match 1 times out of 32767...
- Setting an EVAL scope, savestack=3
- 1 <a> <bc> | 4: PLUS
- EXACT <b> can match 1 times out of 32767...
- Setting an EVAL scope, savestack=3
- 2 <ab> <c> | 7: EXACT <c>
- 3 <abc> <> | 9: END
- Match successful!
- Freeing REx: 'a*b+c'
Each step is of the form n <x> <y>, with <x>
the
part of the string matched and <y>
the part not yet
matched. The | 1: STAR says that Perl is at line number 1
in the compilation list above. See
Debugging Regular Expressions in perldebguts for much more detail.
An alternative method of debugging regexps is to embed print
statements within the regexp. This provides a blow-by-blow account of
the backtracking in an alternation:
- "that this" =~ m@(?{print "Start at position ", pos, "\n";})
- t(?{print "t1\n";})
- h(?{print "h1\n";})
- i(?{print "i1\n";})
- s(?{print "s1\n";})
- |
- t(?{print "t2\n";})
- h(?{print "h2\n";})
- a(?{print "a2\n";})
- t(?{print "t2\n";})
- (?{print "Done at position ", pos, "\n";})
- @x;
prints
- Start at position 0
- t1
- h1
- t2
- h2
- a2
- t2
- Done at position 4
Code expressions, conditional expressions, and independent expressions are experimental. Don't use them in production code. Yet.
This is just a tutorial. For the full story on Perl regular expressions, see the perlre regular expressions reference page.
For more information on the matching m// and substitution s///
operators, see Regexp Quote-Like Operators in perlop. For
information on the split operation, see split.
For an excellent all-around resource on the care and feeding of regular expressions, see the book Mastering Regular Expressions by Jeffrey Friedl (published by O'Reilly, ISBN 1556592-257-3).
Copyright (c) 2000 Mark Kvale All rights reserved.
This document may be distributed under the same terms as Perl itself.
The inspiration for the stop codon DNA example came from the ZIP code example in chapter 7 of Mastering Regular Expressions.
The author would like to thank Jeff Pinyan, Andrew Johnson, Peter Haworth, Ronald J Kimball, and Joe Smith for all their helpful comments.
perlriscos - Perl version 5 for RISC OS
This document gives instructions for building Perl for RISC OS. It is complicated by the need to cross compile. There is a binary version of perl available from http://www.cp15.org/perl/ which you may wish to use instead of trying to compile it yourself.
You need an installed and working gccsdk cross compiler http://gccsdk.riscos.info/ and REXEN http://www.cp15.org/programming/
Firstly, copy the source and build a native copy of perl for your host system. Then, in the source to be cross compiled:
- $ ./Configure
Select the riscos hint file. The default answers for the rest of the questions are usually sufficient.
Note that, if you wish to run Configure non-interactively (see the INSTALL document for details), to have it select the correct hint file, you'll need to provide the argument -Dhintfile=riscos on the Configure command-line.
- $ make miniperl
This should build miniperl and then fail when it tries to run it.
Copy the miniperl executable from the native build done earlier to replace the cross compiled miniperl.
- $ make
This will use miniperl to complete the rest of the build.
Alex Waugh <alex@alexwaugh.com>
perlrun - how to execute the Perl interpreter
perl [ -sTtuUWX ] [ -hv ] [ -V[:configvar] ] [ -cw ] [ -d[t][:debugger] ] [ -D[number/list] ] [ -pna ] [ -Fpattern ] [ -l[octal] ] [ -0[octal/hexadecimal] ] [ -Idir ] [ -m[-]module ] [ -M[-]'module...' ] [ -f ] [ -C [number/list] ] [ -S ] [ -x[dir] ] [ -i[extension] ] [ [-e|-E] 'command' ] [ -- ] [ programfile ] [ argument ]...
The normal way to run a Perl program is by making it directly executable, or else by passing the name of the source file as an argument on the command line. (An interactive Perl environment is also possible--see perldebug for details on how to do that.) Upon startup, Perl looks for your program in one of the following places:
Specified line by line via -e or -E switches on the command line.
Contained in the file specified by the first filename on the command line.
(Note that systems supporting the #!
notation invoke interpreters this
way. See Location of Perl.)
Passed in implicitly via standard input. This works only if there are no filename arguments--to pass arguments to a STDIN-read program you must explicitly specify a "-" for the program name.
With methods 2 and 3, Perl starts parsing the input file from the
beginning, unless you've specified a -x switch, in which case it
scans for the first line starting with #!
and containing the word
"perl", and starts there instead. This is useful for running a program
embedded in a larger message. (In this case you would indicate the end
of the program using the __END__
token.)
The #!
line is always examined for switches as the line is being
parsed. Thus, if you're on a machine that allows only one argument
with the #!
line, or worse, doesn't even recognize the #!
line, you
still can get consistent switch behaviour regardless of how Perl was
invoked, even if -x was used to find the beginning of the program.
Because historically some operating systems silently chopped off
kernel interpretation of the #!
line after 32 characters, some
switches may be passed in on the command line, and some may not;
you could even get a "-" without its letter, if you're not careful.
You probably want to make sure that all your switches fall either
before or after that 32-character boundary. Most switches don't
actually care if they're processed redundantly, but getting a "-"
instead of a complete switch could cause Perl to try to execute
standard input instead of your program. And a partial -I switch
could also cause odd results.
Some switches do care if they are processed twice, for instance
combinations of -l and -0. Either put all the switches after
the 32-character boundary (if applicable), or replace the use of
-0digits by BEGIN{ $/ = "\0digits"; }
.
Parsing of the #!
switches starts wherever "perl" is mentioned in the line.
The sequences "-*" and "- " are specifically ignored so that you could,
if you were so inclined, say
- #!/bin/sh
- #! -*-perl-*-
- eval 'exec perl -x -wS $0 ${1+"$@"}'
- if 0;
to let Perl see the -p switch.
A similar trick involves the env program, if you have it.
- #!/usr/bin/env perl
The examples above use a relative path to the perl interpreter,
getting whatever version is first in the user's path. If you want
a specific version of Perl, say, perl5.14.1, you should place
that directly in the #!
line's path.
If the #!
line does not contain the word "perl" nor the word "indir"
the program named after the #!
is executed instead of the Perl
interpreter. This is slightly bizarre, but it helps people on machines
that don't do #!
, because they can tell a program that their SHELL is
/usr/bin/perl, and Perl will then dispatch the program to the correct
interpreter for them.
After locating your program, Perl compiles the entire program to an internal form. If there are any compilation errors, execution of the program is not attempted. (This is unlike the typical shell script, which might run part-way through before finding a syntax error.)
If the program is syntactically correct, it is executed. If the program
runs off the end without hitting an exit() or die() operator, an implicit
exit(0) is provided to indicate successful completion.
Unix's #!
technique can be simulated on other systems:
Put
- extproc perl -S -your_switches
as the first line in *.cmd file (-S due to a bug in cmd.exe's
`extproc' handling).
Create a batch file to run your program, and codify it in
ALTERNATE_SHEBANG
(see the dosish.h file in the source
distribution for more information).
The Win95/NT installation, when using the ActiveState installer for Perl, will modify the Registry to associate the .pl extension with the perl interpreter. If you install Perl by other means (including building from the sources), you may have to modify the Registry yourself. Note that this means you can no longer tell the difference between an executable Perl program and a Perl library file.
Put
at the top of your program, where -mysw are any command line switches you
want to pass to Perl. You can now invoke the program directly, by saying
perl program
, or as a DCL procedure, by saying @program
(or implicitly
via DCL$PATH by just using the name of the program).
This incantation is a bit much to remember, but Perl will display it for
you if you say perl "-V:startperl"
.
Command-interpreters on non-Unix systems have rather different ideas
on quoting than Unix shells. You'll need to learn the special
characters in your command-interpreter (*
, \
and " are
common) and how to protect whitespace and these characters to run
one-liners (see -e below).
On some systems, you may have to change single-quotes to double ones, which you must not do on Unix or Plan 9 systems. You might also have to change a single % to a %%.
For example:
- # Unix
- perl -e 'print "Hello world\n"'
- # MS-DOS, etc.
- perl -e "print \"Hello world\n\""
- # VMS
- perl -e "print ""Hello world\n"""
The problem is that none of this is reliable: it depends on the command and it is entirely possible neither works. If 4DOS were the command shell, this would probably work better:
- perl -e "print <Ctrl-x>"Hello world\n<Ctrl-x>""
CMD.EXE in Windows NT slipped a lot of standard Unix functionality in when nobody was looking, but just try to find documentation for its quoting rules.
There is no general solution to all of this. It's just a mess.
It may seem obvious to say, but Perl is useful only when users can easily find it. When possible, it's good for both /usr/bin/perl and /usr/local/bin/perl to be symlinks to the actual binary. If that can't be done, system administrators are strongly encouraged to put (symlinks to) perl and its accompanying utilities into a directory typically found along a user's PATH, or in some other obvious and convenient place.
In this documentation, #!/usr/bin/perl
on the first line of the program
will stand in for whatever method works on your system. You are
advised to use a specific path if you care about a specific version.
- #!/usr/local/bin/perl5.14
or if you just want to be running at least version, place a statement like this at the top of your program:
- use 5.014;
As with all standard commands, a single-character switch may be clustered with the following switch, if any.
- #!/usr/bin/perl -spi.orig # same as -s -p -i.orig
Switches include:
specifies the input record separator ($/
) as an octal or
hexadecimal number. If there are no digits, the null character is the
separator. Other switches may precede or follow the digits. For
example, if you have a version of find which can print filenames
terminated by the null character, you can say this:
- find . -name '*.orig' -print0 | perl -n0e unlink
The special value 00 will cause Perl to slurp files in paragraph mode. Any value 0400 or above will cause Perl to slurp files whole, but by convention the value 0777 is the one normally used for this purpose.
You can also specify the separator character using hexadecimal notation:
-0xHHH..., where the H are valid hexadecimal digits. Unlike
the octal form, this one may be used to specify any Unicode character, even
those beyond 0xFF. So if you really want a record separator of 0777,
specify it as -0x1FF. (This means that you cannot use the -x option
with a directory name that consists of hexadecimal digits, or else Perl
will think you have specified a hex number to -0.)
turns on autosplit mode when used with a -n or -p. An implicit split command to the @F array is done as the first thing inside the implicit while loop produced by the -n or -p.
- perl -ane 'print pop(@F), "\n";'
is equivalent to
An alternate delimiter may be specified using -F.
The -C flag controls some of the Perl Unicode features.
As of 5.8.1, the -C can be followed either by a number or a list of option letters. The letters, their numeric values, and effects are as follows; listing the letters is equal to summing the numbers.
- I 1 STDIN is assumed to be in UTF-8
- O 2 STDOUT will be in UTF-8
- E 4 STDERR will be in UTF-8
- S 7 I + O + E
- i 8 UTF-8 is the default PerlIO layer for input streams
- o 16 UTF-8 is the default PerlIO layer for output streams
- D 24 i + o
- A 32 the @ARGV elements are expected to be strings encoded
- in UTF-8
- L 64 normally the "IOEioA" are unconditional, the L makes
- them conditional on the locale environment variables
- (the LC_ALL, LC_TYPE, and LANG, in the order of
- decreasing precedence) -- if the variables indicate
- UTF-8, then the selected "IOEioA" are in effect
- a 256 Set ${^UTF8CACHE} to -1, to run the UTF-8 caching
- code in debugging mode.
For example, -COE and -C6 will both turn on UTF-8-ness on both STDOUT and STDERR. Repeating letters is just redundant, not cumulative nor toggling.
The io
options mean that any subsequent open() (or similar I/O
operations) in the current file scope will have the :utf8
PerlIO layer
implicitly applied to them, in other words, UTF-8 is expected from any
input stream, and UTF-8 is produced to any output stream. This is just
the default, with explicit layers in open() and with binmode() one can
manipulate streams as usual.
-C on its own (not followed by any number or option list), or the
empty string ""
for the PERL_UNICODE
environment variable, has the
same effect as -CSDL. In other words, the standard I/O handles and
the default open() layer are UTF-8-fied but only if the locale
environment variables indicate a UTF-8 locale. This behaviour follows
the implicit (and problematic) UTF-8 behaviour of Perl 5.8.0.
(See UTF-8 no longer default under UTF-8 locales in perl581delta.)
You can use -C0 (or "0"
for PERL_UNICODE
) to explicitly
disable all the above Unicode features.
The read-only magic variable ${^UNICODE}
reflects the numeric value
of this setting. This variable is set during Perl startup and is
thereafter read-only. If you want runtime effects, use the three-arg
open() (see open), the two-arg binmode() (see binmode),
and the open pragma (see open).
(In Perls earlier than 5.8.1 the -C switch was a Win32-only switch that enabled the use of Unicode-aware "wide system call" Win32 APIs. This feature was practically unused, however, and the command line switch was therefore "recycled".)
Note: Since perl 5.10.1, if the -C option is used on the #!
line,
it must be specified on the command line as well, since the standard streams
are already set up at this point in the execution of the perl interpreter.
You can also use binmode() to set the encoding of an I/O stream.
causes Perl to check the syntax of the program and then exit without
executing it. Actually, it will execute and BEGIN
, UNITCHECK
,
or CHECK
blocks and any use statements: these are considered as
occurring outside the execution of your program. INIT
and END
blocks, however, will be skipped.
runs the program under the Perl debugger. See perldebug. If t is specified, it indicates to the debugger that threads will be used in the code being debugged.
runs the program under the control of a debugging, profiling, or tracing
module installed as Devel::MOD. E.g., -d:DProf executes the
program using the Devel::DProf
profiler. As with the -M flag, options
may be passed to the Devel::MOD package where they will be received
and interpreted by the Devel::MOD::import routine. Again, like -M,
use --d:-MOD to call Devel::MOD::unimport instead of import. The
comma-separated list of options must follow a =
character. If t is
specified, it indicates to the debugger that threads will be used in the
code being debugged. See perldebug.
sets debugging flags. To watch how it executes your program, use -Dtls. (This works only if debugging is compiled into your Perl.) Another nice value is -Dx, which lists your compiled syntax tree. And -Dr displays compiled regular expressions; the format of the output is explained in perldebguts.
As an alternative, specify a number instead of list of letters (e.g., -D14 is equivalent to -Dtls):
- 1 p Tokenizing and parsing (with v, displays parse stack)
- 2 s Stack snapshots (with v, displays all stacks)
- 4 l Context (loop) stack processing
- 8 t Trace execution
- 16 o Method and overloading resolution
- 32 c String/numeric conversions
- 64 P Print profiling info, source file input state
- 128 m Memory and SV allocation
- 256 f Format processing
- 512 r Regular expression parsing and execution
- 1024 x Syntax tree dump
- 2048 u Tainting checks
- 4096 U Unofficial, User hacking (reserved for private,
- unreleased use)
- 8192 H Hash dump -- usurps values()
- 16384 X Scratchpad allocation
- 32768 D Cleaning up
- 65536 S Op slab allocation
- 131072 T Tokenizing
- 262144 R Include reference counts of dumped variables (eg when
- using -Ds)
- 524288 J show s,t,P-debug (don't Jump over) on opcodes within
- package DB
- 1048576 v Verbose: use in conjunction with other flags
- 2097152 C Copy On Write
- 4194304 A Consistency checks on internal structures
- 8388608 q quiet - currently only suppresses the "EXECUTING"
- message
- 16777216 M trace smart match resolution
- 33554432 B dump suBroutine definitions, including special Blocks
- like BEGIN
All these flags require -DDEBUGGING when you compile the Perl
executable (but see :opd
in Devel::Peek or 'debug' mode in re
which may change this).
See the INSTALL file in the Perl source distribution
for how to do this. This flag is automatically set if you include -g
option when Configure
asks you about optimizer/debugger flags.
If you're just trying to get a print out of each line of Perl code
as it executes, the way that sh -x
provides for shell scripts,
you can't use Perl's -D switch. Instead do this
- # If you have "env" utility
- env PERLDB_OPTS="NonStop=1 AutoTrace=1 frame=2" perl -dS program
- # Bourne shell syntax
- $ PERLDB_OPTS="NonStop=1 AutoTrace=1 frame=2" perl -dS program
- # csh syntax
- % (setenv PERLDB_OPTS "NonStop=1 AutoTrace=1 frame=2"; perl -dS program)
See perldebug for details and variations.
may be used to enter one line of program. If -e is given, Perl will not look for a filename in the argument list. Multiple -e commands may be given to build up a multi-line script. Make sure to use semicolons where you would in a normal program.
behaves just like -e, except that it implicitly enables all optional features (in the main compilation unit). See feature.
Disable executing $Config{sitelib}/sitecustomize.pl at startup.
Perl can be built so that it by default will try to execute $Config{sitelib}/sitecustomize.pl at startup (in a BEGIN block). This is a hook that allows the sysadmin to customize how Perl behaves. It can for instance be used to add entries to the @INC array to make Perl find modules in non-standard locations.
Perl actually inserts the following code:
Since it is an actual do (not a require), sitecustomize.pl
doesn't need to return a true value. The code is run in package main
,
in its own lexical scope. However, if the script dies, $@
will not
be set.
The value of $Config{sitelib}
is also determined in C code and not
read from Config.pm
, which is not loaded.
The code is executed very early. For example, any changes made to
@INC
will show up in the output of `perl -V`. Of course, END
blocks will be likewise executed very late.
To determine at runtime if this capability has been compiled in your
perl, you can check the value of $Config{usesitecustomize}
.
specifies the pattern to split on if -a is also in effect. The
pattern may be surrounded by //
, ""
, or ''
, otherwise it will be
put in single quotes. You can't use literal whitespace in the pattern.
prints a summary of the options.
specifies that files processed by the <>
construct are to be
edited in-place. It does this by renaming the input file, opening the
output file by the original name, and selecting that output file as the
default for print() statements. The extension, if supplied, is used to
modify the name of the old file to make a backup copy, following these
rules:
If no extension is supplied, and your system supports it, the original file is kept open without a name while the output is redirected to a new file with the original filename. When perl exits, cleanly or not, the original file is unlinked.
If the extension doesn't contain a *
, then it is appended to the
end of the current filename as a suffix. If the extension does
contain one or more *
characters, then each *
is replaced
with the current filename. In Perl terms, you could think of this
as:
- ($backup = $extension) =~ s/\*/$file_name/g;
This allows you to add a prefix to the backup file, instead of (or in addition to) a suffix:
- $ perl -pi'orig_*' -e 's/bar/baz/' fileA # backup to
- # 'orig_fileA'
Or even to place backup copies of the original files into another directory (provided the directory already exists):
- $ perl -pi'old/*.orig' -e 's/bar/baz/' fileA # backup to
- # 'old/fileA.orig'
These sets of one-liners are equivalent:
- $ perl -pi -e 's/bar/baz/' fileA # overwrite current file
- $ perl -pi'*' -e 's/bar/baz/' fileA # overwrite current file
- $ perl -pi'.orig' -e 's/bar/baz/' fileA # backup to 'fileA.orig'
- $ perl -pi'*.orig' -e 's/bar/baz/' fileA # backup to 'fileA.orig'
From the shell, saying
- $ perl -p -i.orig -e "s/foo/bar/; ... "
is the same as using the program:
- #!/usr/bin/perl -pi.orig
- s/foo/bar/;
which is equivalent to
- #!/usr/bin/perl
- $extension = '.orig';
- LINE: while (<>) {
- if ($ARGV ne $oldargv) {
- if ($extension !~ /\*/) {
- $backup = $ARGV . $extension;
- }
- else {
- ($backup = $extension) =~ s/\*/$ARGV/g;
- }
- rename($ARGV, $backup);
- open(ARGVOUT, ">$ARGV");
- select(ARGVOUT);
- $oldargv = $ARGV;
- }
- s/foo/bar/;
- }
- continue {
- print; # this prints to original filename
- }
- select(STDOUT);
except that the -i form doesn't need to compare $ARGV to $oldargv to know when the filename has changed. It does, however, use ARGVOUT for the selected filehandle. Note that STDOUT is restored as the default output filehandle after the loop.
As shown above, Perl creates the backup file whether or not any output is actually changed. So this is just a fancy way to copy files:
- $ perl -p -i'/some/file/path/*' -e 1 file1 file2 file3...
- or
- $ perl -p -i'.orig' -e 1 file1 file2 file3...
You can use eof without parentheses to locate the end of each input
file, in case you want to append to each file, or reset line numbering
(see example in eof).
If, for a given file, Perl is unable to create the backup file as specified in the extension then it will skip that file and continue on with the next one (if it exists).
For a discussion of issues surrounding file permissions and -i, see Why does Perl let me delete read-only files? Why does -i clobber protected files? Isn't this a bug in Perl? in perlfaq5.
You cannot use -i to create directories or to strip extensions from files.
Perl does not expand ~
in filenames, which is good, since some
folks use it for their backup files:
- $ perl -pi~ -e 's/foo/bar/' file1 file2 file3...
Note that because -i renames or deletes the original file before creating a new file of the same name, Unix-style soft and hard links will not be preserved.
Finally, the -i switch does not impede execution when no files are given on the command line. In this case, no backup is made (the original file cannot, of course, be determined) and processing proceeds from STDIN to STDOUT as might be expected.
Directories specified by -I are prepended to the search path for
modules (@INC
).
enables automatic line-ending processing. It has two separate
effects. First, it automatically chomps $/
(the input record
separator) when used with -n or -p. Second, it assigns $\
(the output record separator) to have the value of octnum so
that any print statements will have that separator added back on.
If octnum is omitted, sets $\
to the current value of
$/
. For instance, to trim lines to 80 columns:
- perl -lpe 'substr($_, 80) = ""'
Note that the assignment $\ = $/
is done when the switch is processed,
so the input record separator can be different than the output record
separator if the -l switch is followed by a -0 switch:
- gnufind / -print0 | perl -ln0e 'print "found $_" if -p'
This sets $\
to newline and then sets $/
to the null character.
-mmodule executes use module ();
before executing your
program.
-Mmodule executes use module ;
before executing your
program. You can use quotes to add extra code after the module name,
e.g., '-MMODULE qw(foo bar)'.
If the first character after the -M or -m is a dash (-) then the 'use' is replaced with 'no'.
A little builtin syntactic sugar means you can also say
-mMODULE=foo,bar or -MMODULE=foo,bar as a shortcut for
'-MMODULE qw(foo bar)'. This avoids the need to use quotes when
importing symbols. The actual code generated by -MMODULE=foo,bar is
use module split(/,/,q{foo,bar})
. Note that the =
form
removes the distinction between -m and -M.
A consequence of this is that -MMODULE=number never does a version check,
unless MODULE::import() itself is set up to do a version check, which
could happen for example if MODULE inherits from Exporter.
causes Perl to assume the following loop around your program, which makes it iterate over filename arguments somewhat like sed -n or awk:
- LINE:
- while (<>) {
- ... # your program goes here
- }
Note that the lines are not printed by default. See -p to have lines printed. If a file named by an argument cannot be opened for some reason, Perl warns you about it and moves on to the next file.
Also note that <>
passes command line arguments to
open, which doesn't necessarily interpret them as file names.
See perlop for possible security implications.
Here is an efficient way to delete all files that haven't been modified for at least a week:
This is faster than using the -exec switch of find because you don't have to start a process on every filename found. It does suffer from the bug of mishandling newlines in pathnames, which you can fix if you follow the example under -0.
BEGIN
and END
blocks may be used to capture control before or after
the implicit program loop, just as in awk.
causes Perl to assume the following loop around your program, which makes it iterate over filename arguments somewhat like sed:
If a file named by an argument cannot be opened for some reason, Perl warns you about it, and moves on to the next file. Note that the lines are printed automatically. An error occurring during printing is treated as fatal. To suppress printing use the -n switch. A -p overrides a -n switch.
BEGIN
and END
blocks may be used to capture control before or after
the implicit loop, just as in awk.
enables rudimentary switch parsing for switches on the command line after the program name but before any filename arguments (or before an argument of --). Any switch found there is removed from @ARGV and sets the corresponding variable in the Perl program. The following program prints "1" if the program is invoked with a -xyz switch, and "abc" if it is invoked with -xyz=abc.
- #!/usr/bin/perl -s
- if ($xyz) { print "$xyz\n" }
Do note that a switch like --help creates the variable ${-help}
, which is not compliant
with use strict "refs"
. Also, when using this option on a script with
warnings enabled you may get a lot of spurious "used only once" warnings.
makes Perl use the PATH environment variable to search for the program unless the name of the program contains path separators.
On some platforms, this also makes Perl append suffixes to the
filename while searching for it. For example, on Win32 platforms,
the ".bat" and ".cmd" suffixes are appended if a lookup for the
original name fails, and if the name does not already end in one
of those suffixes. If your Perl was compiled with DEBUGGING
turned
on, using the -Dp switch to Perl shows how the search progresses.
Typically this is used to emulate #!
startup on platforms that don't
support #!
. It's also convenient when debugging a script that uses #!
,
and is thus normally found by the shell's $PATH search mechanism.
This example works on many platforms that have a shell compatible with Bourne shell:
- #!/usr/bin/perl
- eval 'exec /usr/bin/perl -wS $0 ${1+"$@"}'
- if $running_under_some_shell;
The system ignores the first line and feeds the program to /bin/sh,
which proceeds to try to execute the Perl program as a shell script.
The shell executes the second line as a normal shell command, and thus
starts up the Perl interpreter. On some systems $0 doesn't always
contain the full pathname, so the -S tells Perl to search for the
program if necessary. After Perl locates the program, it parses the
lines and ignores them because the variable $running_under_some_shell
is never true. If the program will be interpreted by csh, you will need
to replace ${1+"$@"}
with $*
, even though that doesn't understand
embedded spaces (and such) in the argument list. To start up sh rather
than csh, some systems may have to replace the #!
line with a line
containing just a colon, which will be politely ignored by Perl. Other
systems can't control that, and need a totally devious construct that
will work under any of csh, sh, or Perl, such as the following:
If the filename supplied contains directory separators (and so is an absolute or relative pathname), and if that file is not found, platforms that append file extensions will do so and try to look for the file with those extensions added, one by one.
On DOS-like platforms, if the program does not contain directory separators, it will first be searched for in the current directory before being searched for on the PATH. On Unix platforms, the program will be searched for strictly on the PATH.
Like -T, but taint checks will issue warnings rather than fatal
errors. These warnings can now be controlled normally with no warnings
qw(taint)
.
Note: This is not a substitute for -T
! This is meant to be
used only as a temporary development aid while securing legacy code:
for real production code and for new secure code written from scratch,
always use the real -T.
turns on "taint" so you can test them. Ordinarily
these checks are done only when running setuid or setgid. It's a
good idea to turn them on explicitly for programs that run on behalf
of someone else whom you might not necessarily trust, such as CGI
programs or any internet servers you might write in Perl. See
perlsec for details. For security reasons, this option must be
seen by Perl quite early; usually this means it must appear early
on the command line or in the #!
line for systems which support
that construct.
This switch causes Perl to dump core after compiling your program. You can then in theory take this core dump and turn it into an executable file by using the undump program (not supplied). This speeds startup at the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello world" executable comes out to about 200K on my machine.) If you want to execute a portion of your program before dumping, use the dump() operator instead. Note: availability of undump is platform specific and may not be available for a specific port of Perl.
allows Perl to do unsafe operations. Currently the only "unsafe" operations are attempting to unlink directories while running as superuser and running setuid programs with fatal taint checks turned into warnings. Note that warnings must be enabled along with this option to actually generate the taint-check warnings.
prints the version and patchlevel of your perl executable.
prints summary of the major perl configuration values and the current values of @INC.
Prints to STDOUT the value of the named configuration variable(s),
with multiples when your configvar argument looks like a regex (has
non-letters). For example:
- $ perl -V:libc
- libc='/lib/libc-2.2.4.so';
- $ perl -V:lib.
- libs='-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc';
- libc='/lib/libc-2.2.4.so';
- $ perl -V:lib.*
- libpth='/usr/local/lib /lib /usr/lib';
- libs='-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc';
- lib_ext='.a';
- libc='/lib/libc-2.2.4.so';
- libperl='libperl.a';
- ....
Additionally, extra colons can be used to control formatting. A trailing colon suppresses the linefeed and terminator ";", allowing you to embed queries into shell commands. (mnemonic: PATH separator ":".)
- $ echo "compression-vars: " `perl -V:z.*: ` " are here !"
- compression-vars: zcat='' zip='zip' are here !
A leading colon removes the "name=" part of the response, this allows you to map to the name you need. (mnemonic: empty label)
- $ echo "goodvfork="`./perl -Ilib -V::usevfork`
- goodvfork=false;
Leading and trailing colons can be used together if you need
positional parameter values without the names. Note that in the case
below, the PERL_API
params are returned in alphabetical order.
- $ echo building_on `perl -V::osname: -V::PERL_API_.*:` now
- building_on 'linux' '5' '1' '9' now
prints warnings about dubious constructs, such as variable names mentioned only once and scalar variables used before being set; redefined subroutines; references to undefined filehandles; filehandles opened read-only that you are attempting to write on; values used as a number that don't look like numbers; using an array as though it were a scalar; if your subroutines recurse more than 100 deep; and innumerable other things.
This switch really just enables the global $^W
variable; normally,
the lexically scoped use warnings
pragma is preferred. You
can disable or promote into fatal errors specific warnings using
__WARN__
hooks, as described in perlvar and warn.
See also perldiag and perltrap. A fine-grained warning
facility is also available if you want to manipulate entire classes
of warnings; see warnings or perllexwarn.
Enables all warnings regardless of no warnings
or $^W
.
See perllexwarn.
Disables all warnings regardless of use warnings
or $^W
.
See perllexwarn.
tells Perl that the program is embedded in a larger chunk of unrelated
text, such as in a mail message. Leading garbage will be
discarded until the first line that starts with #!
and contains the
string "perl". Any meaningful switches on that line will be applied.
All references to line numbers by the program (warnings, errors, ...)
will treat the #!
line as the first line.
Thus a warning on the 2nd line of the program, which is on the 100th
line in the file will be reported as line 2, not as line 100.
This can be overridden by using the #line
directive.
(See Plain Old Comments (Not!) in perlsyn)
If a directory name is specified, Perl will switch to that directory
before running the program. The -x switch controls only the
disposal of leading garbage. The program must be terminated with
__END__
if there is trailing garbage to be ignored; the program
can process any or all of the trailing garbage via the DATA
filehandle
if desired.
The directory, if specified, must appear immediately following the -x with no intervening whitespace.
Used if chdir has no argument.
Used if chdir has no argument and HOME is not set.
Used in executing subprocesses, and in finding the program if -S is used.
A list of directories in which to look for Perl library
files before looking in the standard library and the current
directory. Any architecture-specific and version-specific directories,
such as version/archname/, version/, or archname/ under the
specified locations are automatically included if they exist, with this
lookup done at interpreter startup time. In addition, any directories
matching the entries in $Config{inc_version_list}
are added.
(These typically would be for older compatible perl versions installed
in the same directory tree.)
If PERL5LIB is not defined, PERLLIB is used. Directories are separated
(like in PATH) by a colon on Unixish platforms and by a semicolon on
Windows (the proper path separator being given by the command perl
-V:path_sep).
When running taint checks, either because the program was running setuid or setgid, or the -T or -t switch was specified, neither PERL5LIB nor PERLLIB is consulted. The program should instead say:
- use lib "/my/directory";
Command-line options (switches). Switches in this variable are treated as if they were on every Perl command line. Only the -[CDIMUdmtwW] switches are allowed. When running taint checks (either because the program was running setuid or setgid, or because the -T or -t switch was used), this variable is ignored. If PERL5OPT begins with -T, tainting will be enabled and subsequent options ignored. If PERL5OPT begins with -t, tainting will be enabled, a writable dot removed from @INC, and subsequent options honored.
A space (or colon) separated list of PerlIO layers. If perl is built to use PerlIO system for IO (the default) these layers affect Perl's IO.
It is conventional to start layer names with a colon (for example, :perlio
) to
emphasize their similarity to variable "attributes". But the code that parses
layer specification strings, which is also used to decode the PERLIO
environment variable, treats the colon as a separator.
An unset or empty PERLIO is equivalent to the default set of layers for
your platform; for example, :unix:perlio on Unix-like systems
and :unix:crlf on Windows and other DOS-like systems.
The list becomes the default for all Perl's IO. Consequently only built-in
layers can appear in this list, as external layers (such as :encoding()
) need
IO in order to load them! See open pragma for how to add external
encodings as defaults.
Layers it makes sense to include in the PERLIO environment variable are briefly summarized below. For more details see PerlIO.
A pseudolayer that turns the :utf8
flag off for the layer below;
unlikely to be useful on its own in the global PERLIO environment variable.
You perhaps were thinking of :crlf:bytes or :perlio:bytes.
A layer which does CRLF to "\n"
translation distinguishing "text" and
"binary" files in the manner of MS-DOS and similar operating systems.
(It currently does not mimic MS-DOS as far as treating of Control-Z
as being an end-of-file marker.)
A layer that implements "reading" of files by using mmap(2) to make an entire file appear in the process's address space, and then using that as PerlIO's "buffer".
This is a re-implementation of stdio-like buffering written as a
PerlIO layer. As such it will call whatever layer is below it for
its operations, typically :unix
.
An experimental pseudolayer that removes the topmost layer. Use with the same care as is reserved for nitroglycerine.
A pseudolayer that manipulates other layers. Applying the :raw
layer is equivalent to calling binmode($fh). It makes the stream
pass each byte as-is without translation. In particular, both CRLF
translation and intuiting :utf8
from the locale are disabled.
Unlike in earlier versions of Perl, :raw
is not
just the inverse of :crlf
: other layers which would affect the
binary nature of the stream are also removed or disabled.
This layer provides a PerlIO interface by wrapping system's ANSI C "stdio"
library calls. The layer provides both buffering and IO.
Note that the :stdio
layer does not do CRLF translation even if that
is the platform's normal behaviour. You will need a :crlf
layer above it
to do that.
A pseudolayer that enables a flag in the layer below to tell Perl
that output should be in utf8 and that input should be regarded as
already in valid utf8 form. WARNING: It does not check for validity and as such
should be handled with extreme caution for input, because security violations
can occur with non-shortest UTF-8 encodings, etc. Generally :encoding(utf8)
is
the best option when reading UTF-8 encoded data.
On Win32 platforms this experimental layer uses native "handle" IO rather than a Unix-like numeric file descriptor layer. Known to be buggy in this release (5.14).
The default set of layers should give acceptable results on all platforms
For Unix platforms that will be the equivalent of "unix perlio" or "stdio". Configure is set up to prefer the "stdio" implementation if the system's library provides for fast access to the buffer; otherwise, it uses the "unix perlio" implementation.
On Win32 the default in this release (5.14) is "unix crlf". Win32's "stdio"
has a number of bugs/mis-features for Perl IO which are somewhat depending
on the version and vendor of the C compiler. Using our own crlf
layer as
the buffer avoids those issues and makes things more uniform. The crlf
layer provides CRLF conversion as well as buffering.
This release (5.14) uses unix
as the bottom layer on Win32, and so still
uses the C compiler's numeric file descriptor routines. There is an
experimental native win32
layer, which is expected to be enhanced and
should eventually become the default under Win32.
The PERLIO environment variable is completely ignored when Perl is run in taint mode.
If set to the name of a file or device, certain operations of PerlIO subsystem will be logged to that file, which is opened in append mode. Typical uses are in Unix:
- % env PERLIO_DEBUG=/dev/tty perl script ...
and under Win32, the approximately equivalent:
- > set PERLIO_DEBUG=CON
- perl script ...
This functionality is disabled for setuid scripts and for scripts run with -T.
A list of directories in which to look for Perl library files before looking in the standard library and the current directory. If PERL5LIB is defined, PERLLIB is not used.
The PERLLIB environment variable is completely ignored when Perl is run in taint mode.
The command used to load the debugger code. The default is:
- BEGIN { require "perl5db.pl" }
The PERL5DB environment variable is only used when Perl is started with a bare -d switch.
If set to a true value, indicates to the debugger that the code being debugged uses threads.
On Win32 ports only, may be set to an alternative shell that Perl must use
internally for executing "backtick" commands or system(). Default is
cmd.exe /x/d/c
on WindowsNT and command.com /c
on Windows95. The
value is considered space-separated. Precede any character that
needs to be protected, like a space or backslash, with another backslash.
Note that Perl doesn't use COMSPEC for this purpose because COMSPEC has a high degree of variability among users, leading to portability concerns. Besides, Perl can use a shell that may not be fit for interactive use, and setting COMSPEC to such a shell may interfere with the proper functioning of other programs (which usually look in COMSPEC to find a shell fit for interactive use).
Before Perl 5.10.0 and 5.8.8, PERL5SHELL was not taint checked
when running external commands. It is recommended that
you explicitly set (or delete) $ENV{PERL5SHELL}
when running
in taint mode under Windows.
Set to 1 to allow the use of non-IFS compatible LSPs (Layered Service Providers). Perl normally searches for an IFS-compatible LSP because this is required for its emulation of Windows sockets as real filehandles. However, this may cause problems if you have a firewall such as McAfee Guardian, which requires that all applications use its LSP but which is not IFS-compatible, because clearly Perl will normally avoid using such an LSP.
Setting this environment variable to 1 means that Perl will simply use the first suitable LSP enumerated in the catalog, which keeps McAfee Guardian happy--and in that particular case Perl still works too because McAfee Guardian's LSP actually plays other games which allow applications requiring IFS compatibility to work.
Relevant only if Perl is compiled with the malloc
included with the Perl
distribution; that is, if perl -V:d_mymalloc is "define".
If set, this dumps out memory statistics after execution. If set to an integer greater than one, also dumps out memory statistics after compilation.
Relevant only if your Perl executable was built with -DDEBUGGING, this controls the behaviour of global destruction of objects and other references. See PERL_DESTRUCT_LEVEL in perlhacktips for more information.
Set to "1"
to have Perl resolve all undefined symbols when it loads
a dynamic library. The default behaviour is to resolve symbols when
they are used. Setting this variable is useful during testing of
extensions, as it ensures that you get an error on misspelled function
names even if the test suite doesn't call them.
If using the use encoding
pragma without an explicit encoding name, the
PERL_ENCODING environment variable is consulted for an encoding name.
(Since Perl 5.8.1, new semantics in Perl 5.18.0) Used to override the randomization of Perl's internal hash function. The value is expressed in hexadecimal, and may include a leading 0x. Truncated patterns are treated as though they are suffixed with sufficient 0's as required.
If the option is provided, and PERL_PERTURB_KEYS
is NOT set, then
a value of '0' implies PERL_PERTURB_KEYS=0
and any other value
implies PERL_PERTURB_KEYS=2
.
PLEASE NOTE: The hash seed is sensitive information. Hashes are randomized to protect against local and remote attacks against Perl code. By manually setting a seed, this protection may be partially or completely lost.
See Algorithmic Complexity Attacks in perlsec and PERL_PERTURB_KEYS PERL_HASH_SEED_DEBUG for more information.
(Since Perl 5.18.0) Set to "0"
or "NO"
then traversing keys
will be repeatable from run to run for the same PERL_HASH_SEED.
Insertion into a hash will not change the order, except to provide
for more space in the hash. When combined with setting PERL_HASH_SEED
this mode is as close to pre 5.18 behavior as you can get.
When set to "1"
or "RANDOM"
then traversing keys will be randomized.
Every time a hash is inserted into the key order will change in a random
fashion. The order may not be repeatable in a following program run
even if the PERL_HASH_SEED has been specified. This is the default
mode for perl.
When set to "2"
or "DETERMINISTIC"
then inserting keys into a hash
will cause the key order to change, but in a way that is repeatable
from program run to program run.
NOTE: Use of this option is considered insecure, and is intended only for debugging non-deterministic behavior in Perl's hash function. Do not use it in production.
See Algorithmic Complexity Attacks in perlsec and PERL_HASH_SEED
and PERL_HASH_SEED_DEBUG for more information. You can get and set the
key traversal mask for a specific hash by using the hash_traversal_mask()
function from Hash::Util.
(Since Perl 5.8.1.) Set to "1"
to display (to STDERR) information
about the hash function, seed, and what type of key traversal
randomization is in effect at the beginning of execution. This, combined
with PERL_HASH_SEED and PERL_PERTURB_KEYS is intended to aid in
debugging nondeterministic behaviour caused by hash randomization.
Note that any information about the hash function, especially the hash
seed is sensitive information: by knowing it, one can craft a denial-of-service
attack against Perl code, even remotely; see Algorithmic Complexity Attacks in perlsec
for more information. Do not disclose the hash seed to people who
don't need to know it. See also hash_seed()
and
key_traversal_mask()
in Hash::Util.
An example output might be:
- HASH_FUNCTION = ONE_AT_A_TIME_HARD HASH_SEED = 0x652e9b9349a7a032 PERTURB_KEYS = 1 (RANDOM)
If your Perl was configured with -Accflags=-DPERL_MEM_LOG, setting
the environment variable PERL_MEM_LOG
enables logging debug
messages. The value has the form <number>[m][s][t], where
number is the file descriptor number you want to write to (2 is
default), and the combination of letters specifies that you want
information about (m)emory and/or (s)v, optionally with
(t)imestamps. For example, PERL_MEM_LOG=1mst logs all
information to stdout. You can write to other opened file descriptors
in a variety of ways:
- $ 3>foo3 PERL_MEM_LOG=3m perl ...
A translation-concealed rooted logical name that contains Perl and the logical device for the @INC path on VMS only. Other logical names that affect Perl on VMS include PERLSHR, PERL_ENV_TABLES, and SYS$TIMEZONE_DIFFERENTIAL, but are optional and discussed further in perlvms and in README.vms in the Perl source distribution.
Available in Perls 5.8.1 and later. If set to "unsafe"
, the pre-Perl-5.8.0
signal behaviour (which is immediate but unsafe) is restored. If set
to safe
, then safe (but deferred) signals are used. See
Deferred Signals (Safe Signals) in perlipc.
Equivalent to the -C command-line switch. Note that this is not
a boolean variable. Setting this to "1"
is not the right way to
"enable Unicode" (whatever that would mean). You can use "0"
to
"disable Unicode", though (or alternatively unset PERL_UNICODE in
your shell before starting Perl). See the description of the -C
switch for more information.
Used if chdir has no argument and HOME and LOGDIR are not set.
Perl also has environment variables that control how Perl handles data specific to particular natural languages; see perllocale.
Perl and its various modules and components, including its test frameworks, may sometimes make use of certain other environment variables. Some of these are specific to a particular platform. Please consult the appropriate module documentation and any documentation for your platform (like perlsolaris, perllinux, perlmacosx, perlwin32, etc) for variables peculiar to those specific situations.
Perl makes all environment variables available to the program being executed, and passes these along to any child processes it starts. However, programs running setuid would do well to execute the following lines before doing anything else, just to keep people honest:
perlsec - Perl security
Perl is designed to make it easy to program securely even when running with extra privileges, like setuid or setgid programs. Unlike most command line shells, which are based on multiple substitution passes on each line of the script, Perl uses a more conventional evaluation scheme with fewer hidden snags. Additionally, because the language has more builtin functionality, it can rely less upon external (and possibly untrustworthy) programs to accomplish its purposes.
If you believe you have found a security vulnerability in Perl, please email perl5-security-report@perl.org with details. This points to a closed subscription, unarchived mailing list. Please only use this address for security issues in the Perl core, not for modules independently distributed on CPAN.
Perl automatically enables a set of special security checks, called taint mode, when it detects its program running with differing real and effective user or group IDs. The setuid bit in Unix permissions is mode 04000, the setgid bit mode 02000; either or both may be set. You can also enable taint mode explicitly by using the -T command line flag. This flag is strongly suggested for server programs and any program run on behalf of someone else, such as a CGI script. Once taint mode is on, it's on for the remainder of your script.
While in this mode, Perl takes special precautions called taint checks to prevent both obvious and subtle traps. Some of these checks are reasonably simple, such as verifying that path directories aren't writable by others; careful programmers have always used checks like these. Other checks, however, are best supported by the language itself, and it is these checks especially that contribute to making a set-id Perl program more secure than the corresponding C program.
You may not use data derived from outside your program to affect
something else outside your program--at least, not by accident. All
command line arguments, environment variables, locale information (see
perllocale), results of certain system calls (readdir(),
readlink(), the variable of shmread(), the messages returned by
msgrcv(), the password, gcos and shell fields returned by the
getpwxxx()
calls), and all file input are marked as "tainted".
Tainted data may not be used directly or indirectly in any command
that invokes a sub-shell, nor in any command that modifies files,
directories, or processes, with the following exceptions:
Arguments to print and syswrite are not checked for taintedness.
Symbolic methods
- $obj->$method(@args);
and symbolic sub references
- &{$foo}(@args);
- $foo->(@args);
are not checked for taintedness. This requires extra carefulness unless you want external data to affect your control flow. Unless you carefully limit what these symbolic values are, people are able to call functions outside your Perl code, such as POSIX::system, in which case they are able to run arbitrary external code.
Hash keys are never tainted.
For efficiency reasons, Perl takes a conservative view of whether data is tainted. If an expression contains tainted data, any subexpression may be considered tainted, even if the value of the subexpression is not itself affected by the tainted data.
Because taintedness is associated with each scalar value, some elements of an array or hash can be tainted and others not. The keys of a hash are never tainted.
For example:
- $arg = shift; # $arg is tainted
- $hid = $arg . 'bar'; # $hid is also tainted
- $line = <>; # Tainted
- $line = <STDIN>; # Also tainted
- open FOO, "/home/me/bar" or die $!;
- $line = <FOO>; # Still tainted
- $path = $ENV{'PATH'}; # Tainted, but see below
- $data = 'abc'; # Not tainted
- system "echo $arg"; # Insecure
- system "/bin/echo", $arg; # Considered insecure
- # (Perl doesn't know about /bin/echo)
- system "echo $hid"; # Insecure
- system "echo $data"; # Insecure until PATH set
- $path = $ENV{'PATH'}; # $path now tainted
- $ENV{'PATH'} = '/bin:/usr/bin';
- delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
- $path = $ENV{'PATH'}; # $path now NOT tainted
- system "echo $data"; # Is secure now!
- open(FOO, "< $arg"); # OK - read-only file
- open(FOO, "> $arg"); # Not OK - trying to write
- open(FOO,"echo $arg|"); # Not OK
- open(FOO,"-|")
- or exec 'echo', $arg; # Also not OK
- $shout = `echo $arg`; # Insecure, $shout now tainted
- unlink $data, $arg; # Insecure
- umask $arg; # Insecure
- exec "echo $arg"; # Insecure
- exec "echo", $arg; # Insecure
- exec "sh", '-c', $arg; # Very insecure!
- @files = <*.c>; # insecure (uses readdir() or similar)
- @files = glob('*.c'); # insecure (uses readdir() or similar)
- # In either case, the results of glob are tainted, since the list of
- # filenames comes from outside of the program.
- $bad = ($arg, 23); # $bad will be tainted
- $arg, `true`; # Insecure (although it isn't really)
If you try to do something insecure, you will get a fatal error saying something like "Insecure dependency" or "Insecure $ENV{PATH}".
The exception to the principle of "one tainted value taints the whole
expression" is with the ternary conditional operator ?:. Since code
with a ternary conditional
- $result = $tainted_value ? "Untainted" : "Also untainted";
is effectively
- if ( $tainted_value ) {
- $result = "Untainted";
- } else {
- $result = "Also untainted";
- }
it doesn't make sense for $result
to be tainted.
To test whether a variable contains tainted data, and whose use would
thus trigger an "Insecure dependency" message, you can use the
tainted()
function of the Scalar::Util module, available in your
nearby CPAN mirror, and included in Perl starting from the release 5.8.0.
Or you may be able to use the following is_tainted()
function.
This function makes use of the fact that the presence of tainted data anywhere within an expression renders the entire expression tainted. It would be inefficient for every operator to test every argument for taintedness. Instead, the slightly more efficient and conservative approach is used that if any tainted value has been accessed within the same expression, the whole expression is considered tainted.
But testing for taintedness gets you only so far. Sometimes you have just to clear your data's taintedness. Values may be untainted by using them as keys in a hash; otherwise the only way to bypass the tainting mechanism is by referencing subpatterns from a regular expression match. Perl presumes that if you reference a substring using $1, $2, etc., that you knew what you were doing when you wrote the pattern. That means using a bit of thought--don't just blindly untaint anything, or you defeat the entire mechanism. It's better to verify that the variable has only good characters (for certain values of "good") rather than checking whether it has any bad characters. That's because it's far too easy to miss bad characters that you never thought of.
Here's a test to make sure that the data contains nothing but "word" characters (alphabetics, numerics, and underscores), a hyphen, an at sign, or a dot.
- if ($data =~ /^([-\@\w.]+)$/) {
- $data = $1; # $data now untainted
- } else {
- die "Bad data in '$data'"; # log this somewhere
- }
This is fairly secure because /\w+/
doesn't normally match shell
metacharacters, nor are dot, dash, or at going to mean something special
to the shell. Use of /.+/
would have been insecure in theory because
it lets everything through, but Perl doesn't check for that. The lesson
is that when untainting, you must be exceedingly careful with your patterns.
Laundering data using regular expression is the only mechanism for
untainting dirty data, unless you use the strategy detailed below to fork
a child of lesser privilege.
The example does not untaint $data
if use locale
is in effect,
because the characters matched by \w
are determined by the locale.
Perl considers that locale definitions are untrustworthy because they
contain data from outside the program. If you are writing a
locale-aware program, and want to launder data with a regular expression
containing \w
, put no locale
ahead of the expression in the same
block. See SECURITY in perllocale for further discussion and examples.
When you make a script executable, in order to make it usable as a
command, the system will pass switches to perl from the script's #!
line. Perl checks that any command line switches given to a setuid
(or setgid) script actually match the ones set on the #! line. Some
Unix and Unix-like environments impose a one-switch limit on the #!
line, so you may need to use something like -wU
instead of -w -U
under such systems. (This issue should arise only in Unix or
Unix-like environments that support #! and setuid or setgid scripts.)
When the taint mode (-T
) is in effect, the "." directory is removed
from @INC
, and the environment variables PERL5LIB
and PERLLIB
are ignored by Perl. You can still adjust @INC
from outside the
program by using the -I
command line option as explained in
perlrun. The two environment variables are ignored because
they are obscured, and a user running a program could be unaware that
they are set, whereas the -I
option is clearly visible and
therefore permitted.
Another way to modify @INC
without modifying the program, is to use
the lib
pragma, e.g.:
- perl -Mlib=/foo program
The benefit of using -Mlib=/foo over -I/foo
, is that the former
will automagically remove any duplicated directories, while the later
will not.
Note that if a tainted string is added to @INC
, the following
problem will be reported:
- Insecure dependency in require while running with -T switch
For "Insecure $ENV{PATH}
" messages, you need to set $ENV{'PATH'}
to
a known value, and each directory in the path must be absolute and
non-writable by others than its owner and group. You may be surprised to
get this message even if the pathname to your executable is fully
qualified. This is not generated because you didn't supply a full path
to the program; instead, it's generated because you never set your PATH
environment variable, or you didn't set it to something that was safe.
Because Perl can't guarantee that the executable in question isn't itself
going to turn around and execute some other program that is dependent on
your PATH, it makes sure you set the PATH.
The PATH isn't the only environment variable which can cause problems. Because some shells may use the variables IFS, CDPATH, ENV, and BASH_ENV, Perl checks that those are either empty or untainted when starting subprocesses. You may wish to add something like this to your setid and taint-checking scripts.
- delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; # Make %ENV safer
It's also possible to get into trouble with other operations that don't care whether they use tainted values. Make judicious use of the file tests in dealing with any user-supplied filenames. When possible, do opens and such after properly dropping any special user (or group!) privileges. Perl doesn't prevent you from opening tainted filenames for reading, so be careful what you print out. The tainting mechanism is intended to prevent stupid mistakes, not to remove the need for thought.
Perl does not call the shell to expand wild cards when you pass system
and exec explicit parameter lists instead of strings with possible shell
wildcards in them. Unfortunately, the open, glob, and
backtick functions provide no such alternate calling convention, so more
subterfuge will be required.
Perl provides a reasonably safe way to open a file or pipe from a setuid
or setgid program: just create a child process with reduced privilege who
does the dirty work for you. First, fork a child using the special
open syntax that connects the parent and child by a pipe. Now the
child resets its ID set and any other per-process attributes, like
environment variables, umasks, current working directories, back to the
originals or known safe values. Then the child process, which no longer
has any special permissions, does the open or other system call.
Finally, the child passes the data it managed to access back to the
parent. Because the file or pipe was opened in the child while running
under less privilege than the parent, it's not apt to be tricked into
doing something it shouldn't.
Here's a way to do backticks reasonably safely. Notice how the exec is
not called with a string that the shell could expand. This is by far the
best way to call something that might be subjected to shell escapes: just
never call the shell at all.
- use English '-no_match_vars';
- die "Can't fork: $!" unless defined($pid = open(KID, "-|"));
- if ($pid) { # parent
- while (<KID>) {
- # do something
- }
- close KID;
- } else {
- my @temp = ($EUID, $EGID);
- my $orig_uid = $UID;
- my $orig_gid = $GID;
- $EUID = $UID;
- $EGID = $GID;
- # Drop privileges
- $UID = $orig_uid;
- $GID = $orig_gid;
- # Make sure privs are really gone
- ($EUID, $EGID) = @temp;
- die "Can't drop privileges"
- unless $UID == $EUID && $GID eq $EGID;
- $ENV{PATH} = "/bin:/usr/bin"; # Minimal PATH.
- # Consider sanitizing the environment even more.
- exec 'myprog', 'arg1', 'arg2'
- or die "can't exec myprog: $!";
- }
A similar strategy would work for wildcard expansion via glob, although
you can use readdir instead.
Taint checking is most useful when although you trust yourself not to have written a program to give away the farm, you don't necessarily trust those who end up using it not to try to trick it into doing something bad. This is the kind of security checking that's useful for set-id programs and programs launched on someone else's behalf, like CGI programs.
This is quite different, however, from not even trusting the writer of the code not to try to do something evil. That's the kind of trust needed when someone hands you a program you've never seen before and says, "Here, run this." For that kind of safety, you might want to check out the Safe module, included standard in the Perl distribution. This module allows the programmer to set up special compartments in which all system operations are trapped and namespace access is carefully controlled. Safe should not be considered bullet-proof, though: it will not prevent the foreign code to set up infinite loops, allocate gigabytes of memory, or even abusing perl bugs to make the host interpreter crash or behave in unpredictable ways. In any case it's better avoided completely if you're really concerned about security.
Beyond the obvious problems that stem from giving special privileges to systems as flexible as scripts, on many versions of Unix, set-id scripts are inherently insecure right from the start. The problem is a race condition in the kernel. Between the time the kernel opens the file to see which interpreter to run and when the (now-set-id) interpreter turns around and reopens the file to interpret it, the file in question may have changed, especially if you have symbolic links on your system.
Fortunately, sometimes this kernel "feature" can be disabled. Unfortunately, there are two ways to disable it. The system can simply outlaw scripts with any set-id bit set, which doesn't help much. Alternately, it can simply ignore the set-id bits on scripts.
However, if the kernel set-id script feature isn't disabled, Perl will complain loudly that your set-id script is insecure. You'll need to either disable the kernel set-id script feature, or put a C wrapper around the script. A C wrapper is just a compiled program that does nothing except call your Perl program. Compiled programs are not subject to the kernel bug that plagues set-id scripts. Here's a simple wrapper, written in C:
- #define REAL_PATH "/path/to/script"
- main(ac, av)
- char **av;
- {
- execv(REAL_PATH, av);
- }
Compile this wrapper into a binary executable and then make it rather than your script setuid or setgid.
In recent years, vendors have begun to supply systems free of this
inherent security bug. On such systems, when the kernel passes the name
of the set-id script to open to the interpreter, rather than using a
pathname subject to meddling, it instead passes /dev/fd/3. This is a
special file already opened on the script, so that there can be no race
condition for evil scripts to exploit. On these systems, Perl should be
compiled with -DSETUID_SCRIPTS_ARE_SECURE_NOW
. The Configure
program that builds Perl tries to figure this out for itself, so you
should never have to specify this yourself. Most modern releases of
SysVr4 and BSD 4.4 use this approach to avoid the kernel race condition.
There are a number of ways to hide the source to your Perl programs, with varying levels of "security".
First of all, however, you can't take away read permission, because the source code has to be readable in order to be compiled and interpreted. (That doesn't mean that a CGI script's source is readable by people on the web, though.) So you have to leave the permissions at the socially friendly 0755 level. This lets people on your local system only see your source.
Some people mistakenly regard this as a security problem. If your program does insecure things, and relies on people not knowing how to exploit those insecurities, it is not secure. It is often possible for someone to determine the insecure things and exploit them without viewing the source. Security through obscurity, the name for hiding your bugs instead of fixing them, is little security indeed.
You can try using encryption via source filters (Filter::* from CPAN, or Filter::Util::Call and Filter::Simple since Perl 5.8). But crackers might be able to decrypt it. You can try using the byte code compiler and interpreter described below, but crackers might be able to de-compile it. You can try using the native-code compiler described below, but crackers might be able to disassemble it. These pose varying degrees of difficulty to people wanting to get at your code, but none can definitively conceal it (this is true of every language, not just Perl).
If you're concerned about people profiting from your code, then the bottom line is that nothing but a restrictive license will give you legal security. License your software and pepper it with threatening statements like "This is unpublished proprietary software of XYZ Corp. Your access to it does not give you permission to use it blah blah blah." You should see a lawyer to be sure your license's wording will stand up in court.
Unicode is a new and complex technology and one may easily overlook certain security pitfalls. See perluniintro for an overview and perlunicode for details, and Security Implications of Unicode in perlunicode for security implications in particular.
Certain internal algorithms used in the implementation of Perl can be attacked by choosing the input carefully to consume large amounts of either time or space or both. This can lead into the so-called Denial of Service (DoS) attacks.
Hash Algorithm - Hash algorithms like the one used in Perl are well known to be vulnerable to collision attacks on their hash function. Such attacks involve constructing a set of keys which collide into the same bucket producing inefficient behavior. Such attacks often depend on discovering the seed of the hash function used to map the keys to buckets. That seed is then used to brute-force a key set which can be used to mount a denial of service attack. In Perl 5.8.1 changes were introduced to harden Perl to such attacks, and then later in Perl 5.18.0 these features were enhanced and additional protections added.
At the time of this writing, Perl 5.18.0 is considered to be well-hardened against algorithmic complexity attacks on its hash implementation. This is largely owed to the following measures mitigate attacks:
In order to make it impossible to know what seed to generate an attack
key set for, this seed is randomly initialized at process start. This
may be overridden by using the PERL_HASH_SEED environment variable, see
PERL_HASH_SEED in perlrun. This environment variable controls how
items are actually stored, not how they are presented via
keys, values and each.
Independent of which seed is used in the hash function, keys,
values, and each return items in a per-hash randomized order.
Modifying a hash by insertion will change the iteration order of that hash.
This behavior can be overridden by using hash_traversal_mask()
from
Hash::Util or by using the PERL_PERTURB_KEYS environment variable,
see PERL_PERTURB_KEYS in perlrun. Note that this feature controls the
"visible" order of the keys, and not the actual order they are stored in.
When items collide into a given hash bucket the order they are stored in the chain is no longer predictable in Perl 5.18. This has the intention to make it harder to observe a collisions. This behavior can be overridden by using the PERL_PERTURB_KEYS environment variable, see PERL_PERTURB_KEYS in perlrun.
The default hash function has been modified with the intention of making it harder to infer the hash seed.
The source code includes multiple hash algorithms to choose from. While we believe that the default perl hash is robust to attack, we have included the hash function Siphash as a fall-back option. At the time of release of Perl 5.18.0 Siphash is believed to be of cryptographic strength. This is not the default as it is much slower than the default hash.
Without compiling a special Perl, there is no way to get the exact same behavior of any versions prior to Perl 5.18.0. The closest one can get is by setting PERL_PERTURB_KEYS to 0 and setting the PERL_HASH_SEED to a known value. We do not advise those settings for production use due to the above security considerations.
Perl has never guaranteed any ordering of the hash keys, and the ordering has already changed several times during the lifetime of Perl 5. Also, the ordering of hash keys has always been, and continues to be, affected by the insertion order and the history of changes made to the hash over its lifetime.
Also note that while the order of the hash elements might be
randomized, this "pseudo-ordering" should not be used for
applications like shuffling a list randomly (use List::Util::shuffle()
for that, see List::Util, a standard core module since Perl 5.8.0;
or the CPAN module Algorithm::Numerical::Shuffle
), or for generating
permutations (use e.g. the CPAN modules Algorithm::Permute
or
Algorithm::FastPermute
), or for any cryptographic applications.
Regular expressions - Perl's regular expression engine is so called NFA (Non-deterministic Finite Automaton), which among other things means that it can rather easily consume large amounts of both time and space if the regular expression may match in several ways. Careful crafting of the regular expressions can help but quite often there really isn't much one can do (the book "Mastering Regular Expressions" is required reading, see perlfaq2). Running out of space manifests itself by Perl running out of memory.
Sorting - the quicksort algorithm used in Perls before 5.8.0 to implement the sort() function is very easy to trick into misbehaving so that it consumes a lot of time. Starting from Perl 5.8.0 a different sorting algorithm, mergesort, is used by default. Mergesort cannot misbehave on any input.
See http://www.cs.rice.edu/~scrosby/hash/ for more information, and any computer science textbook on algorithmic complexity.
perlrun for its description of cleaning up environment variables.
perlsolaris - Perl version 5 on Solaris systems
This document describes various features of Sun's Solaris operating system that will affect how Perl version 5 (hereafter just perl) is compiled and/or runs. Some issues relating to the older SunOS 4.x are also discussed, though they may be out of date.
For the most part, everything should just work.
Starting with Solaris 8, perl5.00503 (or higher) is supplied with the operating system, so you might not even need to build a newer version of perl at all. The Sun-supplied version is installed in /usr/perl5 with /usr/bin/perl pointing to /usr/perl5/bin/perl. Do not disturb that installation unless you really know what you are doing. If you remove the perl supplied with the OS, you will render some bits of your system inoperable. If you wish to install a newer version of perl, install it under a different prefix from /usr/perl5. Common prefixes to use are /usr/local and /opt/perl.
You may wish to put your version of perl in the PATH of all users by changing the link /usr/bin/perl. This is probably OK, as most perl scripts shipped with Solaris use an explicit path. (There are a few exceptions, such as /usr/bin/rpm2cpio and /etc/rcm/scripts/README, but these are also sufficiently generic that the actual version of perl probably doesn't matter too much.)
Solaris ships with a range of Solaris-specific modules. If you choose to install your own version of perl you will find the source of many of these modules is available on CPAN under the Sun::Solaris:: namespace.
Solaris may include two versions of perl, e.g. Solaris 9 includes both 5.005_03 and 5.6.1. This is to provide stability across Solaris releases, in cases where a later perl version has incompatibilities with the version included in the preceding Solaris release. The default perl version will always be the most recent, and in general the old version will only be retained for one Solaris release. Note also that the default perl will NOT be configured to search for modules in the older version, again due to compatibility/stability concerns. As a consequence if you upgrade Solaris, you will have to rebuild/reinstall any additional CPAN modules that you installed for the previous Solaris version. See the CPAN manpage under 'autobundle' for a quick way of doing this.
As an interim measure, you may either change the #! line of your scripts to specifically refer to the old perl version, e.g. on Solaris 9 use #!/usr/perl5/5.00503/bin/perl to use the perl version that was the default for Solaris 8, or if you have a large number of scripts it may be more convenient to make the old version of perl the default on your system. You can do this by changing the appropriate symlinks under /usr/perl5 as follows (example for Solaris 9):
- # cd /usr/perl5
- # rm bin man pod
- # ln -s ./5.00503/bin
- # ln -s ./5.00503/man
- # ln -s ./5.00503/lib/pod
- # rm /usr/bin/perl
- # ln -s ../perl5/5.00503/bin/perl /usr/bin/perl
In both cases this should only be considered to be a temporary measure - you should upgrade to the later version of perl as soon as is practicable.
Note also that the perl command-line utilities (e.g. perldoc) and any that are added by modules that you install will be under /usr/perl5/bin, so that directory should be added to your PATH.
For consistency with common usage, perl's Configure script performs some minor manipulations on the operating system name and version number as reported by uname. Here's a partial translation table:
- Sun: perl's Configure:
- uname uname -r Name osname osvers
- SunOS 4.1.3 Solaris 1.1 sunos 4.1.3
- SunOS 5.6 Solaris 2.6 solaris 2.6
- SunOS 5.8 Solaris 8 solaris 2.8
- SunOS 5.9 Solaris 9 solaris 2.9
- SunOS 5.10 Solaris 10 solaris 2.10
The complete table can be found in the Sun Managers' FAQ ftp://ftp.cs.toronto.edu/pub/jdd/sunmanagers/faq under "9.1) Which Sun models run which versions of SunOS?".
There are many, many sources for Solaris information. A few of the important ones for perl:
The Solaris FAQ is available at http://www.science.uva.nl/pub/solaris/solaris2.html.
The Sun Managers' FAQ is available at ftp://ftp.cs.toronto.edu/pub/jdd/sunmanagers/faq
Precompiled binaries, links to many sites, and much, much more are available at http://www.sunfreeware.com/ and http://www.blastwave.org/.
All Solaris documentation is available on-line at http://docs.sun.com/.
Be sure to use a tar program compiled under Solaris (not SunOS 4.x) to extract the perl-5.x.x.tar.gz file. Do not use GNU tar compiled for SunOS4 on Solaris. (GNU tar compiled for Solaris should be fine.) When you run SunOS4 binaries on Solaris, the run-time system magically alters pathnames matching m#lib/locale# so that when tar tries to create lib/locale.pm, a file named lib/oldlocale.pm gets created instead. If you found this advice too late and used a SunOS4-compiled tar anyway, you must find the incorrectly renamed file and move it back to lib/locale.pm.
You must use an ANSI C compiler to build perl. Perl can be compiled with either Sun's add-on C compiler or with gcc. The C compiler that shipped with SunOS4 will not do.
Several tools needed to build perl are located in /usr/ccs/bin/: ar, as, ld, and make. Make sure that /usr/ccs/bin/ is in your PATH.
On all the released versions of Solaris (8, 9 and 10) you need to make sure the following packages are installed (this info is extracted from the Solaris FAQ):
for tools (sccs, lex, yacc, make, nm, truss, ld, as): SUNWbtool, SUNWsprot, SUNWtoo
for libraries & headers: SUNWhea, SUNWarc, SUNWlibm, SUNWlibms, SUNWdfbh, SUNWcg6h, SUNWxwinc
Additionaly, on Solaris 8 and 9 you also need:
for 64 bit development: SUNWarcx, SUNWbtoox, SUNWdplx, SUNWscpux, SUNWsprox, SUNWtoox, SUNWlmsx, SUNWlmx, SUNWlibCx
And only on Solaris 8 you also need:
for libraries & headers: SUNWolinc
If you are in doubt which package contains a file you are missing, try to find an installation that has that file. Then do a
- $ grep /my/missing/file /var/sadm/install/contents
This will display a line like this:
/usr/include/sys/errno.h f none 0644 root bin 7471 37605 956241356 SUNWhea
The last item listed (SUNWhea in this example) is the package you need.
You don't need to have /usr/ucb/ in your PATH to build perl. If you want /usr/ucb/ in your PATH anyway, make sure that /usr/ucb/ is NOT in your PATH before the directory containing the right C compiler.
If you use Sun's C compiler, make sure the correct directory (usually /opt/SUNWspro/bin/) is in your PATH (before /usr/ucb/).
If you use gcc, make sure your installation is recent and complete. perl versions since 5.6.0 build fine with gcc > 2.8.1 on Solaris >= 2.6.
You must Configure perl with
- $ sh Configure -Dcc=gcc
If you don't, you may experience strange build errors.
If you have updated your Solaris version, you may also have to update your gcc. For example, if you are running Solaris 2.6 and your gcc is installed under /usr/local, check in /usr/local/lib/gcc-lib and make sure you have the appropriate directory, sparc-sun-solaris2.6/ or i386-pc-solaris2.6/. If gcc's directory is for a different version of Solaris than you are running, then you will need to rebuild gcc for your new version of Solaris.
You can get a precompiled version of gcc from http://www.sunfreeware.com/ or http://www.blastwave.org/. Make sure you pick up the package for your Solaris release.
If you wish to use gcc to build add-on modules for use with the perl shipped with Solaris, you should use the Solaris::PerlGcc module which is available from CPAN. The perl shipped with Solaris is configured and built with the Sun compilers, and the compiler configuration information stored in Config.pm is therefore only relevant to the Sun compilers. The Solaris:PerlGcc module contains a replacement Config.pm that is correct for gcc - see the module for details.
The following information applies to gcc version 2. Volunteers to update it as appropriately for gcc version 3 would be appreciated.
The versions of as and ld supplied with Solaris work fine for building perl. There is normally no need to install the GNU versions to compile perl.
If you decide to ignore this advice and use the GNU versions anyway, then be sure that they are relatively recent. Versions newer than 2.7 are apparently new enough. Older versions may have trouble with dynamic loading.
If you wish to use GNU ld, then you need to pass it the -Wl,-E flag. The hints/solaris_2.sh file tries to do this automatically by setting the following Configure variables:
- ccdlflags="$ccdlflags -Wl,-E"
- lddlflags="$lddlflags -Wl,-E -G"
However, over the years, changes in gcc, GNU ld, and Solaris ld have made it difficult to automatically detect which ld ultimately gets called. You may have to manually edit config.sh and add the -Wl,-E flags yourself, or else run Configure interactively and add the flags at the appropriate prompts.
If your gcc is configured to use GNU as and ld but you want to use the Solaris ones instead to build perl, then you'll need to add -B/usr/ccs/bin/ to the gcc command line. One convenient way to do that is with
- $ sh Configure -Dcc='gcc -B/usr/ccs/bin/'
Note that the trailing slash is required. This will result in some harmless warnings as Configure is run:
- gcc: file path prefix `/usr/ccs/bin/' never used
These messages may safely be ignored. (Note that for a SunOS4 system, you must use -B/bin/ instead.)
Alternatively, you can use the GCC_EXEC_PREFIX environment variable to ensure that Sun's as and ld are used. Consult your gcc documentation for further information on the -B option and the GCC_EXEC_PREFIX variable.
The make under /usr/ccs/bin works fine for building perl. If you have the Sun C compilers, you will also have a parallel version of make (dmake). This works fine to build perl, but can sometimes cause problems when running 'make test' due to underspecified dependencies between the different test harness files. The same problem can also affect the building of some add-on modules, so in those cases either specify '-m serial' on the dmake command line, or use /usr/ccs/bin/make instead. If you wish to use GNU make, be sure that the set-group-id bit is not set. If it is, then arrange your PATH so that /usr/ccs/bin/make is before GNU make or else have the system administrator disable the set-group-id bit on GNU make.
Solaris provides some BSD-compatibility functions in /usr/ucblib/libucb.a. Perl will not build and run correctly if linked against -lucb since it contains routines that are incompatible with the standard Solaris libc. Normally this is not a problem since the solaris hints file prevents Configure from even looking in /usr/ucblib for libraries, and also explicitly omits -lucb.
Make sure your PATH includes the compiler (/opt/SUNWspro/bin/ if you're using Sun's compiler) as well as /usr/ccs/bin/ to pick up the other development tools (such as make, ar, as, and ld). Make sure your path either doesn't include /usr/ucb or that it includes it after the compiler and compiler tools and other standard Solaris directories. You definitely don't want /usr/ucb/cc.
If you have the LD_LIBRARY_PATH environment variable set, be sure that it does NOT include /lib or /usr/lib. If you will be building extensions that call third-party shared libraries (e.g. Berkeley DB) then make sure that your LD_LIBRARY_PATH environment variable includes the directory with that library (e.g. /usr/local/lib).
If you get an error message
- dlopen: stub interception failed
it is probably because your LD_LIBRARY_PATH environment variable includes a directory which is a symlink to /usr/lib (such as /lib). The reason this causes a problem is quite subtle. The file libdl.so.1.0 actually *only* contains functions which generate 'stub interception failed' errors! The runtime linker intercepts links to "/usr/lib/libdl.so.1.0" and links in internal implementations of those functions instead. [Thanks to Tim Bunce for this explanation.]
See the INSTALL file for general information regarding Configure. Only Solaris-specific issues are discussed here. Usually, the defaults should be fine.
See the INSTALL file for general information regarding 64-bit compiles. In general, the defaults should be fine for most people.
By default, perl-5.6.0 (or later) is compiled as a 32-bit application with largefile and long-long support.
Solaris 7 and above will run in either 32 bit or 64 bit mode on SPARC CPUs, via a reboot. You can build 64 bit apps whilst running 32 bit mode and vice-versa. 32 bit apps will run under Solaris running in either 32 or 64 bit mode. 64 bit apps require Solaris to be running 64 bit mode.
Existing 32 bit apps are properly known as LP32, i.e. Longs and Pointers are 32 bit. 64-bit apps are more properly known as LP64. The discriminating feature of a LP64 bit app is its ability to utilise a 64-bit address space. It is perfectly possible to have a LP32 bit app that supports both 64-bit integers (long long) and largefiles (> 2GB), and this is the default for perl-5.6.0.
For a more complete explanation of 64-bit issues, see the "Solaris 64-bit Developer's Guide" at http://docs.sun.com/
You can detect the OS mode using "isainfo -v", e.g.
- $ isainfo -v # Ultra 30 in 64 bit mode
- 64-bit sparcv9 applications
- 32-bit sparc applications
By default, perl will be compiled as a 32-bit application. Unless you want to allocate more than ~ 4GB of memory inside perl, or unless you need more than 255 open file descriptors, you probably don't need perl to be a 64-bit app.
For Solaris 2.6 and onwards, there are two different ways for 32-bit applications to manipulate large files (files whose size is > 2GByte). (A 64-bit application automatically has largefile support built in by default.)
First is the "transitional compilation environment", described in lfcompile64(5). According to the man page,
- The transitional compilation environment exports all the
- explicit 64-bit functions (xxx64()) and types in addition to
- all the regular functions (xxx()) and types. Both xxx() and
- xxx64() functions are available to the program source. A
- 32-bit application must use the xxx64() functions in order
- to access large files. See the lf64(5) manual page for a
- complete listing of the 64-bit transitional interfaces.
The transitional compilation environment is obtained with the following compiler and linker flags:
- getconf LFS64_CFLAGS -D_LARGEFILE64_SOURCE
- getconf LFS64_LDFLAG # nothing special needed
- getconf LFS64_LIBS # nothing special needed
Second is the "large file compilation environment", described in lfcompile(5). According to the man page,
- Each interface named xxx() that needs to access 64-bit entities
- to access large files maps to a xxx64() call in the
- resulting binary. All relevant data types are defined to be
- of correct size (for example, off_t has a typedef definition
- for a 64-bit entity).
- An application compiled in this environment is able to use
- the xxx() source interfaces to access both large and small
- files, rather than having to explicitly utilize the transitional
- xxx64() interface calls to access large files.
Two exceptions are fseek() and ftell(). 32-bit applications should use fseeko(3C) and ftello(3C). These will get automatically mapped to fseeko64() and ftello64().
The large file compilation environment is obtained with
- getconf LFS_CFLAGS -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
- getconf LFS_LDFLAGS # nothing special needed
- getconf LFS_LIBS # nothing special needed
By default, perl uses the large file compilation environment and relies on Solaris to do the underlying mapping of interfaces.
To compile a 64-bit application on an UltraSparc with a recent Sun Compiler, you need to use the flag "-xarch=v9". getconf(1) will tell you this, e.g.
- $ getconf -a | grep v9
- XBS5_LP64_OFF64_CFLAGS: -xarch=v9
- XBS5_LP64_OFF64_LDFLAGS: -xarch=v9
- XBS5_LP64_OFF64_LINTFLAGS: -xarch=v9
- XBS5_LPBIG_OFFBIG_CFLAGS: -xarch=v9
- XBS5_LPBIG_OFFBIG_LDFLAGS: -xarch=v9
- XBS5_LPBIG_OFFBIG_LINTFLAGS: -xarch=v9
- _XBS5_LP64_OFF64_CFLAGS: -xarch=v9
- _XBS5_LP64_OFF64_LDFLAGS: -xarch=v9
- _XBS5_LP64_OFF64_LINTFLAGS: -xarch=v9
- _XBS5_LPBIG_OFFBIG_CFLAGS: -xarch=v9
- _XBS5_LPBIG_OFFBIG_LDFLAGS: -xarch=v9
- _XBS5_LPBIG_OFFBIG_LINTFLAGS: -xarch=v9
This flag is supported in Sun WorkShop Compilers 5.0 and onwards (now marketed under the name Forte) when used on Solaris 7 or later on UltraSparc systems.
If you are using gcc, you would need to use -mcpu=v9 -m64 instead. This option is not yet supported as of gcc 2.95.2; from install/SPECIFIC in that release:
- GCC version 2.95 is not able to compile code correctly for sparc64
- targets. Users of the Linux kernel, at least, can use the sparc32
- program to start up a new shell invocation with an environment that
- causes configure to recognize (via uname -a) the system as sparc-*-*
- instead.
All this should be handled automatically by the hints file, if requested.
As of 5.8.1, long doubles are working if you use the Sun compilers (needed for additional math routines not included in libm).
It is possible to build a threaded version of perl on Solaris. The entire perl thread implementation is still experimental, however, so beware.
Starting from perl 5.7.1 perl uses the Solaris malloc, since the perl malloc breaks when dealing with more than 2GB of memory, and the Solaris malloc also seems to be faster.
If you for some reason (such as binary backward compatibility) really need to use perl's malloc, you can rebuild perl from the sources and Configure the build with
- $ sh Configure -Dusemymalloc
You should not use perl's malloc if you are building with gcc. There are reports of core dumps, especially in the PDL module. The problem appears to go away under -DDEBUGGING, so it has been difficult to track down. Sun's compiler appears to be okay with or without perl's malloc. [XXX further investigation is needed here.]
If you have problems with dynamic loading using gcc on SunOS or Solaris, and you are using GNU as and GNU ld, see the section GNU as and GNU ld above.
If you get this message on SunOS or Solaris, and you're using gcc, it's probably the GNU as or GNU ld problem in the previous item GNU as and GNU ld.
The primary cause of the 'dlopen: stub interception failed' message is that the LD_LIBRARY_PATH environment variable includes a directory which is a symlink to /usr/lib (such as /lib). See LD_LIBRARY_PATH above.
This is a common error when trying to build perl on Solaris 2.6 with a gcc installation from Solaris 2.5 or 2.5.1. The Solaris header files changed, so you need to update your gcc installation. You can either rerun the fixincludes script from gcc or take the opportunity to update your gcc installation.
This is a message from your shell telling you that the command 'ar' was not found. You need to check your PATH environment variable to make sure that it includes the directory with the 'ar' command. This is a common problem on Solaris, where 'ar' is in the /usr/ccs/bin/ directory.
op/stat.t test 4 may fail if you are on a tmpfs of some sort. Building in /tmp sometimes shows this behavior. The test suite detects if you are building in /tmp, but it may not be able to catch all tmpfs situations.
See nss_delete core dump from op/pwent or op/grent in perlhpux.
You can pick up prebuilt binaries for Solaris from http://www.sunfreeware.com/, http://www.blastwave.org, ActiveState http://www.activestate.com/, and http://www.perl.com/ under the Binaries list at the top of the page. There are probably other sources as well. Please note that these sites are under the control of their respective owners, not the perl developers.
The stdio(3C) manpage notes that for LP32 applications, only 255 files may be opened using fopen(), and only file descriptors 0 through 255 can be used in a stream. Since perl calls open() and then fdopen(3C) with the resulting file descriptor, perl is limited to 255 simultaneous open files, even if sysopen() is used. If this proves to be an insurmountable problem, you can compile perl as a LP64 application, see Building an LP64 perl for details. Note also that the default resource limit for open file descriptors on Solaris is 255, so you will have to modify your ulimit or rctl (Solaris 9 onwards) appropriately.
See the modules under the Solaris:: and Sun::Solaris namespaces on CPAN, see http://www.cpan.org/modules/by-module/Solaris/ and http://www.cpan.org/modules/by-module/Sun/.
Proc::ProcessTable does not compile on Solaris with perl5.6.0 and higher if you have LARGEFILES defined. Since largefile support is the default in 5.6.0 and later, you have to take special steps to use this module.
The problem is that various structures visible via procfs use off_t, and if you compile with largefile support these change from 32 bits to 64 bits. Thus what you get back from procfs doesn't match up with the structures in perl, resulting in garbage. See proc(4) for further discussion.
A fix for Proc::ProcessTable is to edit Makefile to explicitly remove the largefile flags from the ones MakeMaker picks up from Config.pm. This will result in Proc::ProcessTable being built under the correct environment. Everything should then be OK as long as Proc::ProcessTable doesn't try to share off_t's with the rest of perl, or if it does they should be explicitly specified as off64_t.
BSD::Resource versions earlier than 1.09 do not compile on Solaris with perl 5.6.0 and higher, for the same reasons as Proc::ProcessTable. BSD::Resource versions starting from 1.09 have a workaround for the problem.
Net::SSLeay requires a /dev/urandom to be present. This device is available from Solaris 9 onwards. For earlier Solaris versions you can either get the package SUNWski (packaged with several Sun software products, for example the Sun WebServer, which is part of the Solaris Server Intranet Extension, or the Sun Directory Services, part of Solaris for ISPs) or download the ANDIrand package from http://www.cosy.sbg.ac.at/~andi/. If you use SUNWski, make a symbolic link /dev/urandom pointing to /dev/random. For more details, see Document ID27606 entitled "Differing /dev/random support requirements within Solaris[TM] Operating Environments", available at http://sunsolve.sun.com .
It may be possible to use the Entropy Gathering Daemon (written in Perl!), available from http://www.lothar.com/tech/crypto/.
In SunOS 4.x you most probably want to use the SunOS ld, /usr/bin/ld, since the more recent versions of GNU ld (like 2.13) do not seem to work for building Perl anymore. When linking the extensions, the GNU ld gets very unhappy and spews a lot of errors like this
- ... relocation truncated to fit: BASE13 ...
and dies. Therefore the SunOS 4.1 hints file explicitly sets the ld to be /usr/bin/ld.
As of Perl 5.8.1 the dynamic loading of libraries (DynaLoader, XSLoader) also seems to have become broken in in SunOS 4.x. Therefore the default is to build Perl statically.
Running the test suite in SunOS 4.1 is a bit tricky since the lib/Tie/File/t/09_gen_rs test hangs (subtest #51, FWIW) for some unknown reason. Just stop the test and kill that particular Perl process.
There are various other failures, that as of SunOS 4.1.4 and gcc 3.2.2 look a lot like gcc bugs. Many of the failures happen in the Encode tests, where for example when the test expects "0" you get "0" which should after a little squinting look very odd indeed. Another example is earlier in t/run/fresh_perl where chr(0xff) is expected but the test fails because the result is chr(0xff). Exactly.
This is the "make test" result from the said combination:
- Failed 27 test scripts out of 745, 96.38% okay.
Running the harness
is painful because of the many failing
Unicode-related tests will output megabytes of failure messages,
but if one patiently waits, one gets these results:
- Failed Test Stat Wstat Total Fail Failed List of Failed
- -----------------------------------------------------------------------------
- ...
- ../ext/Encode/t/at-cn.t 4 1024 29 4 13.79% 14-17
- ../ext/Encode/t/at-tw.t 10 2560 17 10 58.82% 2 4 6 8 10 12
- 14-17
- ../ext/Encode/t/enc_data.t 29 7424 ?? ?? % ??
- ../ext/Encode/t/enc_eucjp.t 29 7424 ?? ?? % ??
- ../ext/Encode/t/enc_module.t 29 7424 ?? ?? % ??
- ../ext/Encode/t/encoding.t 29 7424 ?? ?? % ??
- ../ext/Encode/t/grow.t 12 3072 24 12 50.00% 2 4 6 8 10 12 14
- 16 18 20 22 24
- Failed Test Stat Wstat Total Fail Failed List of Failed
- ------------------------------------------------------------------------------
- ../ext/Encode/t/guess.t 255 65280 29 40 137.93% 10-29
- ../ext/Encode/t/jperl.t 29 7424 15 30 200.00% 1-15
- ../ext/Encode/t/mime-header.t 2 512 10 2 20.00% 2-3
- ../ext/Encode/t/perlio.t 22 5632 38 22 57.89% 1-4 9-16 19-20
- 23-24 27-32
- ../ext/List/Util/t/shuffle.t 0 139 ?? ?? % ??
- ../ext/PerlIO/t/encoding.t 14 1 7.14% 11
- ../ext/PerlIO/t/fallback.t 9 2 22.22% 3 5
- ../ext/Socket/t/socketpair.t 0 2 45 70 155.56% 11-45
- ../lib/CPAN/t/vcmp.t 30 1 3.33% 25
- ../lib/Tie/File/t/09_gen_rs.t 0 15 ?? ?? % ??
- ../lib/Unicode/Collate/t/test.t 199 30 15.08% 7 26-27 71-75
- 81-88 95 101
- 103-104 106 108-
- 109 122 124 161
- 169-172
- ../lib/sort.t 0 139 119 26 21.85% 107-119
- op/alarm.t 4 1 25.00% 4
- op/utfhash.t 97 1 1.03% 31
- run/fresh_perl.t 91 1 1.10% 32
- uni/tr_7jis.t ?? ?? % ??
- uni/tr_eucjp.t 29 7424 6 12 200.00% 1-6
- uni/tr_sjis.t 29 7424 6 12 200.00% 1-6
- 56 tests and 467 subtests skipped.
- Failed 27/811 test scripts, 96.67% okay. 1383/75399 subtests failed, 98.17% okay.
The alarm() test failure is caused by system() apparently blocking alarm(). That is probably a libc bug, and given that SunOS 4.x has been end-of-lifed years ago, don't hold your breath for a fix. In addition to that, don't try anything too Unicode-y, especially with Encode, and you should be fine in SunOS 4.x.
The original was written by Andy Dougherty doughera@lafayette.edu drawing heavily on advice from Alan Burlison, Nick Ing-Simmons, Tim Bunce, and many other Solaris users over the years.
Please report any errors, updates, or suggestions to perlbug@perl.org.
perlsource - A guide to the Perl source tree
This document describes the layout of the Perl source tree. If you're hacking on the Perl core, this will help you find what you're looking for.
The Perl source tree is big. Here's some of the thing you'll find in it:
The C source code and header files mostly live in the root of the source tree. There are a few platform-specific directories which contain C code. In addition, some of the modules shipped with Perl include C or XS code.
See perlinterp for more details on the files that make up the Perl interpreter, as well as details on how it works.
Modules shipped as part of the Perl core live in four subdirectories. Two of these directories contain modules that live in the core, and two contain modules that can also be released separately on CPAN. Modules which can be released on cpan are known as "dual-life" modules.
This directory contains pure-Perl modules which are only released as part of the core. This directory contains all of the modules and their tests, unlike other core modules.
This directory contains XS-using modules which are only released as part of the core. These modules generally have their Makefile.PL and are laid out more like a typical CPAN module.
This directory is for dual-life modules where the blead source is canonical. Note that some modules in this directory may not yet have been released separately on CPAN.
This directory contains dual-life modules where the CPAN module is canonical. Do not patch these modules directly! Changes to these modules should be submitted to the maintainer of the CPAN module. Once those changes are applied and released, the new version of the module will be incorporated into the core.
For some dual-life modules, it has not yet been determined if the CPAN version or the blead source is canonical. Until that is done, those modules should be in cpan/.
The Perl core has an extensive test suite. If you add new tests (or new modules with tests), you may need to update the t/TEST file so that the tests are run.
Tests for core modules in the lib/ directory are right next to the module itself. For example, we have lib/strict.pm and lib/strict.t.
Tests for modules in ext/ and the dual-life modules are in t/ subdirectories for each module, like a standard CPAN distribution.
Tests for the absolute basic functionality of Perl. This includes
if
, basic file reads and writes, simple regexes, etc. These are run
first in the test suite and if any of them fail, something is really
broken.
Tests for basic control structures, if/else, while
, subroutines,
etc.
Tests for basic issues of how Perl parses and compiles itself.
Tests for built-in IO functions, including command line arguments.
Tests for perl's method resolution order implementations (see mro).
Tests for perl's built in functions that don't fit into any of the other directories.
Tests for perl's built in functions which, like those in t/op/, do not fit into any of the other directories, but which, in addition, cannot use t/test.pl,as that program depends on functionality which the test file itself is testing.
Tests for regex related functions or behaviour. (These used to live in t/op).
Tests for features of how perl actually runs, including exit codes and handling of PERL* environment variables.
Tests for the core support of Unicode.
Windows-specific tests.
Tests the state of the source tree for various common errors. For example, it tests that everyone who is listed in the git log has a corresponding entry in the AUTHORS file.
The old home for the module tests, you shouldn't put anything new in here. There are still some bits and pieces hanging around in here that need to be moved. Perhaps you could move them? Thanks!
A test suite for the s2p converter.
All of the core documentation intended for end users lives in pod/. Individual modules in lib/, ext/, dist/, and cpan/ usually have their own documentation, either in the Module.pm file or an accompanying Module.pod file.
Finally, documentation intended for core Perl developers lives in the Porting/ directory.
The Porting directory contains a grab bag of code and documentation intended to help porters work on Perl. Some of the highlights include:
These are scripts which will check the source things like ANSI C violations, POD encoding issues, etc.
These files contain information on who maintains which modules. Run
perl Porting/Maintainers -M Module::Name
to find out more
information about a dual-life module.
Tidies a pod file. It's a good idea to run this on a pod file you've patched.
The Perl build system starts with the Configure script in the root directory.
Platform-specific pieces of the build system also live in platform-specific directories like win32/, vms/, etc.
The Configure script is ultimately responsible for generating a Makefile.
The build system that Perl uses is called metaconfig. This system is maintained separately from the Perl core.
The metaconfig system has its own git repository. Please see its README file in http://perl5.git.perl.org/metaconfig.git/ for more details.
The Cross directory contains various files related to cross-compiling Perl. See Cross/README for more details.
This file lists everyone who's contributed to Perl. If you submit a patch, you should add your name to this file as part of the patch.
The MANIFEST file in the root of the source tree contains a list of every file in the Perl core, as well as a brief description of each file.
You can get an overview of all the files with this command:
- % perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST
perlstyle - Perl style guide
Each programmer will, of course, have his or her own preferences in regards to formatting, but there are some general guidelines that will make your programs easier to read, understand, and maintain.
The most important thing is to run your programs under the -w
flag at all times. You may turn it off explicitly for particular
portions of code via the no warnings
pragma or the $^W
variable
if you must. You should also always run under use strict
or know the
reason why not. The use sigtrap
and even use diagnostics
pragmas
may also prove useful.
Regarding aesthetics of code lay out, about the only thing Larry cares strongly about is that the closing curly bracket of a multi-line BLOCK should line up with the keyword that started the construct. Beyond that, he has other preferences that aren't so strong:
4-column indent.
Opening curly on same line as keyword, if possible, otherwise line up.
Space before the opening curly of a multi-line BLOCK.
One-line BLOCK may be put on one line, including curlies.
No space before the semicolon.
Semicolon omitted in "short" one-line BLOCK.
Space around most operators.
Space around a "complex" subscript (inside brackets).
Blank lines between chunks that do different things.
Uncuddled elses.
No space between function name and its opening parenthesis.
Space after each comma.
Long lines broken after an operator (except and
and or
).
Space after last parenthesis matching on current line.
Line up corresponding items vertically.
Omit redundant punctuation as long as clarity doesn't suffer.
Larry has his reasons for each of these things, but he doesn't claim that everyone else's mind works the same as his does.
Here are some other more substantive style issues to think about:
Just because you CAN do something a particular way doesn't mean that you SHOULD do it that way. Perl is designed to give you several ways to do anything, so consider picking the most readable one. For instance
is better than
because the second way hides the main point of the statement in a modifier. On the other hand
- print "Starting analysis\n" if $verbose;
is better than
- $verbose && print "Starting analysis\n";
because the main point isn't whether the user typed -v or not.
Similarly, just because an operator lets you assume default arguments doesn't mean that you have to make use of the defaults. The defaults are there for lazy systems programmers writing one-shot programs. If you want your program to be readable, consider supplying the argument.
Along the same lines, just because you CAN omit parentheses in many places doesn't mean that you ought to:
When in doubt, parenthesize. At the very least it will let some poor schmuck bounce on the % key in vi.
Even if you aren't in doubt, consider the mental welfare of the person who has to maintain the code after you, and who will probably put parentheses in the wrong place.
Don't go through silly contortions to exit a loop at the top or the
bottom, when Perl provides the last operator so you can exit in
the middle. Just "outdent" it a little to make it more visible:
Don't be afraid to use loop labels--they're there to enhance readability as well as to allow multilevel loop breaks. See the previous example.
Avoid using grep() (or map()) or `backticks` in a void context, that is,
when you just throw away their return values. Those functions all
have return values, so use them. Otherwise use a foreach()
loop or
the system() function instead.
For portability, when using features that may not be implemented on
every machine, test the construct in an eval to see if it fails. If
you know what version or patchlevel a particular feature was
implemented, you can test $]
($PERL_VERSION
in English
) to see if it
will be there. The Config
module will also let you interrogate values
determined by the Configure program when Perl was installed.
Choose mnemonic identifiers. If you can't remember what mnemonic means, you've got a problem.
While short identifiers like $gotit
are probably ok, use underscores to
separate words in longer identifiers. It is generally easier to read
$var_names_like_this
than $VarNamesLikeThis
, especially for
non-native speakers of English. It's also a simple rule that works
consistently with VAR_NAMES_LIKE_THIS
.
Package names are sometimes an exception to this rule. Perl informally
reserves lowercase module names for "pragma" modules like integer
and
strict
. Other modules should begin with a capital letter and use mixed
case, but probably without underscores due to limitations in primitive
file systems' representations of module names as files that must fit into a
few sparse bytes.
You may find it helpful to use letter case to indicate the scope or nature of a variable. For example:
- $ALL_CAPS_HERE constants only (beware clashes with perl vars!)
- $Some_Caps_Here package-wide global/static
- $no_caps_here function scope my() or local() variables
Function and method names seem to work best as all lowercase.
E.g., $obj->as_string()
.
You can use a leading underscore to indicate that a variable or function should not be used outside the package that defined it.
If you have a really hairy regular expression, use the /x modifier and
put in some whitespace to make it look a little less like line noise.
Don't use slash as a delimiter when your regexp has slashes or backslashes.
Use the new and
and or
operators to avoid having to parenthesize
list operators so much, and to reduce the incidence of punctuation
operators like && and ||. Call your subroutines as if they were
functions or list operators to avoid excessive ampersands and parentheses.
Use here documents instead of repeated print() statements.
Line up corresponding things vertically, especially if it'd be too long to fit on one line anyway.
Always check the return codes of system calls. Good error messages should
go to STDERR
, include which program caused the problem, what the failed
system call and arguments were, and (VERY IMPORTANT) should contain the
standard system error message for what went wrong. Here's a simple but
sufficient example:
Line up your transliterations when it makes sense:
- tr [abc]
- [xyz];
Think about reusability. Why waste brainpower on a one-shot when you
might want to do something like it again? Consider generalizing your
code. Consider writing a module or object class. Consider making your
code run cleanly with use strict
and use warnings
(or -w) in
effect. Consider giving away your code. Consider changing your whole
world view. Consider... oh, never mind.
Try to document your code and use Pod formatting in a consistent way. Here are commonly expected conventions:
use C<>
for function, variable and module names (and more
generally anything that can be considered part of code, like filehandles
or specific values). Note that function names are considered more readable
with parentheses after their name, that is function()
.
use B<>
for commands names like cat or grep.
use F<>
or C<>
for file names. F<>
should
be the only Pod code for file names, but as most Pod formatters render it
as italic, Unix and Windows paths with their slashes and backslashes may
be less readable, and better rendered with C<>
.
Be consistent.
Be nice.
perlsub - Perl subroutines
To declare subroutines:
- sub NAME; # A "forward" declaration.
- sub NAME(PROTO); # ditto, but with prototypes
- sub NAME : ATTRS; # with attributes
- sub NAME(PROTO) : ATTRS; # with attributes and prototypes
- sub NAME BLOCK # A declaration and a definition.
- sub NAME(PROTO) BLOCK # ditto, but with prototypes
- sub NAME : ATTRS BLOCK # with attributes
- sub NAME(PROTO) : ATTRS BLOCK # with prototypes and attributes
To define an anonymous subroutine at runtime:
- $subref = sub BLOCK; # no proto
- $subref = sub (PROTO) BLOCK; # with proto
- $subref = sub : ATTRS BLOCK; # with attributes
- $subref = sub (PROTO) : ATTRS BLOCK; # with proto and attributes
To import subroutines:
- use MODULE qw(NAME1 NAME2 NAME3);
To call subroutines:
- NAME(LIST); # & is optional with parentheses.
- NAME LIST; # Parentheses optional if predeclared/imported.
- &NAME(LIST); # Circumvent prototypes.
- &NAME; # Makes current @_ visible to called subroutine.
Like many languages, Perl provides for user-defined subroutines.
These may be located anywhere in the main program, loaded in from
other files via the do, require, or use keywords, or
generated on the fly using eval or anonymous subroutines.
You can even call a function indirectly using a variable containing
its name or a CODE reference.
The Perl model for function call and return values is simple: all functions are passed as parameters one single flat list of scalars, and all functions likewise return to their caller one single flat list of scalars. Any arrays or hashes in these call and return lists will collapse, losing their identities--but you may always use pass-by-reference instead to avoid this. Both call and return lists may contain as many or as few scalar elements as you'd like. (Often a function without an explicit return statement is called a subroutine, but there's really no difference from Perl's perspective.)
Any arguments passed in show up in the array @_
. Therefore, if
you called a function with two arguments, those would be stored in
$_[0]
and $_[1]
. The array @_
is a local array, but its
elements are aliases for the actual scalar parameters. In particular,
if an element $_[0]
is updated, the corresponding argument is
updated (or an error occurs if it is not updatable). If an argument
is an array or hash element which did not exist when the function
was called, that element is created only when (and if) it is modified
or a reference to it is taken. (Some earlier versions of Perl
created the element whether or not the element was assigned to.)
Assigning to the whole array @_
removes that aliasing, and does
not update any arguments.
A return statement may be used to exit a subroutine, optionally
specifying the returned value, which will be evaluated in the
appropriate context (list, scalar, or void) depending on the context of
the subroutine call. If you specify no return value, the subroutine
returns an empty list in list context, the undefined value in scalar
context, or nothing in void context. If you return one or more
aggregates (arrays and hashes), these will be flattened together into
one large indistinguishable list.
If no return is found and if the last statement is an expression, its
value is returned. If the last statement is a loop control structure
like a foreach
or a while
, the returned value is unspecified. The
empty sub returns the empty list.
Perl does not have named formal parameters. In practice all you
do is assign to a my() list of these. Variables that aren't
declared to be private are global variables. For gory details
on creating private variables, see Private Variables via my()
and Temporary Values via local(). To create protected
environments for a set of functions in a separate package (and
probably a separate file), see Packages in perlmod.
Example:
Example:
- # get a line, combining continuation lines
- # that start with whitespace
- sub get_line {
- $thisline = $lookahead; # global variables!
- LINE: while (defined($lookahead = <STDIN>)) {
- if ($lookahead =~ /^[ \t]/) {
- $thisline .= $lookahead;
- }
- else {
- last LINE;
- }
- }
- return $thisline;
- }
- $lookahead = <STDIN>; # get first line
- while (defined($line = get_line())) {
- ...
- }
Assigning to a list of private variables to name your arguments:
- sub maybeset {
- my($key, $value) = @_;
- $Foo{$key} = $value unless $Foo{$key};
- }
Because the assignment copies the values, this also has the effect
of turning call-by-reference into call-by-value. Otherwise a
function is free to do in-place modifications of @_
and change
its caller's values.
You aren't allowed to modify constants in this way, of course. If an argument were actually literal and you tried to change it, you'd take a (presumably fatal) exception. For example, this won't work:
- upcase_in("frederick");
It would be much safer if the upcase_in()
function
were written to return a copy of its parameters instead
of changing them in place:
Notice how this (unprototyped) function doesn't care whether it was
passed real scalars or arrays. Perl sees all arguments as one big,
long, flat parameter list in @_
. This is one area where
Perl's simple argument-passing style shines. The upcase()
function would work perfectly well without changing the upcase()
definition even if we fed it things like this:
- @newlist = upcase(@list1, @list2);
- @newlist = upcase( split /:/, $var );
Do not, however, be tempted to do this:
- (@a, @b) = upcase(@list1, @list2);
Like the flattened incoming parameter list, the return list is also
flattened on return. So all you have managed to do here is stored
everything in @a
and made @b
empty. See
Pass by Reference for alternatives.
A subroutine may be called using an explicit &
prefix. The
&
is optional in modern Perl, as are parentheses if the
subroutine has been predeclared. The &
is not optional
when just naming the subroutine, such as when it's used as
an argument to defined() or undef(). Nor is it optional when you
want to do an indirect subroutine call with a subroutine name or
reference using the &$subref()
or &{$subref}()
constructs,
although the $subref->()
notation solves that problem.
See perlref for more about all that.
Subroutines may be called recursively. If a subroutine is called
using the &
form, the argument list is optional, and if omitted,
no @_
array is set up for the subroutine: the @_
array at the
time of the call is visible to subroutine instead. This is an
efficiency mechanism that new users may wish to avoid.
- &foo(1,2,3); # pass three arguments
- foo(1,2,3); # the same
- foo(); # pass a null list
- &foo(); # the same
- &foo; # foo() get current args, like foo(@_) !!
- foo; # like foo() IFF sub foo predeclared, else "foo"
Not only does the &
form make the argument list optional, it also
disables any prototype checking on arguments you do provide. This
is partly for historical reasons, and partly for having a convenient way
to cheat if you know what you're doing. See Prototypes below.
Since Perl 5.16.0, the __SUB__ token is available under use feature
'current_sub'
and use 5.16.0
. It will evaluate to a reference to the
currently-running sub, which allows for recursive calls without knowing
your subroutine's name.
The behaviour of __SUB__ within a regex code block (such as /(?{...})/
)
is subject to change.
Subroutines whose names are in all upper case are reserved to the Perl
core, as are modules whose names are in all lower case. A subroutine in
all capitals is a loosely-held convention meaning it will be called
indirectly by the run-time system itself, usually due to a triggered event.
Subroutines that do special, pre-defined things include AUTOLOAD
, CLONE
,
DESTROY
plus all functions mentioned in perltie and PerlIO::via.
The BEGIN
, UNITCHECK
, CHECK
, INIT
and END
subroutines
are not so much subroutines as named special code blocks, of which you
can have more than one in a package, and which you can not call
explicitly. See BEGIN, UNITCHECK, CHECK, INIT and END in perlmod
Synopsis:
WARNING: The use of attribute lists on my declarations is still
evolving. The current semantics and interface are subject to change.
See attributes and Attribute::Handlers.
The my operator declares the listed variables to be lexically
confined to the enclosing block, conditional (if/unless/elsif/else),
loop (for/foreach/while/until/continue), subroutine, eval,
or do/require/use'd file. If more than one value is listed, the
list must be placed in parentheses. All listed elements must be
legal lvalues. Only alphanumeric identifiers may be lexically
scoped--magical built-ins like $/
must currently be localized
with local instead.
Unlike dynamic variables created by the local operator, lexical
variables declared with my are totally hidden from the outside
world, including any called subroutines. This is true if it's the
same subroutine called from itself or elsewhere--every call gets
its own copy.
This doesn't mean that a my variable declared in a statically
enclosing lexical scope would be invisible. Only dynamic scopes
are cut off. For example, the bumpx()
function below has access
to the lexical $x variable because both the my and the sub
occurred at the same scope, presumably file scope.
- my $x = 10;
- sub bumpx { $x++ }
An eval(), however, can see lexical variables of the scope it is
being evaluated in, so long as the names aren't hidden by declarations within
the eval() itself. See perlref.
The parameter list to my() may be assigned to if desired, which allows you to initialize your variables. (If no initializer is given for a particular variable, it is created with the undefined value.) Commonly this is used to name input parameters to a subroutine. Examples:
- $arg = "fred"; # "global" variable
- $n = cube_root(27);
- print "$arg thinks the root is $n\n";
- fred thinks the root is 3
- sub cube_root {
- my $arg = shift; # name doesn't matter
- $arg **= 1/3;
- return $arg;
- }
The my is simply a modifier on something you might assign to. So when
you do assign to variables in its argument list, my doesn't
change whether those variables are viewed as a scalar or an array. So
both supply a list context to the right-hand side, while
- my $foo = <STDIN>;
supplies a scalar context. But the following declares only one variable:
- my $foo, $bar = 1; # WRONG
That has the same effect as
- my $foo;
- $bar = 1;
The declared variable is not introduced (is not visible) until after the current statement. Thus,
- my $x = $x;
can be used to initialize a new $x with the value of the old $x, and the expression
- my $x = 123 and $x == 123
is false unless the old $x happened to have the value 123
.
Lexical scopes of control structures are not bounded precisely by the braces that delimit their controlled blocks; control expressions are part of that scope, too. Thus in the loop
the scope of $line extends from its declaration throughout the rest of
the loop construct (including the continue clause), but not beyond
it. Similarly, in the conditional
the scope of $answer extends from its declaration through the rest
of that conditional, including any elsif
and else
clauses,
but not beyond it. See Simple Statements in perlsyn for information
on the scope of variables in statements with modifiers.
The foreach
loop defaults to scoping its index variable dynamically
in the manner of local. However, if the index variable is
prefixed with the keyword my, or if there is already a lexical
by that name in scope, then a new lexical is created instead. Thus
in the loop
- for my $i (1, 2, 3) {
- some_function();
- }
the scope of $i extends to the end of the loop, but not beyond it,
rendering the value of $i inaccessible within some_function()
.
Some users may wish to encourage the use of lexically scoped variables. As an aid to catching implicit uses to package variables, which are always global, if you say
- use strict 'vars';
then any variable mentioned from there to the end of the enclosing
block must either refer to a lexical variable, be predeclared via
our or use vars
, or else must be fully qualified with the package name.
A compilation error results otherwise. An inner block may countermand
this with no strict 'vars'
.
A my has both a compile-time and a run-time effect. At compile
time, the compiler takes notice of it. The principal usefulness
of this is to quiet use strict 'vars'
, but it is also essential
for generation of closures as detailed in perlref. Actual
initialization is delayed until run time, though, so it gets executed
at the appropriate time, such as each time through a loop, for
example.
Variables declared with my are not part of any package and are therefore
never fully qualified with the package name. In particular, you're not
allowed to try to make a package variable (or other global) lexical:
- my $pack::var; # ERROR! Illegal syntax
In fact, a dynamic variable (also known as package or global variables)
are still accessible using the fully qualified ::
notation even while a
lexical of the same name is also visible:
That will print out 20
and 10
.
You may declare my variables at the outermost scope of a file
to hide any such identifiers from the world outside that file. This
is similar in spirit to C's static variables when they are used at
the file level. To do this with a subroutine requires the use of
a closure (an anonymous function that accesses enclosing lexicals).
If you want to create a private subroutine that cannot be called
from outside that block, it can declare a lexical variable containing
an anonymous sub reference:
As long as the reference is never returned by any function within the
module, no outside module can see the subroutine, because its name is not in
any package's symbol table. Remember that it's not REALLY called
$some_pack::secret_version
or anything; it's just $secret_version,
unqualified and unqualifiable.
This does not work with object methods, however; all object methods have to be in the symbol table of some package to be found. See Function Templates in perlref for something of a work-around to this.
There are two ways to build persistent private variables in Perl 5.10.
First, you can simply use the state feature. Or, you can use closures,
if you want to stay compatible with releases older than 5.10.
Beginning with Perl 5.10.0, you can declare variables with the state
keyword in place of my. For that to work, though, you must have
enabled that feature beforehand, either by using the feature
pragma, or
by using -E
on one-liners (see feature). Beginning with Perl 5.16,
the CORE::state
form does not require the
feature
pragma.
The state keyword creates a lexical variable (following the same scoping
rules as my) that persists from one subroutine call to the next. If a
state variable resides inside an anonymous subroutine, then each copy of
the subroutine has its own copy of the state variable. However, the value
of the state variable will still persist between calls to the same copy of
the anonymous subroutine. (Don't forget that sub { ... }
creates a new
subroutine each time it is executed.)
For example, the following code maintains a private counter, incremented each time the gimme_another() function is called:
And this example uses anonymous subroutines to create separate counters:
Also, since $x
is lexical, it can't be reached or modified by any Perl
code outside.
When combined with variable declaration, simple scalar assignment to state
variables (as in state $x = 42
) is executed only the first time. When such
statements are evaluated subsequent times, the assignment is ignored. The
behavior of this sort of assignment to non-scalar variables is undefined.
Just because a lexical variable is lexically (also called statically)
scoped to its enclosing block, eval, or do FILE, this doesn't mean that
within a function it works like a C static. It normally works more
like a C auto, but with implicit garbage collection.
Unlike local variables in C or C++, Perl's lexical variables don't necessarily get recycled just because their scope has exited. If something more permanent is still aware of the lexical, it will stick around. So long as something else references a lexical, that lexical won't be freed--which is as it should be. You wouldn't want memory being free until you were done using it, or kept around once you were done. Automatic garbage collection takes care of this for you.
This means that you can pass back or save away references to lexical variables, whereas to return a pointer to a C auto is a grave error. It also gives us a way to simulate C's function statics. Here's a mechanism for giving a function private variables with both lexical scoping and a static lifetime. If you do want to create something like C's static variables, just enclose the whole function in an extra block, and put the static variable outside the function but in the block.
If this function is being sourced in from a separate file
via require or use, then this is probably just fine. If it's
all in the main program, you'll need to arrange for the my
to be executed early, either by putting the whole block above
your main program, or more likely, placing merely a BEGIN
code block around it to make sure it gets executed before your program
starts to run:
See BEGIN, UNITCHECK, CHECK, INIT and END in perlmod about the
special triggered code blocks, BEGIN
, UNITCHECK
, CHECK
,
INIT
and END
.
If declared at the outermost scope (the file scope), then lexicals work somewhat like C's file statics. They are available to all functions in that same file declared below them, but are inaccessible from outside that file. This strategy is sometimes used in modules to create private variables that the whole module can see.
WARNING: In general, you should be using my instead of local, because
it's faster and safer. Exceptions to this include the global punctuation
variables, global filehandles and formats, and direct manipulation of the
Perl symbol table itself. local is mostly used when the current value
of a variable must be visible to called subroutines.
Synopsis:
- # localization of values
- local $foo; # make $foo dynamically local
- local (@wid, %get); # make list of variables local
- local $foo = "flurp"; # make $foo dynamic, and init it
- local @oof = @bar; # make @oof dynamic, and init it
- local $hash{key} = "val"; # sets a local value for this hash entry
- delete local $hash{key}; # delete this entry for the current block
- local ($cond ? $v1 : $v2); # several types of lvalues support
- # localization
- # localization of symbols
- local *FH; # localize $FH, @FH, %FH, &FH ...
- local *merlyn = *randal; # now $merlyn is really $randal, plus
- # @merlyn is really @randal, etc
- local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal
- local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc
A local modifies its listed variables to be "local" to the
enclosing block, eval, or do FILE
--and to any subroutine
called from within that block. A local just gives temporary
values to global (meaning package) variables. It does not create
a local variable. This is known as dynamic scoping. Lexical scoping
is done with my, which works more like C's auto declarations.
Some types of lvalues can be localized as well: hash and array elements and slices, conditionals (provided that their result is always localizable), and symbolic references. As for simple variables, this creates new, dynamically scoped values.
If more than one variable or expression is given to local, they must be
placed in parentheses. This operator works
by saving the current values of those variables in its argument list on a
hidden stack and restoring them upon exiting the block, subroutine, or
eval. This means that called subroutines can also reference the local
variable, but not the global one. The argument list may be assigned to if
desired, which allows you to initialize your local variables. (If no
initializer is given for a particular variable, it is created with an
undefined value.)
Because local is a run-time operator, it gets executed each time
through a loop. Consequently, it's more efficient to localize your
variables outside the loop.
A local is simply a modifier on an lvalue expression. When you assign to
a localized variable, the local doesn't change whether its list is viewed
as a scalar or an array. So
both supply a list context to the right-hand side, while
- local $foo = <STDIN>;
supplies a scalar context.
If you localize a special variable, you'll be giving a new value to it, but its magic won't go away. That means that all side-effects related to this magic still work with the localized value.
This feature allows code like this to work :
Note, however, that this restricts localization of some values ; for example, the following statement dies, as of perl 5.10.0, with an error Modification of a read-only value attempted, because the $1 variable is magical and read-only :
- local $1 = 2;
One exception is the default scalar variable: starting with perl 5.14
local($_) will always strip all magic from $_, to make it possible
to safely reuse $_ in a subroutine.
WARNING: Localization of tied arrays and hashes does not currently work as described. This will be fixed in a future release of Perl; in the meantime, avoid code that relies on any particular behaviour of localising tied arrays or hashes (localising individual elements is still okay). See Localising Tied Arrays and Hashes Is Broken in perl58delta for more details.
The construct
- local *name;
creates a whole new symbol table entry for the glob name
in the
current package. That means that all variables in its glob slot ($name,
@name, %name, &name, and the name
filehandle) are dynamically reset.
This implies, among other things, that any magic eventually carried by
those variables is locally lost. In other words, saying local */
will not have any effect on the internal value of the input record
separator.
It's also worth taking a moment to explain what happens when you
localize a member of a composite type (i.e. an array or hash element).
In this case, the element is localized by name. This means that
when the scope of the local() ends, the saved value will be
restored to the hash element whose key was named in the local(), or
the array element whose index was named in the local(). If that
element was deleted while the local() was in effect (e.g. by a
delete() from a hash or a shift() of an array), it will spring
back into existence, possibly extending an array and filling in the
skipped elements with undef. For instance, if you say
- %hash = ( 'This' => 'is', 'a' => 'test' );
- @ary = ( 0..5 );
- {
- local($ary[5]) = 6;
- local($hash{'a'}) = 'drill';
- while (my $e = pop(@ary)) {
- print "$e . . .\n";
- last unless $e > 3;
- }
- if (@ary) {
- $hash{'only a'} = 'test';
- delete $hash{'a'};
- }
- }
- print join(' ', map { "$_ $hash{$_}" } sort keys %hash),".\n";
- print "The array has ",scalar(@ary)," elements: ",
- join(', ', map { defined $_ ? $_ : 'undef' } @ary),"\n";
Perl will print
- 6 . . .
- 4 . . .
- 3 . . .
- This is a test only a test.
- The array has 6 elements: 0, 1, 2, undef, undef, 5
The behavior of local() on non-existent members of composite types is subject to change in future.
You can use the delete local $array[$idx]
and delete local $hash{key}
constructs to delete a composite type entry for the current block and restore
it when it ends. They return the array/hash value before the localization,
which means that they are respectively equivalent to
and
except that for those the local is scoped to the do block. Slices are
also accepted.
- my %hash = (
- a => [ 7, 8, 9 ],
- b => 1,
- )
- {
- my $a = delete local $hash{a};
- # $a is [ 7, 8, 9 ]
- # %hash is (b => 1)
- {
- my @nums = delete local @$a[0, 2]
- # @nums is (7, 9)
- # $a is [ undef, 8 ]
- $a[0] = 999; # will be erased when the scope ends
- }
- # $a is back to [ 7, 8, 9 ]
- }
- # %hash is back to its original state
WARNING: Lvalue subroutines are still experimental and the implementation may change in future versions of Perl.
It is possible to return a modifiable value from a subroutine. To do this, you have to declare the subroutine to return an lvalue.
- my $val;
- sub canmod : lvalue {
- $val; # or: return $val;
- }
- sub nomod {
- $val;
- }
- canmod() = 5; # assigns to $val
- nomod() = 5; # ERROR
The scalar/list context for the subroutine and for the right-hand side of assignment is determined as if the subroutine call is replaced by a scalar. For example, consider:
- data(2,3) = get_data(3,4);
Both subroutines here are called in a scalar context, while in:
- (data(2,3)) = get_data(3,4);
and in:
- (data(2),data(3)) = get_data(3,4);
all the subroutines are called in a list context.
They appear to be convenient, but there is at least one reason to be circumspect.
They violate encapsulation. A normal mutator can check the supplied argument before setting the attribute it is protecting, an lvalue subroutine never gets that chance. Consider;
- my $some_array_ref = []; # protected by mutators ??
- sub set_arr { # normal mutator
- my $val = shift;
- die("expected array, you supplied ", ref $val)
- unless ref $val eq 'ARRAY';
- $some_array_ref = $val;
- }
- sub set_arr_lv : lvalue { # lvalue mutator
- $some_array_ref;
- }
- # set_arr_lv cannot stop this !
- set_arr_lv() = { a => 1 };
WARNING: Lexical subroutines are still experimental. The feature may be modified or removed in future versions of Perl.
Lexical subroutines are only available under the use feature
'lexical_subs'
pragma, which produces a warning unless the
"experimental::lexical_subs" warnings category is disabled.
Beginning with Perl 5.18, you can declare a private subroutine with my
or state. As with state variables, the state keyword is only
available under use feature 'state'
or use 5.010
or higher.
These subroutines are only visible within the block in which they are declared, and only after that declaration:
To use a lexical subroutine from inside the subroutine itself, you must
predeclare it. The sub foo {...}
subroutine definition syntax respects
any previous my sub;
or state sub;
declaration.
- my sub baz; # predeclaration
- sub baz { # define the "my" sub
- baz(); # recursive call
- }
state sub
vs my sub
What is the difference between "state" subs and "my" subs? Each time that execution enters a block when "my" subs are declared, a new copy of each sub is created. "State" subroutines persist from one execution of the containing block to the next.
So, in general, "state" subroutines are faster. But "my" subs are necessary if you want to create closures:
In this example, a new $x
is created when whatever
is called, and
also a new inner
, which can see the new $x
. A "state" sub will only
see the $x
from the first call to whatever
.
our subroutinesLike our $variable
, our sub
creates a lexical alias to the package
subroutine of the same name.
The two main uses for this are to switch back to using the package sub inside an inner scope:
and to make a subroutine visible to other packages in the same scope:
WARNING: The mechanism described in this section was originally the only way to simulate pass-by-reference in older versions of Perl. While it still works fine in modern versions, the new reference mechanism is generally easier to work with. See below.
Sometimes you don't want to pass the value of an array to a subroutine
but rather the name of it, so that the subroutine can modify the global
copy of it rather than working with a local copy. In perl you can
refer to all objects of a particular name by prefixing the name
with a star: *foo
. This is often known as a "typeglob", because the
star on the front can be thought of as a wildcard match for all the
funny prefix characters on variables and subroutines and such.
When evaluated, the typeglob produces a scalar value that represents
all the objects of that name, including any filehandle, format, or
subroutine. When assigned to, it causes the name mentioned to refer to
whatever *
value was assigned to it. Example:
- sub doubleary {
- local(*someary) = @_;
- foreach $elem (@someary) {
- $elem *= 2;
- }
- }
- doubleary(*foo);
- doubleary(*bar);
Scalars are already passed by reference, so you can modify
scalar arguments without using this mechanism by referring explicitly
to $_[0]
etc. You can modify all the elements of an array by passing
all the elements as scalars, but you have to use the *
mechanism (or
the equivalent reference mechanism) to push, pop, or change the size of
an array. It will certainly be faster to pass the typeglob (or reference).
Even if you don't want to modify an array, this mechanism is useful for passing multiple arrays in a single LIST, because normally the LIST mechanism will merge all the array values so that you can't extract out the individual arrays. For more on typeglobs, see Typeglobs and Filehandles in perldata.
Despite the existence of my, there are still three places where the
local operator still shines. In fact, in these three places, you
must use local instead of my.
You need to give a global variable a temporary value, especially $_.
The global variables, like @ARGV
or the punctuation variables, must be
localized with local(). This block reads in /etc/motd, and splits
it up into chunks separated by lines of equal signs, which are placed
in @Fields
.
It particular, it's important to localize $_ in any routine that assigns
to it. Look out for implicit assignments in while
conditionals.
You need to create a local file or directory handle or a local function.
A function that needs a filehandle of its own must use
local() on a complete typeglob. This can be used to create new symbol
table entries:
See the Symbol module for a way to create anonymous symbol table entries.
Because assignment of a reference to a typeglob creates an alias, this can be used to create what is effectively a local function, or at least, a local alias.
- {
- local *grow = \&shrink; # only until this block exits
- grow(); # really calls shrink()
- move(); # if move() grow()s, it shrink()s too
- }
- grow(); # get the real grow() again
See Function Templates in perlref for more about manipulating functions by name in this way.
You want to temporarily change just one element of an array or hash.
You can localize just one element of an aggregate. Usually this
is done on dynamics:
- {
- local $SIG{INT} = 'IGNORE';
- funct(); # uninterruptible
- }
- # interruptibility automatically restored here
But it also works on lexically declared aggregates.
If you want to pass more than one array or hash into a function--or return them from it--and have them maintain their integrity, then you're going to have to use an explicit pass-by-reference. Before you do that, you need to understand references as detailed in perlref. This section may not make much sense to you otherwise.
Here are a few simple examples. First, let's pass in several arrays
to a function and have it pop all of then, returning a new list
of all their former last elements:
Here's how you might write a function that returns a list of keys occurring in all the hashes passed to it:
So far, we're using just the normal list return mechanism. What happens if you want to pass or return a hash? Well, if you're using only one of them, or you don't mind them concatenating, then the normal calling convention is ok, although a little expensive.
Where people get into trouble is here:
- (@a, @b) = func(@c, @d);
- or
- (%a, %b) = func(%c, %d);
That syntax simply won't work. It sets just @a
or %a
and
clears the @b
or %b
. Plus the function didn't get passed
into two separate arrays or hashes: it got one long list in @_
,
as always.
If you can arrange for everyone to deal with this through references, it's cleaner code, although not so nice to look at. Here's a function that takes two array references as arguments, returning the two array elements in order of how many elements they have in them:
It turns out that you can actually do this also:
Here we're using the typeglobs to do symbol table aliasing. It's
a tad subtle, though, and also won't work if you're using my
variables, because only globals (even in disguise as locals)
are in the symbol table.
If you're passing around filehandles, you could usually just use the bare
typeglob, like *STDOUT
, but typeglobs references work, too.
For example:
If you're planning on generating new filehandles, you could do this. Notice to pass back just the bare *FH, not its reference.
Perl supports a very limited kind of compile-time argument checking using function prototyping. If you declare
then mypush()
takes arguments exactly like push() does. The
function declaration must be visible at compile time. The prototype
affects only interpretation of new-style calls to the function,
where new-style is defined as not using the &
character. In
other words, if you call it like a built-in function, then it behaves
like a built-in function. If you call it like an old-fashioned
subroutine, then it behaves like an old-fashioned subroutine. It
naturally falls out from this rule that prototypes have no influence
on subroutine references like \&foo
or on indirect subroutine
calls like &{$subref}
or $subref->()
.
Method calls are not influenced by prototypes either, because the function to be called is indeterminate at compile time, since the exact code called depends on inheritance.
Because the intent of this feature is primarily to let you define subroutines that work like built-in functions, here are prototypes for some other functions that parse almost exactly like the corresponding built-in.
- Declared as Called as
- sub mylink ($$) mylink $old, $new
- sub myvec ($$$) myvec $var, $offset, 1
- sub myindex ($$;$) myindex &getstring, "substr"
- sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off
- sub myreverse (@) myreverse $a, $b, $c
- sub myjoin ($@) myjoin ":", $a, $b, $c
- sub mypop (+) mypop @array
- sub mysplice (+$$@) mysplice @array, 0, 2, @pushme
- sub mykeys (+) mykeys %{$hashref}
- sub myopen (*;$) myopen HANDLE, $name
- sub mypipe (**) mypipe READHANDLE, WRITEHANDLE
- sub mygrep (&@) mygrep { /foo/ } $a, $b, $c
- sub myrand (;$) myrand 42
- sub mytime () mytime
Any backslashed prototype character represents an actual argument
that must start with that character (optionally preceded by my,
our or local), with the exception of $
, which will
accept any scalar lvalue expression, such as $foo = 7
or
my_function()->[0]
. The value passed as part of @_
will be a
reference to the actual argument given in the subroutine call,
obtained by applying \
to that argument.
You can use the \[]
backslash group notation to specify more than one
allowed argument type. For example:
will allow calling myref() as
- myref $var
- myref @array
- myref %hash
- myref &sub
- myref *glob
and the first argument of myref() will be a reference to a scalar, an array, a hash, a code, or a glob.
Unbackslashed prototype characters have special meanings. Any
unbackslashed @
or %
eats all remaining arguments, and forces
list context. An argument represented by $
forces scalar context. An
&
requires an anonymous subroutine, which, if passed as the first
argument, does not require the sub keyword or a subsequent comma.
A *
allows the subroutine to accept a bareword, constant, scalar expression,
typeglob, or a reference to a typeglob in that slot. The value will be
available to the subroutine either as a simple scalar, or (in the latter
two cases) as a reference to the typeglob. If you wish to always convert
such arguments to a typeglob reference, use Symbol::qualify_to_ref() as
follows:
The +
prototype is a special alternative to $
that will act like
\[@%]
when given a literal array or hash variable, but will otherwise
force scalar context on the argument. This is useful for functions which
should accept either a literal array or an array reference as the argument:
When using the +
prototype, your function must check that the argument
is of an acceptable type.
A semicolon (;
) separates mandatory arguments from optional arguments.
It is redundant before @
or %
, which gobble up everything else.
As the last character of a prototype, or just before a semicolon, a @
or a %
, you can use _
in place of $
: if this argument is not
provided, $_
will be used instead.
Note how the last three examples in the table above are treated
specially by the parser. mygrep()
is parsed as a true list
operator, myrand()
is parsed as a true unary operator with unary
precedence the same as rand(), and mytime()
is truly without
arguments, just like time(). That is, if you say
- mytime +2;
you'll get mytime() + 2
, not mytime(2)
, which is how it would be parsed
without a prototype. If you want to force a unary function to have the
same precedence as a list operator, add ;
to the end of the prototype:
- sub mygetprotobynumber($;);
- mygetprotobynumber $a > $b; # parsed as mygetprotobynumber($a > $b)
The interesting thing about &
is that you can generate new syntax with it,
provided it's in the initial position:
That prints "unphooey"
. (Yes, there are still unresolved
issues having to do with visibility of @_
. I'm ignoring that
question for the moment. (But note that if we make @_
lexically
scoped, those anonymous subroutines can act like closures... (Gee,
is this sounding a little Lispish? (Never mind.))))
And here's a reimplementation of the Perl grep operator:
Some folks would prefer full alphanumeric prototypes. Alphanumerics have been intentionally left out of prototypes for the express purpose of someday in the future adding named, formal parameters. The current mechanism's main goal is to let module writers provide better diagnostics for module users. Larry feels the notation quite understandable to Perl programmers, and that it will not intrude greatly upon the meat of the module, nor make it harder to read. The line noise is visually encapsulated into a small pill that's easy to swallow.
If you try to use an alphanumeric sequence in a prototype you will generate an optional warning - "Illegal character in prototype...". Unfortunately earlier versions of Perl allowed the prototype to be used as long as its prefix was a valid prototype. The warning may be upgraded to a fatal error in a future version of Perl once the majority of offending code is fixed.
It's probably best to prototype new functions, not retrofit prototyping into older ones. That's because you must be especially careful about silent impositions of differing list versus scalar contexts. For example, if you decide that a function should take just one parameter, like this:
and someone has been calling it with an array or expression returning a list:
- func(@foo);
- func( split /:/ );
Then you've just supplied an automatic scalar in front of their
argument, which can be more than a bit surprising. The old @foo
which used to hold one thing doesn't get passed in. Instead,
func()
now gets passed in a 1
; that is, the number of elements
in @foo
. And the split gets called in scalar context so it
starts scribbling on your @_
parameter list. Ouch!
This is all very powerful, of course, and should be used only in moderation to make the world a better place.
Functions with a prototype of ()
are potential candidates for
inlining. If the result after optimization and constant folding
is either a constant or a lexically-scoped scalar which has no other
references, then it will be used in place of function calls made
without &
. Calls made using &
are never inlined. (See
constant.pm for an easy way to declare most constants.)
The following functions would all be inlined:
- sub pi () { 3.14159 } # Not exact, but close.
- sub PI () { 4 * atan2 1, 1 } # As good as it gets,
- # and it's inlined, too!
- sub ST_DEV () { 0 }
- sub ST_INO () { 1 }
- sub FLAG_FOO () { 1 << 8 }
- sub FLAG_BAR () { 1 << 9 }
- sub FLAG_MASK () { FLAG_FOO | FLAG_BAR }
- sub OPT_BAZ () { not (0x1B58 & FLAG_MASK) }
- sub N () { int(OPT_BAZ) / 3 }
- sub FOO_SET () { 1 if FLAG_MASK & FLAG_FOO }
Be aware that these will not be inlined; as they contain inner scopes, the constant folding doesn't reduce them to a single constant:
If you redefine a subroutine that was eligible for inlining, you'll get
a warning by default. (You can use this warning to tell whether or not a
particular subroutine is considered constant.) The warning is
considered severe enough not to be affected by the -w
switch (or its absence) because previously compiled
invocations of the function will still be using the old value of the
function. If you need to be able to redefine the subroutine, you need to
ensure that it isn't inlined, either by dropping the ()
prototype
(which changes calling semantics, so beware) or by thwarting the
inlining mechanism in some other way, such as
Many built-in functions may be overridden, though this should be tried only occasionally and for good reason. Typically this might be done by a package attempting to emulate missing built-in functionality on a non-Unix system.
Overriding may be done only by importing the name from a module at
compile time--ordinary predeclaration isn't good enough. However, the
use subs
pragma lets you, in effect, predeclare subs
via the import syntax, and these names may then override built-in ones:
To unambiguously refer to the built-in form, precede the
built-in name with the special package qualifier CORE::
. For example,
saying CORE::open()
always refers to the built-in open(), even
if the current package has imported some other subroutine called
&open()
from elsewhere. Even though it looks like a regular
function call, it isn't: the CORE:: prefix in that case is part of Perl's
syntax, and works for any keyword, regardless of what is in the CORE
package. Taking a reference to it, that is, \&CORE::open
, only works
for some keywords. See CORE.
Library modules should not in general export built-in names like open
or chdir as part of their default @EXPORT
list, because these may
sneak into someone else's namespace and change the semantics unexpectedly.
Instead, if the module adds that name to @EXPORT_OK
, then it's
possible for a user to import the name explicitly, but not implicitly.
That is, they could say
- use Module 'open';
and it would import the open override. But if they said
- use Module;
they would get the default imports without overrides.
The foregoing mechanism for overriding built-in is restricted, quite
deliberately, to the package that requests the import. There is a second
method that is sometimes applicable when you wish to override a built-in
everywhere, without regard to namespace boundaries. This is achieved by
importing a sub into the special namespace CORE::GLOBAL::
. Here is an
example that quite brazenly replaces the glob operator with something
that understands regular expressions.
- package REGlob;
- require Exporter;
- @ISA = 'Exporter';
- @EXPORT_OK = 'glob';
- sub import {
- my $pkg = shift;
- return unless @_;
- my $sym = shift;
- my $where = ($sym =~ s/^GLOBAL_// ? 'CORE::GLOBAL' : caller(0));
- $pkg->export($where, $sym, @_);
- }
- sub glob {
- my $pat = shift;
- my @got;
- if (opendir my $d, '.') {
- @got = grep /$pat/, readdir $d;
- closedir $d;
- }
- return @got;
- }
- 1;
And here's how it could be (ab)used:
The initial comment shows a contrived, even dangerous example.
By overriding glob globally, you would be forcing the new (and
subversive) behavior for the glob operator for every namespace,
without the complete cognizance or cooperation of the modules that own
those namespaces. Naturally, this should be done with extreme caution--if
it must be done at all.
The REGlob
example above does not implement all the support needed to
cleanly override perl's glob operator. The built-in glob has
different behaviors depending on whether it appears in a scalar or list
context, but our REGlob
doesn't. Indeed, many perl built-in have such
context sensitive behaviors, and these must be adequately supported by
a properly written override. For a fully functional example of overriding
glob, study the implementation of File::DosGlob
in the standard
library.
When you override a built-in, your replacement should be consistent (if
possible) with the built-in native syntax. You can achieve this by using
a suitable prototype. To get the prototype of an overridable built-in,
use the prototype function with an argument of "CORE::builtin_name"
(see prototype).
Note however that some built-ins can't have their syntax expressed by a
prototype (such as system or chomp). If you override them you won't
be able to fully mimic their original syntax.
The built-ins do, require and glob can also be overridden, but due
to special magic, their original syntax is preserved, and you don't have
to define a prototype for their replacements. (You can't override the
do BLOCK
syntax, though).
require has special additional dark magic: if you invoke your
require replacement as require Foo::Bar
, it will actually receive
the argument "Foo/Bar.pm"
in @_. See require.
And, as you'll have noticed from the previous example, if you override
glob, the <*>
glob operator is overridden as well.
In a similar fashion, overriding the readline function also overrides
the equivalent I/O operator <FILEHANDLE>
. Also, overriding
readpipe also overrides the operators ``
and qx//.
Finally, some built-ins (e.g. exists or grep) can't be overridden.
If you call a subroutine that is undefined, you would ordinarily
get an immediate, fatal error complaining that the subroutine doesn't
exist. (Likewise for subroutines being used as methods, when the
method doesn't exist in any base class of the class's package.)
However, if an AUTOLOAD
subroutine is defined in the package or
packages used to locate the original subroutine, then that
AUTOLOAD
subroutine is called with the arguments that would have
been passed to the original subroutine. The fully qualified name
of the original subroutine magically appears in the global $AUTOLOAD
variable of the same package as the AUTOLOAD
routine. The name
is not passed as an ordinary argument because, er, well, just
because, that's why. (As an exception, a method call to a nonexistent
import or unimport
method is just skipped instead. Also, if
the AUTOLOAD subroutine is an XSUB, there are other ways to retrieve the
subroutine name. See Autoloading with XSUBs in perlguts for details.)
Many AUTOLOAD
routines load in a definition for the requested
subroutine using eval(), then execute that subroutine using a special
form of goto() that erases the stack frame of the AUTOLOAD
routine
without a trace. (See the source to the standard module documented
in AutoLoader, for example.) But an AUTOLOAD
routine can
also just emulate the routine and never define it. For example,
let's pretend that a function that wasn't defined should just invoke
system with those arguments. All you'd do is:
In fact, if you predeclare functions you want to call that way, you don't even need parentheses:
- use subs qw(date who ls);
- date;
- who "am", "i";
- ls '-l';
A more complete example of this is the Shell module on CPAN, which can treat undefined subroutine calls as calls to external programs.
Mechanisms are available to help modules writers split their modules into autoloadable files. See the standard AutoLoader module described in AutoLoader and in AutoSplit, the standard SelfLoader modules in SelfLoader, and the document on adding C functions to Perl code in perlxs.
A subroutine declaration or definition may have a list of attributes
associated with it. If such an attribute list is present, it is
broken up at space or colon boundaries and treated as though a
use attributes
had been seen. See attributes for details
about what attributes are currently supported.
Unlike the limitation with the obsolescent use attrs
, the
sub : ATTRLIST
syntax works to associate the attributes with
a pre-declaration, and not just with a subroutine definition.
The attributes must be valid as simple identifier names (without any punctuation other than the '_' character). They may have a parameter list appended, which is only checked for whether its parentheses ('(',')') nest properly.
Examples of valid syntax (even though the attributes are unknown):
Examples of invalid syntax:
- sub fnord : switch(10,foo(); # ()-string not balanced
- sub snoid : Ugly('('); # ()-string not balanced
- sub xyzzy : 5x5; # "5x5" not a valid identifier
- sub plugh : Y2::north; # "Y2::north" not a simple identifier
- sub snurt : foo + bar; # "+" not a colon or space
The attribute list is passed as a list of constant strings to the code which associates them with the subroutine. In particular, the second example of valid syntax above currently looks like this in terms of how it's parsed and invoked:
- use attributes __PACKAGE__, \&plugh, q[Ugly('\(")], 'Bad';
For further details on attribute lists and their manipulation, see attributes and Attribute::Handlers.
See Function Templates in perlref for more about references and closures. See perlxs if you'd like to learn about calling C subroutines from Perl. See perlembed if you'd like to learn about calling Perl subroutines from C. See perlmod to learn about bundling up your functions in separate files. See perlmodlib to learn what library modules come standard on your system. See perlootut to learn how to make object method calls.
perlsymbian - Perl version 5 on Symbian OS
This document describes various features of the Symbian operating system that will affect how Perl version 5 (hereafter just Perl) is compiled and/or runs.
NOTE: this port (as of 0.4.1) does not compile into a Symbian OS GUI application, but instead it results in a Symbian DLL. The DLL includes a C++ class called CPerlBase, which one can then (derive from and) use to embed Perl into applications, see symbian/README.
The base port of Perl to Symbian only implements the basic POSIX-like functionality; it does not implement any further Symbian or Series 60, Series 80, or UIQ bindings for Perl.
It is also possible to generate Symbian executables for "miniperl" and "perl", but since there is no standard command line interface for Symbian (nor full keyboards in the devices), these are useful mainly as demonstrations.
(0) You need to have the appropriate Symbian SDK installed.
- These instructions have been tested under various Nokia Series 60
- Symbian SDKs (1.2 to 2.6, 2.8 should also work, 1.2 compiles but
- does not work), Series 80 2.0, and Nokia 7710 (Series 90) SDK.
- You can get the SDKs from Forum Nokia (L<http://www.forum.nokia.com/>).
- A very rough port ("it compiles") to UIQ 2.1 has also been made.
- A prerequisite for any of the SDKs is to install ActivePerl
- from ActiveState, L<http://www.activestate.com/Products/ActivePerl/>
- Having the SDK installed also means that you need to have either
- the Metrowerks CodeWarrior installed (2.8 and 3.0 were used in testing)
- or the Microsoft Visual C++ 6.0 installed (SP3 minimum, SP5 recommended).
- Note that for example the Series 60 2.0 VC SDK installation talks
- about ActivePerl build 518, which does no more (as of mid-2005) exist
- at the ActiveState website. The ActivePerl 5.8.4 build 810 was
- used successfully for compiling Perl on Symbian. The 5.6.x ActivePerls
- do not work.
- Other SDKs or compilers like Visual.NET, command-line-only
- Visual.NET, Borland, GnuPoc, or sdk2unix have not been tried.
- These instructions almost certainly won't work with older Symbian
- releases or other SDKs. Patches to get this port running in other
- releases, SDKs, compilers, platforms, or devices are naturally welcome.
(1) Get a Perl source code distribution (for example the file perl-5.9.2.tar.gz is fine) from http://www.cpan.org/src/ and unpack it in your the C:/Symbian directory of your Windows system.
(2) Change to the perl source directory.
- cd c:\Symbian\perl-5.x.x
(3) Run the following script using the perl coming with the SDK
- perl symbian\config.pl
- You must use the cmd.exe, the Cygwin shell will not work.
- The PATH must include the SDK tools, including a Perl,
- which should be the case under cmd.exe. If you do not
- have that, see the end of symbian\sdk.pl for notes of
- how your environment should be set up for Symbian compiles.
(4) Build the project, either by
- make all
- in cmd.exe or by using either the Metrowerks CodeWarrior
- or the Visual C++ 6.0, or the Visual Studio 8 (the Visual C++
- 2005 Express Edition works fine).
- If you use the VC IDE, you will have to run F<symbian\config.pl>
- first using the cmd.exe, and then run 'make win.mf vc6.mf' to generate
- the VC6 makefiles and workspaces. "make vc6" will compile for the VC6,
- and "make cw" for the CodeWarrior.
- The following SDK and compiler configurations and Nokia phones were
- tested at some point in time (+ = compiled and PerlApp run, - = not),
- both for Perl 5.8.x and 5.9.x:
- SDK | VC | CW |
- --------+----+----+---
- S60 1.2 | + | + | 3650 (*)
- S60 2.0 | + | + | 6600
- S60 2.1 | - | + | 6670
- S60 2.6 | + | + | 6630
- S60 2.8 | + | + | (not tested in a device)
- S80 2.6 | - | + | 9300
- S90 1.1 | + | - | 7710
- UIQ 2.1 | - | + | (not tested in a device)
- (*) Compiles but does not work, unfortunately, a problem with Symbian.
- If you are using the 'make' directly, it is the GNU make from the SDKs,
- and it will invoke the right make commands for the Windows emulator
- build and the Arm target builds ('thumb' by default) as necessary.
- The build scripts assume the 'absolute style' SDK installs under C:,
- the 'subst style' will not work.
- If using the VC IDE, to build use for example the File->Open Workspace->
- C:\Symbian\8.0a\S60_2nd_FP2\epoc32\build\symbian\perl\perl\wins\perl.dsw
- The emulator binaries will appear in the same directory.
- If using the VC IDE, you will a lot of warnings in the beginning of
- the build because a lot of headers mentioned by the source cannot
- be found, but this is not serious since those headers are not used.
- The Metrowerks will give a lot of warnings about unused variables and
- empty declarations, you can ignore those.
- When the Windows and Arm DLLs are built do not be scared by a very long
- messages whizzing by: it is the "export freeze" phase where the whole
- (rather large) API of Perl is listed.
- Once the build is completed you need to create the DLL SIS file by
- make perldll.sis
- which will create the file perlXYZ.sis (the XYZ being the Perl version)
- which you can then install into your Symbian device: an easy way
- to do this is to send them via Bluetooth or infrared and just open
- the messages.
- Since the total size of all Perl SIS files once installed is
- over 2 MB, it is recommended to do the installation into a
- memory card (drive E:) instead of the C: drive.
- The size of the perlXYZ.SIS is about 370 kB but once it is in the
- device it is about one 750 kB (according to the application manager).
- The perlXYZ.sis includes only the Perl DLL: to create an additional
- SIS file which includes some of the standard (pure) Perl libraries,
- issue the command
- make perllib.sis
- Some of the standard Perl libraries are included, but not all:
- see L</HISTORY> or F<symbian\install.cfg> for more details
- (250 kB -> 700 kB).
- Some of the standard Perl XS extensions (see L</HISTORY> are
- also available:
- make perlext.sis
- which will create perlXYZext.sis (290 kB -> 770 kB).
- To compile the demonstration application PerlApp you need first to
- install the Perl headers under the SDK.
- To install the Perl headers and the class CPerlBase documentation
- so that you no more need the Perl sources around to compile Perl
- applications using the SDK:
- make sdkinstall
- The destination directory is C:\Symbian\perl\X.Y.Z. For more
- details, see F<symbian\PerlBase.pod>.
- Once the headers have been installed, you can create a SIS for
- the PerlApp:
- make perlapp.sis
- The perlapp.sis (11 kB -> 16 kB) will be built in the symbian
- subdirectory, but a copy will also be made to the main directory.
- If you want to package the Perl DLLs (one for WINS, one for ARMI),
- the headers, and the documentation:
- make perlsdk.zip
- which will create perlXYZsdk.zip that can be used in another
- Windows system with the SDK, without having to compile Perl in
- that system.
- If you want to package the PerlApp sources:
- make perlapp.zip
- If you want to package the perl.exe and miniperl.exe, you
- can use the perlexe.sis and miniperlexe.sis make targets.
- You also probably want the perllib.sis for the libraries
- and maybe even the perlapp.sis for the recognizer.
- The make target 'allsis' combines all the above SIS targets.
- To clean up after compilation you can use either of
- make clean
- make distclean
- depending on how clean you want to be.
If you see right after "make" this
- cat makefile.sh >makefile
- 'cat' is not recognized as an internal or external command,
- operable program or batch file.
it means you need to (re)run the symbian\config.pl.
If you get the error
- 'perl' is not recognized as an internal or external command,
- operable program or batch file.
you may need to reinstall the ActivePerl.
If you see this
- ren makedef.pl nomakedef.pl
- The system cannot find the file specified.
- C:\Symbian\...\make.exe: [rename_makedef] Error 1 (ignored)
please ignore it since it is nothing serious (the build process of renames the Perl makedef.pl as nomakedef.pl to avoid confusing it with a makedef.pl of the SDK).
The PerlApp application demonstrates how to embed Perl interpreters
to a Symbian application. The "Time" menu item runs the following
Perl code: print "Running in ", $^O, "\n", scalar localtime
,
the "Oneliner" allows one to type in Perl code, and the "Run"
opens a file chooser for selecting a Perl file to run.
The PerlApp also is started when the "Perl recognizer" (also included and installed) detects a Perl file being activated through the GUI, and offers either to install it under \Perl (if the Perl file is in the inbox of the messaging application) or to run it (if the Perl file is under \Perl).
In the symbian subdirectory there is sisify.pl utility which can be used to package Perl scripts and/or Perl library directories into SIS files, which can be installed to the device. To run the sisify.pl utility, you will need to have the 'makesis' and 'uidcrc' utilities already installed. If you don't have the Win32 SDKs, you may try for example http://gnupoc.sourceforge.net/ or http://symbianos.org/~andreh/.
First of all note that you have full access to the Symbian device when using Perl: you can do a lot of damage to your device (like removing system files) unless you are careful. Please do take backups before doing anything.
The Perl port has been done for the most part using the Symbian standard POSIX-ish STDLIB library. It is a reasonably complete library, but certain corners of such emulation libraries that tend to be left unimplemented on non-UNIX platforms have been left unimplemented also this time: fork(), signals(), user/group ids, select() working for sockets, non-blocking sockets, and so forth. See the file symbian/config.sh and look for 'undef' to find the unsupported APIs (or from Perl use Config).
The filesystem of Symbian devices uses DOSish syntax, "drives" separated from paths by a colon, and backslashes for the path. The exact assignment of the drives probably varies between platforms, but for example in Series 60 you might see C: as the (flash) main memory, D: as the RAM drive, E: as the memory card (MMC), Z: as the ROM. In Series 80 D: is the memory card. As far the devices go the NUL: is the bit bucket, the COMx: are the serial lines, IRCOMx: are the IR ports, TMP: might be C:\System\Temp. Remember to double those backslashes in doublequoted strings.
The Perl DLL is installed in \System\Libs\. The Perl libraries and extension DLLs are installed in \System\Libs\Perl\X.Y.Z\. The PerlApp is installed in \System\Apps\, and the SIS also installs a couple of demo scripts in \Perl\ (C:\Mydocs\Perl\ on Nokia 7710).
Note that the Symbian filesystem is very picky: it strongly prefers the \ instead of the /.
When doing XS / Symbian C++ programming include first the Symbian headers, then any standard C/POSIX headers, then Perl headers, and finally any application headers.
New() and Copy() are unfortunately used by both Symbian and Perl code so you'll have to play cpp games if you need them. PerlBase.h undefines the Perl definitions and redefines them as PerlNew() and PerlCopy().
Lots. See symbian/TODO.
As of Perl Symbian port version 0.4.1 any part of Perl's standard regression test suite has not been run on a real Symbian device using the ported Perl, so innumerable bugs may lie in wait. Therefore there is absolutely no warranty.
When creating and extending application programming interfaces (APIs) for Symbian or Series 60 or Series 80 or Series 90 it is suggested that trademarks, registered trademarks, or trade names are not used in the API names. Instead, developers should consider basing the API naming in the existing (C++, or maybe Java) public component and API naming, modified as appropriate by the rules of the programming language the new APIs are for.
Nokia is a registered trademark of Nokia Corporation. Nokia's product names are trademarks or registered trademarks of Nokia. Other product and company names mentioned herein may be trademarks or trade names of their respective owners.
Jarkko Hietaniemi
Copyright (c) 2004-2005 Nokia. All rights reserved.
Copyright (c) 2006-2007 Jarkko Hietaniemi.
The Symbian port is licensed under the same terms as Perl itself.
0.1.0: April 2005
(This will show as "0.01" in the Symbian Installer.)
- - The console window is a very simple console indeed: one can
- get the newline with "000" and the "C" button is a backspace.
- Do not expect a terminal capable of vt100 or ANSI sequences.
- The console is also "ASCII", you cannot input e.g. any accented
- letters. Because of obvious physical constraints the console is
- also very small: (in Nokia 6600) 22 columns, 17 rows.
- - The following libraries are available:
- AnyDBM_File AutoLoader base Carp Config Cwd constant
- DynaLoader Exporter File::Spec integer lib strict Symbol
- vars warnings XSLoader
- - The following extensions are available:
- attributes Compress::Zlib Cwd Data::Dumper Devel::Peek Digest::MD5 DynaLoader
- Fcntl File::Glob Filter::Util::Call IO List::Util MIME::Base64
- PerlIO::scalar PerlIO::via SDBM_File Socket Storable Time::HiRes
- - The following extensions are missing for various technical reasons:
- B ByteLoader Devel::DProf Devel::PPPort Encode GDBM_File
- I18N::Langinfo IPC::SysV NDBM_File Opcode PerlIO::encoding POSIX
- re Safe Sys::Hostname Sys::Syslog
- threads threads::shared Unicode::Normalize
- - Using MakeMaker or the Module::* to build and install modules
- is not supported.
- - Building XS other than the ones in the core is not supported.
Since this is 0.something release, any future releases are almost guaranteed to be binary incompatible. As a sign of this the Symbian symbol exports are kept unfrozen and the .def files fully rebuilt every time.
0.2.0: October 2005
- - Perl 5.9.3 (patch level 25741)
- - Compress::Zlib and IO::Zlib supported
- - sisify.pl added
We maintain the binary incompatibility.
0.3.0: October 2005
- - Perl 5.9.3 (patch level 25911)
- - Series 80 2.0 and UIQ 2.1 support
We maintain the binary incompatibility.
0.4.0: November 2005
- - Perl 5.9.3 (patch level 26052)
- - adding a sample Symbian extension
We maintain the binary incompatibility.
0.4.1: December 2006
- - Perl 5.9.5-to-be (patch level 30002)
- - added extensions: Compress/Raw/Zlib, Digest/SHA,
- Hash/Util, Math/BigInt/FastCalc, Text/Soundex, Time/Piece
- - port to S90 1.1 by alexander smishlajev
We maintain the binary incompatibility.
0.4.2: March 2007
- - catchup with Perl 5.9.5-to-be (patch level 30812)
- - tested to build with Microsoft Visual C++ 2005 Express Edition
- (which uses Microsoft Visual C 8, instead of the old VC6),
- SDK used for testing S60_2nd_FP3 aka 8.1a
We maintain the binary incompatibility.
perlsyn - Perl syntax
A Perl program consists of a sequence of declarations and statements which run from the top to the bottom. Loops, subroutines, and other control structures allow you to jump around within the code.
Perl is a free-form language: you can format and indent it however you like. Whitespace serves mostly to separate tokens, unlike languages like Python where it is an important part of the syntax, or Fortran where it is immaterial.
Many of Perl's syntactic elements are optional. Rather than requiring you to put parentheses around every function call and declare every variable, you can often leave such explicit elements off and Perl will figure out what you meant. This is known as Do What I Mean, abbreviated DWIM. It allows programmers to be lazy and to code in a style with which they are comfortable.
Perl borrows syntax and concepts from many languages: awk, sed, C, Bourne Shell, Smalltalk, Lisp and even English. Other languages have borrowed syntax from Perl, particularly its regular expression extensions. So if you have programmed in another language you will see familiar pieces in Perl. They often work the same, but see perltrap for information about how they differ.
The only things you need to declare in Perl are report formats and
subroutines (and sometimes not even subroutines). A scalar variable holds
the undefined value (undef) until it has been assigned a defined
value, which is anything other than undef. When used as a number,
undef is treated as 0
; when used as a string, it is treated as
the empty string, ""
; and when used as a reference that isn't being
assigned to, it is treated as an error. If you enable warnings,
you'll be notified of an uninitialized value whenever you treat
undef as a string or a number. Well, usually. Boolean contexts,
such as:
- if ($a) {}
are exempt from warnings (because they care about truth rather than
definedness). Operators such as ++
, --
, +=
,
-=
, and .=
, that operate on undefined variables such as:
- undef $a;
- $a++;
are also always exempt from such warnings.
A declaration can be put anywhere a statement can, but has no effect on
the execution of the primary sequence of statements: declarations all
take effect at compile time. All declarations are typically put at
the beginning or the end of the script. However, if you're using
lexically-scoped private variables created with my(),
state(), or our(), you'll have to make sure
your format or subroutine definition is within the same block scope
as the my if you expect to be able to access those private variables.
Declaring a subroutine allows a subroutine name to be used as if it were a
list operator from that point forward in the program. You can declare a
subroutine without defining it by saying sub name
, thus:
- sub myname;
- $me = myname $0 or die "can't get myname";
A bare declaration like that declares the function to be a list operator,
not a unary operator, so you have to be careful to use parentheses (or
or
instead of ||.) The || operator binds too tightly to use after
list operators; it becomes part of the last element. You can always use
parentheses around the list operators arguments to turn the list operator
back into something that behaves more like a function call. Alternatively,
you can use the prototype ($)
to turn the subroutine into a unary
operator:
- sub myname ($);
- $me = myname $0 || die "can't get myname";
That now parses as you'd expect, but you still ought to get in the habit of using parentheses in that situation. For more on prototypes, see perlsub
Subroutines declarations can also be loaded up with the require statement
or both loaded and imported into your namespace with a use statement.
See perlmod for details on this.
A statement sequence may contain declarations of lexically-scoped variables, but apart from declaring a variable name, the declaration acts like an ordinary statement, and is elaborated within the sequence of statements as if it were an ordinary statement. That means it actually has both compile-time and run-time effects.
Text from a "#"
character until the end of the line is a comment,
and is ignored. Exceptions include "#"
inside a string or regular
expression.
The only kind of simple statement is an expression evaluated for its
side-effects. Every simple statement must be terminated with a
semicolon, unless it is the final statement in a block, in which case
the semicolon is optional. But put the semicolon in anyway if the
block takes up more than one line, because you may eventually add
another line. Note that there are operators like eval {}
, sub {}
, and
do {}
that look like compound statements, but aren't--they're just
TERMs in an expression--and thus need an explicit termination when used
as the last item in a statement.
The number 0, the strings '0'
and ""
, the empty list ()
, and
undef are all false in a boolean context. All other values are true.
Negation of a true value by !
or not
returns a special false value.
When evaluated as a string it is treated as ""
, but as a number, it
is treated as 0. Most Perl operators
that return true or false behave this way.
Any simple statement may optionally be followed by a SINGLE modifier, just before the terminating semicolon (or block ending). The possible modifiers are:
The EXPR
following the modifier is referred to as the "condition".
Its truth or falsehood determines how the modifier will behave.
if
executes the statement once if and only if the condition is
true. unless
is the opposite, it executes the statement unless
the condition is true (that is, if the condition is false).
The for(each) modifier is an iterator: it executes the statement once
for each item in the LIST (with $_
aliased to each item in turn).
while
repeats the statement while the condition is true.
until
does the opposite, it repeats the statement until the
condition is true (or while the condition is false):
The while
and until
modifiers have the usual "while
loop"
semantics (conditional evaluated first), except when applied to a
do-BLOCK (or to the Perl4 do-SUBROUTINE statement), in
which case the block executes once before the conditional is
evaluated.
This is so that you can write loops like:
See do. Note also that the loop control statements described
later will NOT work in this construct, because modifiers don't take
loop labels. Sorry. You can always put another block inside of it
(for next) or around it (for last) to do that sort of thing.
For next, just double the braces:
For last, you have to be more elaborate:
NOTE: The behaviour of a my, state, or
our modified with a statement modifier conditional
or loop construct (for example, my $x if ...
) is
undefined. The value of the my variable may be undef, any
previously assigned value, or possibly anything else. Don't rely on
it. Future versions of perl might do something different from the
version of perl you try it out on. Here be dragons.
The when
modifier is an experimental feature that first appeared in Perl
5.14. To use it, you should include a use v5.14
declaration.
(Technically, it requires only the switch
feature, but that aspect of it
was not available before 5.14.) Operative only from within a foreach
loop or a given
block, it executes the statement only if the smartmatch
$_ ~~ EXPR is true. If the statement executes, it is followed by
a next from inside a foreach
and break
from inside a given
.
Under the current implementation, the foreach
loop can be
anywhere within the when
modifier's dynamic scope, but must be
within the given
block's lexical scope. This restricted may
be relaxed in a future release. See Switch Statements below.
In Perl, a sequence of statements that defines a scope is called a block. Sometimes a block is delimited by the file containing it (in the case of a required file, or the program as a whole), and sometimes a block is delimited by the extent of a string (in the case of an eval).
But generally, a block is delimited by curly brackets, also known as braces. We will call this syntactic construct a BLOCK.
The following compound statements may be used to control flow:
- if (EXPR) BLOCK
- if (EXPR) BLOCK else BLOCK
- if (EXPR) BLOCK elsif (EXPR) BLOCK ...
- if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
- unless (EXPR) BLOCK
- unless (EXPR) BLOCK else BLOCK
- unless (EXPR) BLOCK elsif (EXPR) BLOCK ...
- unless (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
- given (EXPR) BLOCK
- LABEL while (EXPR) BLOCK
- LABEL while (EXPR) BLOCK continue BLOCK
- LABEL until (EXPR) BLOCK
- LABEL until (EXPR) BLOCK continue BLOCK
- LABEL for (EXPR; EXPR; EXPR) BLOCK
- LABEL for VAR (LIST) BLOCK
- LABEL for VAR (LIST) BLOCK continue BLOCK
- LABEL foreach (EXPR; EXPR; EXPR) BLOCK
- LABEL foreach VAR (LIST) BLOCK
- LABEL foreach VAR (LIST) BLOCK continue BLOCK
- LABEL BLOCK
- LABEL BLOCK continue BLOCK
- PHASE BLOCK
The experimental given
statement is not automatically enabled; see
Switch Statements below for how to do so, and the attendant caveats.
Unlike in C and Pascal, in Perl these are all defined in terms of BLOCKs, not statements. This means that the curly brackets are required--no dangling statements allowed. If you want to write conditionals without curly brackets, there are several other ways to do it. The following all do the same thing:
The if
statement is straightforward. Because BLOCKs are always
bounded by curly brackets, there is never any ambiguity about which
if
an else
goes with. If you use unless
in place of if
,
the sense of the test is reversed. Like if
, unless
can be followed
by else
. unless
can even be followed by one or more elsif
statements, though you may want to think twice before using that particular
language construct, as everyone reading your code will have to think at least
twice before they can understand what's going on.
The while
statement executes the block as long as the expression is
true.
The until
statement executes the block as long as the expression is
false.
The LABEL is optional, and if present, consists of an identifier followed
by a colon. The LABEL identifies the loop for the loop control
statements next, last, and redo.
If the LABEL is omitted, the loop control statement
refers to the innermost enclosing loop. This may include dynamically
looking back your call-stack at run time to find the LABEL. Such
desperate behavior triggers a warning if you use the use warnings
pragma or the -w flag.
If there is a continue BLOCK, it is always executed just before the
conditional is about to be evaluated again. Thus it can be used to
increment a loop variable, even when the loop has been continued via
the next statement.
When a block is preceding by a compilation phase keyword such as BEGIN
,
END
, INIT
, CHECK
, or UNITCHECK
, then the block will run only
during the corresponding phase of execution. See perlmod for more details.
Extension modules can also hook into the Perl parser to define new kinds of compound statements. These are introduced by a keyword which the extension recognizes, and the syntax following the keyword is defined entirely by the extension. If you are an implementor, see PL_keyword_plugin in perlapi for the mechanism. If you are using such a module, see the module's documentation for details of the syntax that it defines.
The next command starts the next iteration of the loop:
- LINE: while (<STDIN>) {
- next LINE if /^#/; # discard comments
- ...
- }
The last command immediately exits the loop in question. The
continue block, if any, is not executed:
- LINE: while (<STDIN>) {
- last LINE if /^$/; # exit when done with header
- ...
- }
The redo command restarts the loop block without evaluating the
conditional again. The continue block, if any, is not executed.
This command is normally used by programs that want to lie to themselves
about what was just input.
For example, when processing a file like /etc/termcap. If your input lines might end in backslashes to indicate continuation, you want to skip ahead and get the next record.
which is Perl shorthand for the more explicitly written version:
Note that if there were a continue block on the above code, it would
get executed only on lines discarded by the regex (since redo skips the
continue block). A continue block is often used to reset line counters
or m?pat? one-time matches:
If the word while
is replaced by the word until
, the sense of the
test is reversed, but the conditional is still tested before the first
iteration.
Loop control statements don't work in an if
or unless
, since
they aren't loops. You can double the braces to make them such, though.
This is caused by the fact that a block by itself acts as a loop that executes once, see Basic BLOCKs.
The form while/if BLOCK BLOCK, available in Perl 4, is no longer
available. Replace any occurrence of if BLOCK
by if (do BLOCK)
.
Perl's C-style for
loop works like the corresponding while
loop;
that means that this:
- for ($i = 1; $i < 10; $i++) {
- ...
- }
is the same as this:
- $i = 1;
- while ($i < 10) {
- ...
- } continue {
- $i++;
- }
There is one minor difference: if variables are declared with my
in the initialization section of the for
, the lexical scope of
those variables is exactly the for
loop (the body of the loop
and the control sections).
Besides the normal array index looping, for
can lend itself
to many other interesting applications. Here's one that avoids the
problem you get into if you explicitly test for end-of-file on
an interactive file descriptor causing your program to appear to
hang.
- $on_a_tty = -t STDIN && -t STDOUT;
- sub prompt { print "yes? " if $on_a_tty }
- for ( prompt(); <STDIN>; prompt() ) {
- # do something
- }
Using readline (or the operator form, <EXPR>
) as the
conditional of a for
loop is shorthand for the following. This
behaviour is the same as a while
loop conditional.
- for ( prompt(); defined( $_ = <STDIN> ); prompt() ) {
- # do something
- }
The foreach
loop iterates over a normal list value and sets the
variable VAR to be each element of the list in turn. If the variable
is preceded with the keyword my, then it is lexically scoped, and
is therefore visible only within the loop. Otherwise, the variable is
implicitly local to the loop and regains its former value upon exiting
the loop. If the variable was previously declared with my, it uses
that variable instead of the global one, but it's still localized to
the loop. This implicit localization occurs only in a foreach
loop.
The foreach
keyword is actually a synonym for the for
keyword, so
you can use either. If VAR is omitted, $_
is set to each value.
If any element of LIST is an lvalue, you can modify it by modifying
VAR inside the loop. Conversely, if any element of LIST is NOT an
lvalue, any attempt to modify that element will fail. In other words,
the foreach
loop index variable is an implicit alias for each item
in the list that you're looping over.
If any part of LIST is an array, foreach
will get very confused if
you add or remove elements within the loop body, for example with
splice. So don't do that.
foreach
probably won't do what you expect if VAR is a tied or other
special variable. Don't do that either.
Examples:
Here's how a C programmer might code up a particular algorithm in Perl:
Whereas here's how a Perl programmer more comfortable with the idiom might do it:
See how much easier this is? It's cleaner, safer, and faster. It's
cleaner because it's less noisy. It's safer because if code gets added
between the inner and outer loops later on, the new code won't be
accidentally executed. The next explicitly iterates the other loop
rather than merely terminating the inner one. And it's faster because
Perl executes a foreach
statement more rapidly than it would the
equivalent for
loop.
A BLOCK by itself (labeled or not) is semantically equivalent to a
loop that executes once. Thus you can use any of the loop control
statements in it to leave or restart the block. (Note that this is
NOT true in eval{}, sub{}, or contrary to popular belief
do{} blocks, which do NOT count as loops.) The continue
block is optional.
The BLOCK construct can be used to emulate case structures.
You'll also find that foreach
loop used to create a topicalizer
and a switch:
Such constructs are quite frequently used, both because older versions of
Perl had no official switch
statement, and also because the new version
described immediately below remains experimental and can sometimes be confusing.
Starting from Perl 5.10.1 (well, 5.10.0, but it didn't work right), you can say
- use feature "switch";
to enable an experimental switch feature. This is loosely based on an old version of a Perl 6 proposal, but it no longer resembles the Perl 6 construct. You also get the switch feature whenever you declare that your code prefers to run under a version of Perl that is 5.10 or later. For example:
- use v5.14;
Under the "switch" feature, Perl gains the experimental keywords
given
, when
, default
, continue, and break
.
Starting from Perl 5.16, one can prefix the switch
keywords with CORE::
to access the feature without a use feature
statement. The keywords given
and
when
are analogous to switch
and
case
in other languages, so the code in the previous section could be
rewritten as
The foreach
is the non-experimental way to set a topicalizer.
If you wish to use the highly experimental given
, that could be
written like this:
As of 5.14, that can also be written this way:
Or if you don't care to play it safe, like this:
The arguments to given
and when
are in scalar context,
and given
assigns the $_
variable its topic value.
Exactly what the EXPR argument to when
does is hard to describe
precisely, but in general, it tries to guess what you want done. Sometimes
it is interpreted as $_ ~~ EXPR, and sometimes it is not. It
also behaves differently when lexically enclosed by a given
block than
it does when dynamically enclosed by a foreach
loop. The rules are far
too difficult to understand to be described here. See Experimental Details on given and when later on.
Due to an unfortunate bug in how given
was implemented between Perl 5.10
and 5.16, under those implementations the version of $_
governed by
given
is merely a lexically scoped copy of the original, not a
dynamically scoped alias to the original, as it would be if it were a
foreach
or under both the original and the current Perl 6 language
specification. This bug was fixed in Perl
5.18. If you really want a lexical $_
,
specify that explicitly, but note that my $_
is now deprecated and will warn unless warnings
have been disabled:
If your code still needs to run on older versions,
stick to foreach
for your topicalizer and
you will be less unhappy.
Although not for the faint of heart, Perl does support a goto
statement. There are three forms: goto-LABEL, goto-EXPR, and
goto-&NAME. A loop's LABEL is not actually a valid target for
a goto; it's just the name of the loop.
The goto-LABEL form finds the statement labeled with LABEL and resumes
execution there. It may not be used to go into any construct that
requires initialization, such as a subroutine or a foreach
loop. It
also can't be used to go into a construct that is optimized away. It
can be used to go almost anywhere else within the dynamic scope,
including out of subroutines, but it's usually better to use some other
construct such as last or die. The author of Perl has never felt the
need to use this form of goto (in Perl, that is--C is another matter).
The goto-EXPR form expects a label name, whose scope will be resolved
dynamically. This allows for computed gotos per FORTRAN, but isn't
necessarily recommended if you're optimizing for maintainability:
- goto(("FOO", "BAR", "GLARCH")[$i]);
The goto-&NAME form is highly magical, and substitutes a call to the
named subroutine for the currently running subroutine. This is used by
AUTOLOAD()
subroutines that wish to load another subroutine and then
pretend that the other subroutine had been called in the first place
(except that any modifications to @_
in the current subroutine are
propagated to the other subroutine.) After the goto, not even caller()
will be able to tell that this routine was called first.
In almost all cases like this, it's usually a far, far better idea to use the
structured control flow mechanisms of next, last, or redo instead of
resorting to a goto. For certain applications, the catch and throw pair of
eval{} and die() for exception processing can also be a prudent approach.
Beginning in Perl 5.12, Perl accepts an ellipsis, "...
", as a
placeholder for code that you haven't implemented yet. This form of
ellipsis, the unimplemented statement, should not be confused with the
binary flip-flop ...
operator. One is a statement and the other an
operator. (Perl doesn't usually confuse them because usually Perl can tell
whether it wants an operator or a statement, but see below for exceptions.)
When Perl 5.12 or later encounters an ellipsis statement, it parses this
without error, but if and when you should actually try to execute it, Perl
throws an exception with the text Unimplemented
:
You can only use the elliptical statement to stand in for a complete statement. These examples of how the ellipsis works:
The elliptical statement cannot stand in for an expression that
is part of a larger statement, since the ...
is also the three-dot
version of the flip-flop operator (see Range Operators in perlop).
These examples of attempts to use an ellipsis are syntax errors:
There are some cases where Perl can't immediately tell the difference
between an expression and a statement. For instance, the syntax for a
block and an anonymous hash reference constructor look the same unless
there's something in the braces to give Perl a hint. The ellipsis is a
syntax error if Perl doesn't guess that the { ... }
is a block. In that
case, it doesn't think the ...
is an ellipsis because it's expecting an
expression instead of a statement:
- @transformed = map { ... } @input; # syntax error
You can use a ;
inside your block to denote that the { ... }
is a
block and not a hash reference constructor. Now the ellipsis works:
Note: Some folks colloquially refer to this bit of punctuation as a
"yada-yada" or "triple-dot", but its true name
is actually an ellipsis. Perl does not yet
accept the Unicode version, U+2026 HORIZONTAL ELLIPSIS, as an alias for
...
, but someday it may.
Perl has a mechanism for intermixing documentation with source code. While it's expecting the beginning of a new statement, if the compiler encounters a line that begins with an equal sign and a word, like this
- =head1 Here There Be Pods!
Then that text and all remaining text up through and including a line
beginning with =cut
will be ignored. The format of the intervening
text is described in perlpod.
This allows you to intermix your source code and your documentation text freely, as in
- =item snazzle($)
- The snazzle() function will behave in the most spectacular
- form that you can possibly imagine, not even excepting
- cybernetic pyrotechnics.
- =cut back to the compiler, nuff of this pod stuff!
- sub snazzle($) {
- my $thingie = shift;
- .........
- }
Note that pod translators should look at only paragraphs beginning with a pod directive (it makes parsing easier), whereas the compiler actually knows to look for pod escapes even in the middle of a paragraph. This means that the following secret stuff will be ignored by both the compiler and the translators.
You probably shouldn't rely upon the warn() being podded out forever.
Not all pod translators are well-behaved in this regard, and perhaps
the compiler will become pickier.
One may also use pod directives to quickly comment out a section of code.
Perl can process line directives, much like the C preprocessor. Using
this, one can control Perl's idea of filenames and line numbers in
error or warning messages (especially for strings that are processed
with eval()). The syntax for this mechanism is almost the same as for
most C preprocessors: it matches the regular expression
- # example: '# line 42 "new_filename.plx"'
- /^\# \s*
- line \s+ (\d+) \s*
- (?:\s("?)([^"]+)\g2)? \s*
- $/x
with $1
being the line number for the next line, and $3
being
the optional filename (specified with or without quotes). Note that
no whitespace may precede the #
, unlike modern C preprocessors.
There is a fairly obvious gotcha included with the line directive: Debuggers and profilers will only show the last source line to appear at a particular line number in a given file. Care should be taken not to cause line number collisions in code you'd like to debug later.
Here are some examples that you should be able to type into your command shell:
- % perl
- # line 200 "bzzzt"
- # the '#' on the previous line must be the first char on line
- die 'foo';
- __END__
- foo at bzzzt line 201.
- % perl
- # line 200 "bzzzt"
- eval qq[\n#line 2001 ""\ndie 'foo']; print $@;
- __END__
- foo at - line 2001.
- % perl
- eval qq[\n#line 200 "foo bar"\ndie 'foo']; print $@;
- __END__
- foo at foo bar line 200.
- % perl
- # line 345 "goop"
- eval "\n#line " . __LINE__ . ' "' . __FILE__ ."\"\ndie 'foo'";
- print $@;
- __END__
- foo at goop line 345.
As previously mentioned, the "switch" feature is considered highly
experimental; it is subject to change with little notice. In particular,
when
has tricky behaviours that are expected to change to become less
tricky in the future. Do not rely upon its current (mis)implementation.
Before Perl 5.18, given
also had tricky behaviours that you should still
beware of if your code must run on older versions of Perl.
Here is a longer example of given
:
- use feature ":5.10";
- given ($foo) {
- when (undef) {
- say '$foo is undefined';
- }
- when ("foo") {
- say '$foo is the string "foo"';
- }
- when ([1,3,5,7,9]) {
- say '$foo is an odd digit';
- continue; # Fall through
- }
- when ($_ < 100) {
- say '$foo is numerically less than 100';
- }
- when (\&complicated_check) {
- say 'a complicated check for $foo is true';
- }
- default {
- die q(I don't know what to do with $foo);
- }
- }
Before Perl 5.18, given(EXPR)
assigned the value of EXPR to
merely a lexically scoped copy (!) of $_
, not a dynamically
scoped alias the way foreach
does. That made it similar to
except that the block was automatically broken out of by a successful
when
or an explicit break
. Because it was only a copy, and because
it was only lexically scoped, not dynamically scoped, you could not do the
things with it that you are used to in a foreach
loop. In particular,
it did not work for arbitrary function calls if those functions might try
to access $_. Best stick to foreach
for that.
Most of the power comes from the implicit smartmatching that can
sometimes apply. Most of the time, when(EXPR)
is treated as an
implicit smartmatch of $_
, that is, $_ ~~ EXPR
. (See
Smartmatch Operator in perlop for more information on smartmatching.)
But when EXPR is one of the 10 exceptional cases (or things like them)
listed below, it is used directly as a boolean.
A user-defined subroutine call or a method invocation.
A regular expression match in the form of /REGEX/
, $foo =~ /REGEX/
,
or $foo =~ EXPR
. Also, a negated regular expression match in
the form !/REGEX/
, $foo !~ /REGEX/
, or $foo !~ EXPR
.
A smart match that uses an explicit ~~
operator, such as EXPR ~~ EXPR
.
A boolean comparison operator such as $_ < 10
or $x eq "abc"
. The
relational operators that this applies to are the six numeric comparisons
(<
, >, <=
, >=
, ==
, and !=
), and
the six string comparisons (lt
, gt
, le
, ge
, eq
, and ne
).
NOTE: You will often have to use $c ~~ $_
because
the default case uses $_ ~~ $c
, which is frequently
the opposite of what you want.
At least the three builtin functions defined(...), exists(...), and
eof(...). We might someday add more of these later if we think of them.
A negated expression, whether !(EXPR)
or not(EXPR), or a logical
exclusive-or, (EXPR1) xor (EXPR2)
. The bitwise versions (~
and ^)
are not included.
A filetest operator, with exactly 4 exceptions: -s
, -M
, -A
, and
-C
, as these return numerical values, not boolean ones. The -z
filetest operator is not included in the exception list.
The ..
and ...
flip-flop operators. Note that the ...
flip-flop
operator is completely different from the ...
elliptical statement
just described.
In those 8 cases above, the value of EXPR is used directly as a boolean, so
no smartmatching is done. You may think of when
as a smartsmartmatch.
Furthermore, Perl inspects the operands of logical operators to decide whether to use smartmatching for each one by applying the above test to the operands:
If EXPR is EXPR1 && EXPR2
or EXPR1 and EXPR2
, the test is applied
recursively to both EXPR1 and EXPR2.
Only if both operands also pass the
test, recursively, will the expression be treated as boolean. Otherwise,
smartmatching is used.
If EXPR is EXPR1 || EXPR2
, EXPR1 // EXPR2
, or EXPR1 or EXPR2
, the
test is applied recursively to EXPR1 only (which might itself be a
higher-precedence AND operator, for example, and thus subject to the
previous rule), not to EXPR2. If EXPR1 is to use smartmatching, then EXPR2
also does so, no matter what EXPR2 contains. But if EXPR2 does not get to
use smartmatching, then the second argument will not be either. This is
quite different from the && case just described, so be careful.
These rules are complicated, but the goal is for them to do what you want (even if you don't quite understand why they are doing it). For example:
- when (/^\d+$/ && $_ < 75) { ... }
will be treated as a boolean match because the rules say both
a regex match and an explicit test on $_
will be treated
as boolean.
Also:
- when ([qw(foo bar)] && /baz/) { ... }
will use smartmatching because only one of the operands is a boolean: the other uses smartmatching, and that wins.
Further:
- when ([qw(foo bar)] || /^baz/) { ... }
will use smart matching (only the first operand is considered), whereas
- when (/^baz/ || [qw(foo bar)]) { ... }
will test only the regex, which causes both operands to be treated as boolean. Watch out for this one, then, because an arrayref is always a true value, which makes it effectively redundant. Not a good idea.
Tautologous boolean operators are still going to be optimized away. Don't be tempted to write
This will optimize down to "foo"
, so "bar"
will never be considered (even
though the rules say to use a smartmatch
on "foo"
). For an alternation like
this, an array ref will work, because this will instigate smartmatching:
- when ([qw(foo bar)] { ... }
This is somewhat equivalent to the C-style switch statement's fallthrough
functionality (not to be confused with Perl's fallthrough
functionality--see below), wherein the same block is used for several
case
statements.
Another useful shortcut is that, if you use a literal array or hash as the
argument to given
, it is turned into a reference. So given(@foo)
is
the same as given(\@foo)
, for example.
default
behaves exactly like when(1 == 1)
, which is
to say that it always matches.
You can use the break
keyword to break out of the enclosing
given
block. Every when
block is implicitly ended with
a break
.
You can use the continue keyword to fall through from one
case to the next:
When a given
statement is also a valid expression (for example,
when it's the last statement of a block), it evaluates to:
An empty list as soon as an explicit break
is encountered.
The value of the last evaluated expression of the successful
when
/default
clause, if there happens to be one.
The value of the last evaluated expression of the given
block if no
condition is true.
In both last cases, the last expression is evaluated in the context that
was applied to the given
block.
Note that, unlike if
and unless
, failed when
statements always
evaluate to an empty list.
Currently, given
blocks can't always
be used as proper expressions. This
may be addressed in a future version of Perl.
Instead of using given()
, you can use a foreach()
loop.
For example, here's one way to count how many times a particular
string occurs in an array:
Or in a more recent version:
At the end of all when
blocks, there is an implicit next.
You can override that with an explicit last if you're
interested in only the first match alone.
This doesn't work if you explicitly specify a loop variable, as
in for $item (@array)
. You have to use the default variable $_
.
The Perl 5 smartmatch and given
/when
constructs are not compatible
with their Perl 6 analogues. The most visible difference and least
important difference is that, in Perl 5, parentheses are required around
the argument to given()
and when()
(except when this last one is used
as a statement modifier). Parentheses in Perl 6 are always optional in a
control construct such as if()
, while()
, or when()
; they can't be
made optional in Perl 5 without a great deal of potential confusion,
because Perl 5 would parse the expression
- given $foo {
- ...
- }
as though the argument to given
were an element of the hash
%foo
, interpreting the braces as hash-element syntax.
However, their are many, many other differences. For example, this works in Perl 5:
But it doesn't work at all in Perl 6. Instead, you should
use the (parallelizable) any
operator instead:
The table of smartmatches in Smartmatch Operator in perlop is not identical to that proposed by the Perl 6 specification, mainly due to differences between Perl 6's and Perl 5's data models, but also because the Perl 6 spec has changed since Perl 5 rushed into early adoption.
In Perl 6, when()
will always do an implicit smartmatch with its
argument, while in Perl 5 it is convenient (albeit potentially confusing) to
suppress this implicit smartmatch in various rather loosely-defined
situations, as roughly outlined above. (The difference is largely because
Perl 5 does not have, even internally, a boolean type.)
perlbug - how to submit bug reports on Perl
perlbug
perlbug [ -v ] [ -a address ] [ -s subject ] [ -b body | -f inputfile ] [ -F outputfile ] [ -r returnaddress ] [ -e editor ] [ -c adminaddress | -C ] [ -S ] [ -t ] [ -d ] [ -A ] [ -h ] [ -T ]
perlbug [ -v ] [ -r returnaddress ] [ -A ] [ -ok | -okay | -nok | -nokay ]
perlthanks
This program is designed to help you generate and send bug reports (and thank-you notes) about perl5 and the modules which ship with it.
In most cases, you can just run it interactively from a command line without any special arguments and follow the prompts.
If you have found a bug with a non-standard port (one that was not part of the standard distribution), a binary distribution, or a non-core module (such as Tk, DBI, etc), then please see the documentation that came with that distribution to determine the correct place to report bugs.
If you are unable to send your report using perlbug (most likely because your system doesn't have a way to send mail that perlbug recognizes), you may be able to use this tool to compose your report and save it to a file which you can then send to perlbug@perl.org using your regular mail client.
In extreme cases, perlbug may not work well enough on your system to guide you through composing a bug report. In those cases, you may be able to use perlbug -d to get system configuration information to include in a manually composed bug report to perlbug@perl.org.
When reporting a bug, please run through this checklist:
Type perl -v
at the command line to find out.
Look at http://www.perl.org/ to find out. If you are not using the latest released version, please try to replicate your bug on the latest stable release.
Note that reports about bugs in old versions of Perl, especially those which indicate you haven't also tested the current stable release of Perl, are likely to receive less attention from the volunteers who build and maintain Perl than reports about bugs in the current release.
This tool isn't appropriate for reporting bugs in any version prior to Perl 5.0.
A significant number of the bug reports we get turn out to be documented features in Perl. Make sure the issue you've run into isn't intentional by glancing through the documentation that comes with the Perl distribution.
Given the sheer volume of Perl documentation, this isn't a trivial undertaking, but if you can point to documentation that suggests the behaviour you're seeing is wrong, your issue is likely to receive more attention. You may want to start with perldoc perltrap for pointers to common traps that new (and experienced) Perl programmers run into.
If you're unsure of the meaning of an error message you've run across, perldoc perldiag for an explanation. If the message isn't in perldiag, it probably isn't generated by Perl. You may have luck consulting your operating system documentation instead.
If you are on a non-UNIX platform perldoc perlport, as some features may be unimplemented or work differently.
You may be able to figure out what's going wrong using the Perl debugger. For information about how to use the debugger perldoc perldebug.
The easier it is to reproduce your bug, the more likely it will be fixed -- if nobody can duplicate your problem, it probably won't be addressed.
A good test case has most of these attributes: short, simple code; few dependencies on external commands, modules, or libraries; no platform-dependent code (unless it's a platform-specific bug); clear, simple documentation.
A good test case is almost always a good candidate to be included in Perl's test suite. If you have the time, consider writing your test case so that it can be easily included into the standard test suite.
Be sure to include the exact error messages, if any. "Perl gave an error" is not an exact error message.
If you get a core dump (or equivalent), you may use a debugger (dbx, gdb, etc) to produce a stack trace to include in the bug report.
NOTE: unless your Perl has been compiled with debug info (often -g), the stack trace is likely to be somewhat hard to use because it will most probably contain only the function names and not their arguments. If possible, recompile your Perl with debug info and reproduce the crash and the stack trace.
The easier it is to understand a reproducible bug, the more likely it will be fixed. Any insight you can provide into the problem will help a great deal. In other words, try to analyze the problem (to the extent you can) and report your discoveries.
A bug report which includes a patch to fix it will almost
definitely be fixed. When sending a patch, please use the diff
program with the -u
option to generate "unified" diff files.
Bug reports with patches are likely to receive significantly more
attention and interest than those without patches.
Your patch may be returned with requests for changes, or requests for more detailed explanations about your fix.
Here are a few hints for creating high-quality patches:
Make sure the patch is not reversed (the first argument to diff is
typically the original file, the second argument your changed file).
Make sure you test your patch by applying it with the patch
program before you send it on its way. Try to follow the same style
as the code you are trying to patch. Make sure your patch really
does work (make test
, if the thing you're patching is covered
by Perl's test suite).
perlbug
to submit the report?
perlbug will, amongst other things, ensure your report includes
crucial information about your version of perl. If perlbug
is
unable to mail your report after you have typed it in, you may have
to compose the message yourself, add the output produced by perlbug
-d
and email it to perlbug@perl.org. If, for some reason, you
cannot run perlbug
at all on your system, be sure to include the
entire output produced by running perl -V
(note the uppercase V).
Whether you use perlbug
or send the email manually, please make
your Subject line informative. "a bug" is not informative. Neither
is "perl crashes" nor is "HELP!!!". These don't help. A compact
description of what's wrong is fine.
perlbug
to submit a thank-you note?
Yes, you can do this by either using the -T
option, or by invoking
the program as perlthanks
. Thank-you notes are good. It makes people
smile.
Having done your bit, please be prepared to wait, to be told the bug is in your code, or possibly to get no reply at all. The volunteers who maintain Perl are busy folks, so if your problem is an obvious bug in your own code, is difficult to understand or is a duplicate of an existing report, you may not receive a personal reply.
If it is important to you that your bug be fixed, do monitor the perl5-porters@perl.org mailing list and the commit logs to development versions of Perl, and encourage the maintainers with kind words or offers of frosty beverages. (Please do be kind to the maintainers. Harassing or flaming them is likely to have the opposite effect of the one you want.)
Feel free to update the ticket about your bug on http://rt.perl.org if a new version of Perl is released and your bug is still present.
Address to send the report to. Defaults to perlbug@perl.org.
Don't send a bug received acknowledgement to the reply address. Generally it is only a sensible to use this option if you are a perl maintainer actively watching perl porters for your message to arrive.
Body of the report. If not included on the command line, or in a file with -f, you will get a chance to edit the message.
Don't send copy to administrator.
Address to send copy of report to. Defaults to the address of the local perl administrator (recorded when perl was built).
Data mode (the default if you redirect or pipe output). This prints out your configuration data, without mailing anything. You can use this with -v to get more complete data.
Editor to use.
File containing the body of the report. Use this to quickly send a prepared message.
File to output the results to instead of sending as an email. Useful particularly when running perlbug on a machine with no direct internet connection.
Prints a brief summary of the options.
Report successful build on this system to perl porters. Forces -S and -C. Forces and supplies values for -s and -b. Only prompts for a return address if it cannot guess it (for use with make). Honors return address specified with -r. You can use this with -v to get more complete data. Only makes a report if this system is less than 60 days old.
As -ok except it will report on older systems.
Report unsuccessful build on this system. Forces -C. Forces and supplies a value for -s, then requires you to edit the report and say what went wrong. Alternatively, a prepared report may be supplied using -f. Only prompts for a return address if it cannot guess it (for use with make). Honors return address specified with -r. You can use this with -v to get more complete data. Only makes a report if this system is less than 60 days old.
As -nok except it will report on older systems.
Your return address. The program will ask you to confirm its default if you don't use this option.
Send without asking for confirmation.
Subject to include with the message. You will be prompted if you don't supply one on the command line.
Test mode. The target address defaults to perlbug-test@perl.org.
Send a thank-you note instead of a bug report.
Include verbose configuration data in the report.
Kenneth Albanowski (<kjahds@kjahds.com>), subsequently doctored by Gurusamy Sarathy (<gsar@activestate.com>), Tom Christiansen (<tchrist@perl.com>), Nathan Torkington (<gnat@frii.com>), Charles F. Randall (<cfr@pobox.com>), Mike Guy (<mjtg@cam.ac.uk>), Dominic Dunlop (<domo@computer.org>), Hugo van der Sanden (<hv@crypt.org>), Jarkko Hietaniemi (<jhi@iki.fi>), Chris Nandor (<pudge@pobox.com>), Jon Orwant (<orwant@media.mit.edu>, Richard Foley (<richard.foley@rfi.net>), and Jesse Vincent (<jesse@bestpractical.com>).
perl(1), perldebug(1), perldiag(1), perlport(1), perltrap(1), diff(1), patch(1), dbx(1), gdb(1)
None known (guess what must have been used to report them?)
perlthrtut - Tutorial on threads in Perl
This tutorial describes the use of Perl interpreter threads (sometimes referred to as ithreads). In this model, each thread runs in its own Perl interpreter, and any data sharing between threads must be explicit. The user-level interface for ithreads uses the threads class.
NOTE: There was another older Perl threading flavor called the 5.005 model that used the threads class. This old model was known to have problems, is deprecated, and was removed for release 5.10. You are strongly encouraged to migrate any existing 5.005 threads code to the new model as soon as possible.
You can see which (or neither) threading flavour you have by
running perl -V
and looking at the Platform
section.
If you have useithreads=define
you have ithreads, if you
have use5005threads=define
you have 5.005 threads.
If you have neither, you don't have any thread support built in.
If you have both, you are in trouble.
The threads and threads::shared modules are included in the core Perl distribution. Additionally, they are maintained as a separate modules on CPAN, so you can check there for any updates.
A thread is a flow of control through a program with a single execution point.
Sounds an awful lot like a process, doesn't it? Well, it should. Threads are one of the pieces of a process. Every process has at least one thread and, up until now, every process running Perl had only one thread. With 5.8, though, you can create extra threads. We're going to show you how, when, and why.
There are three basic ways that you can structure a threaded program. Which model you choose depends on what you need your program to do. For many non-trivial threaded programs, you'll need to choose different models for different pieces of your program.
The boss/worker model usually has one boss thread and one or more worker threads. The boss thread gathers or generates tasks that need to be done, then parcels those tasks out to the appropriate worker thread.
This model is common in GUI and server programs, where a main thread waits for some event and then passes that event to the appropriate worker threads for processing. Once the event has been passed on, the boss thread goes back to waiting for another event.
The boss thread does relatively little work. While tasks aren't necessarily performed faster than with any other method, it tends to have the best user-response times.
In the work crew model, several threads are created that do essentially the same thing to different pieces of data. It closely mirrors classical parallel processing and vector processors, where a large array of processors do the exact same thing to many pieces of data.
This model is particularly useful if the system running the program will distribute multiple threads across different processors. It can also be useful in ray tracing or rendering engines, where the individual threads can pass on interim results to give the user visual feedback.
The pipeline model divides up a task into a series of steps, and passes the results of one step on to the thread processing the next. Each thread does one thing to each piece of data and passes the results to the next thread in line.
This model makes the most sense if you have multiple processors so two or more threads will be executing in parallel, though it can often make sense in other contexts as well. It tends to keep the individual tasks small and simple, as well as allowing some parts of the pipeline to block (on I/O or system calls, for example) while other parts keep going. If you're running different parts of the pipeline on different processors you may also take advantage of the caches on each processor.
This model is also handy for a form of recursive programming where, rather than having a subroutine call itself, it instead creates another thread. Prime and Fibonacci generators both map well to this form of the pipeline model. (A version of a prime number generator is presented later on.)
If you have experience with other thread implementations, you might find that things aren't quite what you expect. It's very important to remember when dealing with Perl threads that Perl Threads Are Not X Threads for all values of X. They aren't POSIX threads, or DecThreads, or Java's Green threads, or Win32 threads. There are similarities, and the broad concepts are the same, but if you start looking for implementation details you're going to be either disappointed or confused. Possibly both.
This is not to say that Perl threads are completely different from everything that's ever come before. They're not. Perl's threading model owes a lot to other thread models, especially POSIX. Just as Perl is not C, though, Perl threads are not POSIX threads. So if you find yourself looking for mutexes, or thread priorities, it's time to step back a bit and think about what you want to do and how Perl can do it.
However, it is important to remember that Perl threads cannot magically
do things unless your operating system's threads allow it. So if your
system blocks the entire process on sleep(), Perl usually will, as well.
Perl Threads Are Different.
The addition of threads has changed Perl's internals substantially. There are implications for people who write modules with XS code or external libraries. However, since Perl data is not shared among threads by default, Perl modules stand a high chance of being thread-safe or can be made thread-safe easily. Modules that are not tagged as thread-safe should be tested or code reviewed before being used in production code.
Not all modules that you might use are thread-safe, and you should always assume a module is unsafe unless the documentation says otherwise. This includes modules that are distributed as part of the core. Threads are a relatively new feature, and even some of the standard modules aren't thread-safe.
Even if a module is thread-safe, it doesn't mean that the module is optimized to work well with threads. A module could possibly be rewritten to utilize the new features in threaded Perl to increase performance in a threaded environment.
If you're using a module that's not thread-safe for some reason, you can protect yourself by using it from one, and only one thread at all. If you need multiple threads to access such a module, you can use semaphores and lots of programming discipline to control access to it. Semaphores are covered in Basic semaphores.
See also Thread-Safety of System Libraries.
The threads module provides the basic functions you need to write threaded programs. In the following sections, we'll cover the basics, showing you what you need to do to create a threaded program. After that, we'll go over some of the features of the threads module that make threaded programming easier.
Thread support is a Perl compile-time option. It's something that's turned on or off when Perl is built at your site, rather than when your programs are compiled. If your Perl wasn't compiled with thread support enabled, then any attempt to use threads will fail.
Your programs can use the Config module to check whether threads are enabled. If your program can't run without them, you can say something like:
A possibly-threaded program using a possibly-threaded module might have code like this:
Since code that runs both with and without threads is usually pretty
messy, it's best to isolate the thread-specific code in its own
module. In our example above, that's what MyMod_threaded
is, and it's
only imported if we're running on a threaded Perl.
In a real situation, care should be taken that all threads are finished executing before the program exits. That care has not been taken in these examples in the interest of simplicity. Running these examples as is will produce error messages, usually caused by the fact that there are still threads running when the program exits. You should not be alarmed by this.
The threads module provides the tools you need to create new
threads. Like any other module, you need to tell Perl that you want to use
it; use threads;
imports all the pieces you need to create basic
threads.
The simplest, most straightforward way to create a thread is with create()
:
The create()
method takes a reference to a subroutine and creates a new
thread that starts executing in the referenced subroutine. Control
then passes both to the subroutine and the caller.
If you need to, your program can pass parameters to the subroutine as
part of the thread startup. Just include the list of parameters as
part of the threads->create()
call, like this:
- use threads;
- my $Param3 = 'foo';
- my $thr1 = threads->create(\&sub1, 'Param 1', 'Param 2', $Param3);
- my @ParamList = (42, 'Hello', 3.14);
- my $thr2 = threads->create(\&sub1, @ParamList);
- my $thr3 = threads->create(\&sub1, qw(Param1 Param2 Param3));
- sub sub1 {
- my @InboundParameters = @_;
- print("In the thread\n");
- print('Got parameters >', join('<>', @InboundParameters), "<\n");
- }
The last example illustrates another feature of threads. You can spawn off several threads using the same subroutine. Each thread executes the same subroutine, but in a separate thread with a separate environment and potentially separate arguments.
new()
is a synonym for create()
.
Since threads are also subroutines, they can return values. To wait
for a thread to exit and extract any values it might return, you can
use the join() method:
In the example above, the join() method returns as soon as the thread
ends. In addition to waiting for a thread to finish and gathering up
any values that the thread might have returned, join() also performs
any OS cleanup necessary for the thread. That cleanup might be
important, especially for long-running programs that spawn lots of
threads. If you don't want the return values and don't want to wait
for the thread to finish, you should call the detach()
method
instead, as described next.
NOTE: In the example above, the thread returns a list, thus necessitating
that the thread creation call be made in list context (i.e., my ($thr)
).
See $thr->join() in threads and THREAD CONTEXT in threads for more
details on thread context and return values.
join() does three things: it waits for a thread to exit, cleans up
after it, and returns any data the thread may have produced. But what
if you're not interested in the thread's return values, and you don't
really care when the thread finishes? All you want is for the thread
to get cleaned up after when it's done.
In this case, you use the detach()
method. Once a thread is detached,
it'll run until it's finished; then Perl will clean up after it
automatically.
Once a thread is detached, it may not be joined, and any return data that it might have produced (if it was done and waiting for a join) is lost.
detach()
can also be called as a class method to allow a thread to
detach itself:
With threads one must be careful to make sure they all have a chance to run to completion, assuming that is what you want.
An action that terminates a process will terminate all running threads. die() and exit() have this property, and perl does an exit when the main thread exits, perhaps implicitly by falling off the end of your code, even if that's not what you want.
As an example of this case, this code prints the message "Perl exited with active threads: 2 running and unjoined":
But when the following lines are added at the end:
- $thr1->join();
- $thr2->join();
it prints two lines of output, a perhaps more useful outcome.
Now that we've covered the basics of threads, it's time for our next topic: Data. Threading introduces a couple of complications to data access that non-threaded programs never need to worry about.
The biggest difference between Perl ithreads and the old 5.005 style threading, or for that matter, to most other threading systems out there, is that by default, no data is shared. When a new Perl thread is created, all the data associated with the current thread is copied to the new thread, and is subsequently private to that new thread! This is similar in feel to what happens when a Unix process forks, except that in this case, the data is just copied to a different part of memory within the same process rather than a real fork taking place.
To make use of threading, however, one usually wants the threads to share
at least some data between themselves. This is done with the
threads::shared module and the :shared
attribute:
In the case of a shared array, all the array's elements are shared, and for a shared hash, all the keys and values are shared. This places restrictions on what may be assigned to shared array and hash elements: only simple values or references to shared variables are allowed - this is so that a private variable can't accidentally become shared. A bad assignment will cause the thread to die. For example:
- use threads;
- use threads::shared;
- my $var = 1;
- my $svar :shared = 2;
- my %hash :shared;
- ... create some threads ...
- $hash{a} = 1; # All threads see exists($hash{a}) and $hash{a} == 1
- $hash{a} = $var; # okay - copy-by-value: same effect as previous
- $hash{a} = $svar; # okay - copy-by-value: same effect as previous
- $hash{a} = \$svar; # okay - a reference to a shared variable
- $hash{a} = \$var; # This will die
- delete($hash{a}); # okay - all threads will see !exists($hash{a})
Note that a shared variable guarantees that if two or more threads try to modify it at the same time, the internal state of the variable will not become corrupted. However, there are no guarantees beyond this, as explained in the next section.
While threads bring a new set of useful tools, they also bring a number of pitfalls. One pitfall is the race condition:
What do you think $a
will be? The answer, unfortunately, is it
depends. Both sub1()
and sub2()
access the global variable $a
, once
to read and once to write. Depending on factors ranging from your
thread implementation's scheduling algorithm to the phase of the moon,
$a
can be 2 or 3.
Race conditions are caused by unsynchronized access to shared data. Without explicit synchronization, there's no way to be sure that nothing has happened to the shared data between the time you access it and the time you update it. Even this simple code fragment has the possibility of error:
Two threads both access $a
. Each thread can potentially be interrupted
at any point, or be executed in any order. At the end, $a
could be 3
or 4, and both $b
and $c
could be 2 or 3.
Even $a += 5
or $a++
are not guaranteed to be atomic.
Whenever your program accesses data or resources that can be accessed by other threads, you must take steps to coordinate access or risk data inconsistency and race conditions. Note that Perl will protect its internals from your race conditions, but it won't protect you from you.
Perl provides a number of mechanisms to coordinate the interactions
between themselves and their data, to avoid race conditions and the like.
Some of these are designed to resemble the common techniques used in thread
libraries such as pthreads
; others are Perl-specific. Often, the
standard techniques are clumsy and difficult to get right (such as
condition waits). Where possible, it is usually easier to use Perlish
techniques such as queues, which remove some of the hard work involved.
The lock() function takes a shared variable and puts a lock on it.
No other thread may lock the variable until the variable is unlocked
by the thread holding the lock. Unlocking happens automatically
when the locking thread exits the block that contains the call to the
lock() function. Using lock() is straightforward: This example has
several threads doing some calculations in parallel, and occasionally
updating a running total:
- use threads;
- use threads::shared;
- my $total :shared = 0;
- sub calc {
- while (1) {
- my $result;
- # (... do some calculations and set $result ...)
- {
- lock($total); # Block until we obtain the lock
- $total += $result;
- } # Lock implicitly released at end of scope
- last if $result == 0;
- }
- }
- my $thr1 = threads->create(\&calc);
- my $thr2 = threads->create(\&calc);
- my $thr3 = threads->create(\&calc);
- $thr1->join();
- $thr2->join();
- $thr3->join();
- print("total=$total\n");
lock() blocks the thread until the variable being locked is
available. When lock() returns, your thread can be sure that no other
thread can lock that variable until the block containing the
lock exits.
It's important to note that locks don't prevent access to the variable
in question, only lock attempts. This is in keeping with Perl's
longstanding tradition of courteous programming, and the advisory file
locking that flock() gives you.
You may lock arrays and hashes as well as scalars. Locking an array, though, will not block subsequent locks on array elements, just lock attempts on the array itself.
Locks are recursive, which means it's okay for a thread to
lock a variable more than once. The lock will last until the outermost
lock() on the variable goes out of scope. For example:
Note that there is no unlock()
function - the only way to unlock a
variable is to allow it to go out of scope.
A lock can either be used to guard the data contained within the variable being locked, or it can be used to guard something else, like a section of code. In this latter case, the variable in question does not hold any useful data, and exists only for the purpose of being locked. In this respect, the variable behaves like the mutexes and basic semaphores of traditional thread libraries.
Locks are a handy tool to synchronize access to data, and using them properly is the key to safe shared data. Unfortunately, locks aren't without their dangers, especially when multiple locks are involved. Consider the following code:
This program will probably hang until you kill it. The only way it won't hang is if one of the two threads acquires both locks first. A guaranteed-to-hang version is more complicated, but the principle is the same.
The first thread will grab a lock on $a
, then, after a pause during which
the second thread has probably had time to do some work, try to grab a
lock on $b
. Meanwhile, the second thread grabs a lock on $b
, then later
tries to grab a lock on $a
. The second lock attempt for both threads will
block, each waiting for the other to release its lock.
This condition is called a deadlock, and it occurs whenever two or more threads are trying to get locks on resources that the others own. Each thread will block, waiting for the other to release a lock on a resource. That never happens, though, since the thread with the resource is itself waiting for a lock to be released.
There are a number of ways to handle this sort of problem. The best
way is to always have all threads acquire locks in the exact same
order. If, for example, you lock variables $a
, $b
, and $c
, always lock
$a
before $b
, and $b
before $c
. It's also best to hold on to locks for
as short a period of time to minimize the risks of deadlock.
The other synchronization primitives described below can suffer from similar problems.
A queue is a special thread-safe object that lets you put data in one end and take it out the other without having to worry about synchronization issues. They're pretty straightforward, and look like this:
- use threads;
- use Thread::Queue;
- my $DataQueue = Thread::Queue->new();
- my $thr = threads->create(sub {
- while (my $DataElement = $DataQueue->dequeue()) {
- print("Popped $DataElement off the queue\n");
- }
- });
- $DataQueue->enqueue(12);
- $DataQueue->enqueue("A", "B", "C");
- sleep(10);
- $DataQueue->enqueue(undef);
- $thr->join();
You create the queue with Thread::Queue->new()
. Then you can
add lists of scalars onto the end with enqueue()
, and pop scalars off
the front of it with dequeue()
. A queue has no fixed size, and can grow
as needed to hold everything pushed on to it.
If a queue is empty, dequeue()
blocks until another thread enqueues
something. This makes queues ideal for event loops and other
communications between threads.
Semaphores are a kind of generic locking mechanism. In their most basic form, they behave very much like lockable scalars, except that they can't hold data, and that they must be explicitly unlocked. In their advanced form, they act like a kind of counter, and can allow multiple threads to have the lock at any one time.
Semaphores have two methods, down()
and up()
: down()
decrements the resource
count, while up()
increments it. Calls to down()
will block if the
semaphore's current count would decrement below zero. This program
gives a quick demonstration:
- use threads;
- use Thread::Semaphore;
- my $semaphore = Thread::Semaphore->new();
- my $GlobalVariable :shared = 0;
- $thr1 = threads->create(\&sample_sub, 1);
- $thr2 = threads->create(\&sample_sub, 2);
- $thr3 = threads->create(\&sample_sub, 3);
- sub sample_sub {
- my $SubNumber = shift(@_);
- my $TryCount = 10;
- my $LocalCopy;
- sleep(1);
- while ($TryCount--) {
- $semaphore->down();
- $LocalCopy = $GlobalVariable;
- print("$TryCount tries left for sub $SubNumber (\$GlobalVariable is $GlobalVariable)\n");
- sleep(2);
- $LocalCopy++;
- $GlobalVariable = $LocalCopy;
- $semaphore->up();
- }
- }
- $thr1->join();
- $thr2->join();
- $thr3->join();
The three invocations of the subroutine all operate in sync. The semaphore, though, makes sure that only one thread is accessing the global variable at once.
By default, semaphores behave like locks, letting only one thread
down()
them at a time. However, there are other uses for semaphores.
Each semaphore has a counter attached to it. By default, semaphores are
created with the counter set to one, down()
decrements the counter by
one, and up()
increments by one. However, we can override any or all
of these defaults simply by passing in different values:
- use threads;
- use Thread::Semaphore;
- my $semaphore = Thread::Semaphore->new(5);
- # Creates a semaphore with the counter set to five
- my $thr1 = threads->create(\&sub1);
- my $thr2 = threads->create(\&sub1);
- sub sub1 {
- $semaphore->down(5); # Decrements the counter by five
- # Do stuff here
- $semaphore->up(5); # Increment the counter by five
- }
- $thr1->detach();
- $thr2->detach();
If down()
attempts to decrement the counter below zero, it blocks until
the counter is large enough. Note that while a semaphore can be created
with a starting count of zero, any up()
or down()
always changes the
counter by at least one, and so $semaphore->down(0)
is the same as
$semaphore->down(1)
.
The question, of course, is why would you do something like this? Why create a semaphore with a starting count that's not one, or why decrement or increment it by more than one? The answer is resource availability. Many resources that you want to manage access for can be safely used by more than one thread at once.
For example, let's take a GUI driven program. It has a semaphore that it uses to synchronize access to the display, so only one thread is ever drawing at once. Handy, but of course you don't want any thread to start drawing until things are properly set up. In this case, you can create a semaphore with a counter set to zero, and up it when things are ready for drawing.
Semaphores with counters greater than one are also useful for establishing quotas. Say, for example, that you have a number of threads that can do I/O at once. You don't want all the threads reading or writing at once though, since that can potentially swamp your I/O channels, or deplete your process's quota of filehandles. You can use a semaphore initialized to the number of concurrent I/O requests (or open files) that you want at any one time, and have your threads quietly block and unblock themselves.
Larger increments or decrements are handy in those cases where a thread needs to check out or return a number of resources at once.
The functions cond_wait()
and cond_signal()
can be used in conjunction with locks to notify
co-operating threads that a resource has become available. They are
very similar in use to the functions found in pthreads
. However
for most purposes, queues are simpler to use and more intuitive. See
threads::shared for more details.
There are times when you may find it useful to have a thread explicitly give up the CPU to another thread. You may be doing something processor-intensive and want to make sure that the user-interface thread gets called frequently. Regardless, there are times that you might want a thread to give up the processor.
Perl's threading package provides the yield()
function that does
this. yield()
is pretty straightforward, and works like this:
- use threads;
- sub loop {
- my $thread = shift;
- my $foo = 50;
- while($foo--) { print("In thread $thread\n"); }
- threads->yield();
- $foo = 50;
- while($foo--) { print("In thread $thread\n"); }
- }
- my $thr1 = threads->create(\&loop, 'first');
- my $thr2 = threads->create(\&loop, 'second');
- my $thr3 = threads->create(\&loop, 'third');
It is important to remember that yield()
is only a hint to give up the CPU,
it depends on your hardware, OS and threading libraries what actually happens.
On many operating systems, yield() is a no-op. Therefore it is important
to note that one should not build the scheduling of the threads around
yield()
calls. It might work on your platform but it won't work on another
platform.
We've covered the workhorse parts of Perl's threading package, and with these tools you should be well on your way to writing threaded code and packages. There are a few useful little pieces that didn't really fit in anyplace else.
The threads->self()
class method provides your program with a way to
get an object representing the thread it's currently in. You can use this
object in the same way as the ones returned from thread creation.
tid()
is a thread object method that returns the thread ID of the
thread the object represents. Thread IDs are integers, with the main
thread in a program being 0. Currently Perl assigns a unique TID to
every thread ever created in your program, assigning the first thread
to be created a TID of 1, and increasing the TID by 1 for each new
thread that's created. When used as a class method, threads->tid()
can be used by a thread to get its own TID.
The equal()
method takes two thread objects and returns true
if the objects represent the same thread, and false if they don't.
Thread objects also have an overloaded ==
comparison so that you can do
comparison on them as you would with normal objects.
threads->list()
returns a list of thread objects, one for each thread
that's currently running and not detached. Handy for a number of things,
including cleaning up at the end of your program (from the main Perl thread,
of course):
- # Loop through all the threads
- foreach my $thr (threads->list()) {
- $thr->join();
- }
If some threads have not finished running when the main Perl thread ends, Perl will warn you about it and die, since it is impossible for Perl to clean up itself while other threads are running.
NOTE: The main Perl thread (thread 0) is in a detached state, and so
does not appear in the list returned by threads->list()
.
Confused yet? It's time for an example program to show some of the things we've covered. This program finds prime numbers using threads.
- 1 #!/usr/bin/perl
- 2 # prime-pthread, courtesy of Tom Christiansen
- 3
- 4 use strict;
- 5 use warnings;
- 6
- 7 use threads;
- 8 use Thread::Queue;
- 9
- 10 sub check_num {
- 11 my ($upstream, $cur_prime) = @_;
- 12 my $kid;
- 13 my $downstream = Thread::Queue->new();
- 14 while (my $num = $upstream->dequeue()) {
- 15 next unless ($num % $cur_prime);
- 16 if ($kid) {
- 17 $downstream->enqueue($num);
- 18 } else {
- 19 print("Found prime: $num\n");
- 20 $kid = threads->create(\&check_num, $downstream, $num);
- 21 if (! $kid) {
- 22 warn("Sorry. Ran out of threads.\n");
- 23 last;
- 24 }
- 25 }
- 26 }
- 27 if ($kid) {
- 28 $downstream->enqueue(undef);
- 29 $kid->join();
- 30 }
- 31 }
- 32
- 33 my $stream = Thread::Queue->new(3..1000, undef);
- 34 check_num($stream, 2);
This program uses the pipeline model to generate prime numbers. Each thread in the pipeline has an input queue that feeds numbers to be checked, a prime number that it's responsible for, and an output queue into which it funnels numbers that have failed the check. If the thread has a number that's failed its check and there's no child thread, then the thread must have found a new prime number. In that case, a new child thread is created for that prime and stuck on the end of the pipeline.
This probably sounds a bit more confusing than it really is, so let's go through this program piece by piece and see what it does. (For those of you who might be trying to remember exactly what a prime number is, it's a number that's only evenly divisible by itself and 1.)
The bulk of the work is done by the check_num()
subroutine, which
takes a reference to its input queue and a prime number that it's
responsible for. After pulling in the input queue and the prime that
the subroutine is checking (line 11), we create a new queue (line 13)
and reserve a scalar for the thread that we're likely to create later
(line 12).
The while loop from line 14 to line 26 grabs a scalar off the input queue and checks against the prime this thread is responsible for. Line 15 checks to see if there's a remainder when we divide the number to be checked by our prime. If there is one, the number must not be evenly divisible by our prime, so we need to either pass it on to the next thread if we've created one (line 17) or create a new thread if we haven't.
The new thread creation is line 20. We pass on to it a reference to the queue we've created, and the prime number we've found. In lines 21 through 24, we check to make sure that our new thread got created, and if not, we stop checking any remaining numbers in the queue.
Finally, once the loop terminates (because we got a 0 or undef in the
queue, which serves as a note to terminate), we pass on the notice to our
child, and wait for it to exit if we've created a child (lines 27 and
30).
Meanwhile, back in the main thread, we first create a queue (line 33) and
queue up all the numbers from 3 to 1000 for checking, plus a termination
notice. Then all we have to do to get the ball rolling is pass the queue
and the first prime to the check_num()
subroutine (line 34).
That's how it works. It's pretty simple; as with many Perl programs, the explanation is much longer than the program.
Some background on thread implementations from the operating system viewpoint. There are three basic categories of threads: user-mode threads, kernel threads, and multiprocessor kernel threads.
User-mode threads are threads that live entirely within a program and its libraries. In this model, the OS knows nothing about threads. As far as it's concerned, your process is just a process.
This is the easiest way to implement threads, and the way most OSes
start. The big disadvantage is that, since the OS knows nothing about
threads, if one thread blocks they all do. Typical blocking activities
include most system calls, most I/O, and things like sleep().
Kernel threads are the next step in thread evolution. The OS knows about kernel threads, and makes allowances for them. The main difference between a kernel thread and a user-mode thread is blocking. With kernel threads, things that block a single thread don't block other threads. This is not the case with user-mode threads, where the kernel blocks at the process level and not the thread level.
This is a big step forward, and can give a threaded program quite a performance boost over non-threaded programs. Threads that block performing I/O, for example, won't block threads that are doing other things. Each process still has only one thread running at once, though, regardless of how many CPUs a system might have.
Since kernel threading can interrupt a thread at any time, they will
uncover some of the implicit locking assumptions you may make in your
program. For example, something as simple as $a = $a + 2
can behave
unpredictably with kernel threads if $a
is visible to other
threads, as another thread may have changed $a
between the time it
was fetched on the right hand side and the time the new value is
stored.
Multiprocessor kernel threads are the final step in thread support. With multiprocessor kernel threads on a machine with multiple CPUs, the OS may schedule two or more threads to run simultaneously on different CPUs.
This can give a serious performance boost to your threaded program, since more than one thread will be executing at the same time. As a tradeoff, though, any of those nagging synchronization issues that might not have shown with basic kernel threads will appear with a vengeance.
In addition to the different levels of OS involvement in threads, different OSes (and different thread implementations for a particular OS) allocate CPU cycles to threads in different ways.
Cooperative multitasking systems have running threads give up control if one of two things happen. If a thread calls a yield function, it gives up control. It also gives up control if the thread does something that would cause it to block, such as perform I/O. In a cooperative multitasking implementation, one thread can starve all the others for CPU time if it so chooses.
Preemptive multitasking systems interrupt threads at regular intervals while the system decides which thread should run next. In a preemptive multitasking system, one thread usually won't monopolize the CPU.
On some systems, there can be cooperative and preemptive threads running simultaneously. (Threads running with realtime priorities often behave cooperatively, for example, while threads running at normal priorities behave preemptively.)
Most modern operating systems support preemptive multitasking nowadays.
The main thing to bear in mind when comparing Perl's ithreads to other threading models is the fact that for each new thread created, a complete copy of all the variables and data of the parent thread has to be taken. Thus, thread creation can be quite expensive, both in terms of memory usage and time spent in creation. The ideal way to reduce these costs is to have a relatively short number of long-lived threads, all created fairly early on (before the base thread has accumulated too much data). Of course, this may not always be possible, so compromises have to be made. However, after a thread has been created, its performance and extra memory usage should be little different than ordinary code.
Also note that under the current implementation, shared variables use a little more memory and are a little slower than ordinary variables.
Note that while threads themselves are separate execution threads and Perl data is thread-private unless explicitly shared, the threads can affect process-scope state, affecting all the threads.
The most common example of this is changing the current working
directory using chdir(). One thread calls chdir(), and the working
directory of all the threads changes.
Even more drastic example of a process-scope change is chroot():
the root directory of all the threads changes, and no thread can
undo it (as opposed to chdir()).
Further examples of process-scope changes include umask() and
changing uids and gids.
Thinking of mixing fork() and threads? Please lie down and wait
until the feeling passes. Be aware that the semantics of fork() vary
between platforms. For example, some Unix systems copy all the current
threads into the child process, while others only copy the thread that
called fork(). You have been warned!
Similarly, mixing signals and threads may be problematic. Implementations are platform-dependent, and even the POSIX semantics may not be what you expect (and Perl doesn't even give you the full POSIX API). For example, there is no way to guarantee that a signal sent to a multi-threaded Perl application will get intercepted by any particular thread. (However, a recently added feature does provide the capability to send signals between threads. See THREAD SIGNALLING in threads for more details.)
Whether various library calls are thread-safe is outside the control
of Perl. Calls often suffering from not being thread-safe include:
localtime(), gmtime(), functions fetching user, group and
network information (such as getgrent(), gethostent(),
getnetent() and so on), readdir(), rand(), and srand(). In
general, calls that depend on some global external state.
If the system Perl is compiled in has thread-safe variants of such calls, they will be used. Beyond that, Perl is at the mercy of the thread-safety or -unsafety of the calls. Please consult your C library call documentation.
On some platforms the thread-safe library interfaces may fail if the
result buffer is too small (for example the user group databases may
be rather large, and the reentrant interfaces may have to carry around
a full snapshot of those databases). Perl will start with a small
buffer, but keep retrying and growing the result buffer
until the result fits. If this limitless growing sounds bad for
security or memory consumption reasons you can recompile Perl with
PERL_REENTRANT_MAXSIZE
defined to the maximum number of bytes you will
allow.
A complete thread tutorial could fill a book (and has, many times), but with what we've covered in this introduction, you should be well on your way to becoming a threaded Perl expert.
Annotated POD for threads: http://annocpan.org/?mode=search&field=Module&name=threads
Latest version of threads on CPAN: http://search.cpan.org/search?module=threads
Annotated POD for threads::shared: http://annocpan.org/?mode=search&field=Module&name=threads%3A%3Ashared
Latest version of threads::shared on CPAN: http://search.cpan.org/search?module=threads%3A%3Ashared
Perl threads mailing list: http://lists.perl.org/list/ithreads.html
Here's a short bibliography courtesy of Jürgen Christoffel:
Birrell, Andrew D. An Introduction to Programming with Threads. Digital Equipment Corporation, 1989, DEC-SRC Research Report #35 online as ftp://ftp.dec.com/pub/DEC/SRC/research-reports/SRC-035.pdf (highly recommended)
Robbins, Kay. A., and Steven Robbins. Practical Unix Programming: A Guide to Concurrency, Communication, and Multithreading. Prentice-Hall, 1996.
Lewis, Bill, and Daniel J. Berg. Multithreaded Programming with Pthreads. Prentice Hall, 1997, ISBN 0-13-443698-9 (a well-written introduction to threads).
Nelson, Greg (editor). Systems Programming with Modula-3. Prentice Hall, 1991, ISBN 0-13-590464-1.
Nichols, Bradford, Dick Buttlar, and Jacqueline Proulx Farrell. Pthreads Programming. O'Reilly & Associates, 1996, ISBN 156592-115-1 (covers POSIX threads).
Boykin, Joseph, David Kirschen, Alan Langerman, and Susan LoVerso. Programming under Mach. Addison-Wesley, 1994, ISBN 0-201-52739-1.
Tanenbaum, Andrew S. Distributed Operating Systems. Prentice Hall, 1995, ISBN 0-13-219908-4 (great textbook).
Silberschatz, Abraham, and Peter B. Galvin. Operating System Concepts, 4th ed. Addison-Wesley, 1995, ISBN 0-201-59292-4
Arnold, Ken and James Gosling. The Java Programming Language, 2nd ed. Addison-Wesley, 1998, ISBN 0-201-31006-6.
comp.programming.threads FAQ, http://www.serpentine.com/~bos/threads-faq/
Le Sergent, T. and B. Berthomieu. "Incremental MultiThreaded Garbage Collection on Virtually Shared Memory Architectures" in Memory Management: Proc. of the International Workshop IWMM 92, St. Malo, France, September 1992, Yves Bekkers and Jacques Cohen, eds. Springer, 1992, ISBN 3540-55940-X (real-life thread applications).
Artur Bergman, "Where Wizards Fear To Tread", June 11, 2002, http://www.perl.com/pub/a/2002/06/11/threads.html
Thanks (in no particular order) to Chaim Frenkel, Steve Fink, Gurusamy Sarathy, Ilya Zakharevich, Benjamin Sugars, Jürgen Christoffel, Joshua Pritikin, and Alan Burlison, for their help in reality-checking and polishing this article. Big thanks to Tom Christiansen for his rewrite of the prime number generator.
Dan Sugalski <dan@sidhe.org<gt>
Slightly modified by Arthur Bergman to fit the new thread model/module.
Reworked slightly by Jörg Walter <jwalt@cpan.org<gt> to be more concise about thread-safety of Perl code.
Rearranged slightly by Elizabeth Mattijsen <liz@dijkmat.nl<gt> to put less emphasis on yield().
The original version of this article originally appeared in The Perl Journal #10, and is copyright 1998 The Perl Journal. It appears courtesy of Jon Orwant and The Perl Journal. This document may be distributed under the same terms as Perl itself.
perltie - how to hide an object class in a simple variable
Prior to release 5.0 of Perl, a programmer could use dbmopen() to connect an on-disk database in the standard Unix dbm(3x) format magically to a %HASH in their program. However, their Perl was either built with one particular dbm library or another, but not both, and you couldn't extend this mechanism to other packages or types of variables.
Now you can.
The tie() function binds a variable to a class (package) that will provide the implementation for access methods for that variable. Once this magic has been performed, accessing a tied variable automatically triggers method calls in the proper class. The complexity of the class is hidden behind magic methods calls. The method names are in ALL CAPS, which is a convention that Perl uses to indicate that they're called implicitly rather than explicitly--just like the BEGIN() and END() functions.
In the tie() call, VARIABLE
is the name of the variable to be
enchanted. CLASSNAME
is the name of a class implementing objects of
the correct type. Any additional arguments in the LIST
are passed to
the appropriate constructor method for that class--meaning TIESCALAR(),
TIEARRAY(), TIEHASH(), or TIEHANDLE(). (Typically these are arguments
such as might be passed to the dbminit() function of C.) The object
returned by the "new" method is also returned by the tie() function,
which would be useful if you wanted to access other methods in
CLASSNAME
. (You don't actually have to return a reference to a right
"type" (e.g., HASH or CLASSNAME
) so long as it's a properly blessed
object.) You can also retrieve a reference to the underlying object
using the tied() function.
Unlike dbmopen(), the tie() function will not use or require a module
for you--you need to do that explicitly yourself.
A class implementing a tied scalar should define the following methods: TIESCALAR, FETCH, STORE, and possibly UNTIE and/or DESTROY.
Let's look at each in turn, using as an example a tie class for scalars that allows the user to do something like:
And now whenever either of those variables is accessed, its current system priority is retrieved and returned. If those variables are set, then the process's priority is changed!
We'll use Jarkko Hietaniemi <jhi@iki.fi>'s BSD::Resource class (not included) to access the PRIO_PROCESS, PRIO_MIN, and PRIO_MAX constants from your system, as well as the getpriority() and setpriority() system calls. Here's the preamble of the class.
This is the constructor for the class. That means it is expected to return a blessed reference to a new scalar (probably anonymous) that it's creating. For example:
- sub TIESCALAR {
- my $class = shift;
- my $pid = shift || $$; # 0 means me
- if ($pid !~ /^\d+$/) {
- carp "Nice::Tie::Scalar got non-numeric pid $pid" if $^W;
- return undef;
- }
- unless (kill 0, $pid) { # EPERM or ERSCH, no doubt
- carp "Nice::Tie::Scalar got bad pid $pid: $!" if $^W;
- return undef;
- }
- return bless \$pid, $class;
- }
This tie class has chosen to return an error rather than raising an
exception if its constructor should fail. While this is how dbmopen() works,
other classes may well not wish to be so forgiving. It checks the global
variable $^W
to see whether to emit a bit of noise anyway.
This method will be triggered every time the tied variable is accessed (read). It takes no arguments beyond its self reference, which is the object representing the scalar we're dealing with. Because in this case we're using just a SCALAR ref for the tied scalar object, a simple $$self allows the method to get at the real value stored there. In our example below, that real value is the process ID to which we've tied our variable.
This time we've decided to blow up (raise an exception) if the renice fails--there's no place for us to return an error otherwise, and it's probably the right thing to do.
This method will be triggered every time the tied variable is set (assigned). Beyond its self reference, it also expects one (and only one) argument: the new value the user is trying to assign. Don't worry about returning a value from STORE; the semantic of assignment returning the assigned value is implemented with FETCH.
- sub STORE {
- my $self = shift;
- confess "wrong type" unless ref $self;
- my $new_nicety = shift;
- croak "usage error" if @_;
- if ($new_nicety < PRIO_MIN) {
- carp sprintf
- "WARNING: priority %d less than minimum system priority %d",
- $new_nicety, PRIO_MIN if $^W;
- $new_nicety = PRIO_MIN;
- }
- if ($new_nicety > PRIO_MAX) {
- carp sprintf
- "WARNING: priority %d greater than maximum system priority %d",
- $new_nicety, PRIO_MAX if $^W;
- $new_nicety = PRIO_MAX;
- }
- unless (defined setpriority(PRIO_PROCESS, $$self, $new_nicety)) {
- confess "setpriority failed: $!";
- }
- }
This method will be triggered when the untie occurs. This can be useful
if the class needs to know when no further calls will be made. (Except DESTROY
of course.) See The untie Gotcha below for more details.
This method will be triggered when the tied variable needs to be destructed. As with other object classes, such a method is seldom necessary, because Perl deallocates its moribund object's memory for you automatically--this isn't C++, you know. We'll use a DESTROY method here for debugging purposes only.
That's about all there is to it. Actually, it's more than all there is to it, because we've done a few nice things here for the sake of completeness, robustness, and general aesthetics. Simpler TIESCALAR classes are certainly possible.
A class implementing a tied ordinary array should define the following methods: TIEARRAY, FETCH, STORE, FETCHSIZE, STORESIZE and perhaps UNTIE and/or DESTROY.
FETCHSIZE and STORESIZE are used to provide $#array
and
equivalent scalar(@array) access.
The methods POP, PUSH, SHIFT, UNSHIFT, SPLICE, DELETE, and EXISTS are
required if the perl operator with the corresponding (but lowercase) name
is to operate on the tied array. The Tie::Array class can be used as a
base class to implement the first five of these in terms of the basic
methods above. The default implementations of DELETE and EXISTS in
Tie::Array simply croak
.
In addition EXTEND will be called when perl would have pre-extended allocation in a real array.
For this discussion, we'll implement an array whose elements are a fixed size at creation. If you try to create an element larger than the fixed size, you'll take an exception. For example:
The preamble code for the class is as follows:
This is the constructor for the class. That means it is expected to return a blessed reference through which the new array (probably an anonymous ARRAY ref) will be accessed.
In our example, just to show you that you don't really have to return an
ARRAY reference, we'll choose a HASH reference to represent our object.
A HASH works out well as a generic record type: the {ELEMSIZE}
field will
store the maximum element size allowed, and the {ARRAY}
field will hold the
true ARRAY ref. If someone outside the class tries to dereference the
object returned (doubtless thinking it an ARRAY ref), they'll blow up.
This just goes to show you that you should respect an object's privacy.
This method will be triggered every time an individual element the tied array is accessed (read). It takes one argument beyond its self reference: the index whose value we're trying to fetch.
If a negative array index is used to read from an array, the index
will be translated to a positive one internally by calling FETCHSIZE
before being passed to FETCH. You may disable this feature by
assigning a true value to the variable $NEGATIVE_INDICES
in the
tied array class.
As you may have noticed, the name of the FETCH method (et al.) is the same for all accesses, even though the constructors differ in names (TIESCALAR vs TIEARRAY). While in theory you could have the same class servicing several tied types, in practice this becomes cumbersome, and it's easiest to keep them at simply one tie type per class.
This method will be triggered every time an element in the tied array is set (written). It takes two arguments beyond its self reference: the index at which we're trying to store something and the value we're trying to put there.
In our example, undef is really $self->{ELEMSIZE}
number of
spaces so we have a little more work to do here:
- sub STORE {
- my $self = shift;
- my( $index, $value ) = @_;
- if ( length $value > $self->{ELEMSIZE} ) {
- croak "length of $value is greater than $self->{ELEMSIZE}";
- }
- # fill in the blanks
- $self->EXTEND( $index ) if $index > $self->FETCHSIZE();
- # right justify to keep element size for smaller elements
- $self->{ARRAY}->[$index] = sprintf "%$self->{ELEMSIZE}s", $value;
- }
Negative indexes are treated the same as with FETCH.
Returns the total number of items in the tied array associated with
object this. (Equivalent to scalar(@array)). For example:
Sets the total number of items in the tied array associated with
object this to be count. If this makes the array larger then
class's mapping of undef should be returned for new positions.
If the array becomes smaller then entries beyond count should be
deleted.
In our example, 'undef' is really an element containing
$self->{ELEMSIZE}
number of spaces. Observe:
Informative call that array is likely to grow to have count entries. Can be used to optimize allocation. This method need do nothing.
In our example, we want to make sure there are no blank (undef)
entries, so EXTEND
will make use of STORESIZE
to fill elements
as needed:
Verify that the element at index key exists in the tied array this.
In our example, we will determine that if an element consists of
$self->{ELEMSIZE}
spaces only, it does not exist:
Delete the element at index key from the tied array this.
In our example, a deleted item is $self->{ELEMSIZE}
spaces:
Clear (remove, delete, ...) all values from the tied array associated with object this. For example:
Append elements of LIST to the array. For example:
Remove last element of the array and return it. For example:
Remove the first element of the array (shifting other elements down) and return it. For example:
Insert LIST elements at the beginning of the array, moving existing elements up to make room. For example:
Perform the equivalent of splice on the array.
offset is optional and defaults to zero, negative values count back from the end of the array.
length is optional and defaults to rest of the array.
LIST may be empty.
Returns a list of the original length elements at offset.
In our example, we'll use a little shortcut if there is a LIST:
Will be called when untie happens. (See The untie Gotcha below.)
This method will be triggered when the tied variable needs to be destructed. As with the scalar tie class, this is almost never needed in a language that does its own garbage collection, so this time we'll just leave it out.
Hashes were the first Perl data type to be tied (see dbmopen()). A class
implementing a tied hash should define the following methods: TIEHASH is
the constructor. FETCH and STORE access the key and value pairs. EXISTS
reports whether a key is present in the hash, and DELETE deletes one.
CLEAR empties the hash by deleting all the key and value pairs. FIRSTKEY
and NEXTKEY implement the keys() and each() functions to iterate over all
the keys. SCALAR is triggered when the tied hash is evaluated in scalar
context. UNTIE is called when untie happens, and DESTROY is called when
the tied variable is garbage collected.
If this seems like a lot, then feel free to inherit from merely the standard Tie::StdHash module for most of your methods, redefining only the interesting ones. See Tie::Hash for details.
Remember that Perl distinguishes between a key not existing in the hash,
and the key existing in the hash but having a corresponding value of
undef. The two possibilities can be tested with the exists() and
defined() functions.
Here's an example of a somewhat interesting tied hash class: it gives you a hash representing a particular user's dot files. You index into the hash with the name of the file (minus the dot) and you get back that dot file's contents. For example:
Or here's another sample of using our tied class:
In our tied hash DotFiles example, we use a regular
hash for the object containing several important
fields, of which only the {LIST}
field will be what the
user thinks of as the real hash.
whose dot files this object represents
where those dot files live
whether we should try to change or remove those dot files
the hash of dot file names and content mappings
Here's the start of Dotfiles.pm:
For our example, we want to be able to emit debugging info to help in tracing during development. We keep also one convenience function around internally to help print out warnings; whowasi() returns the function name that calls it.
Here are the methods for the DotFiles tied hash.
This is the constructor for the class. That means it is expected to return a blessed reference through which the new object (probably but not necessarily an anonymous hash) will be accessed.
Here's the constructor:
- sub TIEHASH {
- my $self = shift;
- my $user = shift || $>;
- my $dotdir = shift || '';
- croak "usage: @{[&whowasi]} [USER [DOTDIR]]" if @_;
- $user = getpwuid($user) if $user =~ /^\d+$/;
- my $dir = (getpwnam($user))[7]
- || croak "@{[&whowasi]}: no user $user";
- $dir .= "/$dotdir" if $dotdir;
- my $node = {
- USER => $user,
- HOME => $dir,
- LIST => {},
- CLOBBER => 0,
- };
- opendir(DIR, $dir)
- || croak "@{[&whowasi]}: can't opendir $dir: $!";
- foreach $dot ( grep /^\./ && -f "$dir/$_", readdir(DIR)) {
- $dot =~ s/^\.//;
- $node->{LIST}{$dot} = undef;
- }
- closedir DIR;
- return bless $node, $self;
- }
It's probably worth mentioning that if you're going to filetest the return values out of a readdir, you'd better prepend the directory in question. Otherwise, because we didn't chdir() there, it would have been testing the wrong file.
This method will be triggered every time an element in the tied hash is accessed (read). It takes one argument beyond its self reference: the key whose value we're trying to fetch.
Here's the fetch for our DotFiles example.
- sub FETCH {
- carp &whowasi if $DEBUG;
- my $self = shift;
- my $dot = shift;
- my $dir = $self->{HOME};
- my $file = "$dir/.$dot";
- unless (exists $self->{LIST}->{$dot} || -f $file) {
- carp "@{[&whowasi]}: no $dot file" if $DEBUG;
- return undef;
- }
- if (defined $self->{LIST}->{$dot}) {
- return $self->{LIST}->{$dot};
- } else {
- return $self->{LIST}->{$dot} = `cat $dir/.$dot`;
- }
- }
It was easy to write by having it call the Unix cat(1) command, but it would probably be more portable to open the file manually (and somewhat more efficient). Of course, because dot files are a Unixy concept, we're not that concerned.
This method will be triggered every time an element in the tied hash is set (written). It takes two arguments beyond its self reference: the index at which we're trying to store something, and the value we're trying to put there.
Here in our DotFiles example, we'll be careful not to let them try to overwrite the file unless they've called the clobber() method on the original object reference returned by tie().
- sub STORE {
- carp &whowasi if $DEBUG;
- my $self = shift;
- my $dot = shift;
- my $value = shift;
- my $file = $self->{HOME} . "/.$dot";
- my $user = $self->{USER};
- croak "@{[&whowasi]}: $file not clobberable"
- unless $self->{CLOBBER};
- open(my $f, '>', $file) || croak "can't open $file: $!";
- print $f $value;
- close($f);
- }
If they wanted to clobber something, they might say:
- $ob = tie %daemon_dots, 'daemon';
- $ob->clobber(1);
- $daemon_dots{signature} = "A true daemon\n";
Another way to lay hands on a reference to the underlying object is to use the tied() function, so they might alternately have set clobber using:
The clobber method is simply:
This method is triggered when we remove an element from the hash, typically by using the delete() function. Again, we'll be careful to check whether they really want to clobber files.
- sub DELETE {
- carp &whowasi if $DEBUG;
- my $self = shift;
- my $dot = shift;
- my $file = $self->{HOME} . "/.$dot";
- croak "@{[&whowasi]}: won't remove file $file"
- unless $self->{CLOBBER};
- delete $self->{LIST}->{$dot};
- my $success = unlink($file);
- carp "@{[&whowasi]}: can't unlink $file: $!" unless $success;
- $success;
- }
The value returned by DELETE becomes the return value of the call to delete(). If you want to emulate the normal behavior of delete(), you should return whatever FETCH would have returned for this key. In this example, we have chosen instead to return a value which tells the caller whether the file was successfully deleted.
This method is triggered when the whole hash is to be cleared, usually by assigning the empty list to it.
In our example, that would remove all the user's dot files! It's such a dangerous thing that they'll have to set CLOBBER to something higher than 1 to make it happen.
This method is triggered when the user uses the exists() function
on a particular hash. In our example, we'll look at the {LIST}
hash element for this:
This method will be triggered when the user is going to iterate through the hash, such as via a keys() or each() call.
This method gets triggered during a keys() or each() iteration. It has a second argument which is the last key that had been accessed. This is useful if you're carrying about ordering or calling the iterator from more than one sequence, or not really storing things in a hash anywhere.
For our example, we're using a real hash so we'll do just the simple thing, but we'll have to go through the LIST field indirectly.
This is called when the hash is evaluated in scalar context. In order to mimic the behaviour of untied hashes, this method should return a false value when the tied hash is considered empty. If this method does not exist, perl will make some educated guesses and return true when the hash is inside an iteration. If this isn't the case, FIRSTKEY is called, and the result will be a false value if FIRSTKEY returns the empty list, true otherwise.
However, you should not blindly rely on perl always doing the right thing. Particularly, perl will mistakenly return true when you clear the hash by repeatedly calling DELETE until it is empty. You are therefore advised to supply your own SCALAR method when you want to be absolutely sure that your hash behaves nicely in scalar context.
In our example we can just call scalar on the underlying hash
referenced by $self->{LIST}
:
This is called when untie occurs. See The untie Gotcha below.
This method is triggered when a tied hash is about to go out of scope. You don't really need it unless you're trying to add debugging or have auxiliary state to clean up. Here's a very simple function:
Note that functions such as keys() and values() may return huge lists when used on large objects, like DBM files. You may prefer to use the each() function to iterate over such. Example:
This is partially implemented now.
A class implementing a tied filehandle should define the following methods: TIEHANDLE, at least one of PRINT, PRINTF, WRITE, READLINE, GETC, READ, and possibly CLOSE, UNTIE and DESTROY. The class can also provide: BINMODE, OPEN, EOF, FILENO, SEEK, TELL - if the corresponding perl operators are used on the handle.
When STDERR is tied, its PRINT method will be called to issue warnings
and error messages. This feature is temporarily disabled during the call,
which means you can use warn() inside PRINT without starting a recursive
loop. And just like __WARN__
and __DIE__
handlers, STDERR's PRINT
method may be called to report parser errors, so the caveats mentioned under
%SIG in perlvar apply.
All of this is especially useful when perl is embedded in some other program, where output to STDOUT and STDERR may have to be redirected in some special way. See nvi and the Apache module for examples.
When tying a handle, the first argument to tie should begin with an
asterisk. So, if you are tying STDOUT, use *STDOUT
. If you have
assigned it to a scalar variable, say $handle
, use *$handle
.
tie $handle
ties the scalar variable $handle
, not the handle inside
it.
In our example we're going to create a shouting handle.
This is the constructor for the class. That means it is expected to return a blessed reference of some sort. The reference can be used to hold some internal information.
This method will be called when the handle is written to via the
syswrite function.
This method will be triggered every time the tied handle is printed to
with the print() or say() functions. Beyond its self reference
it also expects the list that was passed to the print function.
say() acts just like print() except $\ will be localized to \n
so
you need do nothing special to handle say() in PRINT()
.
This method will be triggered every time the tied handle is printed to
with the printf() function.
Beyond its self reference it also expects the format and list that was
passed to the printf function.
This method will be called when the handle is read from via the read
or sysread functions.
This method is called when the handle is read via <HANDLE>
or readline HANDLE
.
As per readline, in scalar context it should return
the next line, or undef for no more data. In list context it should
return all remaining lines, or an empty list for no more data. The strings
returned should include the input record separator $/
(see perlvar),
unless it is undef (which means "slurp" mode).
This method will be called when the getc function is called.
This method will be called when the eof function is called.
Starting with Perl 5.12, an additional integer parameter will be passed. It
will be zero if eof is called without parameter; 1
if eof is given
a filehandle as a parameter, e.g. eof(FH); and 2
in the very special
case that the tied filehandle is ARGV
and eof is called with an empty
parameter list, e.g. eof().
- sub EOF { not length $stringbuf }
This method will be called when the handle is closed via the close
function.
- sub CLOSE { print "CLOSE called.\n" }
As with the other types of ties, this method will be called when untie happens.
It may be appropriate to "auto CLOSE" when this occurs. See
The untie Gotcha below.
As with the other types of ties, this method will be called when the tied handle is about to be destroyed. This is useful for debugging and possibly cleaning up.
- sub DESTROY { print "</shout>\n" }
Here's how to use our little example:
You can define for all tie types an UNTIE method that will be called at untie(). See The untie Gotcha below.
untie Gotcha
If you intend making use of the object returned from either tie() or tied(), and if the tie's target class defines a destructor, there is a subtle gotcha you must guard against.
As setup, consider this (admittedly rather contrived) example of a tie; all it does is use a file to keep a log of the values assigned to a scalar.
- package Remember;
- use strict;
- use warnings;
- use IO::File;
- sub TIESCALAR {
- my $class = shift;
- my $filename = shift;
- my $handle = IO::File->new( "> $filename" )
- or die "Cannot open $filename: $!\n";
- print $handle "The Start\n";
- bless {FH => $handle, Value => 0}, $class;
- }
- sub FETCH {
- my $self = shift;
- return $self->{Value};
- }
- sub STORE {
- my $self = shift;
- my $value = shift;
- my $handle = $self->{FH};
- print $handle "$value\n";
- $self->{Value} = $value;
- }
- sub DESTROY {
- my $self = shift;
- my $handle = $self->{FH};
- print $handle "The End\n";
- close $handle;
- }
- 1;
Here is an example that makes use of this tie:
This is the output when it is executed:
- The Start
- 1
- 4
- 5
- The End
So far so good. Those of you who have been paying attention will have spotted that the tied object hasn't been used so far. So lets add an extra method to the Remember class to allow comments to be included in the file; say, something like this:
And here is the previous example modified to use the comment
method
(which requires the tied object):
When this code is executed there is no output. Here's why:
When a variable is tied, it is associated with the object which is the return value of the TIESCALAR, TIEARRAY, or TIEHASH function. This object normally has only one reference, namely, the implicit reference from the tied variable. When untie() is called, that reference is destroyed. Then, as in the first example above, the object's destructor (DESTROY) is called, which is normal for objects that have no more valid references; and thus the file is closed.
In the second example, however, we have stored another reference to the tied object in $x. That means that when untie() gets called there will still be a valid reference to the object in existence, so the destructor is not called at that time, and thus the file is not closed. The reason there is no output is because the file buffers have not been flushed to disk.
Now that you know what the problem is, what can you do to avoid it?
Prior to the introduction of the optional UNTIE method the only way
was the good old -w
flag. Which will spot any instances where you call
untie() and there are still valid references to the tied object. If
the second script above this near the top use warnings 'untie'
or was run with the -w
flag, Perl prints this
warning message:
- untie attempted while 1 inner references still exist
To get the script to work properly and silence the warning make sure there are no valid references to the tied object before untie() is called:
Now that UNTIE exists the class designer can decide which parts of the
class functionality are really associated with untie and which with
the object being destroyed. What makes sense for a given class depends
on whether the inner references are being kept so that non-tie-related
methods can be called on the object. But in most cases it probably makes
sense to move the functionality that would have been in DESTROY to the UNTIE
method.
If the UNTIE method exists then the warning above does not occur. Instead the UNTIE method is passed the count of "extra" references and can issue its own warning if appropriate. e.g. to replicate the no UNTIE case this method can be used:
- sub UNTIE
- {
- my ($obj,$count) = @_;
- carp "untie attempted while $count inner references still exist" if $count;
- }
See DB_File or Config for some interesting tie() implementations. A good starting point for many tie() implementations is with one of the modules Tie::Scalar, Tie::Array, Tie::Hash, or Tie::Handle.
The bucket usage information provided by scalar(%hash) is not
available. What this means is that using %tied_hash in boolean
context doesn't work right (currently this always tests false,
regardless of whether the hash is empty or hash elements).
Localizing tied arrays or hashes does not work. After exiting the scope the arrays or the hashes are not restored.
Counting the number of entries in a hash via scalar(keys(%hash))
or scalar(values(%hash)) is inefficient since it needs to iterate
through all the entries with FIRSTKEY/NEXTKEY.
Tied hash/array slices cause multiple FETCH/STORE pairs, there are no tie methods for slice operations.
You cannot easily tie a multilevel data structure (such as a hash of hashes) to a dbm file. The first problem is that all but GDBM and Berkeley DB have size limitations, but beyond that, you also have problems with how references are to be represented on disk. One module that does attempt to address this need is DBM::Deep. Check your nearest CPAN site as described in perlmodlib for source code. Note that despite its name, DBM::Deep does not use dbm. Another earlier attempt at solving the problem is MLDBM, which is also available on the CPAN, but which has some fairly serious limitations.
Tied filehandles are still incomplete. sysopen(), truncate(), flock(), fcntl(), stat() and -X can't currently be trapped.
Tom Christiansen
TIEHANDLE by Sven Verdoolaege <skimo@dns.ufsia.ac.be> and Doug MacEachern <dougm@osf.org>
UNTIE by Nick Ing-Simmons <nick@ing-simmons.net>
SCALAR by Tassilo von Parseval <tassilo.von.parseval@rwth-aachen.de>
Tying Arrays by Casey West <casey@geeknest.com>
perltodo - Perl TO-DO List
We no longer install the Perl 5 to-do list as a manpage, as installing snapshot that becomes increasingly out of date isn't that useful to anyone. The current Perl 5 to-do list is maintained in the git repository, and can be viewed at http://perl5.git.perl.org/perl.git/blob/HEAD:/Porting/todo.pod
perltrap - Perl traps for the unwary
The biggest trap of all is forgetting to use warnings
or use the -w
switch; see perllexwarn and perlrun. The second biggest trap is not
making your entire program runnable under use strict
. The third biggest
trap is not reading the list of changes in this version of Perl; see
perldelta.
Accustomed awk users should take special note of the following:
A Perl program executes only once, not once for each input line. You can
do an implicit loop with -n
or -p
.
The English module, loaded via
- use English;
allows you to refer to special variables (like $/
) with names (like
$RS), as though they were in awk; see perlvar for details.
Semicolons are required after all simple statements in Perl (except at the end of a block). Newline is not a statement delimiter.
Curly brackets are required on if
s and while
s.
Variables begin with "$", "@" or "%" in Perl.
Arrays index from 0. Likewise string positions in substr() and index().
You have to decide whether your array has numeric or string indices.
Hash values do not spring into existence upon mere reference.
You have to decide whether you want to use string or numeric comparisons.
Reading an input line does not split it for you. You get to split it to an array yourself. And the split() operator has different arguments than awk's.
The current input line is normally in $_, not $0. It generally does not have the newline stripped. ($0 is the name of the program executed.) See perlvar.
$<digit> does not refer to fields--it refers to substrings matched by the last match pattern.
The print() statement does not add field and record separators unless
you set $,
and $\
. You can set $OFS and $ORS if you're using
the English module.
You must open your files before you print to them.
The range operator is "..", not comma. The comma operator works as in C.
The match operator is "=~", not "~". ("~" is the one's complement operator, as in C.)
The exponentiation operator is "**", not "^". "^" is the XOR operator, as in C. (You know, one could get the feeling that awk is basically incompatible with C.)
The concatenation operator is ".", not the null string. (Using the
null string would render /pat/ /pat/
unparsable, because the third slash
would be interpreted as a division operator--the tokenizer is in fact
slightly context sensitive for operators like "/", "?", and ">".
And in fact, "." itself can be the beginning of a number.)
The following variables work differently:
- Awk Perl
- ARGC scalar @ARGV (compare with $#ARGV)
- ARGV[0] $0
- FILENAME $ARGV
- FNR $. - something
- FS (whatever you like)
- NF $#Fld, or some such
- NR $.
- OFMT $#
- OFS $,
- ORS $\
- RLENGTH length($&)
- RS $/
- RSTART length($`)
- SUBSEP $;
You cannot set $RS to a pattern, only a string.
When in doubt, run the awk construct through a2p and see what it gives you.
Cerebral C and C++ programmers should take note of the following:
Curly brackets are required on if
's and while
's.
You must use elsif
rather than else if
.
The break
and continue keywords from C become in Perl last
and next, respectively. Unlike in C, these do not work within a
do { } while
construct. See Loop Control in perlsyn.
The switch statement is called given/when and only available in
perl 5.10 or newer. See Switch Statements in perlsyn.
Variables begin with "$", "@" or "%" in Perl.
Comments begin with "#", not "/*" or "//". Perl may interpret C/C++ comments as division operators, unterminated regular expressions or the defined-or operator.
You can't take the address of anything, although a similar operator in Perl is the backslash, which creates a reference.
ARGV
must be capitalized. $ARGV[0]
is C's argv[1]
, and argv[0]
ends up in $0
.
System calls such as link(), unlink(), rename(), etc. return nonzero for success, not 0. (system(), however, returns zero for success.)
Signal handlers deal with signal names, not numbers. Use kill -l
to find their names on your system.
Seasoned sed programmers should take note of the following:
A Perl program executes only once, not once for each input line. You can
do an implicit loop with -n
or -p
.
Backreferences in substitutions use "$" rather than "\".
The pattern matching metacharacters "(", ")", and "|" do not have backslashes in front.
The range operator is ...
, rather than comma.
Sharp shell programmers should take note of the following:
The backtick operator does variable interpolation without regard to the presence of single quotes in the command.
The backtick operator does no translation of the return value, unlike csh.
Shells (especially csh) do several levels of substitution on each command line. Perl does substitution in only certain constructs such as double quotes, backticks, angle brackets, and search patterns.
Shells interpret scripts a little bit at a time. Perl compiles the
entire program before executing it (except for BEGIN
blocks, which
execute at compile time).
The arguments are available via @ARGV, not $1, $2, etc.
The environment is not automatically made available as separate scalar variables.
The shell's test
uses "=", "!=", "<" etc for string comparisons and "-eq",
"-ne", "-lt" etc for numeric comparisons. This is the reverse of Perl, which
uses eq
, ne
, lt
for string comparisons, and ==
, !=
<
etc
for numeric comparisons.
Practicing Perl Programmers should take note of the following:
Remember that many operations behave differently in a list context than they do in a scalar one. See perldata for details.
Avoid barewords if you can, especially all lowercase ones. You can't tell by just looking at it whether a bareword is a function or a string. By using quotes on strings and parentheses on function calls, you won't ever get them confused.
You cannot discern from mere inspection which builtins are unary operators (like chop() and chdir()) and which are list operators (like print() and unlink()). (Unless prototyped, user-defined subroutines can only be list operators, never unary ones.) See perlop and perlsub.
People have a hard time remembering that some functions default to $_, or @ARGV, or whatever, but that others which you might expect to do not.
The <FH> construct is not the name of the filehandle, it is a readline operation on that handle. The data read is assigned to $_ only if the file read is the sole condition in a while loop:
- while (<FH>) { }
- while (defined($_ = <FH>)) { }..
- <FH>; # data discarded!
Remember not to use =
when you need =~
;
these two constructs are quite different:
- $x = /foo/;
- $x =~ /foo/;
The do {}
construct isn't a real loop that you can use
loop control on.
Use my() for local variables whenever you can get away with
it (but see perlform for where you can't).
Using local() actually gives a local value to a global
variable, which leaves you open to unforeseen side-effects
of dynamic scoping.
If you localize an exported variable in a module, its exported value will not change. The local name becomes an alias to a new value but the external name is still an alias for the original.
As always, if any of these are ever officially declared as bugs, they'll be fixed and removed.
perltru64 - Perl version 5 on Tru64 (formerly known as Digital UNIX formerly known as DEC OSF/1) systems
This document describes various features of HP's (formerly Compaq's, formerly Digital's) Unix operating system (Tru64) that will affect how Perl version 5 (hereafter just Perl) is configured, compiled and/or runs.
The recommended compiler to use in Tru64 is the native C compiler. The native compiler produces much faster code (the speed difference is noticeable: several dozen percentages) and also more correct code: if you are considering using the GNU C compiler you should use at the very least the release of 2.95.3 since all older gcc releases are known to produce broken code when compiling Perl. One manifestation of this brokenness is the lib/sdbm test dumping core; another is many of the op/regexp and op/pat, or ext/Storable tests dumping core (the exact pattern of failures depending on the GCC release and optimization flags).
gcc 3.2.1 is known to work okay with Perl 5.8.0. However, when optimizing the toke.c gcc likes to have a lot of memory, 256 megabytes seems to be enough. The default setting of the process data section in Tru64 should be one gigabyte, but some sites/setups might have lowered that. The configuration process of Perl checks for too low process limits, and lowers the optimization for the toke.c if necessary, and also gives advice on how to raise the process limits.
Also, Configure might abort with
- Build a threading Perl? [n]
- Configure[2437]: Syntax error at line 1 : 'config.sh' is not expected.
This indicates that Configure is being run with a broken Korn shell (even though you think you are using a Bourne shell by using "sh Configure" or "./Configure"). The Korn shell bug has been reported to Compaq as of February 1999 but in the meanwhile, the reason ksh is being used is that you have the environment variable BIN_SH set to 'xpg4'. This causes /bin/sh to delegate its duties to /bin/posix/sh (a ksh). Unset the environment variable and rerun Configure.
In Tru64 Perl is automatically able to use large files, that is, files larger than 2 gigabytes, there is no need to use the Configure -Duselargefiles option as described in INSTALL (though using the option is harmless).
If you want to use threads, you should primarily use the Perl 5.8.0 threads model by running Configure with -Duseithreads.
Perl threading is going to work only in Tru64 4.0 and newer releases, older operating releases like 3.2 aren't probably going to work properly with threads.
In Tru64 V5 (at least V5.1A, V5.1B) you cannot build threaded Perl with gcc because the system header <pthread.h> explicitly checks for supported C compilers, gcc (at least 3.2.2) not being one of them. But the system C compiler should work just fine.
You cannot Configure Perl to use long doubles unless you have at least Tru64 V5.0, the long double support simply wasn't functional enough before that. Perl's Configure will override attempts to use the long doubles (you can notice this by Configure finding out that the modfl() function does not work as it should).
At the time of this writing (June 2002), there is a known bug in the
Tru64 libc printing of long doubles when not using "e" notation.
The values are correct and usable, but you only get a limited number
of digits displayed unless you force the issue by using printf
"%.33e",$num
or the like. For Tru64 versions V5.0A through V5.1A, a
patch is expected sometime after perl 5.8.0 is released. If your libc
has not yet been patched, you'll get a warning from Configure when
selecting long doubles.
The DB_File tests (db-btree.t, db-hash.t, db-recno.t) may fail you have installed a newer version of Berkeley DB into the system and the -I and -L compiler and linker flags introduce version conflicts with the DB 1.85 headers and libraries that came with the Tru64. For example, mixing a DB v2 library with the DB v1 headers is a bad idea. Watch out for Configure options -Dlocincpth and -Dloclibpth, and check your /usr/local/include and /usr/local/lib since they are included by default.
The second option is to explicitly instruct Configure to detect the
newer Berkeley DB installation, by supplying the right directories with
-Dlocincpth=/some/include and -Dloclibpth=/some/lib and before
running "make test" setting your LD_LIBRARY_PATH to /some/lib.
The third option is to work around the problem by disabling the DB_File completely when build Perl by specifying -Ui_db to Configure, and then using the BerkeleyDB module from CPAN instead of DB_File. The BerkeleyDB works with Berkeley DB versions 2.* or greater.
The Berkeley DB 4.1.25 has been tested with Tru64 V5.1A and found to work. The latest Berkeley DB can be found from http://www.sleepycat.com.
In Tru64 Perl's integers are automatically 64-bit wide, there is no need to use the Configure -Duse64bitint option as described in INSTALL. Similarly, there is no need for -Duse64bitall since pointers are automatically 64-bit wide.
When compiling Perl in Tru64 you may (depending on the compiler release) see two warnings like this
- cc: Warning: numeric.c, line 104: In this statement, floating-point overflow occurs in evaluating the expression "1.8e308". (floatoverfl)
- return HUGE_VAL;
- -----------^
and when compiling the POSIX extension
- cc: Warning: const-c.inc, line 2007: In this statement, floating-point overflow occurs in evaluating the expression "1.8e308". (floatoverfl)
- return HUGE_VAL;
- -------------------^
The exact line numbers may vary between Perl releases. The warnings are benign and can be ignored: in later C compiler releases the warnings should be gone.
When the file pp_sys.c is being compiled you may (depending on the
operating system release) see an additional compiler flag being used:
-DNO_EFF_ONLY_OK
. This is normal and refers to a feature that is
relevant only if you use the filetest
pragma. In older releases of
the operating system the feature was broken and the NO_EFF_ONLY_OK
instructs Perl not to use the feature.
During "make test" the comp/cpp
will be skipped because on Tru64 it
cannot be tested before Perl has been installed. The test refers to
the use of the -P
option of Perl.
The ext/ODBM_File/odbm is known to fail with static builds (Configure -Uusedl) due to a known bug in Tru64's static libdbm library. The good news is that you very probably don't need to ever use the ODBM_File extension since more advanced NDBM_File works fine, not to mention the even more advanced DB_File.
If you get an error like
- Can't load '.../OSF1/lib/perl5/5.8.0/alpha-dec_osf/auto/IO/IO.so' for module IO: Unresolved symbol in .../lib/perl5/5.8.0/alpha-dec_osf/auto/IO/IO.so: sockatmark at .../lib/perl5/5.8.0/alpha-dec_osf/XSLoader.pm line 75.
you need to either recompile your Perl in Tru64 4.0D or upgrade your Tru64 4.0D to at least 4.0F: the sockatmark() system call was added in Tru64 4.0F, and the IO extension refers that symbol.
Jarkko Hietaniemi <jhi@iki.fi>
perlunicode - Unicode support in Perl
Unicode support is an extensive requirement. While Perl does not implement the Unicode standard or the accompanying technical reports from cover to cover, Perl does support many Unicode features.
People who want to learn to use Unicode in Perl, should probably read the Perl Unicode tutorial, perlunitut and perluniintro, before reading this reference document.
Also, the use of Unicode may present security issues that aren't obvious. Read Unicode Security Considerations.
In order to preserve backward compatibility, Perl does not turn
on full internal Unicode support unless the pragma
use feature 'unicode_strings'
is specified. (This is automatically
selected if you use use 5.012
or higher.) Failure to do this can
trigger unexpected surprises. See The Unicode Bug below.
This pragma doesn't affect I/O. Nor does it change the internal representation of strings, only their interpretation. There are still several places where Unicode isn't fully supported, such as in filenames.
Perl knows when a filehandle uses Perl's internal Unicode encodings (UTF-8, or UTF-EBCDIC if in EBCDIC) if the filehandle is opened with the ":encoding(utf8)" layer. Other encodings can be converted to Perl's encoding on input or from Perl's encoding on output by use of the ":encoding(...)" layer. See open.
To indicate that Perl source itself is in UTF-8, use use utf8;
.
use utf8
still needed to enable UTF-8/UTF-EBCDIC in scripts
As a compatibility measure, the use utf8
pragma must be explicitly
included to enable recognition of UTF-8 in the Perl scripts themselves
(in string or regular expression literals, or in identifier names) on
ASCII-based machines or to recognize UTF-EBCDIC on EBCDIC-based
machines. These are the only times when an explicit use utf8
is needed. See utf8.
If a Perl script begins marked with the Unicode BOM (UTF-16LE, UTF16-BE, or UTF-8), or if the script looks like non-BOM-marked UTF-16 of either endianness, Perl will correctly read in the script as Unicode. (BOMless UTF-8 cannot be effectively recognized or differentiated from ISO 8859-1 or other eight-bit encodings.)
use encoding
needed to upgrade non-Latin-1 byte strings
By default, there is a fundamental asymmetry in Perl's Unicode model: implicit upgrading from byte strings to Unicode strings assumes that they were encoded in ISO 8859-1 (Latin-1), but Unicode strings are downgraded with UTF-8 encoding. This happens because the first 256 codepoints in Unicode happens to agree with Latin-1.
See Byte and Character Semantics for more details.
Perl uses logically-wide characters to represent strings internally.
Starting in Perl 5.14, Perl-level operations work with
characters rather than bytes within the scope of a
use feature 'unicode_strings' (or equivalently
use 5.012
or higher). (This is not true if bytes have been
explicitly requested by use bytes, nor necessarily true
for interactions with the platform's operating system.)
For earlier Perls, and when unicode_strings
is not in effect, Perl
provides a fairly safe environment that can handle both types of
semantics in programs. For operations where Perl can unambiguously
decide that the input data are characters, Perl switches to character
semantics. For operations where this determination cannot be made
without additional information from the user, Perl decides in favor of
compatibility and chooses to use byte semantics.
When use locale
(but not use locale ':not_characters'
) is in
effect, Perl uses the semantics associated with the current locale.
(use locale
overrides use feature 'unicode_strings'
in the same scope;
while use locale ':not_characters'
effectively also selects
use feature 'unicode_strings'
in its scope; see perllocale.)
Otherwise, Perl uses the platform's native
byte semantics for characters whose code points are less than 256, and
Unicode semantics for those greater than 255. That means that non-ASCII
characters are undefined except for their
ordinal numbers. This means that none have case (upper and lower), nor are any
a member of character classes, like [:alpha:] or \w
. (But all do belong
to the \W
class or the Perl regular expression extension [:^alpha:].)
This behavior preserves compatibility with earlier versions of Perl, which allowed byte semantics in Perl operations only if none of the program's inputs were marked as being a source of Unicode character data. Such data may come from filehandles, from calls to external programs, from information provided by the system (such as %ENV), or from literals and constants in the source text.
The utf8
pragma is primarily a compatibility device that enables
recognition of UTF-(8|EBCDIC) in literals encountered by the parser.
Note that this pragma is only required while Perl defaults to byte
semantics; when character semantics become the default, this pragma
may become a no-op. See utf8.
If strings operating under byte semantics and strings with Unicode character data are concatenated, the new string will have character semantics. This can cause surprises: See BUGS, below. You can choose to be warned when this happens. See encoding::warnings.
Under character semantics, many operations that formerly operated on bytes now operate on characters. A character in Perl is logically just a number ranging from 0 to 2**31 or so. Larger characters may encode into longer sequences of bytes internally, but this internal detail is mostly hidden for Perl code. See perluniintro for more.
Character semantics have the following effects:
Strings--including hash keys--and regular expression patterns may contain characters that have an ordinal value larger than 255.
If you use a Unicode editor to edit your program, Unicode characters may
occur directly within the literal strings in UTF-8 encoding, or UTF-16.
(The former requires a BOM or use utf8
, the latter requires a BOM.)
Unicode characters can also be added to a string by using the \N{U+...}
notation. The Unicode code for the desired character, in hexadecimal,
should be placed in the braces, after the U
. For instance, a smiley face is
\N{U+263A}.
Alternatively, you can use the \x{...}
notation for characters 0x100 and
above. For characters below 0x100 you may get byte semantics instead of
character semantics; see The Unicode Bug. On EBCDIC machines there is
the additional problem that the value for such characters gives the EBCDIC
character rather than the Unicode one, thus it is more portable to use
\N{U+...}
instead.
Additionally, you can use the \N{...}
notation and put the official
Unicode character name within the braces, such as
\N{WHITE SMILING FACE}
. This automatically loads the charnames
module with the :full
and :short
options. If you prefer different
options for this module, you can instead, before the \N{...}
,
explicitly load it with your desired options; for example,
- use charnames ':loose';
If an appropriate encoding is specified, identifiers within the Perl script may contain Unicode alphanumeric characters, including ideographs. Perl does not currently attempt to canonicalize variable names.
Regular expressions match characters instead of bytes. "." matches a character instead of a byte.
Bracketed character classes in regular expressions match characters instead of
bytes and match against the character properties specified in the
Unicode properties database. \w
can be used to match a Japanese
ideograph, for instance.
Named Unicode properties, scripts, and block ranges may be used (like bracketed
character classes) by using the \p{}
"matches property" construct and
the \P{}
negation, "doesn't match property".
See Unicode Character Properties for more details.
You can define your own character properties and use them
in the regular expression with the \p{}
or \P{}
construct.
See User-Defined Character Properties for more details.
The special pattern \X
matches a logical character, an "extended grapheme
cluster" in Standardese. In Unicode what appears to the user to be a single
character, for example an accented G
, may in fact be composed of a sequence
of characters, in this case a G
followed by an accent character. \X
will match the entire sequence.
The tr/// operator translates characters instead of bytes. Note
that the tr///CU functionality has been removed. For similar
functionality see pack('U0', ...) and pack('C0', ...).
Case translation operators use the Unicode case translation tables
when character input is provided. Note that uc(), or \U
in
interpolated strings, translates to uppercase, while ucfirst,
or \u
in interpolated strings, translates to titlecase in languages
that make the distinction (which is equivalent to uppercase in languages
without the distinction).
Most operators that deal with positions or lengths in a string will
automatically switch to using character positions, including
chop(), chomp(), substr(), pos(), index(), rindex(),
sprintf(), write(), and length(). An operator that
specifically does not switch is vec(). Operators that really don't
care include operators that treat strings as a bucket of bits such as
sort(), and operators dealing with filenames.
The pack()/unpack() letter C
does not change, since it is often
used for byte-oriented formats. Again, think char
in the C language.
There is a new U
specifier that converts between Unicode characters
and code points. There is also a W
specifier that is the equivalent of
chr/ord and properly handles character values even if they are above 255.
The chr() and ord() functions work on characters, similar to
pack("W") and unpack("W"), not pack("C") and
unpack("C"). pack("C") and unpack("C") are methods for
emulating byte-oriented chr() and ord() on Unicode strings.
While these methods reveal the internal encoding of Unicode strings,
that is not something one normally needs to care about at all.
The bit string operators, & | ^ ~
, can operate on character data.
However, for backward compatibility, such as when using bit string
operations when characters are all less than 256 in ordinal value, one
should not use ~
(the bit complement) with characters of both
values less than 256 and values greater than 256. Most importantly,
DeMorgan's laws (~($x|$y) eq ~$x&~$y
and ~($x&$y) eq ~$x|~$y
)
will not hold. The reason for this mathematical faux pas is that
the complement cannot return both the 8-bit (byte-wide) bit
complement and the full character-wide bit complement.
There is a CPAN module, Unicode::Casing, which allows you to define
your own mappings to be used in lc(), lcfirst(), uc(),
ucfirst(), and fc (or their double-quoted string inlined
versions such as \U
).
(Prior to Perl 5.16, this functionality was partially provided
in the Perl core, but suffered from a number of insurmountable
drawbacks, so the CPAN module was written instead.)
(The only time that Perl considers a sequence of individual code
points as a single logical character is in the \X
construct, already
mentioned above. Therefore "character" in this discussion means a single
Unicode code point.)
Very nearly all Unicode character properties are accessible through
regular expressions by using the \p{}
"matches property" construct
and the \P{}
"doesn't match property" for its negation.
For instance, \p{Uppercase}
matches any single character with the Unicode
"Uppercase" property, while \p{L}
matches any character with a
General_Category of "L" (letter) property. Brackets are not
required for single letter property names, so \p{L}
is equivalent to \pL
.
More formally, \p{Uppercase}
matches any single character whose Unicode
Uppercase property value is True, and \P{Uppercase}
matches any character
whose Uppercase property value is False, and they could have been written as
\p{Uppercase=True}
and \p{Uppercase=False}
, respectively.
This formality is needed when properties are not binary; that is, if they can
take on more values than just True and False. For example, the Bidi_Class (see
Bidirectional Character Types below), can take on several different
values, such as Left, Right, Whitespace, and others. To match these, one needs
to specify both the property name (Bidi_Class), AND the value being
matched against
(Left, Right, etc.). This is done, as in the examples above, by having the
two components separated by an equal sign (or interchangeably, a colon), like
\p{Bidi_Class: Left}
.
All Unicode-defined character properties may be written in these compound forms
of \p{property=value}
or \p{property:value}
, but Perl provides some
additional properties that are written only in the single form, as well as
single-form short-cuts for all binary properties and certain others described
below, in which you may omit the property name and the equals or colon
separator.
Most Unicode character properties have at least two synonyms (or aliases if you
prefer): a short one that is easier to type and a longer one that is more
descriptive and hence easier to understand. Thus the "L" and "Letter" properties
above are equivalent and can be used interchangeably. Likewise,
"Upper" is a synonym for "Uppercase", and we could have written
\p{Uppercase}
equivalently as \p{Upper}
. Also, there are typically
various synonyms for the values the property can be. For binary properties,
"True" has 3 synonyms: "T", "Yes", and "Y"; and "False has correspondingly "F",
"No", and "N". But be careful. A short form of a value for one property may
not mean the same thing as the same short form for another. Thus, for the
General_Category property, "L" means "Letter", but for the Bidi_Class property,
"L" means "Left". A complete list of properties and synonyms is in
perluniprops.
Upper/lower case differences in property names and values are irrelevant;
thus \p{Upper}
means the same thing as \p{upper}
or even \p{UpPeR}
.
Similarly, you can add or subtract underscores anywhere in the middle of a
word, so that these are also equivalent to \p{U_p_p_e_r}
. And white space
is irrelevant adjacent to non-word characters, such as the braces and the equals
or colon separators, so \p{ Upper }
and \p{ Upper_case : Y }
are
equivalent to these as well. In fact, white space and even
hyphens can usually be added or deleted anywhere. So even \p{ Up-per case = Yes}
is
equivalent. All this is called "loose-matching" by Unicode. The few places
where stricter matching is used is in the middle of numbers, and in the Perl
extension properties that begin or end with an underscore. Stricter matching
cares about white space (except adjacent to non-word characters),
hyphens, and non-interior underscores.
You can also use negation in both \p{}
and \P{}
by introducing a caret
(^) between the first brace and the property name: \p{^Tamil}
is
equal to \P{Tamil}
.
Almost all properties are immune to case-insensitive matching. That is,
adding a /i regular expression modifier does not change what they
match. There are two sets that are affected.
The first set is
Uppercase_Letter
,
Lowercase_Letter
,
and Titlecase_Letter
,
all of which match Cased_Letter
under /i matching.
And the second set is
Uppercase
,
Lowercase
,
and Titlecase
,
all of which match Cased
under /i matching.
This set also includes its subsets PosixUpper
and PosixLower
both
of which under /i matching match PosixAlpha
.
(The difference between these sets is that some things, such as Roman
numerals, come in both upper and lower case so they are Cased
, but aren't considered
letters, so they aren't Cased_Letter
s.)
The result is undefined if you try to match a non-Unicode code point (that is, one above 0x10FFFF) against a Unicode property. Currently, a warning is raised, and the match will fail. In some cases, this is counterintuitive, as both these fail:
Every Unicode character is assigned a general category, which is the "most usual categorization of a character" (from http://www.unicode.org/reports/tr44).
The compound way of writing these is like \p{General_Category=Number}
(short, \p{gc:n}
). But Perl furnishes shortcuts in which everything up
through the equal or colon separator is omitted. So you can instead just write
\pN
.
Here are the short and long forms of the General Category properties:
- Short Long
- L Letter
- LC, L& Cased_Letter (that is: [\p{Ll}\p{Lu}\p{Lt}])
- Lu Uppercase_Letter
- Ll Lowercase_Letter
- Lt Titlecase_Letter
- Lm Modifier_Letter
- Lo Other_Letter
- M Mark
- Mn Nonspacing_Mark
- Mc Spacing_Mark
- Me Enclosing_Mark
- N Number
- Nd Decimal_Number (also Digit)
- Nl Letter_Number
- No Other_Number
- P Punctuation (also Punct)
- Pc Connector_Punctuation
- Pd Dash_Punctuation
- Ps Open_Punctuation
- Pe Close_Punctuation
- Pi Initial_Punctuation
- (may behave like Ps or Pe depending on usage)
- Pf Final_Punctuation
- (may behave like Ps or Pe depending on usage)
- Po Other_Punctuation
- S Symbol
- Sm Math_Symbol
- Sc Currency_Symbol
- Sk Modifier_Symbol
- So Other_Symbol
- Z Separator
- Zs Space_Separator
- Zl Line_Separator
- Zp Paragraph_Separator
- C Other
- Cc Control (also Cntrl)
- Cf Format
- Cs Surrogate
- Co Private_Use
- Cn Unassigned
Single-letter properties match all characters in any of the
two-letter sub-properties starting with the same letter.
LC
and L&
are special: both are aliases for the set consisting of everything matched by Ll
, Lu
, and Lt
.
Because scripts differ in their directionality (Hebrew and Arabic are written right to left, for example) Unicode supplies these properties in the Bidi_Class class:
- Property Meaning
- L Left-to-Right
- LRE Left-to-Right Embedding
- LRO Left-to-Right Override
- R Right-to-Left
- AL Arabic Letter
- RLE Right-to-Left Embedding
- RLO Right-to-Left Override
- PDF Pop Directional Format
- EN European Number
- ES European Separator
- ET European Terminator
- AN Arabic Number
- CS Common Separator
- NSM Non-Spacing Mark
- BN Boundary Neutral
- B Paragraph Separator
- S Segment Separator
- WS Whitespace
- ON Other Neutrals
This property is always written in the compound form.
For example, \p{Bidi_Class:R}
matches characters that are normally
written right to left.
The world's languages are written in many different scripts. This sentence (unless you're reading it in translation) is written in Latin, while Russian is written in Cyrillic, and Greek is written in, well, Greek; Japanese mainly in Hiragana or Katakana. There are many more.
The Unicode Script and Script_Extensions properties give what script a
given character is in. Either property can be specified with the
compound form like
\p{Script=Hebrew}
(short: \p{sc=hebr}
), or
\p{Script_Extensions=Javanese}
(short: \p{scx=java}
).
In addition, Perl furnishes shortcuts for all
Script
property names. You can omit everything up through the equals
(or colon), and simply write \p{Latin}
or \P{Cyrillic}
.
(This is not true for Script_Extensions
, which is required to be
written in the compound form.)
The difference between these two properties involves characters that are
used in multiple scripts. For example the digits '0' through '9' are
used in many parts of the world. These are placed in a script named
Common
. Other characters are used in just a few scripts. For
example, the "KATAKANA-HIRAGANA DOUBLE HYPHEN" is used in both Japanese
scripts, Katakana and Hiragana, but nowhere else. The Script
property places all characters that are used in multiple scripts in the
Common
script, while the Script_Extensions
property places those
that are used in only a few scripts into each of those scripts; while
still using Common
for those used in many scripts. Thus both these
match:
- "0" =~ /\p{sc=Common}/ # Matches
- "0" =~ /\p{scx=Common}/ # Matches
and only the first of these match:
- "\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{sc=Common} # Matches
- "\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{scx=Common} # No match
And only the last two of these match:
- "\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{sc=Hiragana} # No match
- "\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{sc=Katakana} # No match
- "\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{scx=Hiragana} # Matches
- "\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{scx=Katakana} # Matches
Script_Extensions
is thus an improved Script
, in which there are
fewer characters in the Common
script, and correspondingly more in
other scripts. It is new in Unicode version 6.0, and its data are likely
to change significantly in later releases, as things get sorted out.
(Actually, besides Common
, the Inherited
script, contains
characters that are used in multiple scripts. These are modifier
characters which modify other characters, and inherit the script value
of the controlling character. Some of these are used in many scripts,
and so go into Inherited
in both Script
and Script_Extensions
.
Others are used in just a few scripts, so are in Inherited
in
Script
, but not in Script_Extensions
.)
It is worth stressing that there are several different sets of digits in
Unicode that are equivalent to 0-9 and are matchable by \d
in a
regular expression. If they are used in a single language only, they
are in that language's Script
and Script_Extension
. If they are
used in more than one script, they will be in sc=Common
, but only
if they are used in many scripts should they be in scx=Common
.
A complete list of scripts and their shortcuts is in perluniprops.
For backward compatibility (with Perl 5.6), all properties mentioned
so far may have Is
or Is_
prepended to their name, so \P{Is_Lu}
, for
example, is equal to \P{Lu}
, and \p{IsScript:Arabic}
is equal to
\p{Arabic}
.
In addition to scripts, Unicode also defines blocks of
characters. The difference between scripts and blocks is that the
concept of scripts is closer to natural languages, while the concept
of blocks is more of an artificial grouping based on groups of Unicode
characters with consecutive ordinal values. For example, the "Basic Latin"
block is all characters whose ordinals are between 0 and 127, inclusive; in
other words, the ASCII characters. The "Latin" script contains some letters
from this as well as several other blocks, like "Latin-1 Supplement",
"Latin Extended-A", etc., but it does not contain all the characters from
those blocks. It does not, for example, contain the digits 0-9, because
those digits are shared across many scripts, and hence are in the
Common
script.
For more about scripts versus blocks, see UAX#24 "Unicode Script Property": http://www.unicode.org/reports/tr24
The Script
or Script_Extensions
properties are likely to be the
ones you want to use when processing
natural language; the Block property may occasionally be useful in working
with the nuts and bolts of Unicode.
Block names are matched in the compound form, like \p{Block: Arrows}
or
\p{Blk=Hebrew}
. Unlike most other properties, only a few block names have a
Unicode-defined short name. But Perl does provide a (slight) shortcut: You
can say, for example \p{In_Arrows}
or \p{In_Hebrew}
. For backwards
compatibility, the In
prefix may be omitted if there is no naming conflict
with a script or any other property, and you can even use an Is
prefix
instead in those cases. But it is not a good idea to do this, for a couple
reasons:
It is confusing. There are many naming conflicts, and you may forget some.
For example, \p{Hebrew}
means the script Hebrew, and NOT the block
Hebrew. But would you remember that 6 months from now?
It is unstable. A new version of Unicode may pre-empt the current meaning by
creating a property with the same name. There was a time in very early Unicode
releases when \p{Hebrew}
would have matched the block Hebrew; now it
doesn't.
Some people prefer to always use \p{Block: foo}
and \p{Script: bar}
instead of the shortcuts, whether for clarity, because they can't remember the
difference between 'In' and 'Is' anyway, or they aren't confident that those who
eventually will read their code will know that difference.
A complete list of blocks and their shortcuts is in perluniprops.
There are many more properties than the very basic ones described here. A complete list is in perluniprops.
Unicode defines all its properties in the compound form, so all single-form properties are Perl extensions. Most of these are just synonyms for the Unicode ones, but some are genuine extensions, including several that are in the compound form. And quite a few of these are actually recommended by Unicode (in http://www.unicode.org/reports/tr18).
This section gives some details on all extensions that aren't just synonyms for compound-form Unicode properties (for those properties, you'll have to refer to the Unicode Standard.
\p{All}
This matches any of the 1_114_112 Unicode code points. It is a synonym for
\p{Any}
.
\p{Alnum}
This matches any \p{Alphabetic}
or \p{Decimal_Number}
character.
\p{Any}
This matches any of the 1_114_112 Unicode code points. It is a synonym for
\p{All}
.
\p{ASCII}
This matches any of the 128 characters in the US-ASCII character set, which is a subset of Unicode.
\p{Assigned}
This matches any assigned code point; that is, any code point whose general category is not Unassigned (or equivalently, not Cn).
\p{Blank}
This is the same as \h
and \p{HorizSpace}
: A character that changes the
spacing horizontally.
\p{Decomposition_Type: Non_Canonical}
(Short: \p{Dt=NonCanon}
)
Matches a character that has a non-canonical decomposition.
To understand the use of this rarely used property=value combination, it is
necessary to know some basics about decomposition.
Consider a character, say H. It could appear with various marks around it,
such as an acute accent, or a circumflex, or various hooks, circles, arrows,
etc., above, below, to one side or the other, etc. There are many
possibilities among the world's languages. The number of combinations is
astronomical, and if there were a character for each combination, it would
soon exhaust Unicode's more than a million possible characters. So Unicode
took a different approach: there is a character for the base H, and a
character for each of the possible marks, and these can be variously combined
to get a final logical character. So a logical character--what appears to be a
single character--can be a sequence of more than one individual characters.
This is called an "extended grapheme cluster"; Perl furnishes the \X
regular expression construct to match such sequences.
But Unicode's intent is to unify the existing character set standards and practices, and several pre-existing standards have single characters that mean the same thing as some of these combinations. An example is ISO-8859-1, which has quite a few of these in the Latin-1 range, an example being "LATIN CAPITAL LETTER E WITH ACUTE". Because this character was in this pre-existing standard, Unicode added it to its repertoire. But this character is considered by Unicode to be equivalent to the sequence consisting of the character "LATIN CAPITAL LETTER E" followed by the character "COMBINING ACUTE ACCENT".
"LATIN CAPITAL LETTER E WITH ACUTE" is called a "pre-composed" character, and its equivalence with the sequence is called canonical equivalence. All pre-composed characters are said to have a decomposition (into the equivalent sequence), and the decomposition type is also called canonical.
However, many more characters have a different type of decomposition, a "compatible" or "non-canonical" decomposition. The sequences that form these decompositions are not considered canonically equivalent to the pre-composed character. An example, again in the Latin-1 range, is the "SUPERSCRIPT ONE". It is somewhat like a regular digit 1, but not exactly; its decomposition into the digit 1 is called a "compatible" decomposition, specifically a "super" decomposition. There are several such compatibility decompositions (see http://www.unicode.org/reports/tr44), including one called "compat", which means some miscellaneous type of decomposition that doesn't fit into the decomposition categories that Unicode has chosen.
Note that most Unicode characters don't have a decomposition, so their decomposition type is "None".
For your convenience, Perl has added the Non_Canonical
decomposition
type to mean any of the several compatibility decompositions.
\p{Graph}
Matches any character that is graphic. Theoretically, this means a character that on a printer would cause ink to be used.
\p{HorizSpace}
This is the same as \h
and \p{Blank}
: a character that changes the
spacing horizontally.
\p{In=*}
This is a synonym for \p{Present_In=*}
\p{PerlSpace}
This is the same as \s, restricted to ASCII, namely [ \f\n\r\t]
and starting in Perl v5.18, experimentally, a vertical tab.
Mnemonic: Perl's (original) space
\p{PerlWord}
This is the same as \w
, restricted to ASCII, namely [A-Za-z0-9_]
Mnemonic: Perl's (original) word.
\p{Posix...}
There are several of these, which are equivalents using the \p
notation for Posix classes and are described in
POSIX Character Classes in perlrecharclass.
\p{Present_In: *} (Short: \p{In=*})
This property is used when you need to know in what Unicode version(s) a character is.
The "*" above stands for some two digit Unicode version number, such as
1.1
or 4.0
; or the "*" can also be Unassigned
. This property will
match the code points whose final disposition has been settled as of the
Unicode release given by the version number; \p{Present_In: Unassigned}
will match those code points whose meaning has yet to be assigned.
For example, U+0041
"LATIN CAPITAL LETTER A" was present in the very first
Unicode release available, which is 1.1
, so this property is true for all
valid "*" versions. On the other hand, U+1EFF was not assigned until version
5.1 when it became "LATIN SMALL LETTER Y WITH LOOP", so the only "*" that
would match it are 5.1, 5.2, and later.
Unicode furnishes the Age
property from which this is derived. The problem
with Age is that a strict interpretation of it (which Perl takes) has it
matching the precise release a code point's meaning is introduced in. Thus
U+0041
would match only 1.1; and U+1EFF only 5.1. This is not usually what
you want.
Some non-Perl implementations of the Age property may change its meaning to be the same as the Perl Present_In property; just be aware of that.
Another confusion with both these properties is that the definition is not
that the code point has been assigned, but that the meaning of the code point
has been determined. This is because 66 code points will always be
unassigned, and so the Age for them is the Unicode version in which the decision
to make them so was made. For example, U+FDD0
is to be permanently
unassigned to a character, and the decision to do that was made in version 3.1,
so \p{Age=3.1}
matches this character, as also does \p{Present_In: 3.1}
and up.
\p{Print}
This matches any character that is graphical or blank, except controls.
\p{SpacePerl}
This is the same as \s, including beyond ASCII.
Mnemonic: Space, as modified by Perl. (It doesn't include the vertical tab which both the Posix standard and Unicode consider white space.)
\p{Title}
and \p{Titlecase}
Under case-sensitive matching, these both match the same code points as
\p{General Category=Titlecase_Letter}
(\p{gc=lt}
). The difference
is that under /i caseless matching, these match the same as
\p{Cased}
, whereas \p{gc=lt}
matches \p{Cased_Letter
).
\p{VertSpace}
This is the same as \v
: A character that changes the spacing vertically.
\p{Word}
This is the same as \w
, including over 100_000 characters beyond ASCII.
\p{XPosix...}
There are several of these, which are the standard Posix classes extended to the full Unicode range. They are described in POSIX Character Classes in perlrecharclass.
You can define your own binary character properties by defining subroutines
whose names begin with "In" or "Is". (The experimental feature
(?[ ]) in perlre provides an alternative which allows more complex
definitions.) The subroutines can be defined in any
package. The user-defined properties can be used in the regular expression
\p
and \P
constructs; if you are using a user-defined property from a
package other than the one you are in, you must specify its package in the
\p
or \P
construct.
Note that the effect is compile-time and immutable once defined. However, the subroutines are passed a single parameter, which is 0 if case-sensitive matching is in effect and non-zero if caseless matching is in effect. The subroutine may return different values depending on the value of the flag, and one set of values will immutably be in effect for all case-sensitive matches, and the other set for all case-insensitive matches.
Note that if the regular expression is tainted, then Perl will die rather than calling the subroutine, where the name of the subroutine is determined by the tainted data.
The subroutines must return a specially-formatted string, with one or more newline-separated lines. Each line must be one of the following:
A single hexadecimal number denoting a Unicode code point to include.
Two hexadecimal numbers separated by horizontal whitespace (space or tabular characters) denoting a range of Unicode code points to include.
Something to include, prefixed by "+": a built-in character property (prefixed by "utf8::") or a fully qualified (including package name) user-defined character property, to represent all the characters in that property; two hexadecimal code points for a range; or a single hexadecimal code point.
Something to exclude, prefixed by "-": an existing character property (prefixed by "utf8::") or a fully qualified (including package name) user-defined character property, to represent all the characters in that property; two hexadecimal code points for a range; or a single hexadecimal code point.
Something to negate, prefixed "!": an existing character property (prefixed by "utf8::") or a fully qualified (including package name) user-defined character property, to represent all the characters in that property; two hexadecimal code points for a range; or a single hexadecimal code point.
Something to intersect with, prefixed by "&": an existing character property (prefixed by "utf8::") or a fully qualified (including package name) user-defined character property, for all the characters except the characters in the property; two hexadecimal code points for a range; or a single hexadecimal code point.
For example, to define a property that covers both the Japanese syllabaries (hiragana and katakana), you can define
- sub InKana {
- return <<END;
- 3040\t309F
- 30A0\t30FF
- END
- }
Imagine that the here-doc end marker is at the beginning of the line.
Now you can use \p{InKana}
and \P{InKana}
.
You could also have used the existing block property names:
- sub InKana {
- return <<'END';
- +utf8::InHiragana
- +utf8::InKatakana
- END
- }
Suppose you wanted to match only the allocated characters, not the raw block ranges: in other words, you want to remove the non-characters:
- sub InKana {
- return <<'END';
- +utf8::InHiragana
- +utf8::InKatakana
- -utf8::IsCn
- END
- }
The negation is useful for defining (surprise!) negated classes.
- sub InNotKana {
- return <<'END';
- !utf8::InHiragana
- -utf8::InKatakana
- +utf8::IsCn
- END
- }
This will match all non-Unicode code points, since every one of them is not in Kana. You can use intersection to exclude these, if desired, as this modified example shows:
- sub InNotKana {
- return <<'END';
- !utf8::InHiragana
- -utf8::InKatakana
- +utf8::IsCn
- &utf8::Any
- END
- }
&utf8::Any
must be the last line in the definition.
Intersection is used generally for getting the common characters matched by two (or more) classes. It's important to remember not to use "&" for the first set; that would be intersecting with nothing, resulting in an empty set.
(Note that official Unicode properties differ from these in that they automatically exclude non-Unicode code points and a warning is raised if a match is attempted on one of those.)
This feature has been removed as of Perl 5.16. The CPAN module Unicode::Casing provides better functionality without the drawbacks that this feature had. If you are using a Perl earlier than 5.16, this feature was most fully documented in the 5.14 version of this pod: http://perldoc.perl.org/5.14.0/perlunicode.html#User-Defined-Case-Mappings-%28for-serious-hackers-only%29
See Encode.
The following list of Unicode supported features for regular expressions describes all features currently directly supported by core Perl. The references to "Level N" and the section numbers refer to the Unicode Technical Standard #18, "Unicode Regular Expressions", version 13, from August 2008.
Level 1 - Basic Unicode Support
- RL1.1 Hex Notation - done [1]
- RL1.2 Properties - done [2][3]
- RL1.2a Compatibility Properties - done [4]
- RL1.3 Subtraction and Intersection - experimental [5]
- RL1.4 Simple Word Boundaries - done [6]
- RL1.5 Simple Loose Matches - done [7]
- RL1.6 Line Boundaries - MISSING [8][9]
- RL1.7 Supplementary Code Points - done [10]
\x{...}
\p{...} \P{...}
supports not only minimal list, but all Unicode character properties (see Unicode Character Properties above)
\d \D \s \S \w \W \X [:prop:] [:^prop:]
The experimental feature in v5.18 "(?[...])" accomplishes this. See (?[ ]) in perlre. If you don't want to use an experimental feature, you can use one of the following:
You can mimic class subtraction using lookahead. For example, what UTS#18 might write as
- [{Block=Greek}-[{UNASSIGNED}]]
in Perl can be written as:
- (?!\p{Unassigned})\p{Block=Greek}
- (?=\p{Assigned})\p{Block=Greek}
But in this particular example, you probably really want
- \p{Greek}
which will match assigned characters known to be part of the Greek script.
It does implement the full UTS#18 grouping, intersection, union, and removal (subtraction) syntax.
'+' for union, '-' for removal (set-difference), '&' for intersection
\b \B
Note that Perl does Full case-folding in matching (but with bugs), not Simple: for example U+1F88 is equivalent to U+1F00 U+03B9, instead of just U+1F80. This difference matters mainly for certain Greek capital letters with certain modifiers: the Full case-folding decomposes the letter, while the Simple case-folding would map it to a single character.
Should do ^ and $ also on U+000B (\v in C), FF (\f), CR (\r), CRLF
(\r\n), NEL (U+0085), LS (U+2028), and PS (U+2029); should also affect
<>, $., and script line numbers; should not split lines within CRLF
(i.e. there is no empty line between \r and \n). For CRLF, try the
:crlf
layer (see PerlIO).
Linebreaking conformant with UAX#14 "Unicode Line Breaking Algorithm" is available through the Unicode::LineBreaking module.
UTF-8/UTF-EBDDIC used in Perl allows not only U+10000 to U+10FFFF but also beyond U+10FFFF
Level 2 - Extended Unicode Support
- RL2.1 Canonical Equivalents - MISSING [10][11]
- RL2.2 Default Grapheme Clusters - MISSING [12]
- RL2.3 Default Word Boundaries - MISSING [14]
- RL2.4 Default Loose Matches - MISSING [15]
- RL2.5 Name Properties - DONE
- RL2.6 Wildcard Properties - MISSING
- [10] see UAX#15 "Unicode Normalization Forms"
- [11] have Unicode::Normalize but not integrated to regexes
- [12] have \X but we don't have a "Grapheme Cluster Mode"
- [14] see UAX#29, Word Boundaries
- [15] This is covered in Chapter 3.13 (in Unicode 6.0)
Level 3 - Tailored Support
- RL3.1 Tailored Punctuation - MISSING
- RL3.2 Tailored Grapheme Clusters - MISSING [17][18]
- RL3.3 Tailored Word Boundaries - MISSING
- RL3.4 Tailored Loose Matches - MISSING
- RL3.5 Tailored Ranges - MISSING
- RL3.6 Context Matching - MISSING [19]
- RL3.7 Incremental Matches - MISSING
- ( RL3.8 Unicode Set Sharing )
- RL3.9 Possible Match Sets - MISSING
- RL3.10 Folded Matching - MISSING [20]
- RL3.11 Submatchers - MISSING
- [17] see UAX#10 "Unicode Collation Algorithms"
- [18] have Unicode::Collate but not integrated to regexes
- [19] have (?<=x) and (?=x), but look-aheads or look-behinds
- should see outside of the target substring
- [20] need insensitive matching for linguistic features other
- than case; for example, hiragana to katakana, wide and
- narrow, simplified Han to traditional Han (see UTR#30
- "Character Foldings")
Unicode characters are assigned to code points, which are abstract numbers. To use these numbers, various encodings are needed.
UTF-8
UTF-8 is a variable-length (1 to 4 bytes), byte-order independent encoding. For ASCII (and we really do mean 7-bit ASCII, not another 8-bit encoding), UTF-8 is transparent.
The following table is from Unicode 3.2.
- Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
- U+0000..U+007F 00..7F
- U+0080..U+07FF * C2..DF 80..BF
- U+0800..U+0FFF E0 * A0..BF 80..BF
- U+1000..U+CFFF E1..EC 80..BF 80..BF
- U+D000..U+D7FF ED 80..9F 80..BF
- U+D800..U+DFFF +++++ utf16 surrogates, not legal utf8 +++++
- U+E000..U+FFFF EE..EF 80..BF 80..BF
- U+10000..U+3FFFF F0 * 90..BF 80..BF 80..BF
- U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF
- U+100000..U+10FFFF F4 80..8F 80..BF 80..BF
Note the gaps marked by "*" before several of the byte entries above. These are caused by legal UTF-8 avoiding non-shortest encodings: it is technically possible to UTF-8-encode a single code point in different ways, but that is explicitly forbidden, and the shortest possible encoding should always be used (and that is what Perl does).
Another way to look at it is via bits:
- Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
- 0aaaaaaa 0aaaaaaa
- 00000bbbbbaaaaaa 110bbbbb 10aaaaaa
- ccccbbbbbbaaaaaa 1110cccc 10bbbbbb 10aaaaaa
- 00000dddccccccbbbbbbaaaaaa 11110ddd 10cccccc 10bbbbbb 10aaaaaa
As you can see, the continuation bytes all begin with "10", and the leading bits of the start byte tell how many bytes there are in the encoded character.
The original UTF-8 specification allowed up to 6 bytes, to allow encoding of numbers up to 0x7FFF_FFFF. Perl continues to allow those, and has extended that up to 13 bytes to encode code points up to what can fit in a 64-bit word. However, Perl will warn if you output any of these as being non-portable; and under strict UTF-8 input protocols, they are forbidden.
The Unicode non-character code points are also disallowed in UTF-8 in "open interchange". See Non-character code points.
UTF-EBCDIC
Like UTF-8 but EBCDIC-safe, in the way that UTF-8 is ASCII-safe.
UTF-16, UTF-16BE, UTF-16LE, Surrogates, and BOMs (Byte Order Marks)
The followings items are mostly for reference and general Unicode knowledge, Perl doesn't use these constructs internally.
Like UTF-8, UTF-16 is a variable-width encoding, but where
UTF-8 uses 8-bit code units, UTF-16 uses 16-bit code units.
All code points occupy either 2 or 4 bytes in UTF-16: code points
U+0000..U+FFFF
are stored in a single 16-bit unit, and code
points U+10000..U+10FFFF in two 16-bit units. The latter case is
using surrogates, the first 16-bit unit being the high
surrogate, and the second being the low surrogate.
Surrogates are code points set aside to encode the U+10000..U+10FFFF
range of Unicode code points in pairs of 16-bit units. The high
surrogates are the range U+D800..U+DBFF
and the low surrogates
are the range U+DC00..U+DFFF
. The surrogate encoding is
- $hi = ($uni - 0x10000) / 0x400 + 0xD800;
- $lo = ($uni - 0x10000) % 0x400 + 0xDC00;
and the decoding is
- $uni = 0x10000 + ($hi - 0xD800) * 0x400 + ($lo - 0xDC00);
Because of the 16-bitness, UTF-16 is byte-order dependent. UTF-16 itself can be used for in-memory computations, but if storage or transfer is required either UTF-16BE (big-endian) or UTF-16LE (little-endian) encodings must be chosen.
This introduces another problem: what if you just know that your data
is UTF-16, but you don't know which endianness? Byte Order Marks, or
BOMs, are a solution to this. A special character has been reserved
in Unicode to function as a byte order marker: the character with the
code point U+FEFF
is the BOM.
The trick is that if you read a BOM, you will know the byte order,
since if it was written on a big-endian platform, you will read the
bytes 0xFE 0xFF, but if it was written on a little-endian platform,
you will read the bytes 0xFF 0xFE. (And if the originating platform
was writing in UTF-8, you will read the bytes 0xEF 0xBB 0xBF.)
The way this trick works is that the character with the code point
U+FFFE
is not supposed to be in input streams, so the
sequence of bytes 0xFF 0xFE is unambiguously "BOM, represented in
little-endian format" and cannot be U+FFFE
, represented in big-endian
format".
Surrogates have no meaning in Unicode outside their use in pairs to
represent other code points. However, Perl allows them to be
represented individually internally, for example by saying
chr(0xD801), so that all code points, not just those valid for open
interchange, are
representable. Unicode does define semantics for them, such as their
General Category is "Cs". But because their use is somewhat dangerous,
Perl will warn (using the warning category "surrogate", which is a
sub-category of "utf8") if an attempt is made
to do things like take the lower case of one, or match
case-insensitively, or to output them. (But don't try this on Perls
before 5.14.)
UTF-32, UTF-32BE, UTF-32LE
The UTF-32 family is pretty much like the UTF-16 family, expect that
the units are 32-bit, and therefore the surrogate scheme is not
needed. UTF-32 is a fixed-width encoding. The BOM signatures are
0x00 0x00 0xFE 0xFF for BE and 0xFF 0xFE 0x00 0x00 for LE.
UCS-2, UCS-4
Legacy, fixed-width encodings defined by the ISO 10646 standard. UCS-2 is a 16-bit
encoding. Unlike UTF-16, UCS-2 is not extensible beyond U+FFFF
,
because it does not use surrogates. UCS-4 is a 32-bit encoding,
functionally identical to UTF-32 (the difference being that
UCS-4 forbids neither surrogates nor code points larger than 0x10_FFFF).
UTF-7
A seven-bit safe (non-eight-bit) encoding, which is useful if the transport or storage is not eight-bit safe. Defined by RFC 2152.
66 code points are set aside in Unicode as "non-character code points".
These all have the Unassigned (Cn) General Category, and they never will
be assigned. These are never supposed to be in legal Unicode input
streams, so that code can use them as sentinels that can be mixed in
with character data, and they always will be distinguishable from that data.
To keep them out of Perl input streams, strict UTF-8 should be
specified, such as by using the layer :encoding('UTF-8')
. The
non-character code points are the 32 between U+FDD0 and U+FDEF, and the
34 code points U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, ... U+10FFFE, U+10FFFF.
Some people are under the mistaken impression that these are "illegal",
but that is not true. An application or cooperating set of applications
can legally use them at will internally; but these code points are
"illegal for open interchange". Therefore, Perl will not accept these
from input streams unless lax rules are being used, and will warn
(using the warning category "nonchar", which is a sub-category of "utf8") if
an attempt is made to output them.
The maximum Unicode code point is U+10FFFF. But Perl accepts code
points up to the maximum permissible unsigned number available on the
platform. However, Perl will not accept these from input streams unless
lax rules are being used, and will warn (using the warning category
"non_unicode", which is a sub-category of "utf8") if an attempt is made to
operate on or output them. For example, uc(0x11_0000) will generate
this warning, returning the input parameter as its result, as the upper
case of every non-Unicode code point is the code point itself.
Read Unicode Security Considerations. Also, note the following:
Malformed UTF-8
Unfortunately, the original specification of UTF-8 leaves some room for interpretation of how many bytes of encoded output one should generate from one input Unicode character. Strictly speaking, the shortest possible sequence of UTF-8 bytes should be generated, because otherwise there is potential for an input buffer overflow at the receiving end of a UTF-8 connection. Perl always generates the shortest length UTF-8, and with warnings on, Perl will warn about non-shortest length UTF-8 along with other malformations, such as the surrogates, which are not Unicode code points valid for interchange.
Regular expression pattern matching may surprise you if you're not accustomed to Unicode. Starting in Perl 5.14, several pattern modifiers are available to control this, called the character set modifiers. Details are given in Character set modifiers in perlre.
As discussed elsewhere, Perl has one foot (two hooves?) planted in
each of two worlds: the old world of bytes and the new world of
characters, upgrading from bytes to characters when necessary.
If your legacy code does not explicitly use Unicode, no automatic
switch-over to characters should happen. Characters shouldn't get
downgraded to bytes, either. It is possible to accidentally mix bytes
and characters, however (see perluniintro), in which case \w
in
regular expressions might start behaving differently (unless the /a
modifier is in effect). Review your code. Use warnings and the strict
pragma.
The way Unicode is handled on EBCDIC platforms is still
experimental. On such platforms, references to UTF-8 encoding in this
document and elsewhere should be read as meaning the UTF-EBCDIC
specified in Unicode Technical Report 16, unless ASCII vs. EBCDIC issues
are specifically discussed. There is no utfebcdic
pragma or
":utfebcdic" layer; rather, "utf8" and ":utf8" are reused to mean
the platform's "natural" 8-bit encoding of Unicode. See perlebcdic
for more discussion of the issues.
See Unicode and UTF-8 in perllocale
While Perl does have extensive ways to input and output in Unicode, and a few other "entry points" like the @ARGV array (which can sometimes be interpreted as UTF-8), there are still many places where Unicode (in some encoding or another) could be given as arguments or received as results, or both, but it is not.
The following are such interfaces. Also, see The Unicode Bug.
For all of these interfaces Perl
currently (as of v5.16.0) simply assumes byte strings both as arguments
and results, or UTF-8 strings if the (problematic) encoding
pragma has been used.
One reason that Perl does not attempt to resolve the role of Unicode in
these situations is that the answers are highly dependent on the operating
system and the file system(s). For example, whether filenames can be
in Unicode and in exactly what kind of encoding, is not exactly a
portable concept. Similarly for qx and system: how well will the
"command-line interface" (and which of them?) handle Unicode?
chdir, chmod, chown, chroot, exec, link, lstat, mkdir, rename, rmdir, stat, symlink, truncate, unlink, utime, -X
%ENV
glob (aka the <*>)
open, opendir, sysopen
qx (aka the backtick operator), system
readdir, readlink
The term, "Unicode bug" has been applied to an inconsistency
on ASCII platforms with the
Unicode code points in the Latin-1 Supplement block, that
is, between 128 and 255. Without a locale specified, unlike all other
characters or code points, these characters have very different semantics in
byte semantics versus character semantics, unless
use feature 'unicode_strings'
is specified, directly or indirectly.
(It is indirectly specified by a use v5.12
or higher.)
In character semantics these upper-Latin1 characters are interpreted as Unicode code points, which means they have the same semantics as Latin-1 (ISO-8859-1).
In byte semantics (without unicode_strings
), they are considered to
be unassigned characters, meaning that the only semantics they have is
their ordinal numbers, and that they are
not members of various character classes. None are considered to match \w
for example, but all match \W
.
Perl 5.12.0 added unicode_strings
to force character semantics on
these code points in some circumstances, which fixed portions of the
bug; Perl 5.14.0 fixed almost all of it; and Perl 5.16.0 fixed the
remainder (so far as we know, anyway). The lesson here is to enable
unicode_strings
to avoid the headaches described below.
The old, problematic behavior affects these areas:
Changing the case of a scalar, that is, using uc(), ucfirst(), lc(),
and lcfirst(), or \L
, \U
, \u
and \l
in double-quotish
contexts, such as regular expression substitutions.
Under unicode_strings
starting in Perl 5.12.0, character semantics are
generally used. See lc for details on how this works
in combination with various other pragmas.
Using caseless (/i) regular expression matching.
Starting in Perl 5.14.0, regular expressions compiled within
the scope of unicode_strings
use character semantics
even when executed or compiled into larger
regular expressions outside the scope.
Matching any of several properties in regular expressions, namely \b
,
\B
, \s, \S
, \w
, \W
, and all the Posix character classes
except [[:ascii:]]
.
Starting in Perl 5.14.0, regular expressions compiled within
the scope of unicode_strings
use character semantics
even when executed or compiled into larger
regular expressions outside the scope.
In quotemeta or its inline equivalent \Q
, no code points above 127
are quoted in UTF-8 encoded strings, but in byte encoded strings, code
points between 128-255 are always quoted.
Starting in Perl 5.16.0, consistent quoting rules are used within the
scope of unicode_strings
, as described in quotemeta.
This behavior can lead to unexpected results in which a string's semantics suddenly change if a code point above 255 is appended to or removed from it, which changes the string's semantics from byte to character or vice versa. As an example, consider the following program and its output:
- $ perl -le'
- no feature 'unicode_strings';
- $s1 = "\xC2";
- $s2 = "\x{2660}";
- for ($s1, $s2, $s1.$s2) {
- print /\w/ || 0;
- }
- '
- 0
- 0
- 1
If there's no \w
in s1
or in s2
, why does their concatenation have one?
This anomaly stems from Perl's attempt to not disturb older programs that didn't use Unicode, and hence had no semantics for characters outside of the ASCII range (except in a locale), along with Perl's desire to add Unicode support seamlessly. The result wasn't seamless: these characters were orphaned.
For Perls earlier than those described above, or when a string is passed
to a function outside the subpragma's scope, a workaround is to always
call utf8::upgrade($string)
,
or to use the standard module Encode. Also, a scalar that has any characters
whose ordinal is above 0x100, or which were specified using either of the
\N{...}
notations, will automatically have character semantics.
Sometimes (see When Unicode Does Not Happen or The Unicode Bug) there are situations where you simply need to force a byte string into UTF-8, or vice versa. The low-level calls utf8::upgrade($bytestring) and utf8::downgrade($utf8string[, FAIL_OK]) are the answers.
Note that utf8::downgrade() can fail if the string contains characters that don't fit into a byte.
Calling either function on a string that already is in the desired state is a no-op.
If you want to handle Perl Unicode in XS extensions, you may find the following C APIs useful. See also Unicode Support in perlguts for an explanation about Unicode at the XS level, and perlapi for the API details.
DO_UTF8(sv)
returns true if the UTF8
flag is on and the bytes
pragma is not in effect. SvUTF8(sv)
returns true if the UTF8
flag is on; the bytes pragma is ignored. The UTF8
flag being on
does not mean that there are any characters of code points greater
than 255 (or 127) in the scalar or that there are even any characters
in the scalar. What the UTF8
flag means is that the sequence of
octets in the representation of the scalar is the sequence of UTF-8
encoded code points of the characters of a string. The UTF8
flag
being off means that each octet in this representation encodes a
single character with code point 0..255 within the string. Perl's
Unicode model is not to use UTF-8 until it is absolutely necessary.
uvchr_to_utf8(buf, chr)
writes a Unicode character code point into
a buffer encoding the code point as UTF-8, and returns a pointer
pointing after the UTF-8 bytes. It works appropriately on EBCDIC machines.
utf8_to_uvchr_buf(buf, bufend, lenp)
reads UTF-8 encoded bytes from a
buffer and
returns the Unicode character code point and, optionally, the length of
the UTF-8 byte sequence. It works appropriately on EBCDIC machines.
utf8_length(start, end)
returns the length of the UTF-8 encoded buffer
in characters. sv_len_utf8(sv)
returns the length of the UTF-8 encoded
scalar.
sv_utf8_upgrade(sv)
converts the string of the scalar to its UTF-8
encoded form. sv_utf8_downgrade(sv)
does the opposite, if
possible. sv_utf8_encode(sv)
is like sv_utf8_upgrade except that
it does not set the UTF8
flag. sv_utf8_decode()
does the
opposite of sv_utf8_encode()
. Note that none of these are to be
used as general-purpose encoding or decoding interfaces: use Encode
for that. sv_utf8_upgrade()
is affected by the encoding pragma
but sv_utf8_downgrade()
is not (since the encoding pragma is
designed to be a one-way street).
is_utf8_string(buf, len)
returns true if len
bytes of the buffer
are valid UTF-8.
is_utf8_char_buf(buf, buf_end)
returns true if the pointer points to
a valid UTF-8 character.
UTF8SKIP(buf)
will return the number of bytes in the UTF-8 encoded
character in the buffer. UNISKIP(chr)
will return the number of bytes
required to UTF-8-encode the Unicode character code point. UTF8SKIP()
is useful for example for iterating over the characters of a UTF-8
encoded buffer; UNISKIP()
is useful, for example, in computing
the size required for a UTF-8 encoded buffer.
utf8_distance(a, b)
will tell the distance in characters between the
two pointers pointing to the same UTF-8 encoded buffer.
utf8_hop(s, off) will return a pointer to a UTF-8 encoded buffer
that is off
(positive or negative) Unicode characters displaced
from the UTF-8 buffer s. Be careful not to overstep the buffer:
utf8_hop()
will merrily run off the end or the beginning of the
buffer if told to do so.
pv_uni_display(dsv, spv, len, pvlim, flags)
and
sv_uni_display(dsv, ssv, pvlim, flags)
are useful for debugging the
output of Unicode strings and scalars. By default they are useful
only for debugging--they display all characters as hexadecimal code
points--but with the flags UNI_DISPLAY_ISPRINT
,
UNI_DISPLAY_BACKSLASH
, and UNI_DISPLAY_QQ
you can make the
output more readable.
foldEQ_utf8(s1, pe1, l1, u1, s2, pe2, l2, u2)
can be used to
compare two strings case-insensitively in Unicode. For case-sensitive
comparisons you can just use memEQ()
and memNE()
as usual, except
if one string is in utf8 and the other isn't.
For more information, see perlapi, and utf8.c and utf8.h in the Perl source code distribution.
Perl by default comes with the latest supported Unicode version built in, but you can change to use any earlier one.
Download the files in the desired version of Unicode from the Unicode web site http://www.unicode.org). These should replace the existing files in lib/unicore in the Perl source tree. Follow the instructions in README.perl in that directory to change some of their names, and then build perl (see INSTALL).
See Unicode and UTF-8 in perllocale
See The Unicode Bug
When Perl exchanges data with an extension, the extension should be able to understand the UTF8 flag and act accordingly. If the extension doesn't recognize that flag, it's likely that the extension will return incorrectly-flagged data.
So if you're working with Unicode data, consult the documentation of every module you're using if there are any issues with Unicode data exchange. If the documentation does not talk about Unicode at all, suspect the worst and probably look at the source to learn how the module is implemented. Modules written completely in Perl shouldn't cause problems. Modules that directly or indirectly access code written in other programming languages are at risk.
For affected functions, the simple strategy to avoid data corruption is to always make the encoding of the exchanged data explicit. Choose an encoding that you know the extension can handle. Convert arguments passed to the extensions to that encoding and convert results back from that encoding. Write wrapper functions that do the conversions for you, so you can later change the functions when the extension catches up.
To provide an example, let's say the popular Foo::Bar::escape_html function doesn't deal with Unicode data yet. The wrapper function would convert the argument to raw UTF-8 and convert the result back to Perl's internal representation like so:
Sometimes, when the extension does not convert data but just stores
and retrieves them, you will be able to use the otherwise
dangerous Encode::_utf8_on() function. Let's say the popular
Foo::Bar
extension, written in C, provides a param
method that
lets you store and retrieve data according to these prototypes:
- $self->param($name, $value); # set a scalar
- $value = $self->param($name); # retrieve a scalar
If it does not yet provide support for any encoding, one could write a
derived class with such a param
method:
- sub param {
- my($self,$name,$value) = @_;
- utf8::upgrade($name); # make sure it is UTF-8 encoded
- if (defined $value) {
- utf8::upgrade($value); # make sure it is UTF-8 encoded
- return $self->SUPER::param($name,$value);
- } else {
- my $ret = $self->SUPER::param($name);
- Encode::_utf8_on($ret); # we know, it is UTF-8 encoded
- return $ret;
- }
- }
Some extensions provide filters on data entry/exit points, such as DB_File::filter_store_key and family. Look out for such filters in the documentation of your extensions, they can make the transition to Unicode data much easier.
Some functions are slower when working on UTF-8 encoded strings than on byte encoded strings. All functions that need to hop over characters such as length(), substr() or index(), or matching regular expressions can work much faster when the underlying data are byte-encoded.
In Perl 5.8.0 the slowness was often quite spectacular; in Perl 5.8.1
a caching scheme was introduced which will hopefully make the slowness
somewhat less spectacular, at least for some operations. In general,
operations with UTF-8 encoded strings are still slower. As an example,
the Unicode properties (character classes) like \p{Nd}
are known to
be quite a bit slower (5-20 times) than their simpler counterparts
like \d
(then again, there are hundreds of Unicode characters matching Nd
compared with the 10 ASCII characters matching d
).
There are several known problems with Perl on EBCDIC platforms. If you want to use Perl there, send email to perlbug@perl.org.
In earlier versions, when byte and character data were concatenated, the new string was sometimes created by decoding the byte strings as ISO 8859-1 (Latin-1), even if the old Unicode string used EBCDIC.
If you find any of these, please report them as bugs.
Perl 5.8 has a different Unicode model from 5.6. In 5.6 the programmer
was required to use the utf8
pragma to declare that a given scope
expected to deal with Unicode data and had to make sure that only
Unicode data were reaching that scope. If you have code that is
working with 5.6, you will need some of the following adjustments to
your code. The examples are written such that the code will continue
to work under 5.6, so you should be safe to try them out.
A filehandle that should read or write UTF-8
A scalar that is going to be passed to some extension
Be it Compress::Zlib, Apache::Request or any extension that has no mention of Unicode in the manpage, you need to make sure that the UTF8 flag is stripped off. Note that at the time of this writing (January 2012) the mentioned modules are not UTF-8-aware. Please check the documentation to verify if this is still true.
A scalar we got back from an extension
If you believe the scalar comes back as UTF-8, you will most likely want the UTF8 flag restored:
Same thing, if you are really sure it is UTF-8
A wrapper for fetchrow_array and fetchrow_hashref
When the database contains only UTF-8, a wrapper function or method is a convenient way to replace all your fetchrow_array and fetchrow_hashref calls. A wrapper function will also make it easier to adapt to future enhancements in your database driver. Note that at the time of this writing (January 2012), the DBI has no standardized way to deal with UTF-8 data. Please check the documentation to verify if that is still true.
- sub fetchrow {
- # $what is one of fetchrow_{array,hashref}
- my($self, $sth, $what) = @_;
- if ($] < 5.008) {
- return $sth->$what;
- } else {
- require Encode;
- if (wantarray) {
- my @arr = $sth->$what;
- for (@arr) {
- defined && /[^\000-\177]/ && Encode::_utf8_on($_);
- }
- return @arr;
- } else {
- my $ret = $sth->$what;
- if (ref $ret) {
- for my $k (keys %$ret) {
- defined
- && /[^\000-\177]/
- && Encode::_utf8_on($_) for $ret->{$k};
- }
- return $ret;
- } else {
- defined && /[^\000-\177]/ && Encode::_utf8_on($_) for $ret;
- return $ret;
- }
- }
- }
- }
A large scalar that you know can only contain ASCII
Scalars that contain only ASCII and are marked as UTF-8 are sometimes a drag to your program. If you recognize such a situation, just remove the UTF8 flag:
- utf8::downgrade($val) if $] > 5.008;
perlunitut, perluniintro, perluniprops, Encode, open, utf8, bytes, perlretut, ${^UNICODE} in perlvar http://www.unicode.org/reports/tr44).
perlunifaq - Perl Unicode FAQ
This is a list of questions and answers about Unicode in Perl, intended to be read after perlunitut.
No, and this isn't really a Unicode FAQ.
Perl has an abstracted interface for all supported character encodings, so this
is actually a generic Encode
tutorial and Encode
FAQ. But many people
think that Unicode is special and magical, and I didn't want to disappoint
them, so I decided to call the document a Unicode tutorial.
To find out which character encodings your Perl supports, run:
- perl -MEncode -le "print for Encode->encodings(':all')"
Well, if you can, upgrade to the most recent, but certainly 5.8.1
or newer.
The tutorial and FAQ assume the latest release.
You should also check your modules, and upgrade them if necessary. For example, HTML::Entities requires version >= 1.32 to function correctly, even though the changelog is silent about this.
Well, apart from a bare binmode $fh
, you shouldn't treat them specially.
(The binmode is needed because otherwise Perl may convert line endings on Win32
systems.)
Be careful, though, to never combine text strings with binary strings. If you need text in a binary stream, encode your text strings first using the appropriate encoding, then join them with binary strings. See also: "What if I don't encode?".
Whenever you're communicating text with anything that is external to your perl process, like a database, a text file, a socket, or another program. Even if the thing you're communicating with is also written in Perl.
Whenever your encoded, binary string is used together with a text string, Perl
will assume that your binary string was encoded with ISO-8859-1, also known as
latin-1. If it wasn't latin-1, then your data is unpleasantly converted. For
example, if it was UTF-8, the individual bytes of multibyte characters are seen
as separate characters, and then again converted to UTF-8. Such double encoding
can be compared to double HTML encoding (&gt;
), or double URI encoding
(%253E
).
This silent implicit decoding is known as "upgrading". That may sound positive, but it's best to avoid it.
Your text string will be sent using the bytes in Perl's internal format. In some cases, Perl will warn you that you're doing something wrong, with a friendly warning:
- Wide character in print at example.pl line 2.
Because the internal format is often UTF-8, these bugs are hard to spot, because UTF-8 is usually the encoding you wanted! But don't be lazy, and don't use the fact that Perl's internal format is UTF-8 to your advantage. Encode explicitly to avoid weird bugs, and to show to maintenance programmers that you thought this through.
If all data that comes from a certain handle is encoded in exactly the same
way, you can tell the PerlIO system to automatically decode everything, with
the encoding
layer. If you do this, you can't accidentally forget to decode
or encode anymore, on things that use the layered handle.
You can provide this layer when opening the file:
Or if you already have an open filehandle:
- binmode $fh, ':encoding(UTF-8)';
Some database drivers for DBI can also automatically encode and decode, but that is sometimes limited to the UTF-8 encoding.
Do whatever you can to find out, and if you have to: guess. (Don't forget to document your guess with a comment.)
You could open the document in a web browser, and change the character set or character encoding until you can visually confirm that all characters look the way they should.
There is no way to reliably detect the encoding automatically, so if people keep sending you data without charset indication, you may have to educate them.
Yes, you can! If your sources are UTF-8 encoded, you can indicate that with the
use utf8
pragma.
- use utf8;
This doesn't do anything to your input, or to your output. It only influences
the way your sources are read. You can use Unicode in string literals, in
identifiers (but they still have to be "word characters" according to \w
),
and even in custom delimiters.
No, Data::Dumper's Unicode abilities are as they should be. There have been
some complaints that it should restore the UTF8 flag when the data is read
again with eval. However, you should really not look at the flag, and
nothing indicates that Data::Dumper should break this rule.
Here's what happens: when Perl reads in a string literal, it sticks to 8 bit encoding as long as it can. (But perhaps originally it was internally encoded as UTF-8, when you dumped it.) When it has to give that up because other characters are added to the text string, it silently upgrades the string to UTF-8.
If you properly encode your strings for output, none of this is of your
concern, and you can just eval dumped data as always.
Starting in Perl 5.14 (and partially in Perl 5.12), just put a
use feature 'unicode_strings'
near the beginning of your program.
Within its lexical scope you shouldn't have this problem. It also is
automatically enabled under use feature ':5.12'
or use v5.12
or
using -E
on the command line for Perl 5.12 or higher.
The rationale for requiring this is to not break older programs that
rely on the way things worked before Unicode came along. Those older
programs knew only about the ASCII character set, and so may not work
properly for additional characters. When a string is encoded in UTF-8,
Perl assumes that the program is prepared to deal with Unicode, but when
the string isn't, Perl assumes that only ASCII
is wanted, and so those characters that are not ASCII
characters aren't recognized as to what they would be in Unicode.
use feature 'unicode_strings'
tells Perl to treat all characters as
Unicode, whether the string is encoded in UTF-8 or not, thus avoiding
the problem.
However, on earlier Perls, or if you pass strings to subroutines outside
the feature's scope, you can force Unicode semantics by changing the
encoding to UTF-8 by doing utf8::upgrade($string)
. This can be used
safely on any string, as it checks and does not change strings that have
already been upgraded.
For a more detailed discussion, see Unicode::Semantics on CPAN.
You can't. Some use the UTF8 flag for this, but that's misuse, and makes well behaved modules like Data::Dumper look bad. The flag is useless for this purpose, because it's off when an 8 bit encoding (by default ISO-8859-1) is used to store the string.
This is something you, the programmer, has to keep track of; sorry. You could consider adopting a kind of "Hungarian notation" to help with this.
By first converting the FOO-encoded byte string to a text string, and then the text string to a BAR-encoded byte string:
or by skipping the text string part, and going directly from one binary encoding to the other:
- use Encode qw(from_to);
- from_to($string, 'FOO', 'BAR'); # changes contents of $string
or by letting automatic decoding and encoding do all the work:
decode_utf8
and encode_utf8
?These are alternate syntaxes for decode('utf8', ...)
and encode('utf8',
...)
.
This is a term used both for characters with an ordinal value greater than 127, characters with an ordinal value greater than 255, or any character occupying more than one byte, depending on the context.
The Perl warning "Wide character in ..." is caused by a character with an ordinal value greater than 255. With no specified encoding layer, Perl tries to fit things in ISO-8859-1 for backward compatibility reasons. When it can't, it emits this warning (if warnings are enabled), and outputs UTF-8 encoded data instead.
To avoid this warning and to avoid having different output encodings in a single stream, always specify an encoding explicitly, for example with a PerlIO layer:
- binmode STDOUT, ":encoding(UTF-8)";
Please, unless you're hacking the internals, or debugging weirdness, don't
think about the UTF8 flag at all. That means that you very probably shouldn't
use is_utf8
, _utf8_on
or _utf8_off
at all.
The UTF8 flag, also called SvUTF8, is an internal flag that indicates that the current internal representation is UTF-8. Without the flag, it is assumed to be ISO-8859-1. Perl converts between these automatically. (Actually Perl usually assumes the representation is ASCII; see Why do regex character classes sometimes match only in the ASCII range? above.)
One of Perl's internal formats happens to be UTF-8. Unfortunately, Perl can't keep a secret, so everyone knows about this. That is the source of much confusion. It's better to pretend that the internal format is some unknown encoding, and that you always have to encode and decode explicitly.
use bytes
pragma?Don't use it. It makes no sense to deal with bytes in a text string, and it makes no sense to deal with characters in a byte string. Do the proper conversions (by decoding/encoding), and things will work out well: you get character counts for decoded data, and byte counts for encoded data.
use bytes
is usually a failed attempt to do something useful. Just forget
about it.
use encoding
pragma?Don't use it. Unfortunately, it assumes that the programmer's environment and that of the user will use the same encoding. It will use the same encoding for the source code and for STDIN and STDOUT. When a program is copied to another machine, the source code does not change, but the STDIO environment might.
If you need non-ASCII characters in your source code, make it a UTF-8 encoded
file and use utf8
.
If you need to set the encoding for STDIN, STDOUT, and STDERR, for example
based on the user's locale, use open
.
:encoding
and :utf8
?Because UTF-8 is one of Perl's internal formats, you can often just skip the encoding or decoding step, and manipulate the UTF8 flag directly.
Instead of :encoding(UTF-8)
, you can simply use :utf8
, which skips the
encoding step if the data was already represented as UTF8 internally. This is
widely accepted as good behavior when you're writing, but it can be dangerous
when reading, because it causes internal inconsistency when you have invalid
byte sequences. Using :utf8
for input can sometimes result in security
breaches, so please use :encoding(UTF-8)
instead.
Instead of decode
and encode
, you could use _utf8_on
and _utf8_off
,
but this is considered bad style. Especially _utf8_on
can be dangerous, for
the same reason that :utf8
can.
There are some shortcuts for oneliners; see -C in perlrun.
UTF-8
and utf8
?UTF-8
is the official standard. utf8
is Perl's way of being liberal in
what it accepts. If you have to communicate with things that aren't so liberal,
you may want to consider using UTF-8
. If you have to communicate with things
that are too liberal, you may have to use utf8
. The full explanation is in
Encode.
UTF-8
is internally known as utf-8-strict
. The tutorial uses UTF-8
consistently, even where utf8 is actually used internally, because the
distinction can be hard to make, and is mostly irrelevant.
For example, utf8 can be used for code points that don't exist in Unicode, like 9999999, but if you encode that to UTF-8, you get a substitution character (by default; see Handling Malformed Data in Encode for more ways of dealing with this.)
Okay, if you insist: the "internal format" is utf8, not UTF-8. (When it's not some other encoding.)
It's good that you lost track, because you shouldn't depend on the internal format being any specific encoding. But since you asked: by default, the internal format is either ISO-8859-1 (latin-1), or utf8, depending on the history of the string. On EBCDIC platforms, this may be different even.
Perl knows how it stored the string internally, and will use that knowledge
when you encode
. In other words: don't try to find out what the internal
encoding for a certain string is, but instead just encode it into the encoding
that you want.
Juerd Waalboer <#####@juerd.nl>
perlunicode, perluniintro, Encode
perluniintro - Perl Unicode introduction
This document gives a general idea of Unicode and how to use Unicode in Perl. See Further Resources for references to more in-depth treatments of Unicode.
Unicode is a character set standard which plans to codify all of the writing systems of the world, plus many other symbols.
Unicode and ISO/IEC 10646 are coordinated standards that unify almost all other modern character set standards, covering more than 80 writing systems and hundreds of languages, including all commercially-important modern languages. All characters in the largest Chinese, Japanese, and Korean dictionaries are also encoded. The standards will eventually cover almost all characters in more than 250 writing systems and thousands of languages. Unicode 1.0 was released in October 1991, and 6.0 in October 2010.
A Unicode character is an abstract entity. It is not bound to any
particular integer width, especially not to the C language char
.
Unicode is language-neutral and display-neutral: it does not encode the
language of the text, and it does not generally define fonts or other graphical
layout details. Unicode operates on characters and on text built from
those characters.
Unicode defines characters like LATIN CAPITAL LETTER A
or GREEK
SMALL LETTER ALPHA
and unique numbers for the characters, in this
case 0x0041 and 0x03B1, respectively. These unique numbers are called
code points. A code point is essentially the position of the
character within the set of all possible Unicode characters, and thus in
Perl, the term ordinal is often used interchangeably with it.
The Unicode standard prefers using hexadecimal notation for the code
points. If numbers like 0x0041
are unfamiliar to you, take a peek
at a later section, Hexadecimal Notation. The Unicode standard
uses the notation U+0041 LATIN CAPITAL LETTER A, to give the
hexadecimal code point and the normative name of the character.
Unicode also defines various properties for the characters, like "uppercase" or "lowercase", "decimal digit", or "punctuation"; these properties are independent of the names of the characters. Furthermore, various operations on the characters like uppercasing, lowercasing, and collating (sorting) are defined.
A Unicode logical "character" can actually consist of more than one internal
actual "character" or code point. For Western languages, this is adequately
modelled by a base character (like LATIN CAPITAL LETTER A
) followed
by one or more modifiers (like COMBINING ACUTE ACCENT
). This sequence of
base character and modifiers is called a combining character
sequence. Some non-western languages require more complicated
models, so Unicode created the grapheme cluster concept, which was
later further refined into the extended grapheme cluster. For
example, a Korean Hangul syllable is considered a single logical
character, but most often consists of three actual
Unicode characters: a leading consonant followed by an interior vowel followed
by a trailing consonant.
Whether to call these extended grapheme clusters "characters" depends on your point of view. If you are a programmer, you probably would tend towards seeing each element in the sequences as one unit, or "character". However from the user's point of view, the whole sequence could be seen as one "character" since that's probably what it looks like in the context of the user's language. In this document, we take the programmer's point of view: one "character" is one Unicode code point.
For some combinations of base character and modifiers, there are
precomposed characters. There is a single character equivalent, for
example, to the sequence LATIN CAPITAL LETTER A
followed by
COMBINING ACUTE ACCENT
. It is called LATIN CAPITAL LETTER A WITH
ACUTE
. These precomposed characters are, however, only available for
some combinations, and are mainly meant to support round-trip
conversions between Unicode and legacy standards (like ISO 8859). Using
sequences, as Unicode does, allows for needing fewer basic building blocks
(code points) to express many more potential grapheme clusters. To
support conversion between equivalent forms, various normalization
forms are also defined. Thus, LATIN CAPITAL LETTER A WITH ACUTE
is
in Normalization Form Composed, (abbreviated NFC), and the sequence
LATIN CAPITAL LETTER A
followed by COMBINING ACUTE ACCENT
represents the same character in Normalization Form Decomposed (NFD).
Because of backward compatibility with legacy encodings, the "a unique number for every character" idea breaks down a bit: instead, there is "at least one number for every character". The same character could be represented differently in several legacy encodings. The converse is not also true: some code points do not have an assigned character. Firstly, there are unallocated code points within otherwise used blocks. Secondly, there are special Unicode control characters that do not represent true characters.
When Unicode was first conceived, it was thought that all the world's
characters could be represented using a 16-bit word; that is a maximum of
0x10000
(or 65536) characters from 0x0000
to 0xFFFF
would be
needed. This soon proved to be false, and since Unicode 2.0 (July
1996), Unicode has been defined all the way up to 21 bits (0x10FFFF
),
and Unicode 3.1 (March 2001) defined the first characters above 0xFFFF
.
The first 0x10000
characters are called the Plane 0, or the
Basic Multilingual Plane (BMP). With Unicode 3.1, 17 (yes,
seventeen) planes in all were defined--but they are nowhere near full of
defined characters, yet.
When a new language is being encoded, Unicode generally will choose a
block
of consecutive unallocated code points for its characters. So
far, the number of code points in these blocks has always been evenly
divisible by 16. Extras in a block, not currently needed, are left
unallocated, for future growth. But there have been occasions when
a later relase needed more code points than the available extras, and a
new block had to allocated somewhere else, not contiguous to the initial
one, to handle the overflow. Thus, it became apparent early on that
"block" wasn't an adequate organizing principal, and so the Script
property was created. (Later an improved script property was added as
well, the Script_Extensions
property.) Those code points that are in
overflow blocks can still
have the same script as the original ones. The script concept fits more
closely with natural language: there is Latin
script, Greek
script, and so on; and there are several artificial scripts, like
Common
for characters that are used in multiple scripts, such as
mathematical symbols. Scripts usually span varied parts of several
blocks. For more information about scripts, see Scripts in perlunicode.
The division into blocks exists, but it is almost completely
accidental--an artifact of how the characters have been and still are
allocated. (Note that this paragraph has oversimplified things for the
sake of this being an introduction. Unicode doesn't really encode
languages, but the writing systems for them--their scripts; and one
script can be used by many languages. Unicode also encodes things that
aren't really about languages, such as symbols like BAGGAGE CLAIM
.)
The Unicode code points are just abstract numbers. To input and output these abstract numbers, the numbers must be encoded or serialised somehow. Unicode defines several character encoding forms, of which UTF-8 is perhaps the most popular. UTF-8 is a variable length encoding that encodes Unicode characters as 1 to 6 bytes. Other encodings include UTF-16 and UTF-32 and their big- and little-endian variants (UTF-8 is byte-order independent). The ISO/IEC 10646 defines the UCS-2 and UCS-4 encoding forms.
For more information about encodings--for instance, to learn what surrogates and byte order marks (BOMs) are--see perlunicode.
Starting from Perl v5.6.0, Perl has had the capacity to handle Unicode
natively. Perl v5.8.0, however, is the first recommended release for
serious Unicode work. The maintenance release 5.6.1 fixed many of the
problems of the initial Unicode implementation, but for example
regular expressions still do not work with Unicode in 5.6.1.
Perl v5.14.0 is the first release where Unicode support is
(almost) seamlessly integrable without some gotchas (the exception being
some differences in quotemeta, which is fixed
starting in Perl 5.16.0). To enable this
seamless support, you should use feature 'unicode_strings'
(which is
automatically selected if you use 5.012
or higher). See feature.
(5.14 also fixes a number of bugs and departures from the Unicode
standard.)
Before Perl v5.8.0, the use of use utf8
was used to declare
that operations in the current block or file would be Unicode-aware.
This model was found to be wrong, or at least clumsy: the "Unicodeness"
is now carried with the data, instead of being attached to the
operations.
Starting with Perl v5.8.0, only one case remains where an explicit use
utf8
is needed: if your Perl script itself is encoded in UTF-8, you can
use UTF-8 in your identifier names, and in string and regular expression
literals, by saying use utf8
. This is not the default because
scripts with legacy 8-bit data in them would break. See utf8.
Perl supports both pre-5.6 strings of eight-bit native bytes, and
strings of Unicode characters. The general principle is that Perl tries
to keep its data as eight-bit bytes for as long as possible, but as soon
as Unicodeness cannot be avoided, the data is transparently upgraded
to Unicode. Prior to Perl v5.14.0, the upgrade was not completely
transparent (see The Unicode Bug in perlunicode), and for backwards
compatibility, full transparency is not gained unless use feature
'unicode_strings'
(see feature) or use 5.012
(or higher) is
selected.
Internally, Perl currently uses either whatever the native eight-bit
character set of the platform (for example Latin-1) is, defaulting to
UTF-8, to encode Unicode strings. Specifically, if all code points in
the string are 0xFF
or less, Perl uses the native eight-bit
character set. Otherwise, it uses UTF-8.
A user of Perl does not normally need to know nor care how Perl happens to encode its internal strings, but it becomes relevant when outputting Unicode strings to a stream without a PerlIO layer (one with the "default" encoding). In such a case, the raw bytes used internally (the native character set or UTF-8, as appropriate for each string) will be used, and a "Wide character" warning will be issued if those strings contain a character beyond 0x00FF.
For example,
- perl -e 'print "\x{DF}\n", "\x{0100}\x{DF}\n"'
produces a fairly useless mixture of native bytes and UTF-8, as well as a warning:
- Wide character in print at ...
To output UTF-8, use the :encoding
or :utf8
output layer. Prepending
- binmode(STDOUT, ":utf8");
to this sample program ensures that the output is completely UTF-8, and removes the program's warning.
You can enable automatic UTF-8-ification of your standard file
handles, default open() layer, and @ARGV
by using either
the -C
command line switch or the PERL_UNICODE
environment
variable, see perlrun for the documentation of the -C
switch.
Note that this means that Perl expects other software to work the same way: if Perl has been led to believe that STDIN should be UTF-8, but then STDIN coming in from another command is not UTF-8, Perl will likely complain about the malformed UTF-8.
All features that combine Unicode and I/O also require using the new
PerlIO feature. Almost all Perl 5.8 platforms do use PerlIO, though:
you can see whether yours is by running "perl -V" and looking for
useperlio=define
.
Perl 5.8.0 also supports Unicode on EBCDIC platforms. There, Unicode support is somewhat more complex to implement since additional conversions are needed at every step.
Later Perl releases have added code that will not work on EBCDIC platforms, and no one has complained, so the divergence has continued. If you want to run Perl on an EBCDIC platform, send email to perlbug@perl.org
On EBCDIC platforms, the internal Unicode encoding form is UTF-EBCDIC instead of UTF-8. The difference is that as UTF-8 is "ASCII-safe" in that ASCII characters encode to UTF-8 as-is, while UTF-EBCDIC is "EBCDIC-safe".
To create Unicode characters in literals for code points above 0xFF
,
use the \x{...}
notation in double-quoted strings:
- my $smiley = "\x{263a}";
Similarly, it can be used in regular expression literals
- $smiley =~ /\x{263a}/;
At run-time you can use chr():
See Further Resources for how to find all these numeric codes.
Naturally, ord() will do the reverse: it turns a character into
a code point.
Note that \x..
(no {}
and only two hexadecimal digits), \x{...}
,
and chr(...) for arguments less than 0x100
(decimal 256)
generate an eight-bit character for backward compatibility with older
Perls. For arguments of 0x100
or more, Unicode characters are
always produced. If you want to force the production of Unicode
characters regardless of the numeric value, use pack("U", ...)
instead of \x..
, \x{...}
, or chr().
You can invoke characters by name in double-quoted strings:
- my $arabic_alef = "\N{ARABIC LETTER ALEF}";
And, as mentioned above, you can also pack() numbers into Unicode
characters:
Note that both \x{...}
and \N{...}
are compile-time string
constants: you cannot use variables in them. if you want similar
run-time functionality, use chr() and charnames::string_vianame()
.
If you want to force the result to Unicode characters, use the special
"U0"
prefix. It consumes no arguments but causes the following bytes
to be interpreted as the UTF-8 encoding of Unicode characters:
Likewise, you can stop such UTF-8 interpretation by using the special
"C0"
prefix.
Handling Unicode is for the most part transparent: just use the
strings as usual. Functions like index(), length(), and
substr() will work on the Unicode characters; regular expressions
will work on the Unicode characters (see perlunicode and perlretut).
Note that Perl considers grapheme clusters to be separate characters, so for example
will print 2, not 1. The only exception is that regular expressions
have \X
for matching an extended grapheme cluster. (Thus \X
in a
regular expression would match the entire sequence of both the example
characters.)
Life is not quite so transparent, however, when working with legacy encodings, I/O, and certain special cases:
When you combine legacy data and Unicode, the legacy data needs to be upgraded to Unicode. Normally the legacy data is assumed to be ISO 8859-1 (or EBCDIC, if applicable).
The Encode
module knows about many encodings and has interfaces
for doing conversions between those encodings:
- use Encode 'decode';
- $data = decode("iso-8859-3", $data); # convert from legacy to utf-8
Normally, writing out Unicode data
- print FH $some_string_with_unicode, "\n";
produces raw bytes that Perl happens to use to internally encode the
Unicode string. Perl's internal encoding depends on the system as
well as what characters happen to be in the string at the time. If
any of the characters are at code points 0x100
or above, you will get
a warning. To ensure that the output is explicitly rendered in the
encoding you desire--and to avoid the warning--open the stream with
the desired encoding. Some examples:
and on already open streams, use binmode():
The matching of encoding names is loose: case does not matter, and
many encodings have several aliases. Note that the :utf8
layer
must always be specified exactly like that; it is not subject to
the loose matching of encoding names. Also note that currently :utf8
is unsafe for
input, because it accepts the data without validating that it is indeed valid
UTF-8; you should instead use :encoding(utf-8)
(with or without a
hyphen).
See PerlIO for the :utf8
layer, PerlIO::encoding and
Encode::PerlIO for the :encoding()
layer, and
Encode::Supported for many encodings supported by the Encode
module.
Reading in a file that you know happens to be encoded in one of the Unicode or legacy encodings does not magically turn the data into Unicode in Perl's eyes. To do that, specify the appropriate layer when opening files
The I/O layers can also be specified more flexibly with
the open pragma. See open, or look at the following example.
With the open pragma you can use the :locale
layer
- BEGIN { $ENV{LC_ALL} = $ENV{LANG} = 'ru_RU.KOI8-R' }
- # the :locale will probe the locale environment variables like
- # LC_ALL
- use open OUT => ':locale'; # russki parusski
- open(O, ">koi8");
- print O chr(0x430); # Unicode CYRILLIC SMALL LETTER A = KOI8-R 0xc1
- close O;
- open(I, "<koi8");
- printf "%#x\n", ord(<I>), "\n"; # this should print 0xc1
- close I;
These methods install a transparent filter on the I/O stream that converts data from the specified encoding when it is read in from the stream. The result is always Unicode.
The open pragma affects all the open() calls after the pragma by
setting default layers. If you want to affect only certain
streams, use explicit layers directly in the open() call.
You can switch encodings on an already opened stream by using
binmode(); see binmode.
The :locale
does not currently work with
open() and binmode(), only with the open pragma. The
:utf8
and :encoding(...)
methods do work with all of open(),
binmode(), and the open pragma.
Similarly, you may use these I/O layers on output streams to automatically convert Unicode to the specified encoding when it is written to the stream. For example, the following snippet copies the contents of the file "text.jis" (encoded as ISO-2022-JP, aka JIS) to the file "text.utf8", encoded as UTF-8:
The naming of encodings, both by the open() and by the open
pragma allows for flexible names: koi8-r
and KOI8R
will both be
understood.
Common encodings recognized by ISO, MIME, IANA, and various other standardisation organisations are recognised; for a more detailed list see Encode::Supported.
read() reads characters and returns the number of characters.
seek() and tell() operate on byte counts, as do sysread()
and sysseek().
Notice that because of the default behaviour of not doing any conversion upon input if there is no default layer, it is easy to mistakenly write code that keeps on expanding a file by repeatedly encoding the data:
If you run this code twice, the contents of the file will be twice
UTF-8 encoded. A use open ':encoding(utf8)'
would have avoided the
bug, or explicitly opening also the file for input as UTF-8.
NOTE: the :utf8
and :encoding
features work only if your
Perl has been built with the new PerlIO feature (which is the default
on most systems).
Sometimes you might want to display Perl scalars containing Unicode as
simple ASCII (or EBCDIC) text. The following subroutine converts
its argument so that Unicode characters with code points greater than
255 are displayed as \x{...}
, control characters (like \n
) are
displayed as \x..
, and the rest of the characters as themselves:
For example,
- nice_string("foo\x{100}bar\n")
returns the string
- 'foo\x{0100}bar\x0A'
which is ready to be printed.
Bit Complement Operator ~ And vec()
The bit complement operator ~
may produce surprising results if
used on strings containing characters with ordinal values above
255. In such a case, the results are consistent with the internal
encoding of the characters, but not with much else. So don't do
that. Similarly for vec(): you will be operating on the
internally-encoded bit patterns of the Unicode characters, not on
the code point values, which is very probably not what you want.
Peeking At Perl's Internal Encoding
Normal users of Perl should never care how Perl encodes any particular Unicode string (because the normal ways to get at the contents of a string with Unicode--via input and output--should always be via explicitly-defined I/O layers). But if you must, there are two ways of looking behind the scenes.
One way of peeking inside the internal encoding of Unicode characters
is to use unpack("C*", ... to get the bytes of whatever the string
encoding happens to be, or unpack("U0..", ...)
to get the bytes of the
UTF-8 encoding:
Yet another way would be to use the Devel::Peek module:
- perl -MDevel::Peek -e 'Dump(chr(0x100))'
That shows the UTF8
flag in FLAGS and both the UTF-8 bytes
and Unicode characters in PV
. See also later in this document
the discussion about the utf8::is_utf8()
function.
String Equivalence
The question of string equivalence turns somewhat complicated in Unicode: what do you mean by "equal"?
(Is LATIN CAPITAL LETTER A WITH ACUTE
equal to
LATIN CAPITAL LETTER A
?)
The short answer is that by default Perl compares equivalence (eq
,
ne
) based only on code points of the characters. In the above
case, the answer is no (because 0x00C1 != 0x0041). But sometimes, any
CAPITAL LETTER A's should be considered equal, or even A's of any case.
The long answer is that you need to consider character normalization and casing issues: see Unicode::Normalize, Unicode Technical Report #15, Unicode Normalization Forms and sections on case mapping in the Unicode Standard.
As of Perl 5.8.0, the "Full" case-folding of Case
Mappings/SpecialCasing is implemented, but bugs remain in qr//i with them,
mostly fixed by 5.14.
String Collation
People like to see their strings nicely sorted--or as Unicode parlance goes, collated. But again, what do you mean by collate?
(Does LATIN CAPITAL LETTER A WITH ACUTE
come before or after
LATIN CAPITAL LETTER A WITH GRAVE
?)
The short answer is that by default, Perl compares strings (lt
,
le
, cmp
, ge
, gt
) based only on the code points of the
characters. In the above case, the answer is "after", since
0x00C1
> 0x00C0
.
The long answer is that "it depends", and a good answer cannot be given without knowing (at the very least) the language context. See Unicode::Collate, and Unicode Collation Algorithm http://www.unicode.org/unicode/reports/tr10/
Character Ranges and Classes
Character ranges in regular expression bracketed character classes ( e.g.,
/[a-z]/
) and in the tr/// (also known as y///) operator are not
magically Unicode-aware. What this means is that [A-Za-z]
will not
magically start to mean "all alphabetic letters" (not that it does mean that
even for 8-bit characters; for those, if you are using locales (perllocale),
use /[[:alpha:]]/
; and if not, use the 8-bit-aware property \p{alpha}
).
All the properties that begin with \p
(and its inverse \P
) are actually
character classes that are Unicode-aware. There are dozens of them, see
perluniprops.
You can use Unicode code points as the end points of character ranges, and the range will include all Unicode code points that lie between those end points.
String-To-Number Conversions
Unicode does define several other decimal--and numeric--characters besides the familiar 0 to 9, such as the Arabic and Indic digits. Perl does not support string-to-number conversion for digits other than ASCII 0 to 9 (and ASCII a to f for hexadecimal). To get safe conversions from any Unicode string, use num() in Unicode::UCD.
Will My Old Scripts Break?
Very probably not. Unless you are generating Unicode characters
somehow, old behaviour should be preserved. About the only behaviour
that has changed and which could start generating Unicode is the old
behaviour of chr() where supplying an argument more than 255
produced a character modulo 255. chr(300), for example, was equal
to chr(45) or "-" (in ASCII), now it is LATIN CAPITAL LETTER I WITH
BREVE.
How Do I Make My Scripts Work With Unicode?
Very little work should be needed since nothing changes until you
generate Unicode data. The most important thing is getting input as
Unicode; for that, see the earlier I/O discussion.
To get full seamless Unicode support, add
use feature 'unicode_strings'
(or use 5.012
or higher) to your
script.
How Do I Know Whether My String Is In Unicode?
You shouldn't have to care. But you may if your Perl is before 5.14.0
or you haven't specified use feature 'unicode_strings'
or use
5.012
(or higher) because otherwise the semantics of the code points
in the range 128 to 255 are different depending on
whether the string they are contained within is in Unicode or not.
(See When Unicode Does Not Happen in perlunicode.)
To determine if a string is in Unicode, use:
- print utf8::is_utf8($string) ? 1 : 0, "\n";
But note that this doesn't mean that any of the characters in the
string are necessary UTF-8 encoded, or that any of the characters have
code points greater than 0xFF (255) or even 0x80 (128), or that the
string has any characters at all. All the is_utf8()
does is to
return the value of the internal "utf8ness" flag attached to the
$string
. If the flag is off, the bytes in the scalar are interpreted
as a single byte encoding. If the flag is on, the bytes in the scalar
are interpreted as the (variable-length, potentially multi-byte) UTF-8 encoded
code points of the characters. Bytes added to a UTF-8 encoded string are
automatically upgraded to UTF-8. If mixed non-UTF-8 and UTF-8 scalars
are merged (double-quoted interpolation, explicit concatenation, or
printf/sprintf parameter substitution), the result will be UTF-8 encoded
as if copies of the byte strings were upgraded to UTF-8: for example,
- $a = "ab\x80c";
- $b = "\x{100}";
- print "$a = $b\n";
the output string will be UTF-8-encoded ab\x80c = \x{100}\n
, but
$a
will stay byte-encoded.
Sometimes you might really need to know the byte length of a string
instead of the character length. For that use either the
Encode::encode_utf8()
function or the bytes
pragma
and the length() function:
How Do I Find Out What Encoding a File Has?
You might try Encode::Guess, but it has a number of limitations.
How Do I Detect Data That's Not Valid In a Particular Encoding?
Use the Encode
package to try converting it.
For example,
Or use unpack to try decoding it:
If invalid, a Malformed UTF-8 character warning is produced. The "C0" means
"process the string character per character". Without that, the
unpack("U*", ...)
would work in U0
mode (the default if the format
string starts with U
) and it would return the bytes making up the UTF-8
encoding of the target string, something that will always work.
How Do I Convert Binary Data Into a Particular Encoding, Or Vice Versa?
This probably isn't as useful as you might think. Normally, you shouldn't need to.
In one sense, what you are asking doesn't make much sense: encodings are for characters, and binary data are not "characters", so converting "data" into some encoding isn't meaningful unless you know in what character set and encoding the binary data is in, in which case it's not just binary data, now is it?
If you have a raw sequence of bytes that you know should be
interpreted via a particular encoding, you can use Encode
:
- use Encode 'from_to';
- from_to($data, "iso-8859-1", "utf-8"); # from latin-1 to utf-8
The call to from_to()
changes the bytes in $data
, but nothing
material about the nature of the string has changed as far as Perl is
concerned. Both before and after the call, the string $data
contains just a bunch of 8-bit bytes. As far as Perl is concerned,
the encoding of the string remains as "system-native 8-bit bytes".
You might relate this to a fictional 'Translate' module:
The contents of the string changes, but not the nature of the string. Perl doesn't know any more after the call than before that the contents of the string indicates the affirmative.
Back to converting data. If you have (or want) data in your system's native 8-bit encoding (e.g. Latin-1, EBCDIC, etc.), you can use pack/unpack to convert to/from Unicode.
If you have a sequence of bytes you know is valid UTF-8, but Perl doesn't know it yet, you can make Perl a believer, too:
- use Encode 'decode_utf8';
- $Unicode = decode_utf8($bytes);
or:
- $Unicode = pack("U0a*", $bytes);
You can find the bytes that make up a UTF-8 sequence with
- @bytes = unpack("C*", $Unicode_string)
and you can create well-formed Unicode with
- $Unicode_string = pack("U*", 0xff, ...)
How Do I Display Unicode? How Do I Input Unicode?
See http://www.alanwood.net/unicode/ and http://www.cl.cam.ac.uk/~mgk25/unicode.html
How Does Unicode Work With Traditional Locales?
Starting in Perl 5.16, you can specify
- use locale ':not_characters';
to get Perl to work well with tradtional locales. The catch is that you have to translate from the locale character set to/from Unicode yourself. See Unicode I/O above for how to
to accomplish this, but full details are in Unicode and UTF-8 in perllocale, including gotchas that happen if you don't specifiy
:not_characters
.
The Unicode standard prefers using hexadecimal notation because
that more clearly shows the division of Unicode into blocks of 256 characters.
Hexadecimal is also simply shorter than decimal. You can use decimal
notation, too, but learning to use hexadecimal just makes life easier
with the Unicode standard. The U+HHHH
notation uses hexadecimal,
for example.
The 0x
prefix means a hexadecimal number, the digits are 0-9 and
a-f (or A-F, case doesn't matter). Each hexadecimal digit represents
four bits, or half a byte. print 0x..., "\n"
will show a
hexadecimal number in decimal, and printf "%x\n", $decimal
will
show a decimal number in hexadecimal. If you have just the
"hex digits" of a hexadecimal number, you can use the hex() function.
Unicode Consortium
Unicode FAQ
Unicode Glossary
Unicode Recommended Reading List
The Unicode Consortium has a list of articles and books, some of which give a much more in depth treatment of Unicode: http://unicode.org/resources/readinglist.html
Unicode Useful Resources
Unicode and Multilingual Support in HTML, Fonts, Web Browsers and Other Applications
UTF-8 and Unicode FAQ for Unix/Linux
Legacy Character Sets
You can explore various information from the Unicode data files using
the Unicode::UCD
module.
If you cannot upgrade your Perl to 5.8.0 or later, you can still
do some Unicode processing by using the modules Unicode::String
,
Unicode::Map8
, and Unicode::Map
, available from CPAN.
If you have the GNU recode installed, you can also use the
Perl front-end Convert::Recode
for character conversions.
The following are fast conversions from ISO 8859-1 (Latin-1) bytes to UTF-8 bytes and back, the code works even with older Perl 5 versions.
- # ISO 8859-1 to UTF-8
- s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
- # UTF-8 to ISO 8859-1
- s/([\xC2\xC3])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
perlunitut, perlunicode, Encode, open, utf8, bytes, perlretut, perlrun, Unicode::Collate, Unicode::Normalize, Unicode::UCD
Thanks to the kind readers of the perl5-porters@perl.org, perl-unicode@perl.org, linux-utf8@nl.linux.org, and unicore@unicode.org mailing lists for their valuable feedback.
Copyright 2001-2011 Jarkko Hietaniemi <jhi@iki.fi>
This document may be distributed under the same terms as Perl itself.
perluniprops - Index of Unicode Version 6.2.0 character properties in Perl
This document provides information about the portion of the Unicode database that deals with character properties, that is the portion that is defined on single code points. (Other information in the Unicode data base below briefly mentions other data that Unicode provides.)
Perl can provide access to all non-provisional Unicode character properties, though not all are enabled by default. The omitted ones are the Unihan properties (accessible via the CPAN module Unicode::Unihan) and certain deprecated or Unicode-internal properties. (An installation may choose to recompile Perl's tables to change this. See Unicode character properties that are NOT accepted by Perl.)
For most purposes, access to Unicode properties from the Perl core is through regular expression matches, as described in the next section. For some special purposes, and to access the properties that are not suitable for regular expression matching, all the Unicode character properties that Perl handles are accessible via the standard Unicode::UCD module, as described in the section Properties accessible through Unicode::UCD.
Perl also provides some additional extensions and short-cut synonyms for Unicode properties.
This document merely lists all available properties and does not attempt to explain what each property really means. There is a brief description of each Perl extension; see Other Properties in perlunicode for more information on these. There is some detail about Blocks, Scripts, General_Category, and Bidi_Class in perlunicode, but to find out about the intricacies of the official Unicode properties, refer to the Unicode standard. A good starting place is http://www.unicode.org/reports/tr44/.
Note that you can define your own properties; see User-Defined Character Properties in perlunicode.
\p{}
and \P{}
The Perl regular expression \p{}
and \P{}
constructs give access to
most of the Unicode character properties. The table below shows all these
constructs, both single and compound forms.
Compound forms consist of two components, separated by an equals sign or a
colon. The first component is the property name, and the second component is
the particular value of the property to match against, for example,
\p{Script: Greek}
and \p{Script=Greek}
both mean to match characters
whose Script property is Greek.
Single forms, like \p{Greek}
, are mostly Perl-defined shortcuts for
their equivalent compound forms. The table shows these equivalences. (In our
example, \p{Greek}
is a just a shortcut for \p{Script=Greek}
.)
There are also a few Perl-defined single forms that are not shortcuts for a
compound form. One such is \p{Word}
. These are also listed in the table.
In parsing these constructs, Perl always ignores Upper/lower case differences
everywhere within the {braces}. Thus \p{Greek}
means the same thing as
\p{greek}
. But note that changing the case of the "p"
or "P"
before
the left brace completely changes the meaning of the construct, from "match"
(for \p{}
) to "doesn't match" (for \P{}
). Casing in this document is
for improved legibility.
Also, white space, hyphens, and underscores are normally ignored
everywhere between the {braces}, and hence can be freely added or removed
even if the /x modifier hasn't been specified on the regular expression.
But a 'T' at the beginning of an entry in the table below
means that tighter (stricter) rules are used for that entry:
\p{name}
) tighter rules:
White space, hyphens, and underscores ARE significant except for:
That means, for example, that you can freely add or remove white space adjacent to (but within) the braces without affecting the meaning.
\p{name=value}
or \p{name:value}
) tighter rules:
The tighter rules given above for the single form apply to everything to the right of the colon or equals; the looser rules still apply to everything to the left.
That means, for example, that you can freely add or remove white space adjacent to (but within) the braces and the colon or equal sign.
Some properties are considered obsolete by Unicode, but still available. There are several varieties of obsolescence:
A property may be stabilized. Such a determination does not indicate that the property should or should not be used; instead it is a declaration that the property will not be maintained nor extended for newly encoded characters. Such properties are marked with an 'S' in the table.
A property may be deprecated, perhaps because its original intent
has been replaced by another property, or because its specification was
somehow defective. This means that its use is strongly
discouraged, so much so that a warning will be issued if used, unless the
regular expression is in the scope of a no warnings 'deprecated'
statement. A 'D' flags each such entry in the table, and
the entry there for the longest, most descriptive version of the property will
give the reason it is deprecated, and perhaps advice. Perl may issue such a
warning, even for properties that aren't officially deprecated by Unicode,
when there used to be characters or code points that were matched by them, but
no longer. This is to warn you that your program may not work like it did on
earlier Unicode releases.
A deprecated property may be made unavailable in a future Perl version, so it is best to move away from them.
A deprecated property may also be stabilized, but this fact is not shown.
Properties marked with an 'O' in the table are considered (plain) obsolete. Generally this designation is given to properties that Unicode once used for internal purposes (but not any longer).
Some Perl extensions are present for backwards compatibility and are discouraged from being used, but are not obsolete. An 'X' flags each such entry in the table. Future Unicode versions may force some of these extensions to be removed without warning, replaced by another property with the same name that means something different. Use the equivalent shown instead.
Matches in the Block property have shortcuts that begin with "In_". For
example, \p{Block=Latin1}
can be written as \p{In_Latin1}
. For
backward compatibility, if there is no conflict with another shortcut, these
may also be written as \p{Latin1}
or \p{Is_Latin1}
. But, N.B., there
are numerous such conflicting shortcuts. Use of these forms for Block is
discouraged, and are flagged as such, not only because of the potential
confusion as to what is meant, but also because a later release of Unicode may
preempt the shortcut, and your program would no longer be correct. Use the
"In_" form instead to avoid this, or even more clearly, use the compound form,
e.g., \p{blk:latin1}
. See Blocks in perlunicode for more information
about this.
The table below has two columns. The left column contains the \p{}
constructs to look up, possibly preceded by the flags mentioned above; and
the right column contains information about them, like a description, or
synonyms. It shows both the single and compound forms for each property that
has them. If the left column is a short name for a property, the right column
will give its longer, more descriptive name; and if the left column is the
longest name, the right column will show any equivalent shortest name, in both
single and compound forms if applicable.
The right column will also caution you if a property means something different than what might normally be expected.
All single forms are Perl extensions; a few compound forms are as well, and are noted as such.
Numbers in (parentheses) indicate the total number of code points matched by the property. For emphasis, those properties that match no code points at all are listed as well in a separate section following the table.
Most properties match the same code points regardless of whether "/i"
case-insensitive matching is specified or not. But a few properties are
affected. These are shown with the notation
- (/i= other_property)
in the second column. Under case-insensitive matching they match the same code pode points as the property "other_property".
There is no description given for most non-Perl defined properties (See http://www.unicode.org/reports/tr44/ for that).
For compactness, '*' is used as a wildcard instead of showing all possible combinations. For example, entries like:
- \p{Gc: *} \p{General_Category: *}
mean that 'Gc' is a synonym for 'General_Category', and anything that is valid for the latter is also valid for the former. Similarly,
- \p{Is_*} \p{*}
means that if and only if, for example, \p{Foo}
exists, then
\p{Is_Foo}
and \p{IsFoo}
are also valid and all mean the same thing.
And similarly, \p{Foo=Bar}
means the same as \p{Is_Foo=Bar}
and
\p{IsFoo=Bar}
. "*" here is restricted to something not beginning with an
underscore.
Also, in binary properties, 'Yes', 'T', and 'True' are all synonyms for 'Y'.
And 'No', 'F', and 'False' are all synonyms for 'N'. The table shows 'Y*' and
'N*' to indicate this, and doesn't have separate entries for the other
possibilities. Note that not all properties which have values 'Yes' and 'No'
are binary, and they have all their values spelled out without using this wild
card, and a NOT
clause in their description that highlights their not being
binary. These also require the compound form to match them, whereas true
binary properties have both single and compound forms available.
Note that all non-essential underscores are removed in the display of the short names below.
Legend summary:
- NAME INFO
- X \p{Aegean_Numbers} \p{Block=Aegean_Numbers} (64)
- T \p{Age: 1.1} \p{Age=V1_1} (33_979)
- T \p{Age: 2.0} \p{Age=V2_0} (144_521)
- T \p{Age: 2.1} \p{Age=V2_1} (2)
- T \p{Age: 3.0} \p{Age=V3_0} (10_307)
- T \p{Age: 3.1} \p{Age=V3_1} (44_978)
- T \p{Age: 3.2} \p{Age=V3_2} (1016)
- T \p{Age: 4.0} \p{Age=V4_0} (1226)
- T \p{Age: 4.1} \p{Age=V4_1} (1273)
- T \p{Age: 5.0} \p{Age=V5_0} (1369)
- T \p{Age: 5.1} \p{Age=V5_1} (1624)
- T \p{Age: 5.2} \p{Age=V5_2} (6648)
- T \p{Age: 6.0} \p{Age=V6_0} (2088)
- T \p{Age: 6.1} \p{Age=V6_1} (732)
- T \p{Age: 6.2} \p{Age=V6_2} (1)
- \p{Age: NA} \p{Age=Unassigned} (864_348)
- \p{Age: Unassigned} Code point's usage has not been assigned
- in any Unicode release thus far. (Short:
- \p{Age=NA}) (864_348)
- \p{Age: V1_1} Code point's usage introduced in version
- 1.1 (33_979)
- \p{Age: V2_0} Code point's usage was introduced in
- version 2.0; See also Property
- 'Present_In' (144_521)
- \p{Age: V2_1} Code point's usage was introduced in
- version 2.1; See also Property
- 'Present_In' (2)
- \p{Age: V3_0} Code point's usage was introduced in
- version 3.0; See also Property
- 'Present_In' (10_307)
- \p{Age: V3_1} Code point's usage was introduced in
- version 3.1; See also Property
- 'Present_In' (44_978)
- \p{Age: V3_2} Code point's usage was introduced in
- version 3.2; See also Property
- 'Present_In' (1016)
- \p{Age: V4_0} Code point's usage was introduced in
- version 4.0; See also Property
- 'Present_In' (1226)
- \p{Age: V4_1} Code point's usage was introduced in
- version 4.1; See also Property
- 'Present_In' (1273)
- \p{Age: V5_0} Code point's usage was introduced in
- version 5.0; See also Property
- 'Present_In' (1369)
- \p{Age: V5_1} Code point's usage was introduced in
- version 5.1; See also Property
- 'Present_In' (1624)
- \p{Age: V5_2} Code point's usage was introduced in
- version 5.2; See also Property
- 'Present_In' (6648)
- \p{Age: V6_0} Code point's usage was introduced in
- version 6.0; See also Property
- 'Present_In' (2088)
- \p{Age: V6_1} Code point's usage was introduced in
- version 6.1; See also Property
- 'Present_In' (732)
- \p{Age: V6_2} Code point's usage was introduced in
- version 6.2; See also Property
- 'Present_In' (1)
- \p{AHex} \p{PosixXDigit} (= \p{ASCII_Hex_Digit=Y})
- (22)
- \p{AHex: *} \p{ASCII_Hex_Digit: *}
- X \p{Alchemical} \p{Alchemical_Symbols} (= \p{Block=
- Alchemical_Symbols}) (128)
- X \p{Alchemical_Symbols} \p{Block=Alchemical_Symbols} (Short:
- \p{InAlchemical}) (128)
- \p{All} \p{Any} (1_114_112)
- \p{Alnum} Alphabetic and (decimal) Numeric (102_619)
- \p{Alpha} \p{Alphabetic=Y} (102_159)
- \p{Alpha: *} \p{Alphabetic: *}
- \p{Alphabetic} \p{Alpha} (= \p{Alphabetic=Y}) (102_159)
- \p{Alphabetic: N*} (Short: \p{Alpha=N}, \P{Alpha}) (1_011_953)
- \p{Alphabetic: Y*} (Short: \p{Alpha=Y}, \p{Alpha}) (102_159)
- X \p{Alphabetic_PF} \p{Alphabetic_Presentation_Forms} (=
- \p{Block=Alphabetic_Presentation_Forms})
- (80)
- X \p{Alphabetic_Presentation_Forms} \p{Block=
- Alphabetic_Presentation_Forms} (Short:
- \p{InAlphabeticPF}) (80)
- X \p{Ancient_Greek_Music} \p{Ancient_Greek_Musical_Notation} (=
- \p{Block=
- Ancient_Greek_Musical_Notation}) (80)
- X \p{Ancient_Greek_Musical_Notation} \p{Block=
- Ancient_Greek_Musical_Notation} (Short:
- \p{InAncientGreekMusic}) (80)
- X \p{Ancient_Greek_Numbers} \p{Block=Ancient_Greek_Numbers} (80)
- X \p{Ancient_Symbols} \p{Block=Ancient_Symbols} (64)
- \p{Any} [\x{0000}-\x{10FFFF}] (1_114_112)
- \p{Arab} \p{Arabic} (= \p{Script=Arabic}) (NOT
- \p{Block=Arabic}) (1235)
- \p{Arabic} \p{Script=Arabic} (Short: \p{Arab}; NOT
- \p{Block=Arabic}) (1235)
- X \p{Arabic_Ext_A} \p{Arabic_Extended_A} (= \p{Block=
- Arabic_Extended_A}) (96)
- X \p{Arabic_Extended_A} \p{Block=Arabic_Extended_A} (Short:
- \p{InArabicExtA}) (96)
- X \p{Arabic_Math} \p{Arabic_Mathematical_Alphabetic_Symbols}
- (= \p{Block=
- Arabic_Mathematical_Alphabetic_Symbols})
- (256)
- X \p{Arabic_Mathematical_Alphabetic_Symbols} \p{Block=
- Arabic_Mathematical_Alphabetic_Symbols}
- (Short: \p{InArabicMath}) (256)
- X \p{Arabic_PF_A} \p{Arabic_Presentation_Forms_A} (=
- \p{Block=Arabic_Presentation_Forms_A})
- (688)
- X \p{Arabic_PF_B} \p{Arabic_Presentation_Forms_B} (=
- \p{Block=Arabic_Presentation_Forms_B})
- (144)
- X \p{Arabic_Presentation_Forms_A} \p{Block=
- Arabic_Presentation_Forms_A} (Short:
- \p{InArabicPFA}) (688)
- X \p{Arabic_Presentation_Forms_B} \p{Block=
- Arabic_Presentation_Forms_B} (Short:
- \p{InArabicPFB}) (144)
- X \p{Arabic_Sup} \p{Arabic_Supplement} (= \p{Block=
- Arabic_Supplement}) (48)
- X \p{Arabic_Supplement} \p{Block=Arabic_Supplement} (Short:
- \p{InArabicSup}) (48)
- \p{Armenian} \p{Script=Armenian} (Short: \p{Armn}; NOT
- \p{Block=Armenian}) (91)
- \p{Armi} \p{Imperial_Aramaic} (= \p{Script=
- Imperial_Aramaic}) (NOT \p{Block=
- Imperial_Aramaic}) (31)
- \p{Armn} \p{Armenian} (= \p{Script=Armenian}) (NOT
- \p{Block=Armenian}) (91)
- X \p{Arrows} \p{Block=Arrows} (112)
- \p{ASCII} \p{Block=Basic_Latin} [[:ASCII:]] (128)
- \p{ASCII_Hex_Digit} \p{PosixXDigit} (= \p{ASCII_Hex_Digit=Y})
- (22)
- \p{ASCII_Hex_Digit: N*} (Short: \p{AHex=N}, \P{AHex}) (1_114_090)
- \p{ASCII_Hex_Digit: Y*} (Short: \p{AHex=Y}, \p{AHex}) (22)
- \p{Assigned} All assigned code points (249_698)
- \p{Avestan} \p{Script=Avestan} (Short: \p{Avst}; NOT
- \p{Block=Avestan}) (61)
- \p{Avst} \p{Avestan} (= \p{Script=Avestan}) (NOT
- \p{Block=Avestan}) (61)
- \p{Bali} \p{Balinese} (= \p{Script=Balinese}) (NOT
- \p{Block=Balinese}) (121)
- \p{Balinese} \p{Script=Balinese} (Short: \p{Bali}; NOT
- \p{Block=Balinese}) (121)
- \p{Bamu} \p{Bamum} (= \p{Script=Bamum}) (NOT
- \p{Block=Bamum}) (657)
- \p{Bamum} \p{Script=Bamum} (Short: \p{Bamu}; NOT
- \p{Block=Bamum}) (657)
- X \p{Bamum_Sup} \p{Bamum_Supplement} (= \p{Block=
- Bamum_Supplement}) (576)
- X \p{Bamum_Supplement} \p{Block=Bamum_Supplement} (Short:
- \p{InBamumSup}) (576)
- X \p{Basic_Latin} \p{ASCII} (= \p{Block=Basic_Latin}) (128)
- \p{Batak} \p{Script=Batak} (Short: \p{Batk}; NOT
- \p{Block=Batak}) (56)
- \p{Batk} \p{Batak} (= \p{Script=Batak}) (NOT
- \p{Block=Batak}) (56)
- \p{Bc: *} \p{Bidi_Class: *}
- \p{Beng} \p{Bengali} (= \p{Script=Bengali}) (NOT
- \p{Block=Bengali}) (92)
- \p{Bengali} \p{Script=Bengali} (Short: \p{Beng}; NOT
- \p{Block=Bengali}) (92)
- \p{Bidi_C} \p{Bidi_Control} (= \p{Bidi_Control=Y}) (7)
- \p{Bidi_C: *} \p{Bidi_Control: *}
- \p{Bidi_Class: AL} \p{Bidi_Class=Arabic_Letter} (1438)
- \p{Bidi_Class: AN} \p{Bidi_Class=Arabic_Number} (49)
- \p{Bidi_Class: Arabic_Letter} (Short: \p{Bc=AL}) (1438)
- \p{Bidi_Class: Arabic_Number} (Short: \p{Bc=AN}) (49)
- \p{Bidi_Class: B} \p{Bidi_Class=Paragraph_Separator} (7)
- \p{Bidi_Class: BN} \p{Bidi_Class=Boundary_Neutral} (4015)
- \p{Bidi_Class: Boundary_Neutral} (Short: \p{Bc=BN}) (4015)
- \p{Bidi_Class: Common_Separator} (Short: \p{Bc=CS}) (15)
- \p{Bidi_Class: CS} \p{Bidi_Class=Common_Separator} (15)
- \p{Bidi_Class: EN} \p{Bidi_Class=European_Number} (131)
- \p{Bidi_Class: ES} \p{Bidi_Class=European_Separator} (12)
- \p{Bidi_Class: ET} \p{Bidi_Class=European_Terminator} (66)
- \p{Bidi_Class: European_Number} (Short: \p{Bc=EN}) (131)
- \p{Bidi_Class: European_Separator} (Short: \p{Bc=ES}) (12)
- \p{Bidi_Class: European_Terminator} (Short: \p{Bc=ET}) (66)
- \p{Bidi_Class: L} \p{Bidi_Class=Left_To_Right} (1_098_530)
- \p{Bidi_Class: Left_To_Right} (Short: \p{Bc=L}) (1_098_530)
- \p{Bidi_Class: Left_To_Right_Embedding} (Short: \p{Bc=LRE}) (1)
- \p{Bidi_Class: Left_To_Right_Override} (Short: \p{Bc=LRO}) (1)
- \p{Bidi_Class: LRE} \p{Bidi_Class=Left_To_Right_Embedding} (1)
- \p{Bidi_Class: LRO} \p{Bidi_Class=Left_To_Right_Override} (1)
- \p{Bidi_Class: Nonspacing_Mark} (Short: \p{Bc=NSM}) (1290)
- \p{Bidi_Class: NSM} \p{Bidi_Class=Nonspacing_Mark} (1290)
- \p{Bidi_Class: ON} \p{Bidi_Class=Other_Neutral} (4447)
- \p{Bidi_Class: Other_Neutral} (Short: \p{Bc=ON}) (4447)
- \p{Bidi_Class: Paragraph_Separator} (Short: \p{Bc=B}) (7)
- \p{Bidi_Class: PDF} \p{Bidi_Class=Pop_Directional_Format} (1)
- \p{Bidi_Class: Pop_Directional_Format} (Short: \p{Bc=PDF}) (1)
- \p{Bidi_Class: R} \p{Bidi_Class=Right_To_Left} (4086)
- \p{Bidi_Class: Right_To_Left} (Short: \p{Bc=R}) (4086)
- \p{Bidi_Class: Right_To_Left_Embedding} (Short: \p{Bc=RLE}) (1)
- \p{Bidi_Class: Right_To_Left_Override} (Short: \p{Bc=RLO}) (1)
- \p{Bidi_Class: RLE} \p{Bidi_Class=Right_To_Left_Embedding} (1)
- \p{Bidi_Class: RLO} \p{Bidi_Class=Right_To_Left_Override} (1)
- \p{Bidi_Class: S} \p{Bidi_Class=Segment_Separator} (3)
- \p{Bidi_Class: Segment_Separator} (Short: \p{Bc=S}) (3)
- \p{Bidi_Class: White_Space} (Short: \p{Bc=WS}) (18)
- \p{Bidi_Class: WS} \p{Bidi_Class=White_Space} (18)
- \p{Bidi_Control} \p{Bidi_Control=Y} (Short: \p{BidiC}) (7)
- \p{Bidi_Control: N*} (Short: \p{BidiC=N}, \P{BidiC}) (1_114_105)
- \p{Bidi_Control: Y*} (Short: \p{BidiC=Y}, \p{BidiC}) (7)
- \p{Bidi_M} \p{Bidi_Mirrored} (= \p{Bidi_Mirrored=Y})
- (545)
- \p{Bidi_M: *} \p{Bidi_Mirrored: *}
- \p{Bidi_Mirrored} \p{Bidi_Mirrored=Y} (Short: \p{BidiM})
- (545)
- \p{Bidi_Mirrored: N*} (Short: \p{BidiM=N}, \P{BidiM}) (1_113_567)
- \p{Bidi_Mirrored: Y*} (Short: \p{BidiM=Y}, \p{BidiM}) (545)
- \p{Blank} \h, Horizontal white space (19)
- \p{Blk: *} \p{Block: *}
- \p{Block: Aegean_Numbers} (Single: \p{InAegeanNumbers}) (64)
- \p{Block: Alchemical} \p{Block=Alchemical_Symbols} (128)
- \p{Block: Alchemical_Symbols} (Short: \p{Blk=Alchemical},
- \p{InAlchemical}) (128)
- \p{Block: Alphabetic_PF} \p{Block=Alphabetic_Presentation_Forms}
- (80)
- \p{Block: Alphabetic_Presentation_Forms} (Short: \p{Blk=
- AlphabeticPF}, \p{InAlphabeticPF}) (80)
- \p{Block: Ancient_Greek_Music} \p{Block=
- Ancient_Greek_Musical_Notation} (80)
- \p{Block: Ancient_Greek_Musical_Notation} (Short: \p{Blk=
- AncientGreekMusic},
- \p{InAncientGreekMusic}) (80)
- \p{Block: Ancient_Greek_Numbers} (Single:
- \p{InAncientGreekNumbers}) (80)
- \p{Block: Ancient_Symbols} (Single: \p{InAncientSymbols}) (64)
- \p{Block: Arabic} (Single: \p{InArabic}; NOT \p{Arabic} NOR
- \p{Is_Arabic}) (256)
- \p{Block: Arabic_Ext_A} \p{Block=Arabic_Extended_A} (96)
- \p{Block: Arabic_Extended_A} (Short: \p{Blk=ArabicExtA},
- \p{InArabicExtA}) (96)
- \p{Block: Arabic_Math} \p{Block=
- Arabic_Mathematical_Alphabetic_Symbols}
- (256)
- \p{Block: Arabic_Mathematical_Alphabetic_Symbols} (Short: \p{Blk=
- ArabicMath}, \p{InArabicMath}) (256)
- \p{Block: Arabic_PF_A} \p{Block=Arabic_Presentation_Forms_A} (688)
- \p{Block: Arabic_PF_B} \p{Block=Arabic_Presentation_Forms_B} (144)
- \p{Block: Arabic_Presentation_Forms_A} (Short: \p{Blk=ArabicPFA},
- \p{InArabicPFA}) (688)
- \p{Block: Arabic_Presentation_Forms_B} (Short: \p{Blk=ArabicPFB},
- \p{InArabicPFB}) (144)
- \p{Block: Arabic_Sup} \p{Block=Arabic_Supplement} (48)
- \p{Block: Arabic_Supplement} (Short: \p{Blk=ArabicSup},
- \p{InArabicSup}) (48)
- \p{Block: Armenian} (Single: \p{InArmenian}; NOT \p{Armenian}
- NOR \p{Is_Armenian}) (96)
- \p{Block: Arrows} (Single: \p{InArrows}) (112)
- \p{Block: ASCII} \p{Block=Basic_Latin} (128)
- \p{Block: Avestan} (Single: \p{InAvestan}; NOT \p{Avestan}
- NOR \p{Is_Avestan}) (64)
- \p{Block: Balinese} (Single: \p{InBalinese}; NOT \p{Balinese}
- NOR \p{Is_Balinese}) (128)
- \p{Block: Bamum} (Single: \p{InBamum}; NOT \p{Bamum} NOR
- \p{Is_Bamum}) (96)
- \p{Block: Bamum_Sup} \p{Block=Bamum_Supplement} (576)
- \p{Block: Bamum_Supplement} (Short: \p{Blk=BamumSup},
- \p{InBamumSup}) (576)
- \p{Block: Basic_Latin} (Short: \p{Blk=ASCII}, \p{ASCII}) (128)
- \p{Block: Batak} (Single: \p{InBatak}; NOT \p{Batak} NOR
- \p{Is_Batak}) (64)
- \p{Block: Bengali} (Single: \p{InBengali}; NOT \p{Bengali}
- NOR \p{Is_Bengali}) (128)
- \p{Block: Block_Elements} (Single: \p{InBlockElements}) (32)
- \p{Block: Bopomofo} (Single: \p{InBopomofo}; NOT \p{Bopomofo}
- NOR \p{Is_Bopomofo}) (48)
- \p{Block: Bopomofo_Ext} \p{Block=Bopomofo_Extended} (32)
- \p{Block: Bopomofo_Extended} (Short: \p{Blk=BopomofoExt},
- \p{InBopomofoExt}) (32)
- \p{Block: Box_Drawing} (Single: \p{InBoxDrawing}) (128)
- \p{Block: Brahmi} (Single: \p{InBrahmi}; NOT \p{Brahmi} NOR
- \p{Is_Brahmi}) (128)
- \p{Block: Braille} \p{Block=Braille_Patterns} (256)
- \p{Block: Braille_Patterns} (Short: \p{Blk=Braille},
- \p{InBraille}) (256)
- \p{Block: Buginese} (Single: \p{InBuginese}; NOT \p{Buginese}
- NOR \p{Is_Buginese}) (32)
- \p{Block: Buhid} (Single: \p{InBuhid}; NOT \p{Buhid} NOR
- \p{Is_Buhid}) (32)
- \p{Block: Byzantine_Music} \p{Block=Byzantine_Musical_Symbols}
- (256)
- \p{Block: Byzantine_Musical_Symbols} (Short: \p{Blk=
- ByzantineMusic}, \p{InByzantineMusic})
- (256)
- \p{Block: Canadian_Syllabics} \p{Block=
- Unified_Canadian_Aboriginal_Syllabics}
- (640)
- \p{Block: Carian} (Single: \p{InCarian}; NOT \p{Carian} NOR
- \p{Is_Carian}) (64)
- \p{Block: Chakma} (Single: \p{InChakma}; NOT \p{Chakma} NOR
- \p{Is_Chakma}) (80)
- \p{Block: Cham} (Single: \p{InCham}; NOT \p{Cham} NOR
- \p{Is_Cham}) (96)
- \p{Block: Cherokee} (Single: \p{InCherokee}; NOT \p{Cherokee}
- NOR \p{Is_Cherokee}) (96)
- \p{Block: CJK} \p{Block=CJK_Unified_Ideographs} (20_992)
- \p{Block: CJK_Compat} \p{Block=CJK_Compatibility} (256)
- \p{Block: CJK_Compat_Forms} \p{Block=CJK_Compatibility_Forms} (32)
- \p{Block: CJK_Compat_Ideographs} \p{Block=
- CJK_Compatibility_Ideographs} (512)
- \p{Block: CJK_Compat_Ideographs_Sup} \p{Block=
- CJK_Compatibility_Ideographs_Supplement}
- (544)
- \p{Block: CJK_Compatibility} (Short: \p{Blk=CJKCompat},
- \p{InCJKCompat}) (256)
- \p{Block: CJK_Compatibility_Forms} (Short: \p{Blk=CJKCompatForms},
- \p{InCJKCompatForms}) (32)
- \p{Block: CJK_Compatibility_Ideographs} (Short: \p{Blk=
- CJKCompatIdeographs},
- \p{InCJKCompatIdeographs}) (512)
- \p{Block: CJK_Compatibility_Ideographs_Supplement} (Short: \p{Blk=
- CJKCompatIdeographsSup},
- \p{InCJKCompatIdeographsSup}) (544)
- \p{Block: CJK_Ext_A} \p{Block=
- CJK_Unified_Ideographs_Extension_A}
- (6592)
- \p{Block: CJK_Ext_B} \p{Block=
- CJK_Unified_Ideographs_Extension_B}
- (42_720)
- \p{Block: CJK_Ext_C} \p{Block=
- CJK_Unified_Ideographs_Extension_C}
- (4160)
- \p{Block: CJK_Ext_D} \p{Block=
- CJK_Unified_Ideographs_Extension_D} (224)
- \p{Block: CJK_Radicals_Sup} \p{Block=CJK_Radicals_Supplement} (128)
- \p{Block: CJK_Radicals_Supplement} (Short: \p{Blk=CJKRadicalsSup},
- \p{InCJKRadicalsSup}) (128)
- \p{Block: CJK_Strokes} (Single: \p{InCJKStrokes}) (48)
- \p{Block: CJK_Symbols} \p{Block=CJK_Symbols_And_Punctuation} (64)
- \p{Block: CJK_Symbols_And_Punctuation} (Short: \p{Blk=CJKSymbols},
- \p{InCJKSymbols}) (64)
- \p{Block: CJK_Unified_Ideographs} (Short: \p{Blk=CJK}, \p{InCJK})
- (20_992)
- \p{Block: CJK_Unified_Ideographs_Extension_A} (Short: \p{Blk=
- CJKExtA}, \p{InCJKExtA}) (6592)
- \p{Block: CJK_Unified_Ideographs_Extension_B} (Short: \p{Blk=
- CJKExtB}, \p{InCJKExtB}) (42_720)
- \p{Block: CJK_Unified_Ideographs_Extension_C} (Short: \p{Blk=
- CJKExtC}, \p{InCJKExtC}) (4160)
- \p{Block: CJK_Unified_Ideographs_Extension_D} (Short: \p{Blk=
- CJKExtD}, \p{InCJKExtD}) (224)
- \p{Block: Combining_Diacritical_Marks} (Short: \p{Blk=
- Diacriticals}, \p{InDiacriticals}) (112)
- \p{Block: Combining_Diacritical_Marks_For_Symbols} (Short: \p{Blk=
- DiacriticalsForSymbols},
- \p{InDiacriticalsForSymbols}) (48)
- \p{Block: Combining_Diacritical_Marks_Supplement} (Short: \p{Blk=
- DiacriticalsSup}, \p{InDiacriticalsSup})
- (64)
- \p{Block: Combining_Half_Marks} (Short: \p{Blk=HalfMarks},
- \p{InHalfMarks}) (16)
- \p{Block: Combining_Marks_For_Symbols} \p{Block=
- Combining_Diacritical_Marks_For_Symbols}
- (48)
- \p{Block: Common_Indic_Number_Forms} (Short: \p{Blk=
- IndicNumberForms},
- \p{InIndicNumberForms}) (16)
- \p{Block: Compat_Jamo} \p{Block=Hangul_Compatibility_Jamo} (96)
- \p{Block: Control_Pictures} (Single: \p{InControlPictures}) (64)
- \p{Block: Coptic} (Single: \p{InCoptic}; NOT \p{Coptic} NOR
- \p{Is_Coptic}) (128)
- \p{Block: Counting_Rod} \p{Block=Counting_Rod_Numerals} (32)
- \p{Block: Counting_Rod_Numerals} (Short: \p{Blk=CountingRod},
- \p{InCountingRod}) (32)
- \p{Block: Cuneiform} (Single: \p{InCuneiform}; NOT
- \p{Cuneiform} NOR \p{Is_Cuneiform})
- (1024)
- \p{Block: Cuneiform_Numbers} \p{Block=
- Cuneiform_Numbers_And_Punctuation} (128)
- \p{Block: Cuneiform_Numbers_And_Punctuation} (Short: \p{Blk=
- CuneiformNumbers},
- \p{InCuneiformNumbers}) (128)
- \p{Block: Currency_Symbols} (Single: \p{InCurrencySymbols}) (48)
- \p{Block: Cypriot_Syllabary} (Single: \p{InCypriotSyllabary}) (64)
- \p{Block: Cyrillic} (Single: \p{InCyrillic}; NOT \p{Cyrillic}
- NOR \p{Is_Cyrillic}) (256)
- \p{Block: Cyrillic_Ext_A} \p{Block=Cyrillic_Extended_A} (32)
- \p{Block: Cyrillic_Ext_B} \p{Block=Cyrillic_Extended_B} (96)
- \p{Block: Cyrillic_Extended_A} (Short: \p{Blk=CyrillicExtA},
- \p{InCyrillicExtA}) (32)
- \p{Block: Cyrillic_Extended_B} (Short: \p{Blk=CyrillicExtB},
- \p{InCyrillicExtB}) (96)
- \p{Block: Cyrillic_Sup} \p{Block=Cyrillic_Supplement} (48)
- \p{Block: Cyrillic_Supplement} (Short: \p{Blk=CyrillicSup},
- \p{InCyrillicSup}) (48)
- \p{Block: Cyrillic_Supplementary} \p{Block=Cyrillic_Supplement}
- (48)
- \p{Block: Deseret} (Single: \p{InDeseret}) (80)
- \p{Block: Devanagari} (Single: \p{InDevanagari}; NOT
- \p{Devanagari} NOR \p{Is_Devanagari})
- (128)
- \p{Block: Devanagari_Ext} \p{Block=Devanagari_Extended} (32)
- \p{Block: Devanagari_Extended} (Short: \p{Blk=DevanagariExt},
- \p{InDevanagariExt}) (32)
- \p{Block: Diacriticals} \p{Block=Combining_Diacritical_Marks} (112)
- \p{Block: Diacriticals_For_Symbols} \p{Block=
- Combining_Diacritical_Marks_For_Symbols}
- (48)
- \p{Block: Diacriticals_Sup} \p{Block=
- Combining_Diacritical_Marks_Supplement}
- (64)
- \p{Block: Dingbats} (Single: \p{InDingbats}) (192)
- \p{Block: Domino} \p{Block=Domino_Tiles} (112)
- \p{Block: Domino_Tiles} (Short: \p{Blk=Domino}, \p{InDomino}) (112)
- \p{Block: Egyptian_Hieroglyphs} (Single:
- \p{InEgyptianHieroglyphs}; NOT
- \p{Egyptian_Hieroglyphs} NOR
- \p{Is_Egyptian_Hieroglyphs}) (1072)
- \p{Block: Emoticons} (Single: \p{InEmoticons}) (80)
- \p{Block: Enclosed_Alphanum} \p{Block=Enclosed_Alphanumerics} (160)
- \p{Block: Enclosed_Alphanum_Sup} \p{Block=
- Enclosed_Alphanumeric_Supplement} (256)
- \p{Block: Enclosed_Alphanumeric_Supplement} (Short: \p{Blk=
- EnclosedAlphanumSup},
- \p{InEnclosedAlphanumSup}) (256)
- \p{Block: Enclosed_Alphanumerics} (Short: \p{Blk=
- EnclosedAlphanum},
- \p{InEnclosedAlphanum}) (160)
- \p{Block: Enclosed_CJK} \p{Block=Enclosed_CJK_Letters_And_Months}
- (256)
- \p{Block: Enclosed_CJK_Letters_And_Months} (Short: \p{Blk=
- EnclosedCJK}, \p{InEnclosedCJK}) (256)
- \p{Block: Enclosed_Ideographic_Sup} \p{Block=
- Enclosed_Ideographic_Supplement} (256)
- \p{Block: Enclosed_Ideographic_Supplement} (Short: \p{Blk=
- EnclosedIdeographicSup},
- \p{InEnclosedIdeographicSup}) (256)
- \p{Block: Ethiopic} (Single: \p{InEthiopic}; NOT \p{Ethiopic}
- NOR \p{Is_Ethiopic}) (384)
- \p{Block: Ethiopic_Ext} \p{Block=Ethiopic_Extended} (96)
- \p{Block: Ethiopic_Ext_A} \p{Block=Ethiopic_Extended_A} (48)
- \p{Block: Ethiopic_Extended} (Short: \p{Blk=EthiopicExt},
- \p{InEthiopicExt}) (96)
- \p{Block: Ethiopic_Extended_A} (Short: \p{Blk=EthiopicExtA},
- \p{InEthiopicExtA}) (48)
- \p{Block: Ethiopic_Sup} \p{Block=Ethiopic_Supplement} (32)
- \p{Block: Ethiopic_Supplement} (Short: \p{Blk=EthiopicSup},
- \p{InEthiopicSup}) (32)
- \p{Block: General_Punctuation} (Short: \p{Blk=Punctuation},
- \p{InPunctuation}; NOT \p{Punct} NOR
- \p{Is_Punctuation}) (112)
- \p{Block: Geometric_Shapes} (Single: \p{InGeometricShapes}) (96)
- \p{Block: Georgian} (Single: \p{InGeorgian}; NOT \p{Georgian}
- NOR \p{Is_Georgian}) (96)
- \p{Block: Georgian_Sup} \p{Block=Georgian_Supplement} (48)
- \p{Block: Georgian_Supplement} (Short: \p{Blk=GeorgianSup},
- \p{InGeorgianSup}) (48)
- \p{Block: Glagolitic} (Single: \p{InGlagolitic}; NOT
- \p{Glagolitic} NOR \p{Is_Glagolitic})
- (96)
- \p{Block: Gothic} (Single: \p{InGothic}; NOT \p{Gothic} NOR
- \p{Is_Gothic}) (32)
- \p{Block: Greek} \p{Block=Greek_And_Coptic} (NOT \p{Greek}
- NOR \p{Is_Greek}) (144)
- \p{Block: Greek_And_Coptic} (Short: \p{Blk=Greek}, \p{InGreek};
- NOT \p{Greek} NOR \p{Is_Greek}) (144)
- \p{Block: Greek_Ext} \p{Block=Greek_Extended} (256)
- \p{Block: Greek_Extended} (Short: \p{Blk=GreekExt},
- \p{InGreekExt}) (256)
- \p{Block: Gujarati} (Single: \p{InGujarati}; NOT \p{Gujarati}
- NOR \p{Is_Gujarati}) (128)
- \p{Block: Gurmukhi} (Single: \p{InGurmukhi}; NOT \p{Gurmukhi}
- NOR \p{Is_Gurmukhi}) (128)
- \p{Block: Half_And_Full_Forms} \p{Block=
- Halfwidth_And_Fullwidth_Forms} (240)
- \p{Block: Half_Marks} \p{Block=Combining_Half_Marks} (16)
- \p{Block: Halfwidth_And_Fullwidth_Forms} (Short: \p{Blk=
- HalfAndFullForms},
- \p{InHalfAndFullForms}) (240)
- \p{Block: Hangul} \p{Block=Hangul_Syllables} (NOT \p{Hangul}
- NOR \p{Is_Hangul}) (11_184)
- \p{Block: Hangul_Compatibility_Jamo} (Short: \p{Blk=CompatJamo},
- \p{InCompatJamo}) (96)
- \p{Block: Hangul_Jamo} (Short: \p{Blk=Jamo}, \p{InJamo}) (256)
- \p{Block: Hangul_Jamo_Extended_A} (Short: \p{Blk=JamoExtA},
- \p{InJamoExtA}) (32)
- \p{Block: Hangul_Jamo_Extended_B} (Short: \p{Blk=JamoExtB},
- \p{InJamoExtB}) (80)
- \p{Block: Hangul_Syllables} (Short: \p{Blk=Hangul}, \p{InHangul};
- NOT \p{Hangul} NOR \p{Is_Hangul})
- (11_184)
- \p{Block: Hanunoo} (Single: \p{InHanunoo}; NOT \p{Hanunoo}
- NOR \p{Is_Hanunoo}) (32)
- \p{Block: Hebrew} (Single: \p{InHebrew}; NOT \p{Hebrew} NOR
- \p{Is_Hebrew}) (112)
- \p{Block: High_Private_Use_Surrogates} (Short: \p{Blk=
- HighPUSurrogates},
- \p{InHighPUSurrogates}) (128)
- \p{Block: High_PU_Surrogates} \p{Block=
- High_Private_Use_Surrogates} (128)
- \p{Block: High_Surrogates} (Single: \p{InHighSurrogates}) (896)
- \p{Block: Hiragana} (Single: \p{InHiragana}; NOT \p{Hiragana}
- NOR \p{Is_Hiragana}) (96)
- \p{Block: IDC} \p{Block=
- Ideographic_Description_Characters} (NOT
- \p{ID_Continue} NOR \p{Is_IDC}) (16)
- \p{Block: Ideographic_Description_Characters} (Short: \p{Blk=IDC},
- \p{InIDC}; NOT \p{ID_Continue} NOR
- \p{Is_IDC}) (16)
- \p{Block: Imperial_Aramaic} (Single: \p{InImperialAramaic}; NOT
- \p{Imperial_Aramaic} NOR
- \p{Is_Imperial_Aramaic}) (32)
- \p{Block: Indic_Number_Forms} \p{Block=Common_Indic_Number_Forms}
- (16)
- \p{Block: Inscriptional_Pahlavi} (Single:
- \p{InInscriptionalPahlavi}; NOT
- \p{Inscriptional_Pahlavi} NOR
- \p{Is_Inscriptional_Pahlavi}) (32)
- \p{Block: Inscriptional_Parthian} (Single:
- \p{InInscriptionalParthian}; NOT
- \p{Inscriptional_Parthian} NOR
- \p{Is_Inscriptional_Parthian}) (32)
- \p{Block: IPA_Ext} \p{Block=IPA_Extensions} (96)
- \p{Block: IPA_Extensions} (Short: \p{Blk=IPAExt}, \p{InIPAExt})
- (96)
- \p{Block: Jamo} \p{Block=Hangul_Jamo} (256)
- \p{Block: Jamo_Ext_A} \p{Block=Hangul_Jamo_Extended_A} (32)
- \p{Block: Jamo_Ext_B} \p{Block=Hangul_Jamo_Extended_B} (80)
- \p{Block: Javanese} (Single: \p{InJavanese}; NOT \p{Javanese}
- NOR \p{Is_Javanese}) (96)
- \p{Block: Kaithi} (Single: \p{InKaithi}; NOT \p{Kaithi} NOR
- \p{Is_Kaithi}) (80)
- \p{Block: Kana_Sup} \p{Block=Kana_Supplement} (256)
- \p{Block: Kana_Supplement} (Short: \p{Blk=KanaSup}, \p{InKanaSup})
- (256)
- \p{Block: Kanbun} (Single: \p{InKanbun}) (16)
- \p{Block: Kangxi} \p{Block=Kangxi_Radicals} (224)
- \p{Block: Kangxi_Radicals} (Short: \p{Blk=Kangxi}, \p{InKangxi})
- (224)
- \p{Block: Kannada} (Single: \p{InKannada}; NOT \p{Kannada}
- NOR \p{Is_Kannada}) (128)
- \p{Block: Katakana} (Single: \p{InKatakana}; NOT \p{Katakana}
- NOR \p{Is_Katakana}) (96)
- \p{Block: Katakana_Ext} \p{Block=Katakana_Phonetic_Extensions} (16)
- \p{Block: Katakana_Phonetic_Extensions} (Short: \p{Blk=
- KatakanaExt}, \p{InKatakanaExt}) (16)
- \p{Block: Kayah_Li} (Single: \p{InKayahLi}) (48)
- \p{Block: Kharoshthi} (Single: \p{InKharoshthi}; NOT
- \p{Kharoshthi} NOR \p{Is_Kharoshthi})
- (96)
- \p{Block: Khmer} (Single: \p{InKhmer}; NOT \p{Khmer} NOR
- \p{Is_Khmer}) (128)
- \p{Block: Khmer_Symbols} (Single: \p{InKhmerSymbols}) (32)
- \p{Block: Lao} (Single: \p{InLao}; NOT \p{Lao} NOR
- \p{Is_Lao}) (128)
- \p{Block: Latin_1} \p{Block=Latin_1_Supplement} (128)
- \p{Block: Latin_1_Sup} \p{Block=Latin_1_Supplement} (128)
- \p{Block: Latin_1_Supplement} (Short: \p{Blk=Latin1},
- \p{InLatin1}) (128)
- \p{Block: Latin_Ext_A} \p{Block=Latin_Extended_A} (128)
- \p{Block: Latin_Ext_Additional} \p{Block=
- Latin_Extended_Additional} (256)
- \p{Block: Latin_Ext_B} \p{Block=Latin_Extended_B} (208)
- \p{Block: Latin_Ext_C} \p{Block=Latin_Extended_C} (32)
- \p{Block: Latin_Ext_D} \p{Block=Latin_Extended_D} (224)
- \p{Block: Latin_Extended_A} (Short: \p{Blk=LatinExtA},
- \p{InLatinExtA}) (128)
- \p{Block: Latin_Extended_Additional} (Short: \p{Blk=
- LatinExtAdditional},
- \p{InLatinExtAdditional}) (256)
- \p{Block: Latin_Extended_B} (Short: \p{Blk=LatinExtB},
- \p{InLatinExtB}) (208)
- \p{Block: Latin_Extended_C} (Short: \p{Blk=LatinExtC},
- \p{InLatinExtC}) (32)
- \p{Block: Latin_Extended_D} (Short: \p{Blk=LatinExtD},
- \p{InLatinExtD}) (224)
- \p{Block: Lepcha} (Single: \p{InLepcha}; NOT \p{Lepcha} NOR
- \p{Is_Lepcha}) (80)
- \p{Block: Letterlike_Symbols} (Single: \p{InLetterlikeSymbols})
- (80)
- \p{Block: Limbu} (Single: \p{InLimbu}; NOT \p{Limbu} NOR
- \p{Is_Limbu}) (80)
- \p{Block: Linear_B_Ideograms} (Single: \p{InLinearBIdeograms})
- (128)
- \p{Block: Linear_B_Syllabary} (Single: \p{InLinearBSyllabary})
- (128)
- \p{Block: Lisu} (Single: \p{InLisu}) (48)
- \p{Block: Low_Surrogates} (Single: \p{InLowSurrogates}) (1024)
- \p{Block: Lycian} (Single: \p{InLycian}; NOT \p{Lycian} NOR
- \p{Is_Lycian}) (32)
- \p{Block: Lydian} (Single: \p{InLydian}; NOT \p{Lydian} NOR
- \p{Is_Lydian}) (32)
- \p{Block: Mahjong} \p{Block=Mahjong_Tiles} (48)
- \p{Block: Mahjong_Tiles} (Short: \p{Blk=Mahjong}, \p{InMahjong})
- (48)
- \p{Block: Malayalam} (Single: \p{InMalayalam}; NOT
- \p{Malayalam} NOR \p{Is_Malayalam}) (128)
- \p{Block: Mandaic} (Single: \p{InMandaic}; NOT \p{Mandaic}
- NOR \p{Is_Mandaic}) (32)
- \p{Block: Math_Alphanum} \p{Block=
- Mathematical_Alphanumeric_Symbols} (1024)
- \p{Block: Math_Operators} \p{Block=Mathematical_Operators} (256)
- \p{Block: Mathematical_Alphanumeric_Symbols} (Short: \p{Blk=
- MathAlphanum}, \p{InMathAlphanum}) (1024)
- \p{Block: Mathematical_Operators} (Short: \p{Blk=MathOperators},
- \p{InMathOperators}) (256)
- \p{Block: Meetei_Mayek} (Single: \p{InMeeteiMayek}; NOT
- \p{Meetei_Mayek} NOR
- \p{Is_Meetei_Mayek}) (64)
- \p{Block: Meetei_Mayek_Ext} \p{Block=Meetei_Mayek_Extensions} (32)
- \p{Block: Meetei_Mayek_Extensions} (Short: \p{Blk=MeeteiMayekExt},
- \p{InMeeteiMayekExt}) (32)
- \p{Block: Meroitic_Cursive} (Single: \p{InMeroiticCursive}; NOT
- \p{Meroitic_Cursive} NOR
- \p{Is_Meroitic_Cursive}) (96)
- \p{Block: Meroitic_Hieroglyphs} (Single:
- \p{InMeroiticHieroglyphs}) (32)
- \p{Block: Miao} (Single: \p{InMiao}; NOT \p{Miao} NOR
- \p{Is_Miao}) (160)
- \p{Block: Misc_Arrows} \p{Block=Miscellaneous_Symbols_And_Arrows}
- (256)
- \p{Block: Misc_Math_Symbols_A} \p{Block=
- Miscellaneous_Mathematical_Symbols_A}
- (48)
- \p{Block: Misc_Math_Symbols_B} \p{Block=
- Miscellaneous_Mathematical_Symbols_B}
- (128)
- \p{Block: Misc_Pictographs} \p{Block=
- Miscellaneous_Symbols_And_Pictographs}
- (768)
- \p{Block: Misc_Symbols} \p{Block=Miscellaneous_Symbols} (256)
- \p{Block: Misc_Technical} \p{Block=Miscellaneous_Technical} (256)
- \p{Block: Miscellaneous_Mathematical_Symbols_A} (Short: \p{Blk=
- MiscMathSymbolsA},
- \p{InMiscMathSymbolsA}) (48)
- \p{Block: Miscellaneous_Mathematical_Symbols_B} (Short: \p{Blk=
- MiscMathSymbolsB},
- \p{InMiscMathSymbolsB}) (128)
- \p{Block: Miscellaneous_Symbols} (Short: \p{Blk=MiscSymbols},
- \p{InMiscSymbols}) (256)
- \p{Block: Miscellaneous_Symbols_And_Arrows} (Short: \p{Blk=
- MiscArrows}, \p{InMiscArrows}) (256)
- \p{Block: Miscellaneous_Symbols_And_Pictographs} (Short: \p{Blk=
- MiscPictographs}, \p{InMiscPictographs})
- (768)
- \p{Block: Miscellaneous_Technical} (Short: \p{Blk=MiscTechnical},
- \p{InMiscTechnical}) (256)
- \p{Block: Modifier_Letters} \p{Block=Spacing_Modifier_Letters} (80)
- \p{Block: Modifier_Tone_Letters} (Single:
- \p{InModifierToneLetters}) (32)
- \p{Block: Mongolian} (Single: \p{InMongolian}; NOT
- \p{Mongolian} NOR \p{Is_Mongolian}) (176)
- \p{Block: Music} \p{Block=Musical_Symbols} (256)
- \p{Block: Musical_Symbols} (Short: \p{Blk=Music}, \p{InMusic})
- (256)
- \p{Block: Myanmar} (Single: \p{InMyanmar}; NOT \p{Myanmar}
- NOR \p{Is_Myanmar}) (160)
- \p{Block: Myanmar_Ext_A} \p{Block=Myanmar_Extended_A} (32)
- \p{Block: Myanmar_Extended_A} (Short: \p{Blk=MyanmarExtA},
- \p{InMyanmarExtA}) (32)
- \p{Block: NB} \p{Block=No_Block} (860_672)
- \p{Block: New_Tai_Lue} (Single: \p{InNewTaiLue}; NOT
- \p{New_Tai_Lue} NOR \p{Is_New_Tai_Lue})
- (96)
- \p{Block: NKo} (Single: \p{InNKo}; NOT \p{Nko} NOR
- \p{Is_NKo}) (64)
- \p{Block: No_Block} (Short: \p{Blk=NB}, \p{InNB}) (860_672)
- \p{Block: Number_Forms} (Single: \p{InNumberForms}) (64)
- \p{Block: OCR} \p{Block=Optical_Character_Recognition}
- (32)
- \p{Block: Ogham} (Single: \p{InOgham}; NOT \p{Ogham} NOR
- \p{Is_Ogham}) (32)
- \p{Block: Ol_Chiki} (Single: \p{InOlChiki}) (48)
- \p{Block: Old_Italic} (Single: \p{InOldItalic}; NOT
- \p{Old_Italic} NOR \p{Is_Old_Italic})
- (48)
- \p{Block: Old_Persian} (Single: \p{InOldPersian}; NOT
- \p{Old_Persian} NOR \p{Is_Old_Persian})
- (64)
- \p{Block: Old_South_Arabian} (Single: \p{InOldSouthArabian}) (32)
- \p{Block: Old_Turkic} (Single: \p{InOldTurkic}; NOT
- \p{Old_Turkic} NOR \p{Is_Old_Turkic})
- (80)
- \p{Block: Optical_Character_Recognition} (Short: \p{Blk=OCR},
- \p{InOCR}) (32)
- \p{Block: Oriya} (Single: \p{InOriya}; NOT \p{Oriya} NOR
- \p{Is_Oriya}) (128)
- \p{Block: Osmanya} (Single: \p{InOsmanya}; NOT \p{Osmanya}
- NOR \p{Is_Osmanya}) (48)
- \p{Block: Phags_Pa} (Single: \p{InPhagsPa}; NOT \p{Phags_Pa}
- NOR \p{Is_Phags_Pa}) (64)
- \p{Block: Phaistos} \p{Block=Phaistos_Disc} (48)
- \p{Block: Phaistos_Disc} (Short: \p{Blk=Phaistos}, \p{InPhaistos})
- (48)
- \p{Block: Phoenician} (Single: \p{InPhoenician}; NOT
- \p{Phoenician} NOR \p{Is_Phoenician})
- (32)
- \p{Block: Phonetic_Ext} \p{Block=Phonetic_Extensions} (128)
- \p{Block: Phonetic_Ext_Sup} \p{Block=
- Phonetic_Extensions_Supplement} (64)
- \p{Block: Phonetic_Extensions} (Short: \p{Blk=PhoneticExt},
- \p{InPhoneticExt}) (128)
- \p{Block: Phonetic_Extensions_Supplement} (Short: \p{Blk=
- PhoneticExtSup}, \p{InPhoneticExtSup})
- (64)
- \p{Block: Playing_Cards} (Single: \p{InPlayingCards}) (96)
- \p{Block: Private_Use} \p{Block=Private_Use_Area} (NOT
- \p{Private_Use} NOR \p{Is_Private_Use})
- (6400)
- \p{Block: Private_Use_Area} (Short: \p{Blk=PUA}, \p{InPUA}; NOT
- \p{Private_Use} NOR \p{Is_Private_Use})
- (6400)
- \p{Block: PUA} \p{Block=Private_Use_Area} (NOT
- \p{Private_Use} NOR \p{Is_Private_Use})
- (6400)
- \p{Block: Punctuation} \p{Block=General_Punctuation} (NOT
- \p{Punct} NOR \p{Is_Punctuation}) (112)
- \p{Block: Rejang} (Single: \p{InRejang}; NOT \p{Rejang} NOR
- \p{Is_Rejang}) (48)
- \p{Block: Rumi} \p{Block=Rumi_Numeral_Symbols} (32)
- \p{Block: Rumi_Numeral_Symbols} (Short: \p{Blk=Rumi}, \p{InRumi})
- (32)
- \p{Block: Runic} (Single: \p{InRunic}; NOT \p{Runic} NOR
- \p{Is_Runic}) (96)
- \p{Block: Samaritan} (Single: \p{InSamaritan}; NOT
- \p{Samaritan} NOR \p{Is_Samaritan}) (64)
- \p{Block: Saurashtra} (Single: \p{InSaurashtra}; NOT
- \p{Saurashtra} NOR \p{Is_Saurashtra})
- (96)
- \p{Block: Sharada} (Single: \p{InSharada}; NOT \p{Sharada}
- NOR \p{Is_Sharada}) (96)
- \p{Block: Shavian} (Single: \p{InShavian}) (48)
- \p{Block: Sinhala} (Single: \p{InSinhala}; NOT \p{Sinhala}
- NOR \p{Is_Sinhala}) (128)
- \p{Block: Small_Form_Variants} (Short: \p{Blk=SmallForms},
- \p{InSmallForms}) (32)
- \p{Block: Small_Forms} \p{Block=Small_Form_Variants} (32)
- \p{Block: Sora_Sompeng} (Single: \p{InSoraSompeng}; NOT
- \p{Sora_Sompeng} NOR
- \p{Is_Sora_Sompeng}) (48)
- \p{Block: Spacing_Modifier_Letters} (Short: \p{Blk=
- ModifierLetters}, \p{InModifierLetters})
- (80)
- \p{Block: Specials} (Single: \p{InSpecials}) (16)
- \p{Block: Sundanese} (Single: \p{InSundanese}; NOT
- \p{Sundanese} NOR \p{Is_Sundanese}) (64)
- \p{Block: Sundanese_Sup} \p{Block=Sundanese_Supplement} (16)
- \p{Block: Sundanese_Supplement} (Short: \p{Blk=SundaneseSup},
- \p{InSundaneseSup}) (16)
- \p{Block: Sup_Arrows_A} \p{Block=Supplemental_Arrows_A} (16)
- \p{Block: Sup_Arrows_B} \p{Block=Supplemental_Arrows_B} (128)
- \p{Block: Sup_Math_Operators} \p{Block=
- Supplemental_Mathematical_Operators}
- (256)
- \p{Block: Sup_PUA_A} \p{Block=Supplementary_Private_Use_Area_A}
- (65_536)
- \p{Block: Sup_PUA_B} \p{Block=Supplementary_Private_Use_Area_B}
- (65_536)
- \p{Block: Sup_Punctuation} \p{Block=Supplemental_Punctuation} (128)
- \p{Block: Super_And_Sub} \p{Block=Superscripts_And_Subscripts} (48)
- \p{Block: Superscripts_And_Subscripts} (Short: \p{Blk=
- SuperAndSub}, \p{InSuperAndSub}) (48)
- \p{Block: Supplemental_Arrows_A} (Short: \p{Blk=SupArrowsA},
- \p{InSupArrowsA}) (16)
- \p{Block: Supplemental_Arrows_B} (Short: \p{Blk=SupArrowsB},
- \p{InSupArrowsB}) (128)
- \p{Block: Supplemental_Mathematical_Operators} (Short: \p{Blk=
- SupMathOperators},
- \p{InSupMathOperators}) (256)
- \p{Block: Supplemental_Punctuation} (Short: \p{Blk=
- SupPunctuation}, \p{InSupPunctuation})
- (128)
- \p{Block: Supplementary_Private_Use_Area_A} (Short: \p{Blk=
- SupPUAA}, \p{InSupPUAA}) (65_536)
- \p{Block: Supplementary_Private_Use_Area_B} (Short: \p{Blk=
- SupPUAB}, \p{InSupPUAB}) (65_536)
- \p{Block: Syloti_Nagri} (Single: \p{InSylotiNagri}; NOT
- \p{Syloti_Nagri} NOR
- \p{Is_Syloti_Nagri}) (48)
- \p{Block: Syriac} (Single: \p{InSyriac}; NOT \p{Syriac} NOR
- \p{Is_Syriac}) (80)
- \p{Block: Tagalog} (Single: \p{InTagalog}; NOT \p{Tagalog}
- NOR \p{Is_Tagalog}) (32)
- \p{Block: Tagbanwa} (Single: \p{InTagbanwa}; NOT \p{Tagbanwa}
- NOR \p{Is_Tagbanwa}) (32)
- \p{Block: Tags} (Single: \p{InTags}) (128)
- \p{Block: Tai_Le} (Single: \p{InTaiLe}; NOT \p{Tai_Le} NOR
- \p{Is_Tai_Le}) (48)
- \p{Block: Tai_Tham} (Single: \p{InTaiTham}; NOT \p{Tai_Tham}
- NOR \p{Is_Tai_Tham}) (144)
- \p{Block: Tai_Viet} (Single: \p{InTaiViet}; NOT \p{Tai_Viet}
- NOR \p{Is_Tai_Viet}) (96)
- \p{Block: Tai_Xuan_Jing} \p{Block=Tai_Xuan_Jing_Symbols} (96)
- \p{Block: Tai_Xuan_Jing_Symbols} (Short: \p{Blk=TaiXuanJing},
- \p{InTaiXuanJing}) (96)
- \p{Block: Takri} (Single: \p{InTakri}; NOT \p{Takri} NOR
- \p{Is_Takri}) (80)
- \p{Block: Tamil} (Single: \p{InTamil}; NOT \p{Tamil} NOR
- \p{Is_Tamil}) (128)
- \p{Block: Telugu} (Single: \p{InTelugu}; NOT \p{Telugu} NOR
- \p{Is_Telugu}) (128)
- \p{Block: Thaana} (Single: \p{InThaana}; NOT \p{Thaana} NOR
- \p{Is_Thaana}) (64)
- \p{Block: Thai} (Single: \p{InThai}; NOT \p{Thai} NOR
- \p{Is_Thai}) (128)
- \p{Block: Tibetan} (Single: \p{InTibetan}; NOT \p{Tibetan}
- NOR \p{Is_Tibetan}) (256)
- \p{Block: Tifinagh} (Single: \p{InTifinagh}; NOT \p{Tifinagh}
- NOR \p{Is_Tifinagh}) (80)
- \p{Block: Transport_And_Map} \p{Block=Transport_And_Map_Symbols}
- (128)
- \p{Block: Transport_And_Map_Symbols} (Short: \p{Blk=
- TransportAndMap}, \p{InTransportAndMap})
- (128)
- \p{Block: UCAS} \p{Block=
- Unified_Canadian_Aboriginal_Syllabics}
- (640)
- \p{Block: UCAS_Ext} \p{Block=
- Unified_Canadian_Aboriginal_Syllabics_-
- Extended} (80)
- \p{Block: Ugaritic} (Single: \p{InUgaritic}; NOT \p{Ugaritic}
- NOR \p{Is_Ugaritic}) (32)
- \p{Block: Unified_Canadian_Aboriginal_Syllabics} (Short: \p{Blk=
- UCAS}, \p{InUCAS}) (640)
- \p{Block: Unified_Canadian_Aboriginal_Syllabics_Extended} (Short:
- \p{Blk=UCASExt}, \p{InUCASExt}) (80)
- \p{Block: Vai} (Single: \p{InVai}; NOT \p{Vai} NOR
- \p{Is_Vai}) (320)
- \p{Block: Variation_Selectors} (Short: \p{Blk=VS}, \p{InVS}; NOT
- \p{Variation_Selector} NOR \p{Is_VS})
- (16)
- \p{Block: Variation_Selectors_Supplement} (Short: \p{Blk=VSSup},
- \p{InVSSup}) (240)
- \p{Block: Vedic_Ext} \p{Block=Vedic_Extensions} (48)
- \p{Block: Vedic_Extensions} (Short: \p{Blk=VedicExt},
- \p{InVedicExt}) (48)
- \p{Block: Vertical_Forms} (Single: \p{InVerticalForms}) (16)
- \p{Block: VS} \p{Block=Variation_Selectors} (NOT
- \p{Variation_Selector} NOR \p{Is_VS})
- (16)
- \p{Block: VS_Sup} \p{Block=Variation_Selectors_Supplement}
- (240)
- \p{Block: Yi_Radicals} (Single: \p{InYiRadicals}) (64)
- \p{Block: Yi_Syllables} (Single: \p{InYiSyllables}) (1168)
- \p{Block: Yijing} \p{Block=Yijing_Hexagram_Symbols} (64)
- \p{Block: Yijing_Hexagram_Symbols} (Short: \p{Blk=Yijing},
- \p{InYijing}) (64)
- X \p{Block_Elements} \p{Block=Block_Elements} (32)
- \p{Bopo} \p{Bopomofo} (= \p{Script=Bopomofo}) (NOT
- \p{Block=Bopomofo}) (70)
- \p{Bopomofo} \p{Script=Bopomofo} (Short: \p{Bopo}; NOT
- \p{Block=Bopomofo}) (70)
- X \p{Bopomofo_Ext} \p{Bopomofo_Extended} (= \p{Block=
- Bopomofo_Extended}) (32)
- X \p{Bopomofo_Extended} \p{Block=Bopomofo_Extended} (Short:
- \p{InBopomofoExt}) (32)
- X \p{Box_Drawing} \p{Block=Box_Drawing} (128)
- \p{Brah} \p{Brahmi} (= \p{Script=Brahmi}) (NOT
- \p{Block=Brahmi}) (108)
- \p{Brahmi} \p{Script=Brahmi} (Short: \p{Brah}; NOT
- \p{Block=Brahmi}) (108)
- \p{Brai} \p{Braille} (= \p{Script=Braille}) (256)
- \p{Braille} \p{Script=Braille} (Short: \p{Brai}) (256)
- X \p{Braille_Patterns} \p{Block=Braille_Patterns} (Short:
- \p{InBraille}) (256)
- \p{Bugi} \p{Buginese} (= \p{Script=Buginese}) (NOT
- \p{Block=Buginese}) (30)
- \p{Buginese} \p{Script=Buginese} (Short: \p{Bugi}; NOT
- \p{Block=Buginese}) (30)
- \p{Buhd} \p{Buhid} (= \p{Script=Buhid}) (NOT
- \p{Block=Buhid}) (20)
- \p{Buhid} \p{Script=Buhid} (Short: \p{Buhd}; NOT
- \p{Block=Buhid}) (20)
- X \p{Byzantine_Music} \p{Byzantine_Musical_Symbols} (= \p{Block=
- Byzantine_Musical_Symbols}) (256)
- X \p{Byzantine_Musical_Symbols} \p{Block=Byzantine_Musical_Symbols}
- (Short: \p{InByzantineMusic}) (256)
- \p{C} \p{Other} (= \p{General_Category=Other})
- (1_004_134)
- \p{Cakm} \p{Chakma} (= \p{Script=Chakma}) (NOT
- \p{Block=Chakma}) (67)
- \p{Canadian_Aboriginal} \p{Script=Canadian_Aboriginal} (Short:
- \p{Cans}) (710)
- X \p{Canadian_Syllabics} \p{Unified_Canadian_Aboriginal_Syllabics}
- (= \p{Block=
- Unified_Canadian_Aboriginal_Syllabics})
- (640)
- T \p{Canonical_Combining_Class: 0} \p{Canonical_Combining_Class=
- Not_Reordered} (1_113_459)
- T \p{Canonical_Combining_Class: 1} \p{Canonical_Combining_Class=
- Overlay} (26)
- T \p{Canonical_Combining_Class: 7} \p{Canonical_Combining_Class=
- Nukta} (13)
- T \p{Canonical_Combining_Class: 8} \p{Canonical_Combining_Class=
- Kana_Voicing} (2)
- T \p{Canonical_Combining_Class: 9} \p{Canonical_Combining_Class=
- Virama} (37)
- T \p{Canonical_Combining_Class: 10} \p{Canonical_Combining_Class=
- CCC10} (1)
- T \p{Canonical_Combining_Class: 11} \p{Canonical_Combining_Class=
- CCC11} (1)
- T \p{Canonical_Combining_Class: 12} \p{Canonical_Combining_Class=
- CCC12} (1)
- T \p{Canonical_Combining_Class: 13} \p{Canonical_Combining_Class=
- CCC13} (1)
- T \p{Canonical_Combining_Class: 14} \p{Canonical_Combining_Class=
- CCC14} (1)
- T \p{Canonical_Combining_Class: 15} \p{Canonical_Combining_Class=
- CCC15} (1)
- T \p{Canonical_Combining_Class: 16} \p{Canonical_Combining_Class=
- CCC16} (1)
- T \p{Canonical_Combining_Class: 17} \p{Canonical_Combining_Class=
- CCC17} (1)
- T \p{Canonical_Combining_Class: 18} \p{Canonical_Combining_Class=
- CCC18} (2)
- T \p{Canonical_Combining_Class: 19} \p{Canonical_Combining_Class=
- CCC19} (2)
- T \p{Canonical_Combining_Class: 20} \p{Canonical_Combining_Class=
- CCC20} (1)
- T \p{Canonical_Combining_Class: 21} \p{Canonical_Combining_Class=
- CCC21} (1)
- T \p{Canonical_Combining_Class: 22} \p{Canonical_Combining_Class=
- CCC22} (1)
- T \p{Canonical_Combining_Class: 23} \p{Canonical_Combining_Class=
- CCC23} (1)
- T \p{Canonical_Combining_Class: 24} \p{Canonical_Combining_Class=
- CCC24} (1)
- T \p{Canonical_Combining_Class: 25} \p{Canonical_Combining_Class=
- CCC25} (1)
- T \p{Canonical_Combining_Class: 26} \p{Canonical_Combining_Class=
- CCC26} (1)
- T \p{Canonical_Combining_Class: 27} \p{Canonical_Combining_Class=
- CCC27} (2)
- T \p{Canonical_Combining_Class: 28} \p{Canonical_Combining_Class=
- CCC28} (2)
- T \p{Canonical_Combining_Class: 29} \p{Canonical_Combining_Class=
- CCC29} (2)
- T \p{Canonical_Combining_Class: 30} \p{Canonical_Combining_Class=
- CCC30} (2)
- T \p{Canonical_Combining_Class: 31} \p{Canonical_Combining_Class=
- CCC31} (2)
- T \p{Canonical_Combining_Class: 32} \p{Canonical_Combining_Class=
- CCC32} (2)
- T \p{Canonical_Combining_Class: 33} \p{Canonical_Combining_Class=
- CCC33} (1)
- T \p{Canonical_Combining_Class: 34} \p{Canonical_Combining_Class=
- CCC34} (1)
- T \p{Canonical_Combining_Class: 35} \p{Canonical_Combining_Class=
- CCC35} (1)
- T \p{Canonical_Combining_Class: 36} \p{Canonical_Combining_Class=
- CCC36} (1)
- T \p{Canonical_Combining_Class: 84} \p{Canonical_Combining_Class=
- CCC84} (1)
- T \p{Canonical_Combining_Class: 91} \p{Canonical_Combining_Class=
- CCC91} (1)
- T \p{Canonical_Combining_Class: 103} \p{Canonical_Combining_Class=
- CCC103} (2)
- T \p{Canonical_Combining_Class: 107} \p{Canonical_Combining_Class=
- CCC107} (4)
- T \p{Canonical_Combining_Class: 118} \p{Canonical_Combining_Class=
- CCC118} (2)
- T \p{Canonical_Combining_Class: 122} \p{Canonical_Combining_Class=
- CCC122} (4)
- T \p{Canonical_Combining_Class: 129} \p{Canonical_Combining_Class=
- CCC129} (1)
- T \p{Canonical_Combining_Class: 130} \p{Canonical_Combining_Class=
- CCC130} (6)
- T \p{Canonical_Combining_Class: 132} \p{Canonical_Combining_Class=
- CCC132} (1)
- T \p{Canonical_Combining_Class: 133} \p{Canonical_Combining_Class=
- CCC133} (0)
- T \p{Canonical_Combining_Class: 200} \p{Canonical_Combining_Class=
- Attached_Below_Left} (0)
- T \p{Canonical_Combining_Class: 202} \p{Canonical_Combining_Class=
- Attached_Below} (5)
- T \p{Canonical_Combining_Class: 214} \p{Canonical_Combining_Class=
- Attached_Above} (1)
- T \p{Canonical_Combining_Class: 216} \p{Canonical_Combining_Class=
- Attached_Above_Right} (9)
- T \p{Canonical_Combining_Class: 218} \p{Canonical_Combining_Class=
- Below_Left} (1)
- T \p{Canonical_Combining_Class: 220} \p{Canonical_Combining_Class=
- Below} (129)
- T \p{Canonical_Combining_Class: 222} \p{Canonical_Combining_Class=
- Below_Right} (4)
- T \p{Canonical_Combining_Class: 224} \p{Canonical_Combining_Class=
- Left} (2)
- T \p{Canonical_Combining_Class: 226} \p{Canonical_Combining_Class=
- Right} (1)
- T \p{Canonical_Combining_Class: 228} \p{Canonical_Combining_Class=
- Above_Left} (3)
- T \p{Canonical_Combining_Class: 230} \p{Canonical_Combining_Class=
- Above} (349)
- T \p{Canonical_Combining_Class: 232} \p{Canonical_Combining_Class=
- Above_Right} (4)
- T \p{Canonical_Combining_Class: 233} \p{Canonical_Combining_Class=
- Double_Below} (4)
- T \p{Canonical_Combining_Class: 234} \p{Canonical_Combining_Class=
- Double_Above} (5)
- T \p{Canonical_Combining_Class: 240} \p{Canonical_Combining_Class=
- Iota_Subscript} (1)
- \p{Canonical_Combining_Class: A} \p{Canonical_Combining_Class=
- Above} (349)
- \p{Canonical_Combining_Class: Above} (Short: \p{Ccc=A}) (349)
- \p{Canonical_Combining_Class: Above_Left} (Short: \p{Ccc=AL}) (3)
- \p{Canonical_Combining_Class: Above_Right} (Short: \p{Ccc=AR}) (4)
- \p{Canonical_Combining_Class: AL} \p{Canonical_Combining_Class=
- Above_Left} (3)
- \p{Canonical_Combining_Class: AR} \p{Canonical_Combining_Class=
- Above_Right} (4)
- \p{Canonical_Combining_Class: ATA} \p{Canonical_Combining_Class=
- Attached_Above} (1)
- \p{Canonical_Combining_Class: ATAR} \p{Canonical_Combining_Class=
- Attached_Above_Right} (9)
- \p{Canonical_Combining_Class: ATB} \p{Canonical_Combining_Class=
- Attached_Below} (5)
- \p{Canonical_Combining_Class: ATBL} \p{Canonical_Combining_Class=
- Attached_Below_Left} (0)
- \p{Canonical_Combining_Class: Attached_Above} (Short: \p{Ccc=ATA})
- (1)
- \p{Canonical_Combining_Class: Attached_Above_Right} (Short:
- \p{Ccc=ATAR}) (9)
- \p{Canonical_Combining_Class: Attached_Below} (Short: \p{Ccc=ATB})
- (5)
- \p{Canonical_Combining_Class: Attached_Below_Left} (Short: \p{Ccc=
- ATBL}) (0)
- \p{Canonical_Combining_Class: B} \p{Canonical_Combining_Class=
- Below} (129)
- \p{Canonical_Combining_Class: Below} (Short: \p{Ccc=B}) (129)
- \p{Canonical_Combining_Class: Below_Left} (Short: \p{Ccc=BL}) (1)
- \p{Canonical_Combining_Class: Below_Right} (Short: \p{Ccc=BR}) (4)
- \p{Canonical_Combining_Class: BL} \p{Canonical_Combining_Class=
- Below_Left} (1)
- \p{Canonical_Combining_Class: BR} \p{Canonical_Combining_Class=
- Below_Right} (4)
- \p{Canonical_Combining_Class: CCC10} (Short: \p{Ccc=CCC10}) (1)
- \p{Canonical_Combining_Class: CCC103} (Short: \p{Ccc=CCC103}) (2)
- \p{Canonical_Combining_Class: CCC107} (Short: \p{Ccc=CCC107}) (4)
- \p{Canonical_Combining_Class: CCC11} (Short: \p{Ccc=CCC11}) (1)
- \p{Canonical_Combining_Class: CCC118} (Short: \p{Ccc=CCC118}) (2)
- \p{Canonical_Combining_Class: CCC12} (Short: \p{Ccc=CCC12}) (1)
- \p{Canonical_Combining_Class: CCC122} (Short: \p{Ccc=CCC122}) (4)
- \p{Canonical_Combining_Class: CCC129} (Short: \p{Ccc=CCC129}) (1)
- \p{Canonical_Combining_Class: CCC13} (Short: \p{Ccc=CCC13}) (1)
- \p{Canonical_Combining_Class: CCC130} (Short: \p{Ccc=CCC130}) (6)
- \p{Canonical_Combining_Class: CCC132} (Short: \p{Ccc=CCC132}) (1)
- \p{Canonical_Combining_Class: CCC133} (Short: \p{Ccc=CCC133}) (0)
- \p{Canonical_Combining_Class: CCC14} (Short: \p{Ccc=CCC14}) (1)
- \p{Canonical_Combining_Class: CCC15} (Short: \p{Ccc=CCC15}) (1)
- \p{Canonical_Combining_Class: CCC16} (Short: \p{Ccc=CCC16}) (1)
- \p{Canonical_Combining_Class: CCC17} (Short: \p{Ccc=CCC17}) (1)
- \p{Canonical_Combining_Class: CCC18} (Short: \p{Ccc=CCC18}) (2)
- \p{Canonical_Combining_Class: CCC19} (Short: \p{Ccc=CCC19}) (2)
- \p{Canonical_Combining_Class: CCC20} (Short: \p{Ccc=CCC20}) (1)
- \p{Canonical_Combining_Class: CCC21} (Short: \p{Ccc=CCC21}) (1)
- \p{Canonical_Combining_Class: CCC22} (Short: \p{Ccc=CCC22}) (1)
- \p{Canonical_Combining_Class: CCC23} (Short: \p{Ccc=CCC23}) (1)
- \p{Canonical_Combining_Class: CCC24} (Short: \p{Ccc=CCC24}) (1)
- \p{Canonical_Combining_Class: CCC25} (Short: \p{Ccc=CCC25}) (1)
- \p{Canonical_Combining_Class: CCC26} (Short: \p{Ccc=CCC26}) (1)
- \p{Canonical_Combining_Class: CCC27} (Short: \p{Ccc=CCC27}) (2)
- \p{Canonical_Combining_Class: CCC28} (Short: \p{Ccc=CCC28}) (2)
- \p{Canonical_Combining_Class: CCC29} (Short: \p{Ccc=CCC29}) (2)
- \p{Canonical_Combining_Class: CCC30} (Short: \p{Ccc=CCC30}) (2)
- \p{Canonical_Combining_Class: CCC31} (Short: \p{Ccc=CCC31}) (2)
- \p{Canonical_Combining_Class: CCC32} (Short: \p{Ccc=CCC32}) (2)
- \p{Canonical_Combining_Class: CCC33} (Short: \p{Ccc=CCC33}) (1)
- \p{Canonical_Combining_Class: CCC34} (Short: \p{Ccc=CCC34}) (1)
- \p{Canonical_Combining_Class: CCC35} (Short: \p{Ccc=CCC35}) (1)
- \p{Canonical_Combining_Class: CCC36} (Short: \p{Ccc=CCC36}) (1)
- \p{Canonical_Combining_Class: CCC84} (Short: \p{Ccc=CCC84}) (1)
- \p{Canonical_Combining_Class: CCC91} (Short: \p{Ccc=CCC91}) (1)
- \p{Canonical_Combining_Class: DA} \p{Canonical_Combining_Class=
- Double_Above} (5)
- \p{Canonical_Combining_Class: DB} \p{Canonical_Combining_Class=
- Double_Below} (4)
- \p{Canonical_Combining_Class: Double_Above} (Short: \p{Ccc=DA}) (5)
- \p{Canonical_Combining_Class: Double_Below} (Short: \p{Ccc=DB}) (4)
- \p{Canonical_Combining_Class: Iota_Subscript} (Short: \p{Ccc=IS})
- (1)
- \p{Canonical_Combining_Class: IS} \p{Canonical_Combining_Class=
- Iota_Subscript} (1)
- \p{Canonical_Combining_Class: Kana_Voicing} (Short: \p{Ccc=KV}) (2)
- \p{Canonical_Combining_Class: KV} \p{Canonical_Combining_Class=
- Kana_Voicing} (2)
- \p{Canonical_Combining_Class: L} \p{Canonical_Combining_Class=
- Left} (2)
- \p{Canonical_Combining_Class: Left} (Short: \p{Ccc=L}) (2)
- \p{Canonical_Combining_Class: NK} \p{Canonical_Combining_Class=
- Nukta} (13)
- \p{Canonical_Combining_Class: Not_Reordered} (Short: \p{Ccc=NR})
- (1_113_459)
- \p{Canonical_Combining_Class: NR} \p{Canonical_Combining_Class=
- Not_Reordered} (1_113_459)
- \p{Canonical_Combining_Class: Nukta} (Short: \p{Ccc=NK}) (13)
- \p{Canonical_Combining_Class: OV} \p{Canonical_Combining_Class=
- Overlay} (26)
- \p{Canonical_Combining_Class: Overlay} (Short: \p{Ccc=OV}) (26)
- \p{Canonical_Combining_Class: R} \p{Canonical_Combining_Class=
- Right} (1)
- \p{Canonical_Combining_Class: Right} (Short: \p{Ccc=R}) (1)
- \p{Canonical_Combining_Class: Virama} (Short: \p{Ccc=VR}) (37)
- \p{Canonical_Combining_Class: VR} \p{Canonical_Combining_Class=
- Virama} (37)
- \p{Cans} \p{Canadian_Aboriginal} (= \p{Script=
- Canadian_Aboriginal}) (710)
- \p{Cari} \p{Carian} (= \p{Script=Carian}) (NOT
- \p{Block=Carian}) (49)
- \p{Carian} \p{Script=Carian} (Short: \p{Cari}; NOT
- \p{Block=Carian}) (49)
- \p{Case_Ignorable} \p{Case_Ignorable=Y} (Short: \p{CI}) (1799)
- \p{Case_Ignorable: N*} (Short: \p{CI=N}, \P{CI}) (1_112_313)
- \p{Case_Ignorable: Y*} (Short: \p{CI=Y}, \p{CI}) (1799)
- \p{Cased} \p{Cased=Y} (3448)
- \p{Cased: N*} (Single: \P{Cased}) (1_110_664)
- \p{Cased: Y*} (Single: \p{Cased}) (3448)
- \p{Cased_Letter} \p{General_Category=Cased_Letter} (Short:
- \p{LC}) (3223)
- \p{Category: *} \p{General_Category: *}
- \p{Cc} \p{Cntrl} (= \p{General_Category=Control})
- (65)
- \p{Ccc: *} \p{Canonical_Combining_Class: *}
- \p{CE} \p{Composition_Exclusion} (=
- \p{Composition_Exclusion=Y}) (81)
- \p{CE: *} \p{Composition_Exclusion: *}
- \p{Cf} \p{Format} (= \p{General_Category=Format})
- (139)
- \p{Chakma} \p{Script=Chakma} (Short: \p{Cakm}; NOT
- \p{Block=Chakma}) (67)
- \p{Cham} \p{Script=Cham} (NOT \p{Block=Cham}) (83)
- \p{Changes_When_Casefolded} \p{Changes_When_Casefolded=Y} (Short:
- \p{CWCF}) (1107)
- \p{Changes_When_Casefolded: N*} (Short: \p{CWCF=N}, \P{CWCF})
- (1_113_005)
- \p{Changes_When_Casefolded: Y*} (Short: \p{CWCF=Y}, \p{CWCF})
- (1107)
- \p{Changes_When_Casemapped} \p{Changes_When_Casemapped=Y} (Short:
- \p{CWCM}) (2138)
- \p{Changes_When_Casemapped: N*} (Short: \p{CWCM=N}, \P{CWCM})
- (1_111_974)
- \p{Changes_When_Casemapped: Y*} (Short: \p{CWCM=Y}, \p{CWCM})
- (2138)
- \p{Changes_When_Lowercased} \p{Changes_When_Lowercased=Y} (Short:
- \p{CWL}) (1043)
- \p{Changes_When_Lowercased: N*} (Short: \p{CWL=N}, \P{CWL})
- (1_113_069)
- \p{Changes_When_Lowercased: Y*} (Short: \p{CWL=Y}, \p{CWL}) (1043)
- \p{Changes_When_NFKC_Casefolded} \p{Changes_When_NFKC_Casefolded=
- Y} (Short: \p{CWKCF}) (9944)
- \p{Changes_When_NFKC_Casefolded: N*} (Short: \p{CWKCF=N},
- \P{CWKCF}) (1_104_168)
- \p{Changes_When_NFKC_Casefolded: Y*} (Short: \p{CWKCF=Y},
- \p{CWKCF}) (9944)
- \p{Changes_When_Titlecased} \p{Changes_When_Titlecased=Y} (Short:
- \p{CWT}) (1099)
- \p{Changes_When_Titlecased: N*} (Short: \p{CWT=N}, \P{CWT})
- (1_113_013)
- \p{Changes_When_Titlecased: Y*} (Short: \p{CWT=Y}, \p{CWT}) (1099)
- \p{Changes_When_Uppercased} \p{Changes_When_Uppercased=Y} (Short:
- \p{CWU}) (1126)
- \p{Changes_When_Uppercased: N*} (Short: \p{CWU=N}, \P{CWU})
- (1_112_986)
- \p{Changes_When_Uppercased: Y*} (Short: \p{CWU=Y}, \p{CWU}) (1126)
- \p{Cher} \p{Cherokee} (= \p{Script=Cherokee}) (NOT
- \p{Block=Cherokee}) (85)
- \p{Cherokee} \p{Script=Cherokee} (Short: \p{Cher}; NOT
- \p{Block=Cherokee}) (85)
- \p{CI} \p{Case_Ignorable} (= \p{Case_Ignorable=
- Y}) (1799)
- \p{CI: *} \p{Case_Ignorable: *}
- X \p{CJK} \p{CJK_Unified_Ideographs} (= \p{Block=
- CJK_Unified_Ideographs}) (20_992)
- X \p{CJK_Compat} \p{CJK_Compatibility} (= \p{Block=
- CJK_Compatibility}) (256)
- X \p{CJK_Compat_Forms} \p{CJK_Compatibility_Forms} (= \p{Block=
- CJK_Compatibility_Forms}) (32)
- X \p{CJK_Compat_Ideographs} \p{CJK_Compatibility_Ideographs} (=
- \p{Block=CJK_Compatibility_Ideographs})
- (512)
- X \p{CJK_Compat_Ideographs_Sup}
- \p{CJK_Compatibility_Ideographs_-
- Supplement} (= \p{Block=
- CJK_Compatibility_Ideographs_-
- Supplement}) (544)
- X \p{CJK_Compatibility} \p{Block=CJK_Compatibility} (Short:
- \p{InCJKCompat}) (256)
- X \p{CJK_Compatibility_Forms} \p{Block=CJK_Compatibility_Forms}
- (Short: \p{InCJKCompatForms}) (32)
- X \p{CJK_Compatibility_Ideographs} \p{Block=
- CJK_Compatibility_Ideographs} (Short:
- \p{InCJKCompatIdeographs}) (512)
- X \p{CJK_Compatibility_Ideographs_Supplement} \p{Block=
- CJK_Compatibility_Ideographs_Supplement}
- (Short: \p{InCJKCompatIdeographsSup})
- (544)
- X \p{CJK_Ext_A} \p{CJK_Unified_Ideographs_Extension_A} (=
- \p{Block=
- CJK_Unified_Ideographs_Extension_A})
- (6592)
- X \p{CJK_Ext_B} \p{CJK_Unified_Ideographs_Extension_B} (=
- \p{Block=
- CJK_Unified_Ideographs_Extension_B})
- (42_720)
- X \p{CJK_Ext_C} \p{CJK_Unified_Ideographs_Extension_C} (=
- \p{Block=
- CJK_Unified_Ideographs_Extension_C})
- (4160)
- X \p{CJK_Ext_D} \p{CJK_Unified_Ideographs_Extension_D} (=
- \p{Block=
- CJK_Unified_Ideographs_Extension_D})
- (224)
- X \p{CJK_Radicals_Sup} \p{CJK_Radicals_Supplement} (= \p{Block=
- CJK_Radicals_Supplement}) (128)
- X \p{CJK_Radicals_Supplement} \p{Block=CJK_Radicals_Supplement}
- (Short: \p{InCJKRadicalsSup}) (128)
- X \p{CJK_Strokes} \p{Block=CJK_Strokes} (48)
- X \p{CJK_Symbols} \p{CJK_Symbols_And_Punctuation} (=
- \p{Block=CJK_Symbols_And_Punctuation})
- (64)
- X \p{CJK_Symbols_And_Punctuation} \p{Block=
- CJK_Symbols_And_Punctuation} (Short:
- \p{InCJKSymbols}) (64)
- X \p{CJK_Unified_Ideographs} \p{Block=CJK_Unified_Ideographs}
- (Short: \p{InCJK}) (20_992)
- X \p{CJK_Unified_Ideographs_Extension_A} \p{Block=
- CJK_Unified_Ideographs_Extension_A}
- (Short: \p{InCJKExtA}) (6592)
- X \p{CJK_Unified_Ideographs_Extension_B} \p{Block=
- CJK_Unified_Ideographs_Extension_B}
- (Short: \p{InCJKExtB}) (42_720)
- X \p{CJK_Unified_Ideographs_Extension_C} \p{Block=
- CJK_Unified_Ideographs_Extension_C}
- (Short: \p{InCJKExtC}) (4160)
- X \p{CJK_Unified_Ideographs_Extension_D} \p{Block=
- CJK_Unified_Ideographs_Extension_D}
- (Short: \p{InCJKExtD}) (224)
- \p{Close_Punctuation} \p{General_Category=Close_Punctuation}
- (Short: \p{Pe}) (71)
- \p{Cn} \p{Unassigned} (= \p{General_Category=
- Unassigned}) (864_414)
- \p{Cntrl} \p{General_Category=Control} Control
- characters (Short: \p{Cc}) (65)
- \p{Co} \p{Private_Use} (= \p{General_Category=
- Private_Use}) (NOT \p{Private_Use_Area})
- (137_468)
- X \p{Combining_Diacritical_Marks} \p{Block=
- Combining_Diacritical_Marks} (Short:
- \p{InDiacriticals}) (112)
- X \p{Combining_Diacritical_Marks_For_Symbols} \p{Block=
- Combining_Diacritical_Marks_For_Symbols}
- (Short: \p{InDiacriticalsForSymbols})
- (48)
- X \p{Combining_Diacritical_Marks_Supplement} \p{Block=
- Combining_Diacritical_Marks_Supplement}
- (Short: \p{InDiacriticalsSup}) (64)
- X \p{Combining_Half_Marks} \p{Block=Combining_Half_Marks} (Short:
- \p{InHalfMarks}) (16)
- \p{Combining_Mark} \p{Mark} (= \p{General_Category=Mark})
- (1645)
- X \p{Combining_Marks_For_Symbols}
- \p{Combining_Diacritical_Marks_For_-
- Symbols} (= \p{Block=
- Combining_Diacritical_Marks_For_-
- Symbols}) (48)
- \p{Common} \p{Script=Common} (Short: \p{Zyyy}) (6413)
- X \p{Common_Indic_Number_Forms} \p{Block=Common_Indic_Number_Forms}
- (Short: \p{InIndicNumberForms}) (16)
- \p{Comp_Ex} \p{Full_Composition_Exclusion} (=
- \p{Full_Composition_Exclusion=Y}) (1120)
- \p{Comp_Ex: *} \p{Full_Composition_Exclusion: *}
- X \p{Compat_Jamo} \p{Hangul_Compatibility_Jamo} (= \p{Block=
- Hangul_Compatibility_Jamo}) (96)
- \p{Composition_Exclusion} \p{Composition_Exclusion=Y} (Short:
- \p{CE}) (81)
- \p{Composition_Exclusion: N*} (Short: \p{CE=N}, \P{CE}) (1_114_031)
- \p{Composition_Exclusion: Y*} (Short: \p{CE=Y}, \p{CE}) (81)
- \p{Connector_Punctuation} \p{General_Category=
- Connector_Punctuation} (Short: \p{Pc})
- (10)
- \p{Control} \p{Cntrl} (= \p{General_Category=Control})
- (65)
- X \p{Control_Pictures} \p{Block=Control_Pictures} (64)
- \p{Copt} \p{Coptic} (= \p{Script=Coptic}) (NOT
- \p{Block=Coptic}) (137)
- \p{Coptic} \p{Script=Coptic} (Short: \p{Copt}; NOT
- \p{Block=Coptic}) (137)
- X \p{Counting_Rod} \p{Counting_Rod_Numerals} (= \p{Block=
- Counting_Rod_Numerals}) (32)
- X \p{Counting_Rod_Numerals} \p{Block=Counting_Rod_Numerals} (Short:
- \p{InCountingRod}) (32)
- \p{Cprt} \p{Cypriot} (= \p{Script=Cypriot}) (55)
- \p{Cs} \p{Surrogate} (= \p{General_Category=
- Surrogate}) (2048)
- \p{Cuneiform} \p{Script=Cuneiform} (Short: \p{Xsux}; NOT
- \p{Block=Cuneiform}) (982)
- X \p{Cuneiform_Numbers} \p{Cuneiform_Numbers_And_Punctuation} (=
- \p{Block=
- Cuneiform_Numbers_And_Punctuation}) (128)
- X \p{Cuneiform_Numbers_And_Punctuation} \p{Block=
- Cuneiform_Numbers_And_Punctuation}
- (Short: \p{InCuneiformNumbers}) (128)
- \p{Currency_Symbol} \p{General_Category=Currency_Symbol}
- (Short: \p{Sc}) (49)
- X \p{Currency_Symbols} \p{Block=Currency_Symbols} (48)
- \p{CWCF} \p{Changes_When_Casefolded} (=
- \p{Changes_When_Casefolded=Y}) (1107)
- \p{CWCF: *} \p{Changes_When_Casefolded: *}
- \p{CWCM} \p{Changes_When_Casemapped} (=
- \p{Changes_When_Casemapped=Y}) (2138)
- \p{CWCM: *} \p{Changes_When_Casemapped: *}
- \p{CWKCF} \p{Changes_When_NFKC_Casefolded} (=
- \p{Changes_When_NFKC_Casefolded=Y})
- (9944)
- \p{CWKCF: *} \p{Changes_When_NFKC_Casefolded: *}
- \p{CWL} \p{Changes_When_Lowercased} (=
- \p{Changes_When_Lowercased=Y}) (1043)
- \p{CWL: *} \p{Changes_When_Lowercased: *}
- \p{CWT} \p{Changes_When_Titlecased} (=
- \p{Changes_When_Titlecased=Y}) (1099)
- \p{CWT: *} \p{Changes_When_Titlecased: *}
- \p{CWU} \p{Changes_When_Uppercased} (=
- \p{Changes_When_Uppercased=Y}) (1126)
- \p{CWU: *} \p{Changes_When_Uppercased: *}
- \p{Cypriot} \p{Script=Cypriot} (Short: \p{Cprt}) (55)
- X \p{Cypriot_Syllabary} \p{Block=Cypriot_Syllabary} (64)
- \p{Cyrillic} \p{Script=Cyrillic} (Short: \p{Cyrl}; NOT
- \p{Block=Cyrillic}) (417)
- X \p{Cyrillic_Ext_A} \p{Cyrillic_Extended_A} (= \p{Block=
- Cyrillic_Extended_A}) (32)
- X \p{Cyrillic_Ext_B} \p{Cyrillic_Extended_B} (= \p{Block=
- Cyrillic_Extended_B}) (96)
- X \p{Cyrillic_Extended_A} \p{Block=Cyrillic_Extended_A} (Short:
- \p{InCyrillicExtA}) (32)
- X \p{Cyrillic_Extended_B} \p{Block=Cyrillic_Extended_B} (Short:
- \p{InCyrillicExtB}) (96)
- X \p{Cyrillic_Sup} \p{Cyrillic_Supplement} (= \p{Block=
- Cyrillic_Supplement}) (48)
- X \p{Cyrillic_Supplement} \p{Block=Cyrillic_Supplement} (Short:
- \p{InCyrillicSup}) (48)
- X \p{Cyrillic_Supplementary} \p{Cyrillic_Supplement} (= \p{Block=
- Cyrillic_Supplement}) (48)
- \p{Cyrl} \p{Cyrillic} (= \p{Script=Cyrillic}) (NOT
- \p{Block=Cyrillic}) (417)
- \p{Dash} \p{Dash=Y} (27)
- \p{Dash: N*} (Single: \P{Dash}) (1_114_085)
- \p{Dash: Y*} (Single: \p{Dash}) (27)
- \p{Dash_Punctuation} \p{General_Category=Dash_Punctuation}
- (Short: \p{Pd}) (23)
- \p{Decimal_Number} \p{Digit} (= \p{General_Category=
- Decimal_Number}) (460)
- \p{Decomposition_Type: Can} \p{Decomposition_Type=Canonical}
- (13_225)
- \p{Decomposition_Type: Canonical} (Short: \p{Dt=Can}) (13_225)
- \p{Decomposition_Type: Circle} (Short: \p{Dt=Enc}) (240)
- \p{Decomposition_Type: Com} \p{Decomposition_Type=Compat} (720)
- \p{Decomposition_Type: Compat} (Short: \p{Dt=Com}) (720)
- \p{Decomposition_Type: Enc} \p{Decomposition_Type=Circle} (240)
- \p{Decomposition_Type: Fin} \p{Decomposition_Type=Final} (240)
- \p{Decomposition_Type: Final} (Short: \p{Dt=Fin}) (240)
- \p{Decomposition_Type: Font} (Short: \p{Dt=Font}) (1184)
- \p{Decomposition_Type: Fra} \p{Decomposition_Type=Fraction} (20)
- \p{Decomposition_Type: Fraction} (Short: \p{Dt=Fra}) (20)
- \p{Decomposition_Type: Init} \p{Decomposition_Type=Initial} (171)
- \p{Decomposition_Type: Initial} (Short: \p{Dt=Init}) (171)
- \p{Decomposition_Type: Iso} \p{Decomposition_Type=Isolated} (238)
- \p{Decomposition_Type: Isolated} (Short: \p{Dt=Iso}) (238)
- \p{Decomposition_Type: Med} \p{Decomposition_Type=Medial} (82)
- \p{Decomposition_Type: Medial} (Short: \p{Dt=Med}) (82)
- \p{Decomposition_Type: Nar} \p{Decomposition_Type=Narrow} (122)
- \p{Decomposition_Type: Narrow} (Short: \p{Dt=Nar}) (122)
- \p{Decomposition_Type: Nb} \p{Decomposition_Type=Nobreak} (5)
- \p{Decomposition_Type: Nobreak} (Short: \p{Dt=Nb}) (5)
- \p{Decomposition_Type: Non_Canon} \p{Decomposition_Type=
- Non_Canonical} (Perl extension) (3655)
- \p{Decomposition_Type: Non_Canonical} Union of all non-canonical
- decompositions (Short: \p{Dt=NonCanon})
- (Perl extension) (3655)
- \p{Decomposition_Type: None} (Short: \p{Dt=None}) (1_097_232)
- \p{Decomposition_Type: Small} (Short: \p{Dt=Sml}) (26)
- \p{Decomposition_Type: Sml} \p{Decomposition_Type=Small} (26)
- \p{Decomposition_Type: Sqr} \p{Decomposition_Type=Square} (284)
- \p{Decomposition_Type: Square} (Short: \p{Dt=Sqr}) (284)
- \p{Decomposition_Type: Sub} (Short: \p{Dt=Sub}) (38)
- \p{Decomposition_Type: Sup} \p{Decomposition_Type=Super} (146)
- \p{Decomposition_Type: Super} (Short: \p{Dt=Sup}) (146)
- \p{Decomposition_Type: Vert} \p{Decomposition_Type=Vertical} (35)
- \p{Decomposition_Type: Vertical} (Short: \p{Dt=Vert}) (35)
- \p{Decomposition_Type: Wide} (Short: \p{Dt=Wide}) (104)
- \p{Default_Ignorable_Code_Point} \p{Default_Ignorable_Code_Point=
- Y} (Short: \p{DI}) (4167)
- \p{Default_Ignorable_Code_Point: N*} (Short: \p{DI=N}, \P{DI})
- (1_109_945)
- \p{Default_Ignorable_Code_Point: Y*} (Short: \p{DI=Y}, \p{DI})
- (4167)
- \p{Dep} \p{Deprecated} (= \p{Deprecated=Y}) (111)
- \p{Dep: *} \p{Deprecated: *}
- \p{Deprecated} \p{Deprecated=Y} (Short: \p{Dep}) (111)
- \p{Deprecated: N*} (Short: \p{Dep=N}, \P{Dep}) (1_114_001)
- \p{Deprecated: Y*} (Short: \p{Dep=Y}, \p{Dep}) (111)
- \p{Deseret} \p{Script=Deseret} (Short: \p{Dsrt}) (80)
- \p{Deva} \p{Devanagari} (= \p{Script=Devanagari})
- (NOT \p{Block=Devanagari}) (151)
- \p{Devanagari} \p{Script=Devanagari} (Short: \p{Deva};
- NOT \p{Block=Devanagari}) (151)
- X \p{Devanagari_Ext} \p{Devanagari_Extended} (= \p{Block=
- Devanagari_Extended}) (32)
- X \p{Devanagari_Extended} \p{Block=Devanagari_Extended} (Short:
- \p{InDevanagariExt}) (32)
- \p{DI} \p{Default_Ignorable_Code_Point} (=
- \p{Default_Ignorable_Code_Point=Y})
- (4167)
- \p{DI: *} \p{Default_Ignorable_Code_Point: *}
- \p{Dia} \p{Diacritic} (= \p{Diacritic=Y}) (693)
- \p{Dia: *} \p{Diacritic: *}
- \p{Diacritic} \p{Diacritic=Y} (Short: \p{Dia}) (693)
- \p{Diacritic: N*} (Short: \p{Dia=N}, \P{Dia}) (1_113_419)
- \p{Diacritic: Y*} (Short: \p{Dia=Y}, \p{Dia}) (693)
- X \p{Diacriticals} \p{Combining_Diacritical_Marks} (=
- \p{Block=Combining_Diacritical_Marks})
- (112)
- X \p{Diacriticals_For_Symbols}
- \p{Combining_Diacritical_Marks_For_-
- Symbols} (= \p{Block=
- Combining_Diacritical_Marks_For_-
- Symbols}) (48)
- X \p{Diacriticals_Sup} \p{Combining_Diacritical_Marks_Supplement}
- (= \p{Block=
- Combining_Diacritical_Marks_Supplement})
- (64)
- \p{Digit} \p{General_Category=Decimal_Number} [0-9]
- + all other decimal digits (Short:
- \p{Nd}) (460)
- X \p{Dingbats} \p{Block=Dingbats} (192)
- X \p{Domino} \p{Domino_Tiles} (= \p{Block=
- Domino_Tiles}) (112)
- X \p{Domino_Tiles} \p{Block=Domino_Tiles} (Short:
- \p{InDomino}) (112)
- \p{Dsrt} \p{Deseret} (= \p{Script=Deseret}) (80)
- \p{Dt: *} \p{Decomposition_Type: *}
- \p{Ea: *} \p{East_Asian_Width: *}
- \p{East_Asian_Width: A} \p{East_Asian_Width=Ambiguous} (138_746)
- \p{East_Asian_Width: Ambiguous} (Short: \p{Ea=A}) (138_746)
- \p{East_Asian_Width: F} \p{East_Asian_Width=Fullwidth} (104)
- \p{East_Asian_Width: Fullwidth} (Short: \p{Ea=F}) (104)
- \p{East_Asian_Width: H} \p{East_Asian_Width=Halfwidth} (123)
- \p{East_Asian_Width: Halfwidth} (Short: \p{Ea=H}) (123)
- \p{East_Asian_Width: N} \p{East_Asian_Width=Neutral} (801_894)
- \p{East_Asian_Width: Na} \p{East_Asian_Width=Narrow} (111)
- \p{East_Asian_Width: Narrow} (Short: \p{Ea=Na}) (111)
- \p{East_Asian_Width: Neutral} (Short: \p{Ea=N}) (801_894)
- \p{East_Asian_Width: W} \p{East_Asian_Width=Wide} (173_134)
- \p{East_Asian_Width: Wide} (Short: \p{Ea=W}) (173_134)
- \p{Egyp} \p{Egyptian_Hieroglyphs} (= \p{Script=
- Egyptian_Hieroglyphs}) (NOT \p{Block=
- Egyptian_Hieroglyphs}) (1071)
- \p{Egyptian_Hieroglyphs} \p{Script=Egyptian_Hieroglyphs} (Short:
- \p{Egyp}; NOT \p{Block=
- Egyptian_Hieroglyphs}) (1071)
- X \p{Emoticons} \p{Block=Emoticons} (80)
- X \p{Enclosed_Alphanum} \p{Enclosed_Alphanumerics} (= \p{Block=
- Enclosed_Alphanumerics}) (160)
- X \p{Enclosed_Alphanum_Sup} \p{Enclosed_Alphanumeric_Supplement} (=
- \p{Block=
- Enclosed_Alphanumeric_Supplement}) (256)
- X \p{Enclosed_Alphanumeric_Supplement} \p{Block=
- Enclosed_Alphanumeric_Supplement}
- (Short: \p{InEnclosedAlphanumSup}) (256)
- X \p{Enclosed_Alphanumerics} \p{Block=Enclosed_Alphanumerics}
- (Short: \p{InEnclosedAlphanum}) (160)
- X \p{Enclosed_CJK} \p{Enclosed_CJK_Letters_And_Months} (=
- \p{Block=
- Enclosed_CJK_Letters_And_Months}) (256)
- X \p{Enclosed_CJK_Letters_And_Months} \p{Block=
- Enclosed_CJK_Letters_And_Months} (Short:
- \p{InEnclosedCJK}) (256)
- X \p{Enclosed_Ideographic_Sup} \p{Enclosed_Ideographic_Supplement}
- (= \p{Block=
- Enclosed_Ideographic_Supplement}) (256)
- X \p{Enclosed_Ideographic_Supplement} \p{Block=
- Enclosed_Ideographic_Supplement} (Short:
- \p{InEnclosedIdeographicSup}) (256)
- \p{Enclosing_Mark} \p{General_Category=Enclosing_Mark}
- (Short: \p{Me}) (12)
- \p{Ethi} \p{Ethiopic} (= \p{Script=Ethiopic}) (NOT
- \p{Block=Ethiopic}) (495)
- \p{Ethiopic} \p{Script=Ethiopic} (Short: \p{Ethi}; NOT
- \p{Block=Ethiopic}) (495)
- X \p{Ethiopic_Ext} \p{Ethiopic_Extended} (= \p{Block=
- Ethiopic_Extended}) (96)
- X \p{Ethiopic_Ext_A} \p{Ethiopic_Extended_A} (= \p{Block=
- Ethiopic_Extended_A}) (48)
- X \p{Ethiopic_Extended} \p{Block=Ethiopic_Extended} (Short:
- \p{InEthiopicExt}) (96)
- X \p{Ethiopic_Extended_A} \p{Block=Ethiopic_Extended_A} (Short:
- \p{InEthiopicExtA}) (48)
- X \p{Ethiopic_Sup} \p{Ethiopic_Supplement} (= \p{Block=
- Ethiopic_Supplement}) (32)
- X \p{Ethiopic_Supplement} \p{Block=Ethiopic_Supplement} (Short:
- \p{InEthiopicSup}) (32)
- \p{Ext} \p{Extender} (= \p{Extender=Y}) (31)
- \p{Ext: *} \p{Extender: *}
- \p{Extender} \p{Extender=Y} (Short: \p{Ext}) (31)
- \p{Extender: N*} (Short: \p{Ext=N}, \P{Ext}) (1_114_081)
- \p{Extender: Y*} (Short: \p{Ext=Y}, \p{Ext}) (31)
- \p{Final_Punctuation} \p{General_Category=Final_Punctuation}
- (Short: \p{Pf}) (10)
- \p{Format} \p{General_Category=Format} (Short:
- \p{Cf}) (139)
- \p{Full_Composition_Exclusion} \p{Full_Composition_Exclusion=Y}
- (Short: \p{CompEx}) (1120)
- \p{Full_Composition_Exclusion: N*} (Short: \p{CompEx=N},
- \P{CompEx}) (1_112_992)
- \p{Full_Composition_Exclusion: Y*} (Short: \p{CompEx=Y},
- \p{CompEx}) (1120)
- \p{Gc: *} \p{General_Category: *}
- \p{GCB: *} \p{Grapheme_Cluster_Break: *}
- \p{General_Category: C} \p{General_Category=Other} (1_004_134)
- \p{General_Category: Cased_Letter} [\p{Ll}\p{Lu}\p{Lt}] (Short:
- \p{Gc=LC}, \p{LC}) (3223)
- \p{General_Category: Cc} \p{General_Category=Control} (65)
- \p{General_Category: Cf} \p{General_Category=Format} (139)
- \p{General_Category: Close_Punctuation} (Short: \p{Gc=Pe}, \p{Pe})
- (71)
- \p{General_Category: Cn} \p{General_Category=Unassigned} (864_414)
- \p{General_Category: Cntrl} \p{General_Category=Control} (65)
- \p{General_Category: Co} \p{General_Category=Private_Use} (137_468)
- \p{General_Category: Combining_Mark} \p{General_Category=Mark}
- (1645)
- \p{General_Category: Connector_Punctuation} (Short: \p{Gc=Pc},
- \p{Pc}) (10)
- \p{General_Category: Control} (Short: \p{Gc=Cc}, \p{Cc}) (65)
- \p{General_Category: Cs} \p{General_Category=Surrogate} (2048)
- \p{General_Category: Currency_Symbol} (Short: \p{Gc=Sc}, \p{Sc})
- (49)
- \p{General_Category: Dash_Punctuation} (Short: \p{Gc=Pd}, \p{Pd})
- (23)
- \p{General_Category: Decimal_Number} (Short: \p{Gc=Nd}, \p{Nd})
- (460)
- \p{General_Category: Digit} \p{General_Category=Decimal_Number}
- (460)
- \p{General_Category: Enclosing_Mark} (Short: \p{Gc=Me}, \p{Me})
- (12)
- \p{General_Category: Final_Punctuation} (Short: \p{Gc=Pf}, \p{Pf})
- (10)
- \p{General_Category: Format} (Short: \p{Gc=Cf}, \p{Cf}) (139)
- \p{General_Category: Initial_Punctuation} (Short: \p{Gc=Pi},
- \p{Pi}) (12)
- \p{General_Category: L} \p{General_Category=Letter} (101_013)
- X \p{General_Category: L&} \p{General_Category=Cased_Letter} (3223)
- X \p{General_Category: L_} \p{General_Category=Cased_Letter} Note
- the trailing '_' matters in spite of
- loose matching rules. (3223)
- \p{General_Category: LC} \p{General_Category=Cased_Letter} (3223)
- \p{General_Category: Letter} (Short: \p{Gc=L}, \p{L}) (101_013)
- \p{General_Category: Letter_Number} (Short: \p{Gc=Nl}, \p{Nl})
- (224)
- \p{General_Category: Line_Separator} (Short: \p{Gc=Zl}, \p{Zl}) (1)
- \p{General_Category: Ll} \p{General_Category=Lowercase_Letter}
- (/i= General_Category=Cased_Letter)
- (1751)
- \p{General_Category: Lm} \p{General_Category=Modifier_Letter} (237)
- \p{General_Category: Lo} \p{General_Category=Other_Letter} (97_553)
- \p{General_Category: Lowercase_Letter} (Short: \p{Gc=Ll}, \p{Ll};
- /i= General_Category=Cased_Letter) (1751)
- \p{General_Category: Lt} \p{General_Category=Titlecase_Letter}
- (/i= General_Category=Cased_Letter) (31)
- \p{General_Category: Lu} \p{General_Category=Uppercase_Letter}
- (/i= General_Category=Cased_Letter)
- (1441)
- \p{General_Category: M} \p{General_Category=Mark} (1645)
- \p{General_Category: Mark} (Short: \p{Gc=M}, \p{M}) (1645)
- \p{General_Category: Math_Symbol} (Short: \p{Gc=Sm}, \p{Sm}) (952)
- \p{General_Category: Mc} \p{General_Category=Spacing_Mark} (353)
- \p{General_Category: Me} \p{General_Category=Enclosing_Mark} (12)
- \p{General_Category: Mn} \p{General_Category=Nonspacing_Mark}
- (1280)
- \p{General_Category: Modifier_Letter} (Short: \p{Gc=Lm}, \p{Lm})
- (237)
- \p{General_Category: Modifier_Symbol} (Short: \p{Gc=Sk}, \p{Sk})
- (115)
- \p{General_Category: N} \p{General_Category=Number} (1148)
- \p{General_Category: Nd} \p{General_Category=Decimal_Number} (460)
- \p{General_Category: Nl} \p{General_Category=Letter_Number} (224)
- \p{General_Category: No} \p{General_Category=Other_Number} (464)
- \p{General_Category: Nonspacing_Mark} (Short: \p{Gc=Mn}, \p{Mn})
- (1280)
- \p{General_Category: Number} (Short: \p{Gc=N}, \p{N}) (1148)
- \p{General_Category: Open_Punctuation} (Short: \p{Gc=Ps}, \p{Ps})
- (72)
- \p{General_Category: Other} (Short: \p{Gc=C}, \p{C}) (1_004_134)
- \p{General_Category: Other_Letter} (Short: \p{Gc=Lo}, \p{Lo})
- (97_553)
- \p{General_Category: Other_Number} (Short: \p{Gc=No}, \p{No}) (464)
- \p{General_Category: Other_Punctuation} (Short: \p{Gc=Po}, \p{Po})
- (434)
- \p{General_Category: Other_Symbol} (Short: \p{Gc=So}, \p{So})
- (4404)
- \p{General_Category: P} \p{General_Category=Punctuation} (632)
- \p{General_Category: Paragraph_Separator} (Short: \p{Gc=Zp},
- \p{Zp}) (1)
- \p{General_Category: Pc} \p{General_Category=
- Connector_Punctuation} (10)
- \p{General_Category: Pd} \p{General_Category=Dash_Punctuation} (23)
- \p{General_Category: Pe} \p{General_Category=Close_Punctuation}
- (71)
- \p{General_Category: Pf} \p{General_Category=Final_Punctuation}
- (10)
- \p{General_Category: Pi} \p{General_Category=Initial_Punctuation}
- (12)
- \p{General_Category: Po} \p{General_Category=Other_Punctuation}
- (434)
- \p{General_Category: Private_Use} (Short: \p{Gc=Co}, \p{Co})
- (137_468)
- \p{General_Category: Ps} \p{General_Category=Open_Punctuation} (72)
- \p{General_Category: Punct} \p{General_Category=Punctuation} (632)
- \p{General_Category: Punctuation} (Short: \p{Gc=P}, \p{P}) (632)
- \p{General_Category: S} \p{General_Category=Symbol} (5520)
- \p{General_Category: Sc} \p{General_Category=Currency_Symbol} (49)
- \p{General_Category: Separator} (Short: \p{Gc=Z}, \p{Z}) (20)
- \p{General_Category: Sk} \p{General_Category=Modifier_Symbol} (115)
- \p{General_Category: Sm} \p{General_Category=Math_Symbol} (952)
- \p{General_Category: So} \p{General_Category=Other_Symbol} (4404)
- \p{General_Category: Space_Separator} (Short: \p{Gc=Zs}, \p{Zs})
- (18)
- \p{General_Category: Spacing_Mark} (Short: \p{Gc=Mc}, \p{Mc}) (353)
- \p{General_Category: Surrogate} (Short: \p{Gc=Cs}, \p{Cs}) (2048)
- \p{General_Category: Symbol} (Short: \p{Gc=S}, \p{S}) (5520)
- \p{General_Category: Titlecase_Letter} (Short: \p{Gc=Lt}, \p{Lt};
- /i= General_Category=Cased_Letter) (31)
- \p{General_Category: Unassigned} (Short: \p{Gc=Cn}, \p{Cn})
- (864_414)
- \p{General_Category: Uppercase_Letter} (Short: \p{Gc=Lu}, \p{Lu};
- /i= General_Category=Cased_Letter) (1441)
- \p{General_Category: Z} \p{General_Category=Separator} (20)
- \p{General_Category: Zl} \p{General_Category=Line_Separator} (1)
- \p{General_Category: Zp} \p{General_Category=Paragraph_Separator}
- (1)
- \p{General_Category: Zs} \p{General_Category=Space_Separator} (18)
- X \p{General_Punctuation} \p{Block=General_Punctuation} (Short:
- \p{InPunctuation}) (112)
- X \p{Geometric_Shapes} \p{Block=Geometric_Shapes} (96)
- \p{Geor} \p{Georgian} (= \p{Script=Georgian}) (NOT
- \p{Block=Georgian}) (127)
- \p{Georgian} \p{Script=Georgian} (Short: \p{Geor}; NOT
- \p{Block=Georgian}) (127)
- X \p{Georgian_Sup} \p{Georgian_Supplement} (= \p{Block=
- Georgian_Supplement}) (48)
- X \p{Georgian_Supplement} \p{Block=Georgian_Supplement} (Short:
- \p{InGeorgianSup}) (48)
- \p{Glag} \p{Glagolitic} (= \p{Script=Glagolitic})
- (NOT \p{Block=Glagolitic}) (94)
- \p{Glagolitic} \p{Script=Glagolitic} (Short: \p{Glag};
- NOT \p{Block=Glagolitic}) (94)
- \p{Goth} \p{Gothic} (= \p{Script=Gothic}) (NOT
- \p{Block=Gothic}) (27)
- \p{Gothic} \p{Script=Gothic} (Short: \p{Goth}; NOT
- \p{Block=Gothic}) (27)
- \p{Gr_Base} \p{Grapheme_Base} (= \p{Grapheme_Base=Y})
- (108_661)
- \p{Gr_Base: *} \p{Grapheme_Base: *}
- \p{Gr_Ext} \p{Grapheme_Extend} (= \p{Grapheme_Extend=
- Y}) (1317)
- \p{Gr_Ext: *} \p{Grapheme_Extend: *}
- \p{Graph} Characters that are graphical (247_565)
- \p{Grapheme_Base} \p{Grapheme_Base=Y} (Short: \p{GrBase})
- (108_661)
- \p{Grapheme_Base: N*} (Short: \p{GrBase=N}, \P{GrBase})
- (1_005_451)
- \p{Grapheme_Base: Y*} (Short: \p{GrBase=Y}, \p{GrBase}) (108_661)
- \p{Grapheme_Cluster_Break: CN} \p{Grapheme_Cluster_Break=Control}
- (6023)
- \p{Grapheme_Cluster_Break: Control} (Short: \p{GCB=CN}) (6023)
- \p{Grapheme_Cluster_Break: CR} (Short: \p{GCB=CR}) (1)
- \p{Grapheme_Cluster_Break: EX} \p{Grapheme_Cluster_Break=Extend}
- (1317)
- \p{Grapheme_Cluster_Break: Extend} (Short: \p{GCB=EX}) (1317)
- \p{Grapheme_Cluster_Break: L} (Short: \p{GCB=L}) (125)
- \p{Grapheme_Cluster_Break: LF} (Short: \p{GCB=LF}) (1)
- \p{Grapheme_Cluster_Break: LV} (Short: \p{GCB=LV}) (399)
- \p{Grapheme_Cluster_Break: LVT} (Short: \p{GCB=LVT}) (10_773)
- \p{Grapheme_Cluster_Break: Other} (Short: \p{GCB=XX}) (1_094_924)
- \p{Grapheme_Cluster_Break: PP} \p{Grapheme_Cluster_Break=Prepend}
- (0)
- \p{Grapheme_Cluster_Break: Prepend} (Short: \p{GCB=PP}) (0)
- \p{Grapheme_Cluster_Break: Regional_Indicator} (Short: \p{GCB=RI})
- (26)
- \p{Grapheme_Cluster_Break: RI} \p{Grapheme_Cluster_Break=
- Regional_Indicator} (26)
- \p{Grapheme_Cluster_Break: SM} \p{Grapheme_Cluster_Break=
- SpacingMark} (291)
- \p{Grapheme_Cluster_Break: SpacingMark} (Short: \p{GCB=SM}) (291)
- \p{Grapheme_Cluster_Break: T} (Short: \p{GCB=T}) (137)
- \p{Grapheme_Cluster_Break: V} (Short: \p{GCB=V}) (95)
- \p{Grapheme_Cluster_Break: XX} \p{Grapheme_Cluster_Break=Other}
- (1_094_924)
- \p{Grapheme_Extend} \p{Grapheme_Extend=Y} (Short: \p{GrExt})
- (1317)
- \p{Grapheme_Extend: N*} (Short: \p{GrExt=N}, \P{GrExt}) (1_112_795)
- \p{Grapheme_Extend: Y*} (Short: \p{GrExt=Y}, \p{GrExt}) (1317)
- \p{Greek} \p{Script=Greek} (Short: \p{Grek}; NOT
- \p{Greek_And_Coptic}) (511)
- X \p{Greek_And_Coptic} \p{Block=Greek_And_Coptic} (Short:
- \p{InGreek}) (144)
- X \p{Greek_Ext} \p{Greek_Extended} (= \p{Block=
- Greek_Extended}) (256)
- X \p{Greek_Extended} \p{Block=Greek_Extended} (Short:
- \p{InGreekExt}) (256)
- \p{Grek} \p{Greek} (= \p{Script=Greek}) (NOT
- \p{Greek_And_Coptic}) (511)
- \p{Gujarati} \p{Script=Gujarati} (Short: \p{Gujr}; NOT
- \p{Block=Gujarati}) (84)
- \p{Gujr} \p{Gujarati} (= \p{Script=Gujarati}) (NOT
- \p{Block=Gujarati}) (84)
- \p{Gurmukhi} \p{Script=Gurmukhi} (Short: \p{Guru}; NOT
- \p{Block=Gurmukhi}) (79)
- \p{Guru} \p{Gurmukhi} (= \p{Script=Gurmukhi}) (NOT
- \p{Block=Gurmukhi}) (79)
- X \p{Half_And_Full_Forms} \p{Halfwidth_And_Fullwidth_Forms} (=
- \p{Block=Halfwidth_And_Fullwidth_Forms})
- (240)
- X \p{Half_Marks} \p{Combining_Half_Marks} (= \p{Block=
- Combining_Half_Marks}) (16)
- X \p{Halfwidth_And_Fullwidth_Forms} \p{Block=
- Halfwidth_And_Fullwidth_Forms} (Short:
- \p{InHalfAndFullForms}) (240)
- \p{Han} \p{Script=Han} (75_963)
- \p{Hang} \p{Hangul} (= \p{Script=Hangul}) (NOT
- \p{Hangul_Syllables}) (11_739)
- \p{Hangul} \p{Script=Hangul} (Short: \p{Hang}; NOT
- \p{Hangul_Syllables}) (11_739)
- X \p{Hangul_Compatibility_Jamo} \p{Block=Hangul_Compatibility_Jamo}
- (Short: \p{InCompatJamo}) (96)
- X \p{Hangul_Jamo} \p{Block=Hangul_Jamo} (Short: \p{InJamo})
- (256)
- X \p{Hangul_Jamo_Extended_A} \p{Block=Hangul_Jamo_Extended_A}
- (Short: \p{InJamoExtA}) (32)
- X \p{Hangul_Jamo_Extended_B} \p{Block=Hangul_Jamo_Extended_B}
- (Short: \p{InJamoExtB}) (80)
- \p{Hangul_Syllable_Type: L} \p{Hangul_Syllable_Type=Leading_Jamo}
- (125)
- \p{Hangul_Syllable_Type: Leading_Jamo} (Short: \p{Hst=L}) (125)
- \p{Hangul_Syllable_Type: LV} \p{Hangul_Syllable_Type=LV_Syllable}
- (399)
- \p{Hangul_Syllable_Type: LV_Syllable} (Short: \p{Hst=LV}) (399)
- \p{Hangul_Syllable_Type: LVT} \p{Hangul_Syllable_Type=
- LVT_Syllable} (10_773)
- \p{Hangul_Syllable_Type: LVT_Syllable} (Short: \p{Hst=LVT})
- (10_773)
- \p{Hangul_Syllable_Type: NA} \p{Hangul_Syllable_Type=
- Not_Applicable} (1_102_583)
- \p{Hangul_Syllable_Type: Not_Applicable} (Short: \p{Hst=NA})
- (1_102_583)
- \p{Hangul_Syllable_Type: T} \p{Hangul_Syllable_Type=Trailing_Jamo}
- (137)
- \p{Hangul_Syllable_Type: Trailing_Jamo} (Short: \p{Hst=T}) (137)
- \p{Hangul_Syllable_Type: V} \p{Hangul_Syllable_Type=Vowel_Jamo}
- (95)
- \p{Hangul_Syllable_Type: Vowel_Jamo} (Short: \p{Hst=V}) (95)
- X \p{Hangul_Syllables} \p{Block=Hangul_Syllables} (Short:
- \p{InHangul}) (11_184)
- \p{Hani} \p{Han} (= \p{Script=Han}) (75_963)
- \p{Hano} \p{Hanunoo} (= \p{Script=Hanunoo}) (NOT
- \p{Block=Hanunoo}) (21)
- \p{Hanunoo} \p{Script=Hanunoo} (Short: \p{Hano}; NOT
- \p{Block=Hanunoo}) (21)
- \p{Hebr} \p{Hebrew} (= \p{Script=Hebrew}) (NOT
- \p{Block=Hebrew}) (133)
- \p{Hebrew} \p{Script=Hebrew} (Short: \p{Hebr}; NOT
- \p{Block=Hebrew}) (133)
- \p{Hex} \p{XDigit} (= \p{Hex_Digit=Y}) (44)
- \p{Hex: *} \p{Hex_Digit: *}
- \p{Hex_Digit} \p{XDigit} (= \p{Hex_Digit=Y}) (44)
- \p{Hex_Digit: N*} (Short: \p{Hex=N}, \P{Hex}) (1_114_068)
- \p{Hex_Digit: Y*} (Short: \p{Hex=Y}, \p{Hex}) (44)
- X \p{High_Private_Use_Surrogates} \p{Block=
- High_Private_Use_Surrogates} (Short:
- \p{InHighPUSurrogates}) (128)
- X \p{High_PU_Surrogates} \p{High_Private_Use_Surrogates} (=
- \p{Block=High_Private_Use_Surrogates})
- (128)
- X \p{High_Surrogates} \p{Block=High_Surrogates} (896)
- \p{Hira} \p{Hiragana} (= \p{Script=Hiragana}) (NOT
- \p{Block=Hiragana}) (91)
- \p{Hiragana} \p{Script=Hiragana} (Short: \p{Hira}; NOT
- \p{Block=Hiragana}) (91)
- \p{HorizSpace} \p{Blank} (19)
- \p{Hst: *} \p{Hangul_Syllable_Type: *}
- D \p{Hyphen} \p{Hyphen=Y} (11)
- D \p{Hyphen: N*} Supplanted by Line_Break property values;
- see www.unicode.org/reports/tr14
- (Single: \P{Hyphen}) (1_114_101)
- D \p{Hyphen: Y*} Supplanted by Line_Break property values;
- see www.unicode.org/reports/tr14
- (Single: \p{Hyphen}) (11)
- \p{ID_Continue} \p{ID_Continue=Y} (Short: \p{IDC}; NOT
- \p{Ideographic_Description_Characters})
- (103_355)
- \p{ID_Continue: N*} (Short: \p{IDC=N}, \P{IDC}) (1_010_757)
- \p{ID_Continue: Y*} (Short: \p{IDC=Y}, \p{IDC}) (103_355)
- \p{ID_Start} \p{ID_Start=Y} (Short: \p{IDS}) (101_240)
- \p{ID_Start: N*} (Short: \p{IDS=N}, \P{IDS}) (1_012_872)
- \p{ID_Start: Y*} (Short: \p{IDS=Y}, \p{IDS}) (101_240)
- \p{IDC} \p{ID_Continue} (= \p{ID_Continue=Y}) (NOT
- \p{Ideographic_Description_Characters})
- (103_355)
- \p{IDC: *} \p{ID_Continue: *}
- \p{Ideo} \p{Ideographic} (= \p{Ideographic=Y})
- (75_633)
- \p{Ideo: *} \p{Ideographic: *}
- \p{Ideographic} \p{Ideographic=Y} (Short: \p{Ideo})
- (75_633)
- \p{Ideographic: N*} (Short: \p{Ideo=N}, \P{Ideo}) (1_038_479)
- \p{Ideographic: Y*} (Short: \p{Ideo=Y}, \p{Ideo}) (75_633)
- X \p{Ideographic_Description_Characters} \p{Block=
- Ideographic_Description_Characters}
- (Short: \p{InIDC}) (16)
- \p{IDS} \p{ID_Start} (= \p{ID_Start=Y}) (101_240)
- \p{IDS: *} \p{ID_Start: *}
- \p{IDS_Binary_Operator} \p{IDS_Binary_Operator=Y} (Short:
- \p{IDSB}) (10)
- \p{IDS_Binary_Operator: N*} (Short: \p{IDSB=N}, \P{IDSB})
- (1_114_102)
- \p{IDS_Binary_Operator: Y*} (Short: \p{IDSB=Y}, \p{IDSB}) (10)
- \p{IDS_Trinary_Operator} \p{IDS_Trinary_Operator=Y} (Short:
- \p{IDST}) (2)
- \p{IDS_Trinary_Operator: N*} (Short: \p{IDST=N}, \P{IDST})
- (1_114_110)
- \p{IDS_Trinary_Operator: Y*} (Short: \p{IDST=Y}, \p{IDST}) (2)
- \p{IDSB} \p{IDS_Binary_Operator} (=
- \p{IDS_Binary_Operator=Y}) (10)
- \p{IDSB: *} \p{IDS_Binary_Operator: *}
- \p{IDST} \p{IDS_Trinary_Operator} (=
- \p{IDS_Trinary_Operator=Y}) (2)
- \p{IDST: *} \p{IDS_Trinary_Operator: *}
- \p{Imperial_Aramaic} \p{Script=Imperial_Aramaic} (Short:
- \p{Armi}; NOT \p{Block=
- Imperial_Aramaic}) (31)
- \p{In: *} \p{Present_In: *} (Perl extension)
- \p{In_*} \p{Block: *}
- X \p{Indic_Number_Forms} \p{Common_Indic_Number_Forms} (= \p{Block=
- Common_Indic_Number_Forms}) (16)
- \p{Inherited} \p{Script=Inherited} (Short: \p{Zinh})
- (523)
- \p{Initial_Punctuation} \p{General_Category=Initial_Punctuation}
- (Short: \p{Pi}) (12)
- \p{Inscriptional_Pahlavi} \p{Script=Inscriptional_Pahlavi} (Short:
- \p{Phli}; NOT \p{Block=
- Inscriptional_Pahlavi}) (27)
- \p{Inscriptional_Parthian} \p{Script=Inscriptional_Parthian}
- (Short: \p{Prti}; NOT \p{Block=
- Inscriptional_Parthian}) (30)
- X \p{IPA_Ext} \p{IPA_Extensions} (= \p{Block=
- IPA_Extensions}) (96)
- X \p{IPA_Extensions} \p{Block=IPA_Extensions} (Short:
- \p{InIPAExt}) (96)
- \p{Is_*} \p{*} (Any exceptions are individually
- noted beginning with the word NOT.) If
- an entry has flag(s) at its beginning,
- like "D", the "Is_" form has the same
- flag(s)
- \p{Ital} \p{Old_Italic} (= \p{Script=Old_Italic})
- (NOT \p{Block=Old_Italic}) (35)
- X \p{Jamo} \p{Hangul_Jamo} (= \p{Block=Hangul_Jamo})
- (256)
- X \p{Jamo_Ext_A} \p{Hangul_Jamo_Extended_A} (= \p{Block=
- Hangul_Jamo_Extended_A}) (32)
- X \p{Jamo_Ext_B} \p{Hangul_Jamo_Extended_B} (= \p{Block=
- Hangul_Jamo_Extended_B}) (80)
- \p{Java} \p{Javanese} (= \p{Script=Javanese}) (NOT
- \p{Block=Javanese}) (91)
- \p{Javanese} \p{Script=Javanese} (Short: \p{Java}; NOT
- \p{Block=Javanese}) (91)
- \p{Jg: *} \p{Joining_Group: *}
- \p{Join_C} \p{Join_Control} (= \p{Join_Control=Y}) (2)
- \p{Join_C: *} \p{Join_Control: *}
- \p{Join_Control} \p{Join_Control=Y} (Short: \p{JoinC}) (2)
- \p{Join_Control: N*} (Short: \p{JoinC=N}, \P{JoinC}) (1_114_110)
- \p{Join_Control: Y*} (Short: \p{JoinC=Y}, \p{JoinC}) (2)
- \p{Joining_Group: Ain} (Short: \p{Jg=Ain}) (7)
- \p{Joining_Group: Alaph} (Short: \p{Jg=Alaph}) (1)
- \p{Joining_Group: Alef} (Short: \p{Jg=Alef}) (10)
- \p{Joining_Group: Beh} (Short: \p{Jg=Beh}) (20)
- \p{Joining_Group: Beth} (Short: \p{Jg=Beth}) (2)
- \p{Joining_Group: Burushaski_Yeh_Barree} (Short: \p{Jg=
- BurushaskiYehBarree}) (2)
- \p{Joining_Group: Dal} (Short: \p{Jg=Dal}) (14)
- \p{Joining_Group: Dalath_Rish} (Short: \p{Jg=DalathRish}) (4)
- \p{Joining_Group: E} (Short: \p{Jg=E}) (1)
- \p{Joining_Group: Farsi_Yeh} (Short: \p{Jg=FarsiYeh}) (7)
- \p{Joining_Group: Fe} (Short: \p{Jg=Fe}) (1)
- \p{Joining_Group: Feh} (Short: \p{Jg=Feh}) (10)
- \p{Joining_Group: Final_Semkath} (Short: \p{Jg=FinalSemkath}) (1)
- \p{Joining_Group: Gaf} (Short: \p{Jg=Gaf}) (13)
- \p{Joining_Group: Gamal} (Short: \p{Jg=Gamal}) (3)
- \p{Joining_Group: Hah} (Short: \p{Jg=Hah}) (18)
- \p{Joining_Group: Hamza_On_Heh_Goal} (Short: \p{Jg=
- HamzaOnHehGoal}) (1)
- \p{Joining_Group: He} (Short: \p{Jg=He}) (1)
- \p{Joining_Group: Heh} (Short: \p{Jg=Heh}) (1)
- \p{Joining_Group: Heh_Goal} (Short: \p{Jg=HehGoal}) (2)
- \p{Joining_Group: Heth} (Short: \p{Jg=Heth}) (1)
- \p{Joining_Group: Kaf} (Short: \p{Jg=Kaf}) (5)
- \p{Joining_Group: Kaph} (Short: \p{Jg=Kaph}) (1)
- \p{Joining_Group: Khaph} (Short: \p{Jg=Khaph}) (1)
- \p{Joining_Group: Knotted_Heh} (Short: \p{Jg=KnottedHeh}) (2)
- \p{Joining_Group: Lam} (Short: \p{Jg=Lam}) (7)
- \p{Joining_Group: Lamadh} (Short: \p{Jg=Lamadh}) (1)
- \p{Joining_Group: Meem} (Short: \p{Jg=Meem}) (4)
- \p{Joining_Group: Mim} (Short: \p{Jg=Mim}) (1)
- \p{Joining_Group: No_Joining_Group} (Short: \p{Jg=NoJoiningGroup})
- (1_113_870)
- \p{Joining_Group: Noon} (Short: \p{Jg=Noon}) (8)
- \p{Joining_Group: Nun} (Short: \p{Jg=Nun}) (1)
- \p{Joining_Group: Nya} (Short: \p{Jg=Nya}) (1)
- \p{Joining_Group: Pe} (Short: \p{Jg=Pe}) (1)
- \p{Joining_Group: Qaf} (Short: \p{Jg=Qaf}) (5)
- \p{Joining_Group: Qaph} (Short: \p{Jg=Qaph}) (1)
- \p{Joining_Group: Reh} (Short: \p{Jg=Reh}) (17)
- \p{Joining_Group: Reversed_Pe} (Short: \p{Jg=ReversedPe}) (1)
- \p{Joining_Group: Rohingya_Yeh} (Short: \p{Jg=RohingyaYeh}) (1)
- \p{Joining_Group: Sad} (Short: \p{Jg=Sad}) (5)
- \p{Joining_Group: Sadhe} (Short: \p{Jg=Sadhe}) (1)
- \p{Joining_Group: Seen} (Short: \p{Jg=Seen}) (11)
- \p{Joining_Group: Semkath} (Short: \p{Jg=Semkath}) (1)
- \p{Joining_Group: Shin} (Short: \p{Jg=Shin}) (1)
- \p{Joining_Group: Swash_Kaf} (Short: \p{Jg=SwashKaf}) (1)
- \p{Joining_Group: Syriac_Waw} (Short: \p{Jg=SyriacWaw}) (1)
- \p{Joining_Group: Tah} (Short: \p{Jg=Tah}) (4)
- \p{Joining_Group: Taw} (Short: \p{Jg=Taw}) (1)
- \p{Joining_Group: Teh_Marbuta} (Short: \p{Jg=TehMarbuta}) (3)
- \p{Joining_Group: Teh_Marbuta_Goal} \p{Joining_Group=
- Hamza_On_Heh_Goal} (1)
- \p{Joining_Group: Teth} (Short: \p{Jg=Teth}) (2)
- \p{Joining_Group: Waw} (Short: \p{Jg=Waw}) (16)
- \p{Joining_Group: Yeh} (Short: \p{Jg=Yeh}) (10)
- \p{Joining_Group: Yeh_Barree} (Short: \p{Jg=YehBarree}) (2)
- \p{Joining_Group: Yeh_With_Tail} (Short: \p{Jg=YehWithTail}) (1)
- \p{Joining_Group: Yudh} (Short: \p{Jg=Yudh}) (1)
- \p{Joining_Group: Yudh_He} (Short: \p{Jg=YudhHe}) (1)
- \p{Joining_Group: Zain} (Short: \p{Jg=Zain}) (1)
- \p{Joining_Group: Zhain} (Short: \p{Jg=Zhain}) (1)
- \p{Joining_Type: C} \p{Joining_Type=Join_Causing} (3)
- \p{Joining_Type: D} \p{Joining_Type=Dual_Joining} (215)
- \p{Joining_Type: Dual_Joining} (Short: \p{Jt=D}) (215)
- \p{Joining_Type: Join_Causing} (Short: \p{Jt=C}) (3)
- \p{Joining_Type: L} \p{Joining_Type=Left_Joining} (0)
- \p{Joining_Type: Left_Joining} (Short: \p{Jt=L}) (0)
- \p{Joining_Type: Non_Joining} (Short: \p{Jt=U}) (1_112_389)
- \p{Joining_Type: R} \p{Joining_Type=Right_Joining} (82)
- \p{Joining_Type: Right_Joining} (Short: \p{Jt=R}) (82)
- \p{Joining_Type: T} \p{Joining_Type=Transparent} (1423)
- \p{Joining_Type: Transparent} (Short: \p{Jt=T}) (1423)
- \p{Joining_Type: U} \p{Joining_Type=Non_Joining} (1_112_389)
- \p{Jt: *} \p{Joining_Type: *}
- \p{Kaithi} \p{Script=Kaithi} (Short: \p{Kthi}; NOT
- \p{Block=Kaithi}) (66)
- \p{Kali} \p{Kayah_Li} (= \p{Script=Kayah_Li}) (48)
- \p{Kana} \p{Katakana} (= \p{Script=Katakana}) (NOT
- \p{Block=Katakana}) (300)
- X \p{Kana_Sup} \p{Kana_Supplement} (= \p{Block=
- Kana_Supplement}) (256)
- X \p{Kana_Supplement} \p{Block=Kana_Supplement} (Short:
- \p{InKanaSup}) (256)
- X \p{Kanbun} \p{Block=Kanbun} (16)
- X \p{Kangxi} \p{Kangxi_Radicals} (= \p{Block=
- Kangxi_Radicals}) (224)
- X \p{Kangxi_Radicals} \p{Block=Kangxi_Radicals} (Short:
- \p{InKangxi}) (224)
- \p{Kannada} \p{Script=Kannada} (Short: \p{Knda}; NOT
- \p{Block=Kannada}) (86)
- \p{Katakana} \p{Script=Katakana} (Short: \p{Kana}; NOT
- \p{Block=Katakana}) (300)
- X \p{Katakana_Ext} \p{Katakana_Phonetic_Extensions} (=
- \p{Block=Katakana_Phonetic_Extensions})
- (16)
- X \p{Katakana_Phonetic_Extensions} \p{Block=
- Katakana_Phonetic_Extensions} (Short:
- \p{InKatakanaExt}) (16)
- \p{Kayah_Li} \p{Script=Kayah_Li} (Short: \p{Kali}) (48)
- \p{Khar} \p{Kharoshthi} (= \p{Script=Kharoshthi})
- (NOT \p{Block=Kharoshthi}) (65)
- \p{Kharoshthi} \p{Script=Kharoshthi} (Short: \p{Khar};
- NOT \p{Block=Kharoshthi}) (65)
- \p{Khmer} \p{Script=Khmer} (Short: \p{Khmr}; NOT
- \p{Block=Khmer}) (146)
- X \p{Khmer_Symbols} \p{Block=Khmer_Symbols} (32)
- \p{Khmr} \p{Khmer} (= \p{Script=Khmer}) (NOT
- \p{Block=Khmer}) (146)
- \p{Knda} \p{Kannada} (= \p{Script=Kannada}) (NOT
- \p{Block=Kannada}) (86)
- \p{Kthi} \p{Kaithi} (= \p{Script=Kaithi}) (NOT
- \p{Block=Kaithi}) (66)
- \p{L} \p{Letter} (= \p{General_Category=Letter})
- (101_013)
- X \p{L&} \p{Cased_Letter} (= \p{General_Category=
- Cased_Letter}) (3223)
- X \p{L_} \p{Cased_Letter} (= \p{General_Category=
- Cased_Letter}) Note the trailing '_'
- matters in spite of loose matching
- rules. (3223)
- \p{Lana} \p{Tai_Tham} (= \p{Script=Tai_Tham}) (NOT
- \p{Block=Tai_Tham}) (127)
- \p{Lao} \p{Script=Lao} (NOT \p{Block=Lao}) (67)
- \p{Laoo} \p{Lao} (= \p{Script=Lao}) (NOT \p{Block=
- Lao}) (67)
- \p{Latin} \p{Script=Latin} (Short: \p{Latn}) (1272)
- X \p{Latin_1} \p{Latin_1_Supplement} (= \p{Block=
- Latin_1_Supplement}) (128)
- X \p{Latin_1_Sup} \p{Latin_1_Supplement} (= \p{Block=
- Latin_1_Supplement}) (128)
- X \p{Latin_1_Supplement} \p{Block=Latin_1_Supplement} (Short:
- \p{InLatin1}) (128)
- X \p{Latin_Ext_A} \p{Latin_Extended_A} (= \p{Block=
- Latin_Extended_A}) (128)
- X \p{Latin_Ext_Additional} \p{Latin_Extended_Additional} (=
- \p{Block=Latin_Extended_Additional})
- (256)
- X \p{Latin_Ext_B} \p{Latin_Extended_B} (= \p{Block=
- Latin_Extended_B}) (208)
- X \p{Latin_Ext_C} \p{Latin_Extended_C} (= \p{Block=
- Latin_Extended_C}) (32)
- X \p{Latin_Ext_D} \p{Latin_Extended_D} (= \p{Block=
- Latin_Extended_D}) (224)
- X \p{Latin_Extended_A} \p{Block=Latin_Extended_A} (Short:
- \p{InLatinExtA}) (128)
- X \p{Latin_Extended_Additional} \p{Block=Latin_Extended_Additional}
- (Short: \p{InLatinExtAdditional}) (256)
- X \p{Latin_Extended_B} \p{Block=Latin_Extended_B} (Short:
- \p{InLatinExtB}) (208)
- X \p{Latin_Extended_C} \p{Block=Latin_Extended_C} (Short:
- \p{InLatinExtC}) (32)
- X \p{Latin_Extended_D} \p{Block=Latin_Extended_D} (Short:
- \p{InLatinExtD}) (224)
- \p{Latn} \p{Latin} (= \p{Script=Latin}) (1272)
- \p{Lb: *} \p{Line_Break: *}
- \p{LC} \p{Cased_Letter} (= \p{General_Category=
- Cased_Letter}) (3223)
- \p{Lepc} \p{Lepcha} (= \p{Script=Lepcha}) (NOT
- \p{Block=Lepcha}) (74)
- \p{Lepcha} \p{Script=Lepcha} (Short: \p{Lepc}; NOT
- \p{Block=Lepcha}) (74)
- \p{Letter} \p{General_Category=Letter} (Short: \p{L})
- (101_013)
- \p{Letter_Number} \p{General_Category=Letter_Number} (Short:
- \p{Nl}) (224)
- X \p{Letterlike_Symbols} \p{Block=Letterlike_Symbols} (80)
- \p{Limb} \p{Limbu} (= \p{Script=Limbu}) (NOT
- \p{Block=Limbu}) (66)
- \p{Limbu} \p{Script=Limbu} (Short: \p{Limb}; NOT
- \p{Block=Limbu}) (66)
- \p{Linb} \p{Linear_B} (= \p{Script=Linear_B}) (211)
- \p{Line_Break: AI} \p{Line_Break=Ambiguous} (687)
- \p{Line_Break: AL} \p{Line_Break=Alphabetic} (15_355)
- \p{Line_Break: Alphabetic} (Short: \p{Lb=AL}) (15_355)
- \p{Line_Break: Ambiguous} (Short: \p{Lb=AI}) (687)
- \p{Line_Break: B2} \p{Line_Break=Break_Both} (3)
- \p{Line_Break: BA} \p{Line_Break=Break_After} (151)
- \p{Line_Break: BB} \p{Line_Break=Break_Before} (19)
- \p{Line_Break: BK} \p{Line_Break=Mandatory_Break} (4)
- \p{Line_Break: Break_After} (Short: \p{Lb=BA}) (151)
- \p{Line_Break: Break_Before} (Short: \p{Lb=BB}) (19)
- \p{Line_Break: Break_Both} (Short: \p{Lb=B2}) (3)
- \p{Line_Break: Break_Symbols} (Short: \p{Lb=SY}) (1)
- \p{Line_Break: Carriage_Return} (Short: \p{Lb=CR}) (1)
- \p{Line_Break: CB} \p{Line_Break=Contingent_Break} (1)
- \p{Line_Break: CJ} \p{Line_Break=
- Conditional_Japanese_Starter} (51)
- \p{Line_Break: CL} \p{Line_Break=Close_Punctuation} (87)
- \p{Line_Break: Close_Parenthesis} (Short: \p{Lb=CP}) (2)
- \p{Line_Break: Close_Punctuation} (Short: \p{Lb=CL}) (87)
- \p{Line_Break: CM} \p{Line_Break=Combining_Mark} (1628)
- \p{Line_Break: Combining_Mark} (Short: \p{Lb=CM}) (1628)
- \p{Line_Break: Complex_Context} (Short: \p{Lb=SA}) (665)
- \p{Line_Break: Conditional_Japanese_Starter} (Short: \p{Lb=CJ})
- (51)
- \p{Line_Break: Contingent_Break} (Short: \p{Lb=CB}) (1)
- \p{Line_Break: CP} \p{Line_Break=Close_Parenthesis} (2)
- \p{Line_Break: CR} \p{Line_Break=Carriage_Return} (1)
- \p{Line_Break: EX} \p{Line_Break=Exclamation} (34)
- \p{Line_Break: Exclamation} (Short: \p{Lb=EX}) (34)
- \p{Line_Break: GL} \p{Line_Break=Glue} (18)
- \p{Line_Break: Glue} (Short: \p{Lb=GL}) (18)
- \p{Line_Break: H2} (Short: \p{Lb=H2}) (399)
- \p{Line_Break: H3} (Short: \p{Lb=H3}) (10_773)
- \p{Line_Break: Hebrew_Letter} (Short: \p{Lb=HL}) (74)
- \p{Line_Break: HL} \p{Line_Break=Hebrew_Letter} (74)
- \p{Line_Break: HY} \p{Line_Break=Hyphen} (1)
- \p{Line_Break: Hyphen} (Short: \p{Lb=HY}) (1)
- \p{Line_Break: ID} \p{Line_Break=Ideographic} (162_700)
- \p{Line_Break: Ideographic} (Short: \p{Lb=ID}) (162_700)
- \p{Line_Break: IN} \p{Line_Break=Inseparable} (4)
- \p{Line_Break: Infix_Numeric} (Short: \p{Lb=IS}) (13)
- \p{Line_Break: Inseparable} (Short: \p{Lb=IN}) (4)
- \p{Line_Break: Inseperable} \p{Line_Break=Inseparable} (4)
- \p{Line_Break: IS} \p{Line_Break=Infix_Numeric} (13)
- \p{Line_Break: JL} (Short: \p{Lb=JL}) (125)
- \p{Line_Break: JT} (Short: \p{Lb=JT}) (137)
- \p{Line_Break: JV} (Short: \p{Lb=JV}) (95)
- \p{Line_Break: LF} \p{Line_Break=Line_Feed} (1)
- \p{Line_Break: Line_Feed} (Short: \p{Lb=LF}) (1)
- \p{Line_Break: Mandatory_Break} (Short: \p{Lb=BK}) (4)
- \p{Line_Break: Next_Line} (Short: \p{Lb=NL}) (1)
- \p{Line_Break: NL} \p{Line_Break=Next_Line} (1)
- \p{Line_Break: Nonstarter} (Short: \p{Lb=NS}) (26)
- \p{Line_Break: NS} \p{Line_Break=Nonstarter} (26)
- \p{Line_Break: NU} \p{Line_Break=Numeric} (452)
- \p{Line_Break: Numeric} (Short: \p{Lb=NU}) (452)
- \p{Line_Break: OP} \p{Line_Break=Open_Punctuation} (81)
- \p{Line_Break: Open_Punctuation} (Short: \p{Lb=OP}) (81)
- \p{Line_Break: PO} \p{Line_Break=Postfix_Numeric} (28)
- \p{Line_Break: Postfix_Numeric} (Short: \p{Lb=PO}) (28)
- \p{Line_Break: PR} \p{Line_Break=Prefix_Numeric} (46)
- \p{Line_Break: Prefix_Numeric} (Short: \p{Lb=PR}) (46)
- \p{Line_Break: QU} \p{Line_Break=Quotation} (34)
- \p{Line_Break: Quotation} (Short: \p{Lb=QU}) (34)
- \p{Line_Break: Regional_Indicator} (Short: \p{Lb=RI}) (26)
- \p{Line_Break: RI} \p{Line_Break=Regional_Indicator} (26)
- \p{Line_Break: SA} \p{Line_Break=Complex_Context} (665)
- D \p{Line_Break: SG} \p{Line_Break=Surrogate} (2048)
- \p{Line_Break: SP} \p{Line_Break=Space} (1)
- \p{Line_Break: Space} (Short: \p{Lb=SP}) (1)
- D \p{Line_Break: Surrogate} Deprecated by Unicode because surrogates
- should never appear in well-formed text,
- and therefore shouldn't be the basis for
- line breaking (Short: \p{Lb=SG}) (2048)
- \p{Line_Break: SY} \p{Line_Break=Break_Symbols} (1)
- \p{Line_Break: Unknown} (Short: \p{Lb=XX}) (918_337)
- \p{Line_Break: WJ} \p{Line_Break=Word_Joiner} (2)
- \p{Line_Break: Word_Joiner} (Short: \p{Lb=WJ}) (2)
- \p{Line_Break: XX} \p{Line_Break=Unknown} (918_337)
- \p{Line_Break: ZW} \p{Line_Break=ZWSpace} (1)
- \p{Line_Break: ZWSpace} (Short: \p{Lb=ZW}) (1)
- \p{Line_Separator} \p{General_Category=Line_Separator}
- (Short: \p{Zl}) (1)
- \p{Linear_B} \p{Script=Linear_B} (Short: \p{Linb}) (211)
- X \p{Linear_B_Ideograms} \p{Block=Linear_B_Ideograms} (128)
- X \p{Linear_B_Syllabary} \p{Block=Linear_B_Syllabary} (128)
- \p{Lisu} \p{Script=Lisu} (48)
- \p{Ll} \p{Lowercase_Letter} (=
- \p{General_Category=Lowercase_Letter})
- (/i= General_Category=Cased_Letter)
- (1751)
- \p{Lm} \p{Modifier_Letter} (=
- \p{General_Category=Modifier_Letter})
- (237)
- \p{Lo} \p{Other_Letter} (= \p{General_Category=
- Other_Letter}) (97_553)
- \p{LOE} \p{Logical_Order_Exception} (=
- \p{Logical_Order_Exception=Y}) (15)
- \p{LOE: *} \p{Logical_Order_Exception: *}
- \p{Logical_Order_Exception} \p{Logical_Order_Exception=Y} (Short:
- \p{LOE}) (15)
- \p{Logical_Order_Exception: N*} (Short: \p{LOE=N}, \P{LOE})
- (1_114_097)
- \p{Logical_Order_Exception: Y*} (Short: \p{LOE=Y}, \p{LOE}) (15)
- X \p{Low_Surrogates} \p{Block=Low_Surrogates} (1024)
- \p{Lower} \p{Lowercase=Y} (/i= Cased=Yes) (1934)
- \p{Lower: *} \p{Lowercase: *}
- \p{Lowercase} \p{Lower} (= \p{Lowercase=Y}) (/i= Cased=
- Yes) (1934)
- \p{Lowercase: N*} (Short: \p{Lower=N}, \P{Lower}; /i= Cased=
- No) (1_112_178)
- \p{Lowercase: Y*} (Short: \p{Lower=Y}, \p{Lower}; /i= Cased=
- Yes) (1934)
- \p{Lowercase_Letter} \p{General_Category=Lowercase_Letter}
- (Short: \p{Ll}; /i= General_Category=
- Cased_Letter) (1751)
- \p{Lt} \p{Titlecase_Letter} (=
- \p{General_Category=Titlecase_Letter})
- (/i= General_Category=Cased_Letter) (31)
- \p{Lu} \p{Uppercase_Letter} (=
- \p{General_Category=Uppercase_Letter})
- (/i= General_Category=Cased_Letter)
- (1441)
- \p{Lyci} \p{Lycian} (= \p{Script=Lycian}) (NOT
- \p{Block=Lycian}) (29)
- \p{Lycian} \p{Script=Lycian} (Short: \p{Lyci}; NOT
- \p{Block=Lycian}) (29)
- \p{Lydi} \p{Lydian} (= \p{Script=Lydian}) (NOT
- \p{Block=Lydian}) (27)
- \p{Lydian} \p{Script=Lydian} (Short: \p{Lydi}; NOT
- \p{Block=Lydian}) (27)
- \p{M} \p{Mark} (= \p{General_Category=Mark})
- (1645)
- X \p{Mahjong} \p{Mahjong_Tiles} (= \p{Block=
- Mahjong_Tiles}) (48)
- X \p{Mahjong_Tiles} \p{Block=Mahjong_Tiles} (Short:
- \p{InMahjong}) (48)
- \p{Malayalam} \p{Script=Malayalam} (Short: \p{Mlym}; NOT
- \p{Block=Malayalam}) (98)
- \p{Mand} \p{Mandaic} (= \p{Script=Mandaic}) (NOT
- \p{Block=Mandaic}) (29)
- \p{Mandaic} \p{Script=Mandaic} (Short: \p{Mand}; NOT
- \p{Block=Mandaic}) (29)
- \p{Mark} \p{General_Category=Mark} (Short: \p{M})
- (1645)
- \p{Math} \p{Math=Y} (2310)
- \p{Math: N*} (Single: \P{Math}) (1_111_802)
- \p{Math: Y*} (Single: \p{Math}) (2310)
- X \p{Math_Alphanum} \p{Mathematical_Alphanumeric_Symbols} (=
- \p{Block=
- Mathematical_Alphanumeric_Symbols})
- (1024)
- X \p{Math_Operators} \p{Mathematical_Operators} (= \p{Block=
- Mathematical_Operators}) (256)
- \p{Math_Symbol} \p{General_Category=Math_Symbol} (Short:
- \p{Sm}) (952)
- X \p{Mathematical_Alphanumeric_Symbols} \p{Block=
- Mathematical_Alphanumeric_Symbols}
- (Short: \p{InMathAlphanum}) (1024)
- X \p{Mathematical_Operators} \p{Block=Mathematical_Operators}
- (Short: \p{InMathOperators}) (256)
- \p{Mc} \p{Spacing_Mark} (= \p{General_Category=
- Spacing_Mark}) (353)
- \p{Me} \p{Enclosing_Mark} (= \p{General_Category=
- Enclosing_Mark}) (12)
- \p{Meetei_Mayek} \p{Script=Meetei_Mayek} (Short: \p{Mtei};
- NOT \p{Block=Meetei_Mayek}) (79)
- X \p{Meetei_Mayek_Ext} \p{Meetei_Mayek_Extensions} (= \p{Block=
- Meetei_Mayek_Extensions}) (32)
- X \p{Meetei_Mayek_Extensions} \p{Block=Meetei_Mayek_Extensions}
- (Short: \p{InMeeteiMayekExt}) (32)
- \p{Merc} \p{Meroitic_Cursive} (= \p{Script=
- Meroitic_Cursive}) (NOT \p{Block=
- Meroitic_Cursive}) (26)
- \p{Mero} \p{Meroitic_Hieroglyphs} (= \p{Script=
- Meroitic_Hieroglyphs}) (32)
- \p{Meroitic_Cursive} \p{Script=Meroitic_Cursive} (Short:
- \p{Merc}; NOT \p{Block=
- Meroitic_Cursive}) (26)
- \p{Meroitic_Hieroglyphs} \p{Script=Meroitic_Hieroglyphs} (Short:
- \p{Mero}) (32)
- \p{Miao} \p{Script=Miao} (NOT \p{Block=Miao}) (133)
- X \p{Misc_Arrows} \p{Miscellaneous_Symbols_And_Arrows} (=
- \p{Block=
- Miscellaneous_Symbols_And_Arrows}) (256)
- X \p{Misc_Math_Symbols_A} \p{Miscellaneous_Mathematical_Symbols_A}
- (= \p{Block=
- Miscellaneous_Mathematical_Symbols_A})
- (48)
- X \p{Misc_Math_Symbols_B} \p{Miscellaneous_Mathematical_Symbols_B}
- (= \p{Block=
- Miscellaneous_Mathematical_Symbols_B})
- (128)
- X \p{Misc_Pictographs} \p{Miscellaneous_Symbols_And_Pictographs}
- (= \p{Block=
- Miscellaneous_Symbols_And_Pictographs})
- (768)
- X \p{Misc_Symbols} \p{Miscellaneous_Symbols} (= \p{Block=
- Miscellaneous_Symbols}) (256)
- X \p{Misc_Technical} \p{Miscellaneous_Technical} (= \p{Block=
- Miscellaneous_Technical}) (256)
- X \p{Miscellaneous_Mathematical_Symbols_A} \p{Block=
- Miscellaneous_Mathematical_Symbols_A}
- (Short: \p{InMiscMathSymbolsA}) (48)
- X \p{Miscellaneous_Mathematical_Symbols_B} \p{Block=
- Miscellaneous_Mathematical_Symbols_B}
- (Short: \p{InMiscMathSymbolsB}) (128)
- X \p{Miscellaneous_Symbols} \p{Block=Miscellaneous_Symbols} (Short:
- \p{InMiscSymbols}) (256)
- X \p{Miscellaneous_Symbols_And_Arrows} \p{Block=
- Miscellaneous_Symbols_And_Arrows}
- (Short: \p{InMiscArrows}) (256)
- X \p{Miscellaneous_Symbols_And_Pictographs} \p{Block=
- Miscellaneous_Symbols_And_Pictographs}
- (Short: \p{InMiscPictographs}) (768)
- X \p{Miscellaneous_Technical} \p{Block=Miscellaneous_Technical}
- (Short: \p{InMiscTechnical}) (256)
- \p{Mlym} \p{Malayalam} (= \p{Script=Malayalam})
- (NOT \p{Block=Malayalam}) (98)
- \p{Mn} \p{Nonspacing_Mark} (=
- \p{General_Category=Nonspacing_Mark})
- (1280)
- \p{Modifier_Letter} \p{General_Category=Modifier_Letter}
- (Short: \p{Lm}) (237)
- X \p{Modifier_Letters} \p{Spacing_Modifier_Letters} (= \p{Block=
- Spacing_Modifier_Letters}) (80)
- \p{Modifier_Symbol} \p{General_Category=Modifier_Symbol}
- (Short: \p{Sk}) (115)
- X \p{Modifier_Tone_Letters} \p{Block=Modifier_Tone_Letters} (32)
- \p{Mong} \p{Mongolian} (= \p{Script=Mongolian})
- (NOT \p{Block=Mongolian}) (153)
- \p{Mongolian} \p{Script=Mongolian} (Short: \p{Mong}; NOT
- \p{Block=Mongolian}) (153)
- \p{Mtei} \p{Meetei_Mayek} (= \p{Script=
- Meetei_Mayek}) (NOT \p{Block=
- Meetei_Mayek}) (79)
- X \p{Music} \p{Musical_Symbols} (= \p{Block=
- Musical_Symbols}) (256)
- X \p{Musical_Symbols} \p{Block=Musical_Symbols} (Short:
- \p{InMusic}) (256)
- \p{Myanmar} \p{Script=Myanmar} (Short: \p{Mymr}; NOT
- \p{Block=Myanmar}) (188)
- X \p{Myanmar_Ext_A} \p{Myanmar_Extended_A} (= \p{Block=
- Myanmar_Extended_A}) (32)
- X \p{Myanmar_Extended_A} \p{Block=Myanmar_Extended_A} (Short:
- \p{InMyanmarExtA}) (32)
- \p{Mymr} \p{Myanmar} (= \p{Script=Myanmar}) (NOT
- \p{Block=Myanmar}) (188)
- \p{N} \p{Number} (= \p{General_Category=Number})
- (1148)
- X \p{NB} \p{No_Block} (= \p{Block=No_Block})
- (860_672)
- \p{NChar} \p{Noncharacter_Code_Point} (=
- \p{Noncharacter_Code_Point=Y}) (66)
- \p{NChar: *} \p{Noncharacter_Code_Point: *}
- \p{Nd} \p{Digit} (= \p{General_Category=
- Decimal_Number}) (460)
- \p{New_Tai_Lue} \p{Script=New_Tai_Lue} (Short: \p{Talu};
- NOT \p{Block=New_Tai_Lue}) (83)
- \p{NFC_QC: *} \p{NFC_Quick_Check: *}
- \p{NFC_Quick_Check: M} \p{NFC_Quick_Check=Maybe} (104)
- \p{NFC_Quick_Check: Maybe} (Short: \p{NFCQC=M}) (104)
- \p{NFC_Quick_Check: N} \p{NFC_Quick_Check=No} (NOT
- \P{NFC_Quick_Check} NOR \P{NFC_QC})
- (1120)
- \p{NFC_Quick_Check: No} (Short: \p{NFCQC=N}; NOT
- \P{NFC_Quick_Check} NOR \P{NFC_QC})
- (1120)
- \p{NFC_Quick_Check: Y} \p{NFC_Quick_Check=Yes} (NOT
- \p{NFC_Quick_Check} NOR \p{NFC_QC})
- (1_112_888)
- \p{NFC_Quick_Check: Yes} (Short: \p{NFCQC=Y}; NOT
- \p{NFC_Quick_Check} NOR \p{NFC_QC})
- (1_112_888)
- \p{NFD_QC: *} \p{NFD_Quick_Check: *}
- \p{NFD_Quick_Check: N} \p{NFD_Quick_Check=No} (NOT
- \P{NFD_Quick_Check} NOR \P{NFD_QC})
- (13_225)
- \p{NFD_Quick_Check: No} (Short: \p{NFDQC=N}; NOT
- \P{NFD_Quick_Check} NOR \P{NFD_QC})
- (13_225)
- \p{NFD_Quick_Check: Y} \p{NFD_Quick_Check=Yes} (NOT
- \p{NFD_Quick_Check} NOR \p{NFD_QC})
- (1_100_887)
- \p{NFD_Quick_Check: Yes} (Short: \p{NFDQC=Y}; NOT
- \p{NFD_Quick_Check} NOR \p{NFD_QC})
- (1_100_887)
- \p{NFKC_QC: *} \p{NFKC_Quick_Check: *}
- \p{NFKC_Quick_Check: M} \p{NFKC_Quick_Check=Maybe} (104)
- \p{NFKC_Quick_Check: Maybe} (Short: \p{NFKCQC=M}) (104)
- \p{NFKC_Quick_Check: N} \p{NFKC_Quick_Check=No} (NOT
- \P{NFKC_Quick_Check} NOR \P{NFKC_QC})
- (4787)
- \p{NFKC_Quick_Check: No} (Short: \p{NFKCQC=N}; NOT
- \P{NFKC_Quick_Check} NOR \P{NFKC_QC})
- (4787)
- \p{NFKC_Quick_Check: Y} \p{NFKC_Quick_Check=Yes} (NOT
- \p{NFKC_Quick_Check} NOR \p{NFKC_QC})
- (1_109_221)
- \p{NFKC_Quick_Check: Yes} (Short: \p{NFKCQC=Y}; NOT
- \p{NFKC_Quick_Check} NOR \p{NFKC_QC})
- (1_109_221)
- \p{NFKD_QC: *} \p{NFKD_Quick_Check: *}
- \p{NFKD_Quick_Check: N} \p{NFKD_Quick_Check=No} (NOT
- \P{NFKD_Quick_Check} NOR \P{NFKD_QC})
- (16_880)
- \p{NFKD_Quick_Check: No} (Short: \p{NFKDQC=N}; NOT
- \P{NFKD_Quick_Check} NOR \P{NFKD_QC})
- (16_880)
- \p{NFKD_Quick_Check: Y} \p{NFKD_Quick_Check=Yes} (NOT
- \p{NFKD_Quick_Check} NOR \p{NFKD_QC})
- (1_097_232)
- \p{NFKD_Quick_Check: Yes} (Short: \p{NFKDQC=Y}; NOT
- \p{NFKD_Quick_Check} NOR \p{NFKD_QC})
- (1_097_232)
- \p{Nko} \p{Script=Nko} (NOT \p{NKo}) (59)
- \p{Nkoo} \p{Nko} (= \p{Script=Nko}) (NOT \p{NKo})
- (59)
- \p{Nl} \p{Letter_Number} (= \p{General_Category=
- Letter_Number}) (224)
- \p{No} \p{Other_Number} (= \p{General_Category=
- Other_Number}) (464)
- X \p{No_Block} \p{Block=No_Block} (Short: \p{InNB})
- (860_672)
- \p{Noncharacter_Code_Point} \p{Noncharacter_Code_Point=Y} (Short:
- \p{NChar}) (66)
- \p{Noncharacter_Code_Point: N*} (Short: \p{NChar=N}, \P{NChar})
- (1_114_046)
- \p{Noncharacter_Code_Point: Y*} (Short: \p{NChar=Y}, \p{NChar})
- (66)
- \p{Nonspacing_Mark} \p{General_Category=Nonspacing_Mark}
- (Short: \p{Mn}) (1280)
- \p{Nt: *} \p{Numeric_Type: *}
- \p{Number} \p{General_Category=Number} (Short: \p{N})
- (1148)
- X \p{Number_Forms} \p{Block=Number_Forms} (64)
- \p{Numeric_Type: De} \p{Numeric_Type=Decimal} (460)
- \p{Numeric_Type: Decimal} (Short: \p{Nt=De}) (460)
- \p{Numeric_Type: Di} \p{Numeric_Type=Digit} (128)
- \p{Numeric_Type: Digit} (Short: \p{Nt=Di}) (128)
- \p{Numeric_Type: None} (Short: \p{Nt=None}) (1_112_883)
- \p{Numeric_Type: Nu} \p{Numeric_Type=Numeric} (641)
- \p{Numeric_Type: Numeric} (Short: \p{Nt=Nu}) (641)
- T \p{Numeric_Value: -1} (Short: \p{Nv=-1}) (2)
- T \p{Numeric_Value: -1/2} (Short: \p{Nv=-1/2}) (1)
- T \p{Numeric_Value: 0} (Short: \p{Nv=0}) (60)
- T \p{Numeric_Value: 1/16} (Short: \p{Nv=1/16}) (3)
- T \p{Numeric_Value: 1/10} (Short: \p{Nv=1/10}) (1)
- T \p{Numeric_Value: 1/9} (Short: \p{Nv=1/9}) (1)
- T \p{Numeric_Value: 1/8} (Short: \p{Nv=1/8}) (5)
- T \p{Numeric_Value: 1/7} (Short: \p{Nv=1/7}) (1)
- T \p{Numeric_Value: 1/6} (Short: \p{Nv=1/6}) (2)
- T \p{Numeric_Value: 3/16} (Short: \p{Nv=3/16}) (3)
- T \p{Numeric_Value: 1/5} (Short: \p{Nv=1/5}) (1)
- T \p{Numeric_Value: 1/4} (Short: \p{Nv=1/4}) (9)
- T \p{Numeric_Value: 1/3} (Short: \p{Nv=1/3}) (4)
- T \p{Numeric_Value: 3/8} (Short: \p{Nv=3/8}) (1)
- T \p{Numeric_Value: 2/5} (Short: \p{Nv=2/5}) (1)
- T \p{Numeric_Value: 1/2} (Short: \p{Nv=1/2}) (10)
- T \p{Numeric_Value: 3/5} (Short: \p{Nv=3/5}) (1)
- T \p{Numeric_Value: 5/8} (Short: \p{Nv=5/8}) (1)
- T \p{Numeric_Value: 2/3} (Short: \p{Nv=2/3}) (5)
- T \p{Numeric_Value: 3/4} (Short: \p{Nv=3/4}) (6)
- T \p{Numeric_Value: 4/5} (Short: \p{Nv=4/5}) (1)
- T \p{Numeric_Value: 5/6} (Short: \p{Nv=5/6}) (2)
- T \p{Numeric_Value: 7/8} (Short: \p{Nv=7/8}) (1)
- T \p{Numeric_Value: 1} (Short: \p{Nv=1}) (97)
- T \p{Numeric_Value: 3/2} (Short: \p{Nv=3/2}) (1)
- T \p{Numeric_Value: 2} (Short: \p{Nv=2}) (100)
- T \p{Numeric_Value: 5/2} (Short: \p{Nv=5/2}) (1)
- T \p{Numeric_Value: 3} (Short: \p{Nv=3}) (102)
- T \p{Numeric_Value: 7/2} (Short: \p{Nv=7/2}) (1)
- T \p{Numeric_Value: 4} (Short: \p{Nv=4}) (93)
- T \p{Numeric_Value: 9/2} (Short: \p{Nv=9/2}) (1)
- T \p{Numeric_Value: 5} (Short: \p{Nv=5}) (90)
- T \p{Numeric_Value: 11/2} (Short: \p{Nv=11/2}) (1)
- T \p{Numeric_Value: 6} (Short: \p{Nv=6}) (82)
- T \p{Numeric_Value: 13/2} (Short: \p{Nv=13/2}) (1)
- T \p{Numeric_Value: 7} (Short: \p{Nv=7}) (81)
- T \p{Numeric_Value: 15/2} (Short: \p{Nv=15/2}) (1)
- T \p{Numeric_Value: 8} (Short: \p{Nv=8}) (77)
- T \p{Numeric_Value: 17/2} (Short: \p{Nv=17/2}) (1)
- T \p{Numeric_Value: 9} (Short: \p{Nv=9}) (81)
- T \p{Numeric_Value: 10} (Short: \p{Nv=10}) (40)
- T \p{Numeric_Value: 11} (Short: \p{Nv=11}) (6)
- T \p{Numeric_Value: 12} (Short: \p{Nv=12}) (6)
- T \p{Numeric_Value: 13} (Short: \p{Nv=13}) (4)
- T \p{Numeric_Value: 14} (Short: \p{Nv=14}) (4)
- T \p{Numeric_Value: 15} (Short: \p{Nv=15}) (4)
- T \p{Numeric_Value: 16} (Short: \p{Nv=16}) (5)
- T \p{Numeric_Value: 17} (Short: \p{Nv=17}) (5)
- T \p{Numeric_Value: 18} (Short: \p{Nv=18}) (5)
- T \p{Numeric_Value: 19} (Short: \p{Nv=19}) (5)
- T \p{Numeric_Value: 20} (Short: \p{Nv=20}) (19)
- T \p{Numeric_Value: 21} (Short: \p{Nv=21}) (1)
- T \p{Numeric_Value: 22} (Short: \p{Nv=22}) (1)
- T \p{Numeric_Value: 23} (Short: \p{Nv=23}) (1)
- T \p{Numeric_Value: 24} (Short: \p{Nv=24}) (1)
- T \p{Numeric_Value: 25} (Short: \p{Nv=25}) (1)
- T \p{Numeric_Value: 26} (Short: \p{Nv=26}) (1)
- T \p{Numeric_Value: 27} (Short: \p{Nv=27}) (1)
- T \p{Numeric_Value: 28} (Short: \p{Nv=28}) (1)
- T \p{Numeric_Value: 29} (Short: \p{Nv=29}) (1)
- T \p{Numeric_Value: 30} (Short: \p{Nv=30}) (11)
- T \p{Numeric_Value: 31} (Short: \p{Nv=31}) (1)
- T \p{Numeric_Value: 32} (Short: \p{Nv=32}) (1)
- T \p{Numeric_Value: 33} (Short: \p{Nv=33}) (1)
- T \p{Numeric_Value: 34} (Short: \p{Nv=34}) (1)
- T \p{Numeric_Value: 35} (Short: \p{Nv=35}) (1)
- T \p{Numeric_Value: 36} (Short: \p{Nv=36}) (1)
- T \p{Numeric_Value: 37} (Short: \p{Nv=37}) (1)
- T \p{Numeric_Value: 38} (Short: \p{Nv=38}) (1)
- T \p{Numeric_Value: 39} (Short: \p{Nv=39}) (1)
- T \p{Numeric_Value: 40} (Short: \p{Nv=40}) (10)
- T \p{Numeric_Value: 41} (Short: \p{Nv=41}) (1)
- T \p{Numeric_Value: 42} (Short: \p{Nv=42}) (1)
- T \p{Numeric_Value: 43} (Short: \p{Nv=43}) (1)
- T \p{Numeric_Value: 44} (Short: \p{Nv=44}) (1)
- T \p{Numeric_Value: 45} (Short: \p{Nv=45}) (1)
- T \p{Numeric_Value: 46} (Short: \p{Nv=46}) (1)
- T \p{Numeric_Value: 47} (Short: \p{Nv=47}) (1)
- T \p{Numeric_Value: 48} (Short: \p{Nv=48}) (1)
- T \p{Numeric_Value: 49} (Short: \p{Nv=49}) (1)
- T \p{Numeric_Value: 50} (Short: \p{Nv=50}) (20)
- T \p{Numeric_Value: 60} (Short: \p{Nv=60}) (6)
- T \p{Numeric_Value: 70} (Short: \p{Nv=70}) (6)
- T \p{Numeric_Value: 80} (Short: \p{Nv=80}) (6)
- T \p{Numeric_Value: 90} (Short: \p{Nv=90}) (6)
- T \p{Numeric_Value: 100} (Short: \p{Nv=100}) (20)
- T \p{Numeric_Value: 200} (Short: \p{Nv=200}) (2)
- T \p{Numeric_Value: 300} (Short: \p{Nv=300}) (3)
- T \p{Numeric_Value: 400} (Short: \p{Nv=400}) (2)
- T \p{Numeric_Value: 500} (Short: \p{Nv=500}) (12)
- T \p{Numeric_Value: 600} (Short: \p{Nv=600}) (2)
- T \p{Numeric_Value: 700} (Short: \p{Nv=700}) (2)
- T \p{Numeric_Value: 800} (Short: \p{Nv=800}) (2)
- T \p{Numeric_Value: 900} (Short: \p{Nv=900}) (3)
- T \p{Numeric_Value: 1000} (Short: \p{Nv=1000}) (17)
- T \p{Numeric_Value: 2000} (Short: \p{Nv=2000}) (1)
- T \p{Numeric_Value: 3000} (Short: \p{Nv=3000}) (1)
- T \p{Numeric_Value: 4000} (Short: \p{Nv=4000}) (1)
- T \p{Numeric_Value: 5000} (Short: \p{Nv=5000}) (5)
- T \p{Numeric_Value: 6000} (Short: \p{Nv=6000}) (1)
- T \p{Numeric_Value: 7000} (Short: \p{Nv=7000}) (1)
- T \p{Numeric_Value: 8000} (Short: \p{Nv=8000}) (1)
- T \p{Numeric_Value: 9000} (Short: \p{Nv=9000}) (1)
- T \p{Numeric_Value: 10000} (= 1.0e+04) (Short: \p{Nv=10000}) (7)
- T \p{Numeric_Value: 20000} (= 2.0e+04) (Short: \p{Nv=20000}) (1)
- T \p{Numeric_Value: 30000} (= 3.0e+04) (Short: \p{Nv=30000}) (1)
- T \p{Numeric_Value: 40000} (= 4.0e+04) (Short: \p{Nv=40000}) (1)
- T \p{Numeric_Value: 50000} (= 5.0e+04) (Short: \p{Nv=50000}) (4)
- T \p{Numeric_Value: 60000} (= 6.0e+04) (Short: \p{Nv=60000}) (1)
- T \p{Numeric_Value: 70000} (= 7.0e+04) (Short: \p{Nv=70000}) (1)
- T \p{Numeric_Value: 80000} (= 8.0e+04) (Short: \p{Nv=80000}) (1)
- T \p{Numeric_Value: 90000} (= 9.0e+04) (Short: \p{Nv=90000}) (1)
- T \p{Numeric_Value: 100000} (= 1.0e+05) (Short: \p{Nv=100000}) (1)
- T \p{Numeric_Value: 216000} (= 2.2e+05) (Short: \p{Nv=216000}) (1)
- T \p{Numeric_Value: 432000} (= 4.3e+05) (Short: \p{Nv=432000}) (1)
- T \p{Numeric_Value: 100000000} (= 1.0e+08) (Short: \p{Nv=100000000})
- (2)
- T \p{Numeric_Value: 1000000000000} (= 1.0e+12) (Short: \p{Nv=
- 1000000000000}) (1)
- \p{Numeric_Value: NaN} (Short: \p{Nv=NaN}) (1_112_883)
- \p{Nv: *} \p{Numeric_Value: *}
- X \p{OCR} \p{Optical_Character_Recognition} (=
- \p{Block=Optical_Character_Recognition})
- (32)
- \p{Ogam} \p{Ogham} (= \p{Script=Ogham}) (NOT
- \p{Block=Ogham}) (29)
- \p{Ogham} \p{Script=Ogham} (Short: \p{Ogam}; NOT
- \p{Block=Ogham}) (29)
- \p{Ol_Chiki} \p{Script=Ol_Chiki} (Short: \p{Olck}) (48)
- \p{Olck} \p{Ol_Chiki} (= \p{Script=Ol_Chiki}) (48)
- \p{Old_Italic} \p{Script=Old_Italic} (Short: \p{Ital};
- NOT \p{Block=Old_Italic}) (35)
- \p{Old_Persian} \p{Script=Old_Persian} (Short: \p{Xpeo};
- NOT \p{Block=Old_Persian}) (50)
- \p{Old_South_Arabian} \p{Script=Old_South_Arabian} (Short:
- \p{Sarb}) (32)
- \p{Old_Turkic} \p{Script=Old_Turkic} (Short: \p{Orkh};
- NOT \p{Block=Old_Turkic}) (73)
- \p{Open_Punctuation} \p{General_Category=Open_Punctuation}
- (Short: \p{Ps}) (72)
- X \p{Optical_Character_Recognition} \p{Block=
- Optical_Character_Recognition} (Short:
- \p{InOCR}) (32)
- \p{Oriya} \p{Script=Oriya} (Short: \p{Orya}; NOT
- \p{Block=Oriya}) (90)
- \p{Orkh} \p{Old_Turkic} (= \p{Script=Old_Turkic})
- (NOT \p{Block=Old_Turkic}) (73)
- \p{Orya} \p{Oriya} (= \p{Script=Oriya}) (NOT
- \p{Block=Oriya}) (90)
- \p{Osma} \p{Osmanya} (= \p{Script=Osmanya}) (NOT
- \p{Block=Osmanya}) (40)
- \p{Osmanya} \p{Script=Osmanya} (Short: \p{Osma}; NOT
- \p{Block=Osmanya}) (40)
- \p{Other} \p{General_Category=Other} (Short: \p{C})
- (1_004_134)
- \p{Other_Letter} \p{General_Category=Other_Letter} (Short:
- \p{Lo}) (97_553)
- \p{Other_Number} \p{General_Category=Other_Number} (Short:
- \p{No}) (464)
- \p{Other_Punctuation} \p{General_Category=Other_Punctuation}
- (Short: \p{Po}) (434)
- \p{Other_Symbol} \p{General_Category=Other_Symbol} (Short:
- \p{So}) (4404)
- \p{P} \p{Punct} (= \p{General_Category=
- Punctuation}) (NOT
- \p{General_Punctuation}) (632)
- \p{Paragraph_Separator} \p{General_Category=Paragraph_Separator}
- (Short: \p{Zp}) (1)
- \p{Pat_Syn} \p{Pattern_Syntax} (= \p{Pattern_Syntax=
- Y}) (2760)
- \p{Pat_Syn: *} \p{Pattern_Syntax: *}
- \p{Pat_WS} \p{Pattern_White_Space} (=
- \p{Pattern_White_Space=Y}) (11)
- \p{Pat_WS: *} \p{Pattern_White_Space: *}
- \p{Pattern_Syntax} \p{Pattern_Syntax=Y} (Short: \p{PatSyn})
- (2760)
- \p{Pattern_Syntax: N*} (Short: \p{PatSyn=N}, \P{PatSyn})
- (1_111_352)
- \p{Pattern_Syntax: Y*} (Short: \p{PatSyn=Y}, \p{PatSyn}) (2760)
- \p{Pattern_White_Space} \p{Pattern_White_Space=Y} (Short:
- \p{PatWS}) (11)
- \p{Pattern_White_Space: N*} (Short: \p{PatWS=N}, \P{PatWS})
- (1_114_101)
- \p{Pattern_White_Space: Y*} (Short: \p{PatWS=Y}, \p{PatWS}) (11)
- \p{Pc} \p{Connector_Punctuation} (=
- \p{General_Category=
- Connector_Punctuation}) (10)
- \p{Pd} \p{Dash_Punctuation} (=
- \p{General_Category=Dash_Punctuation})
- (23)
- \p{Pe} \p{Close_Punctuation} (=
- \p{General_Category=Close_Punctuation})
- (71)
- \p{PerlSpace} \s, restricted to ASCII = [ \f\n\r\t] plus
- vertical tab (6)
- \p{PerlWord} \w, restricted to ASCII = [A-Za-z0-9_] (63)
- \p{Pf} \p{Final_Punctuation} (=
- \p{General_Category=Final_Punctuation})
- (10)
- \p{Phag} \p{Phags_Pa} (= \p{Script=Phags_Pa}) (NOT
- \p{Block=Phags_Pa}) (56)
- \p{Phags_Pa} \p{Script=Phags_Pa} (Short: \p{Phag}; NOT
- \p{Block=Phags_Pa}) (56)
- X \p{Phaistos} \p{Phaistos_Disc} (= \p{Block=
- Phaistos_Disc}) (48)
- X \p{Phaistos_Disc} \p{Block=Phaistos_Disc} (Short:
- \p{InPhaistos}) (48)
- \p{Phli} \p{Inscriptional_Pahlavi} (= \p{Script=
- Inscriptional_Pahlavi}) (NOT \p{Block=
- Inscriptional_Pahlavi}) (27)
- \p{Phnx} \p{Phoenician} (= \p{Script=Phoenician})
- (NOT \p{Block=Phoenician}) (29)
- \p{Phoenician} \p{Script=Phoenician} (Short: \p{Phnx};
- NOT \p{Block=Phoenician}) (29)
- X \p{Phonetic_Ext} \p{Phonetic_Extensions} (= \p{Block=
- Phonetic_Extensions}) (128)
- X \p{Phonetic_Ext_Sup} \p{Phonetic_Extensions_Supplement} (=
- \p{Block=
- Phonetic_Extensions_Supplement}) (64)
- X \p{Phonetic_Extensions} \p{Block=Phonetic_Extensions} (Short:
- \p{InPhoneticExt}) (128)
- X \p{Phonetic_Extensions_Supplement} \p{Block=
- Phonetic_Extensions_Supplement} (Short:
- \p{InPhoneticExtSup}) (64)
- \p{Pi} \p{Initial_Punctuation} (=
- \p{General_Category=
- Initial_Punctuation}) (12)
- X \p{Playing_Cards} \p{Block=Playing_Cards} (96)
- \p{Plrd} \p{Miao} (= \p{Script=Miao}) (NOT
- \p{Block=Miao}) (133)
- \p{Po} \p{Other_Punctuation} (=
- \p{General_Category=Other_Punctuation})
- (434)
- \p{PosixAlnum} [A-Za-z0-9] (62)
- \p{PosixAlpha} [A-Za-z] (52)
- \p{PosixBlank} \t and ' ' (2)
- \p{PosixCntrl} ASCII control characters: NUL, SOH, STX,
- ETX, EOT, ENQ, ACK, BEL, BS, HT, LF, VT,
- FF, CR, SO, SI, DLE, DC1, DC2, DC3, DC4,
- NAK, SYN, ETB, CAN, EOM, SUB, ESC, FS,
- GS, RS, US, and DEL (33)
- \p{PosixDigit} [0-9] (10)
- \p{PosixGraph} [-!"#$%&'()*+,./:;<>?@[\\]^_`{|}~0-9A-Za-
- z] (94)
- \p{PosixLower} [a-z] (/i= PosixAlpha) (26)
- \p{PosixPrint} [- 0-9A-Za-
- z!"#$%&'()*+,./:;<>?@[\\]^_`{|}~] (95)
- \p{PosixPunct} [-!"#$%&'()*+,./:;<>?@[\\]^_`{|}~] (32)
- \p{PosixSpace} \t, \n, \cK, \f, \r, and ' '. (\cK is
- vertical tab) (6)
- \p{PosixUpper} [A-Z] (/i= PosixAlpha) (26)
- \p{PosixWord} \p{PerlWord} (63)
- \p{PosixXDigit} \p{ASCII_Hex_Digit=Y} [0-9A-Fa-f] (Short:
- \p{AHex}) (22)
- T \p{Present_In: 1.1} \p{Age=V1_1} (Short: \p{In=1.1}) (Perl
- extension) (33_979)
- T \p{Present_In: 2.0} Code point's usage introduced in version
- 2.0 or earlier (Short: \p{In=2.0}) (Perl
- extension) (178_500)
- T \p{Present_In: 2.1} Code point's usage introduced in version
- 2.1 or earlier (Short: \p{In=2.1}) (Perl
- extension) (178_502)
- T \p{Present_In: 3.0} Code point's usage introduced in version
- 3.0 or earlier (Short: \p{In=3.0}) (Perl
- extension) (188_809)
- T \p{Present_In: 3.1} Code point's usage introduced in version
- 3.1 or earlier (Short: \p{In=3.1}) (Perl
- extension) (233_787)
- T \p{Present_In: 3.2} Code point's usage introduced in version
- 3.2 or earlier (Short: \p{In=3.2}) (Perl
- extension) (234_803)
- T \p{Present_In: 4.0} Code point's usage introduced in version
- 4.0 or earlier (Short: \p{In=4.0}) (Perl
- extension) (236_029)
- T \p{Present_In: 4.1} Code point's usage introduced in version
- 4.1 or earlier (Short: \p{In=4.1}) (Perl
- extension) (237_302)
- T \p{Present_In: 5.0} Code point's usage introduced in version
- 5.0 or earlier (Short: \p{In=5.0}) (Perl
- extension) (238_671)
- T \p{Present_In: 5.1} Code point's usage introduced in version
- 5.1 or earlier (Short: \p{In=5.1}) (Perl
- extension) (240_295)
- T \p{Present_In: 5.2} Code point's usage introduced in version
- 5.2 or earlier (Short: \p{In=5.2}) (Perl
- extension) (246_943)
- T \p{Present_In: 6.0} Code point's usage introduced in version
- 6.0 or earlier (Short: \p{In=6.0}) (Perl
- extension) (249_031)
- T \p{Present_In: 6.1} Code point's usage introduced in version
- 6.1 or earlier (Short: \p{In=6.1}) (Perl
- extension) (249_763)
- T \p{Present_In: 6.2} Code point's usage introduced in version
- 6.2 or earlier (Short: \p{In=6.2}) (Perl
- extension) (249_764)
- \p{Present_In: Unassigned} \p{Age=Unassigned} (Short: \p{In=
- Unassigned}) (Perl extension) (864_348)
- \p{Print} Characters that are graphical plus space
- characters (but no controls) (247_583)
- \p{Private_Use} \p{General_Category=Private_Use} (Short:
- \p{Co}; NOT \p{Private_Use_Area})
- (137_468)
- X \p{Private_Use_Area} \p{Block=Private_Use_Area} (Short:
- \p{InPUA}) (6400)
- \p{Prti} \p{Inscriptional_Parthian} (= \p{Script=
- Inscriptional_Parthian}) (NOT \p{Block=
- Inscriptional_Parthian}) (30)
- \p{Ps} \p{Open_Punctuation} (=
- \p{General_Category=Open_Punctuation})
- (72)
- X \p{PUA} \p{Private_Use_Area} (= \p{Block=
- Private_Use_Area}) (6400)
- \p{Punct} \p{General_Category=Punctuation} (Short:
- \p{P}; NOT \p{General_Punctuation}) (632)
- \p{Punctuation} \p{Punct} (= \p{General_Category=
- Punctuation}) (NOT
- \p{General_Punctuation}) (632)
- \p{Qaac} \p{Coptic} (= \p{Script=Coptic}) (NOT
- \p{Block=Coptic}) (137)
- \p{Qaai} \p{Inherited} (= \p{Script=Inherited})
- (523)
- \p{QMark} \p{Quotation_Mark} (= \p{Quotation_Mark=
- Y}) (29)
- \p{QMark: *} \p{Quotation_Mark: *}
- \p{Quotation_Mark} \p{Quotation_Mark=Y} (Short: \p{QMark})
- (29)
- \p{Quotation_Mark: N*} (Short: \p{QMark=N}, \P{QMark}) (1_114_083)
- \p{Quotation_Mark: Y*} (Short: \p{QMark=Y}, \p{QMark}) (29)
- \p{Radical} \p{Radical=Y} (329)
- \p{Radical: N*} (Single: \P{Radical}) (1_113_783)
- \p{Radical: Y*} (Single: \p{Radical}) (329)
- \p{Rejang} \p{Script=Rejang} (Short: \p{Rjng}; NOT
- \p{Block=Rejang}) (37)
- \p{Rjng} \p{Rejang} (= \p{Script=Rejang}) (NOT
- \p{Block=Rejang}) (37)
- X \p{Rumi} \p{Rumi_Numeral_Symbols} (= \p{Block=
- Rumi_Numeral_Symbols}) (32)
- X \p{Rumi_Numeral_Symbols} \p{Block=Rumi_Numeral_Symbols} (Short:
- \p{InRumi}) (32)
- \p{Runic} \p{Script=Runic} (Short: \p{Runr}; NOT
- \p{Block=Runic}) (78)
- \p{Runr} \p{Runic} (= \p{Script=Runic}) (NOT
- \p{Block=Runic}) (78)
- \p{S} \p{Symbol} (= \p{General_Category=Symbol})
- (5520)
- \p{Samaritan} \p{Script=Samaritan} (Short: \p{Samr}; NOT
- \p{Block=Samaritan}) (61)
- \p{Samr} \p{Samaritan} (= \p{Script=Samaritan})
- (NOT \p{Block=Samaritan}) (61)
- \p{Sarb} \p{Old_South_Arabian} (= \p{Script=
- Old_South_Arabian}) (32)
- \p{Saur} \p{Saurashtra} (= \p{Script=Saurashtra})
- (NOT \p{Block=Saurashtra}) (81)
- \p{Saurashtra} \p{Script=Saurashtra} (Short: \p{Saur};
- NOT \p{Block=Saurashtra}) (81)
- \p{SB: *} \p{Sentence_Break: *}
- \p{Sc} \p{Currency_Symbol} (=
- \p{General_Category=Currency_Symbol})
- (49)
- \p{Sc: *} \p{Script: *}
- \p{Script: Arab} \p{Script=Arabic} (1235)
- \p{Script: Arabic} (Short: \p{Sc=Arab}, \p{Arab}) (1235)
- \p{Script: Armenian} (Short: \p{Sc=Armn}, \p{Armn}) (91)
- \p{Script: Armi} \p{Script=Imperial_Aramaic} (31)
- \p{Script: Armn} \p{Script=Armenian} (91)
- \p{Script: Avestan} (Short: \p{Sc=Avst}, \p{Avst}) (61)
- \p{Script: Avst} \p{Script=Avestan} (61)
- \p{Script: Bali} \p{Script=Balinese} (121)
- \p{Script: Balinese} (Short: \p{Sc=Bali}, \p{Bali}) (121)
- \p{Script: Bamu} \p{Script=Bamum} (657)
- \p{Script: Bamum} (Short: \p{Sc=Bamu}, \p{Bamu}) (657)
- \p{Script: Batak} (Short: \p{Sc=Batk}, \p{Batk}) (56)
- \p{Script: Batk} \p{Script=Batak} (56)
- \p{Script: Beng} \p{Script=Bengali} (92)
- \p{Script: Bengali} (Short: \p{Sc=Beng}, \p{Beng}) (92)
- \p{Script: Bopo} \p{Script=Bopomofo} (70)
- \p{Script: Bopomofo} (Short: \p{Sc=Bopo}, \p{Bopo}) (70)
- \p{Script: Brah} \p{Script=Brahmi} (108)
- \p{Script: Brahmi} (Short: \p{Sc=Brah}, \p{Brah}) (108)
- \p{Script: Brai} \p{Script=Braille} (256)
- \p{Script: Braille} (Short: \p{Sc=Brai}, \p{Brai}) (256)
- \p{Script: Bugi} \p{Script=Buginese} (30)
- \p{Script: Buginese} (Short: \p{Sc=Bugi}, \p{Bugi}) (30)
- \p{Script: Buhd} \p{Script=Buhid} (20)
- \p{Script: Buhid} (Short: \p{Sc=Buhd}, \p{Buhd}) (20)
- \p{Script: Cakm} \p{Script=Chakma} (67)
- \p{Script: Canadian_Aboriginal} (Short: \p{Sc=Cans}, \p{Cans})
- (710)
- \p{Script: Cans} \p{Script=Canadian_Aboriginal} (710)
- \p{Script: Cari} \p{Script=Carian} (49)
- \p{Script: Carian} (Short: \p{Sc=Cari}, \p{Cari}) (49)
- \p{Script: Chakma} (Short: \p{Sc=Cakm}, \p{Cakm}) (67)
- \p{Script: Cham} (Short: \p{Sc=Cham}, \p{Cham}) (83)
- \p{Script: Cher} \p{Script=Cherokee} (85)
- \p{Script: Cherokee} (Short: \p{Sc=Cher}, \p{Cher}) (85)
- \p{Script: Common} (Short: \p{Sc=Zyyy}, \p{Zyyy}) (6413)
- \p{Script: Copt} \p{Script=Coptic} (137)
- \p{Script: Coptic} (Short: \p{Sc=Copt}, \p{Copt}) (137)
- \p{Script: Cprt} \p{Script=Cypriot} (55)
- \p{Script: Cuneiform} (Short: \p{Sc=Xsux}, \p{Xsux}) (982)
- \p{Script: Cypriot} (Short: \p{Sc=Cprt}, \p{Cprt}) (55)
- \p{Script: Cyrillic} (Short: \p{Sc=Cyrl}, \p{Cyrl}) (417)
- \p{Script: Cyrl} \p{Script=Cyrillic} (417)
- \p{Script: Deseret} (Short: \p{Sc=Dsrt}, \p{Dsrt}) (80)
- \p{Script: Deva} \p{Script=Devanagari} (151)
- \p{Script: Devanagari} (Short: \p{Sc=Deva}, \p{Deva}) (151)
- \p{Script: Dsrt} \p{Script=Deseret} (80)
- \p{Script: Egyp} \p{Script=Egyptian_Hieroglyphs} (1071)
- \p{Script: Egyptian_Hieroglyphs} (Short: \p{Sc=Egyp}, \p{Egyp})
- (1071)
- \p{Script: Ethi} \p{Script=Ethiopic} (495)
- \p{Script: Ethiopic} (Short: \p{Sc=Ethi}, \p{Ethi}) (495)
- \p{Script: Geor} \p{Script=Georgian} (127)
- \p{Script: Georgian} (Short: \p{Sc=Geor}, \p{Geor}) (127)
- \p{Script: Glag} \p{Script=Glagolitic} (94)
- \p{Script: Glagolitic} (Short: \p{Sc=Glag}, \p{Glag}) (94)
- \p{Script: Goth} \p{Script=Gothic} (27)
- \p{Script: Gothic} (Short: \p{Sc=Goth}, \p{Goth}) (27)
- \p{Script: Greek} (Short: \p{Sc=Grek}, \p{Grek}) (511)
- \p{Script: Grek} \p{Script=Greek} (511)
- \p{Script: Gujarati} (Short: \p{Sc=Gujr}, \p{Gujr}) (84)
- \p{Script: Gujr} \p{Script=Gujarati} (84)
- \p{Script: Gurmukhi} (Short: \p{Sc=Guru}, \p{Guru}) (79)
- \p{Script: Guru} \p{Script=Gurmukhi} (79)
- \p{Script: Han} (Short: \p{Sc=Han}, \p{Han}) (75_963)
- \p{Script: Hang} \p{Script=Hangul} (11_739)
- \p{Script: Hangul} (Short: \p{Sc=Hang}, \p{Hang}) (11_739)
- \p{Script: Hani} \p{Script=Han} (75_963)
- \p{Script: Hano} \p{Script=Hanunoo} (21)
- \p{Script: Hanunoo} (Short: \p{Sc=Hano}, \p{Hano}) (21)
- \p{Script: Hebr} \p{Script=Hebrew} (133)
- \p{Script: Hebrew} (Short: \p{Sc=Hebr}, \p{Hebr}) (133)
- \p{Script: Hira} \p{Script=Hiragana} (91)
- \p{Script: Hiragana} (Short: \p{Sc=Hira}, \p{Hira}) (91)
- \p{Script: Imperial_Aramaic} (Short: \p{Sc=Armi}, \p{Armi}) (31)
- \p{Script: Inherited} (Short: \p{Sc=Zinh}, \p{Zinh}) (523)
- \p{Script: Inscriptional_Pahlavi} (Short: \p{Sc=Phli}, \p{Phli})
- (27)
- \p{Script: Inscriptional_Parthian} (Short: \p{Sc=Prti}, \p{Prti})
- (30)
- \p{Script: Ital} \p{Script=Old_Italic} (35)
- \p{Script: Java} \p{Script=Javanese} (91)
- \p{Script: Javanese} (Short: \p{Sc=Java}, \p{Java}) (91)
- \p{Script: Kaithi} (Short: \p{Sc=Kthi}, \p{Kthi}) (66)
- \p{Script: Kali} \p{Script=Kayah_Li} (48)
- \p{Script: Kana} \p{Script=Katakana} (300)
- \p{Script: Kannada} (Short: \p{Sc=Knda}, \p{Knda}) (86)
- \p{Script: Katakana} (Short: \p{Sc=Kana}, \p{Kana}) (300)
- \p{Script: Kayah_Li} (Short: \p{Sc=Kali}, \p{Kali}) (48)
- \p{Script: Khar} \p{Script=Kharoshthi} (65)
- \p{Script: Kharoshthi} (Short: \p{Sc=Khar}, \p{Khar}) (65)
- \p{Script: Khmer} (Short: \p{Sc=Khmr}, \p{Khmr}) (146)
- \p{Script: Khmr} \p{Script=Khmer} (146)
- \p{Script: Knda} \p{Script=Kannada} (86)
- \p{Script: Kthi} \p{Script=Kaithi} (66)
- \p{Script: Lana} \p{Script=Tai_Tham} (127)
- \p{Script: Lao} (Short: \p{Sc=Lao}, \p{Lao}) (67)
- \p{Script: Laoo} \p{Script=Lao} (67)
- \p{Script: Latin} (Short: \p{Sc=Latn}, \p{Latn}) (1272)
- \p{Script: Latn} \p{Script=Latin} (1272)
- \p{Script: Lepc} \p{Script=Lepcha} (74)
- \p{Script: Lepcha} (Short: \p{Sc=Lepc}, \p{Lepc}) (74)
- \p{Script: Limb} \p{Script=Limbu} (66)
- \p{Script: Limbu} (Short: \p{Sc=Limb}, \p{Limb}) (66)
- \p{Script: Linb} \p{Script=Linear_B} (211)
- \p{Script: Linear_B} (Short: \p{Sc=Linb}, \p{Linb}) (211)
- \p{Script: Lisu} (Short: \p{Sc=Lisu}, \p{Lisu}) (48)
- \p{Script: Lyci} \p{Script=Lycian} (29)
- \p{Script: Lycian} (Short: \p{Sc=Lyci}, \p{Lyci}) (29)
- \p{Script: Lydi} \p{Script=Lydian} (27)
- \p{Script: Lydian} (Short: \p{Sc=Lydi}, \p{Lydi}) (27)
- \p{Script: Malayalam} (Short: \p{Sc=Mlym}, \p{Mlym}) (98)
- \p{Script: Mand} \p{Script=Mandaic} (29)
- \p{Script: Mandaic} (Short: \p{Sc=Mand}, \p{Mand}) (29)
- \p{Script: Meetei_Mayek} (Short: \p{Sc=Mtei}, \p{Mtei}) (79)
- \p{Script: Merc} \p{Script=Meroitic_Cursive} (26)
- \p{Script: Mero} \p{Script=Meroitic_Hieroglyphs} (32)
- \p{Script: Meroitic_Cursive} (Short: \p{Sc=Merc}, \p{Merc}) (26)
- \p{Script: Meroitic_Hieroglyphs} (Short: \p{Sc=Mero}, \p{Mero})
- (32)
- \p{Script: Miao} (Short: \p{Sc=Miao}, \p{Miao}) (133)
- \p{Script: Mlym} \p{Script=Malayalam} (98)
- \p{Script: Mong} \p{Script=Mongolian} (153)
- \p{Script: Mongolian} (Short: \p{Sc=Mong}, \p{Mong}) (153)
- \p{Script: Mtei} \p{Script=Meetei_Mayek} (79)
- \p{Script: Myanmar} (Short: \p{Sc=Mymr}, \p{Mymr}) (188)
- \p{Script: Mymr} \p{Script=Myanmar} (188)
- \p{Script: New_Tai_Lue} (Short: \p{Sc=Talu}, \p{Talu}) (83)
- \p{Script: Nko} (Short: \p{Sc=Nko}, \p{Nko}) (59)
- \p{Script: Nkoo} \p{Script=Nko} (59)
- \p{Script: Ogam} \p{Script=Ogham} (29)
- \p{Script: Ogham} (Short: \p{Sc=Ogam}, \p{Ogam}) (29)
- \p{Script: Ol_Chiki} (Short: \p{Sc=Olck}, \p{Olck}) (48)
- \p{Script: Olck} \p{Script=Ol_Chiki} (48)
- \p{Script: Old_Italic} (Short: \p{Sc=Ital}, \p{Ital}) (35)
- \p{Script: Old_Persian} (Short: \p{Sc=Xpeo}, \p{Xpeo}) (50)
- \p{Script: Old_South_Arabian} (Short: \p{Sc=Sarb}, \p{Sarb}) (32)
- \p{Script: Old_Turkic} (Short: \p{Sc=Orkh}, \p{Orkh}) (73)
- \p{Script: Oriya} (Short: \p{Sc=Orya}, \p{Orya}) (90)
- \p{Script: Orkh} \p{Script=Old_Turkic} (73)
- \p{Script: Orya} \p{Script=Oriya} (90)
- \p{Script: Osma} \p{Script=Osmanya} (40)
- \p{Script: Osmanya} (Short: \p{Sc=Osma}, \p{Osma}) (40)
- \p{Script: Phag} \p{Script=Phags_Pa} (56)
- \p{Script: Phags_Pa} (Short: \p{Sc=Phag}, \p{Phag}) (56)
- \p{Script: Phli} \p{Script=Inscriptional_Pahlavi} (27)
- \p{Script: Phnx} \p{Script=Phoenician} (29)
- \p{Script: Phoenician} (Short: \p{Sc=Phnx}, \p{Phnx}) (29)
- \p{Script: Plrd} \p{Script=Miao} (133)
- \p{Script: Prti} \p{Script=Inscriptional_Parthian} (30)
- \p{Script: Qaac} \p{Script=Coptic} (137)
- \p{Script: Qaai} \p{Script=Inherited} (523)
- \p{Script: Rejang} (Short: \p{Sc=Rjng}, \p{Rjng}) (37)
- \p{Script: Rjng} \p{Script=Rejang} (37)
- \p{Script: Runic} (Short: \p{Sc=Runr}, \p{Runr}) (78)
- \p{Script: Runr} \p{Script=Runic} (78)
- \p{Script: Samaritan} (Short: \p{Sc=Samr}, \p{Samr}) (61)
- \p{Script: Samr} \p{Script=Samaritan} (61)
- \p{Script: Sarb} \p{Script=Old_South_Arabian} (32)
- \p{Script: Saur} \p{Script=Saurashtra} (81)
- \p{Script: Saurashtra} (Short: \p{Sc=Saur}, \p{Saur}) (81)
- \p{Script: Sharada} (Short: \p{Sc=Shrd}, \p{Shrd}) (83)
- \p{Script: Shavian} (Short: \p{Sc=Shaw}, \p{Shaw}) (48)
- \p{Script: Shaw} \p{Script=Shavian} (48)
- \p{Script: Shrd} \p{Script=Sharada} (83)
- \p{Script: Sinh} \p{Script=Sinhala} (80)
- \p{Script: Sinhala} (Short: \p{Sc=Sinh}, \p{Sinh}) (80)
- \p{Script: Sora} \p{Script=Sora_Sompeng} (35)
- \p{Script: Sora_Sompeng} (Short: \p{Sc=Sora}, \p{Sora}) (35)
- \p{Script: Sund} \p{Script=Sundanese} (72)
- \p{Script: Sundanese} (Short: \p{Sc=Sund}, \p{Sund}) (72)
- \p{Script: Sylo} \p{Script=Syloti_Nagri} (44)
- \p{Script: Syloti_Nagri} (Short: \p{Sc=Sylo}, \p{Sylo}) (44)
- \p{Script: Syrc} \p{Script=Syriac} (77)
- \p{Script: Syriac} (Short: \p{Sc=Syrc}, \p{Syrc}) (77)
- \p{Script: Tagalog} (Short: \p{Sc=Tglg}, \p{Tglg}) (20)
- \p{Script: Tagb} \p{Script=Tagbanwa} (18)
- \p{Script: Tagbanwa} (Short: \p{Sc=Tagb}, \p{Tagb}) (18)
- \p{Script: Tai_Le} (Short: \p{Sc=Tale}, \p{Tale}) (35)
- \p{Script: Tai_Tham} (Short: \p{Sc=Lana}, \p{Lana}) (127)
- \p{Script: Tai_Viet} (Short: \p{Sc=Tavt}, \p{Tavt}) (72)
- \p{Script: Takr} \p{Script=Takri} (66)
- \p{Script: Takri} (Short: \p{Sc=Takr}, \p{Takr}) (66)
- \p{Script: Tale} \p{Script=Tai_Le} (35)
- \p{Script: Talu} \p{Script=New_Tai_Lue} (83)
- \p{Script: Tamil} (Short: \p{Sc=Taml}, \p{Taml}) (72)
- \p{Script: Taml} \p{Script=Tamil} (72)
- \p{Script: Tavt} \p{Script=Tai_Viet} (72)
- \p{Script: Telu} \p{Script=Telugu} (93)
- \p{Script: Telugu} (Short: \p{Sc=Telu}, \p{Telu}) (93)
- \p{Script: Tfng} \p{Script=Tifinagh} (59)
- \p{Script: Tglg} \p{Script=Tagalog} (20)
- \p{Script: Thaa} \p{Script=Thaana} (50)
- \p{Script: Thaana} (Short: \p{Sc=Thaa}, \p{Thaa}) (50)
- \p{Script: Thai} (Short: \p{Sc=Thai}, \p{Thai}) (86)
- \p{Script: Tibetan} (Short: \p{Sc=Tibt}, \p{Tibt}) (207)
- \p{Script: Tibt} \p{Script=Tibetan} (207)
- \p{Script: Tifinagh} (Short: \p{Sc=Tfng}, \p{Tfng}) (59)
- \p{Script: Ugar} \p{Script=Ugaritic} (31)
- \p{Script: Ugaritic} (Short: \p{Sc=Ugar}, \p{Ugar}) (31)
- \p{Script: Unknown} (Short: \p{Sc=Zzzz}, \p{Zzzz}) (1_003_930)
- \p{Script: Vai} (Short: \p{Sc=Vai}, \p{Vai}) (300)
- \p{Script: Vaii} \p{Script=Vai} (300)
- \p{Script: Xpeo} \p{Script=Old_Persian} (50)
- \p{Script: Xsux} \p{Script=Cuneiform} (982)
- \p{Script: Yi} (Short: \p{Sc=Yi}, \p{Yi}) (1220)
- \p{Script: Yiii} \p{Script=Yi} (1220)
- \p{Script: Zinh} \p{Script=Inherited} (523)
- \p{Script: Zyyy} \p{Script=Common} (6413)
- \p{Script: Zzzz} \p{Script=Unknown} (1_003_930)
- \p{Script_Extensions: Arab} \p{Script_Extensions=Arabic} (1262)
- \p{Script_Extensions: Arabic} (Short: \p{Scx=Arab}) (1262)
- \p{Script_Extensions: Armenian} (Short: \p{Scx=Armn}) (92)
- \p{Script_Extensions: Armi} \p{Script_Extensions=Imperial_Aramaic}
- (31)
- \p{Script_Extensions: Armn} \p{Script_Extensions=Armenian} (92)
- \p{Script_Extensions: Avestan} (Short: \p{Scx=Avst}) (61)
- \p{Script_Extensions: Avst} \p{Script_Extensions=Avestan} (61)
- \p{Script_Extensions: Bali} \p{Script_Extensions=Balinese} (121)
- \p{Script_Extensions: Balinese} (Short: \p{Scx=Bali}) (121)
- \p{Script_Extensions: Bamu} \p{Script_Extensions=Bamum} (657)
- \p{Script_Extensions: Bamum} (Short: \p{Scx=Bamu}) (657)
- \p{Script_Extensions: Batak} (Short: \p{Scx=Batk}) (56)
- \p{Script_Extensions: Batk} \p{Script_Extensions=Batak} (56)
- \p{Script_Extensions: Beng} \p{Script_Extensions=Bengali} (94)
- \p{Script_Extensions: Bengali} (Short: \p{Scx=Beng}) (94)
- \p{Script_Extensions: Bopo} \p{Script_Extensions=Bopomofo} (306)
- \p{Script_Extensions: Bopomofo} (Short: \p{Scx=Bopo}) (306)
- \p{Script_Extensions: Brah} \p{Script_Extensions=Brahmi} (108)
- \p{Script_Extensions: Brahmi} (Short: \p{Scx=Brah}) (108)
- \p{Script_Extensions: Brai} \p{Script_Extensions=Braille} (256)
- \p{Script_Extensions: Braille} (Short: \p{Scx=Brai}) (256)
- \p{Script_Extensions: Bugi} \p{Script_Extensions=Buginese} (30)
- \p{Script_Extensions: Buginese} (Short: \p{Scx=Bugi}) (30)
- \p{Script_Extensions: Buhd} \p{Script_Extensions=Buhid} (22)
- \p{Script_Extensions: Buhid} (Short: \p{Scx=Buhd}) (22)
- \p{Script_Extensions: Cakm} \p{Script_Extensions=Chakma} (67)
- \p{Script_Extensions: Canadian_Aboriginal} (Short: \p{Scx=Cans})
- (710)
- \p{Script_Extensions: Cans} \p{Script_Extensions=
- Canadian_Aboriginal} (710)
- \p{Script_Extensions: Cari} \p{Script_Extensions=Carian} (49)
- \p{Script_Extensions: Carian} (Short: \p{Scx=Cari}) (49)
- \p{Script_Extensions: Chakma} (Short: \p{Scx=Cakm}) (67)
- \p{Script_Extensions: Cham} (Short: \p{Scx=Cham}) (83)
- \p{Script_Extensions: Cher} \p{Script_Extensions=Cherokee} (85)
- \p{Script_Extensions: Cherokee} (Short: \p{Scx=Cher}) (85)
- \p{Script_Extensions: Common} (Short: \p{Scx=Zyyy}) (6057)
- \p{Script_Extensions: Copt} \p{Script_Extensions=Coptic} (137)
- \p{Script_Extensions: Coptic} (Short: \p{Scx=Copt}) (137)
- \p{Script_Extensions: Cprt} \p{Script_Extensions=Cypriot} (112)
- \p{Script_Extensions: Cuneiform} (Short: \p{Scx=Xsux}) (982)
- \p{Script_Extensions: Cypriot} (Short: \p{Scx=Cprt}) (112)
- \p{Script_Extensions: Cyrillic} (Short: \p{Scx=Cyrl}) (419)
- \p{Script_Extensions: Cyrl} \p{Script_Extensions=Cyrillic} (419)
- \p{Script_Extensions: Deseret} (Short: \p{Scx=Dsrt}) (80)
- \p{Script_Extensions: Deva} \p{Script_Extensions=Devanagari} (193)
- \p{Script_Extensions: Devanagari} (Short: \p{Scx=Deva}) (193)
- \p{Script_Extensions: Dsrt} \p{Script_Extensions=Deseret} (80)
- \p{Script_Extensions: Egyp} \p{Script_Extensions=
- Egyptian_Hieroglyphs} (1071)
- \p{Script_Extensions: Egyptian_Hieroglyphs} (Short: \p{Scx=Egyp})
- (1071)
- \p{Script_Extensions: Ethi} \p{Script_Extensions=Ethiopic} (495)
- \p{Script_Extensions: Ethiopic} (Short: \p{Scx=Ethi}) (495)
- \p{Script_Extensions: Geor} \p{Script_Extensions=Georgian} (128)
- \p{Script_Extensions: Georgian} (Short: \p{Scx=Geor}) (128)
- \p{Script_Extensions: Glag} \p{Script_Extensions=Glagolitic} (94)
- \p{Script_Extensions: Glagolitic} (Short: \p{Scx=Glag}) (94)
- \p{Script_Extensions: Goth} \p{Script_Extensions=Gothic} (27)
- \p{Script_Extensions: Gothic} (Short: \p{Scx=Goth}) (27)
- \p{Script_Extensions: Greek} (Short: \p{Scx=Grek}) (515)
- \p{Script_Extensions: Grek} \p{Script_Extensions=Greek} (515)
- \p{Script_Extensions: Gujarati} (Short: \p{Scx=Gujr}) (94)
- \p{Script_Extensions: Gujr} \p{Script_Extensions=Gujarati} (94)
- \p{Script_Extensions: Gurmukhi} (Short: \p{Scx=Guru}) (91)
- \p{Script_Extensions: Guru} \p{Script_Extensions=Gurmukhi} (91)
- \p{Script_Extensions: Han} (Short: \p{Scx=Han}) (76_218)
- \p{Script_Extensions: Hang} \p{Script_Extensions=Hangul} (11_971)
- \p{Script_Extensions: Hangul} (Short: \p{Scx=Hang}) (11_971)
- \p{Script_Extensions: Hani} \p{Script_Extensions=Han} (76_218)
- \p{Script_Extensions: Hano} \p{Script_Extensions=Hanunoo} (23)
- \p{Script_Extensions: Hanunoo} (Short: \p{Scx=Hano}) (23)
- \p{Script_Extensions: Hebr} \p{Script_Extensions=Hebrew} (133)
- \p{Script_Extensions: Hebrew} (Short: \p{Scx=Hebr}) (133)
- \p{Script_Extensions: Hira} \p{Script_Extensions=Hiragana} (356)
- \p{Script_Extensions: Hiragana} (Short: \p{Scx=Hira}) (356)
- \p{Script_Extensions: Imperial_Aramaic} (Short: \p{Scx=Armi}) (31)
- \p{Script_Extensions: Inherited} (Short: \p{Scx=Zinh}) (459)
- \p{Script_Extensions: Inscriptional_Pahlavi} (Short: \p{Scx=Phli})
- (27)
- \p{Script_Extensions: Inscriptional_Parthian} (Short: \p{Scx=
- Prti}) (30)
- \p{Script_Extensions: Ital} \p{Script_Extensions=Old_Italic} (35)
- \p{Script_Extensions: Java} \p{Script_Extensions=Javanese} (91)
- \p{Script_Extensions: Javanese} (Short: \p{Scx=Java}) (91)
- \p{Script_Extensions: Kaithi} (Short: \p{Scx=Kthi}) (76)
- \p{Script_Extensions: Kali} \p{Script_Extensions=Kayah_Li} (48)
- \p{Script_Extensions: Kana} \p{Script_Extensions=Katakana} (565)
- \p{Script_Extensions: Kannada} (Short: \p{Scx=Knda}) (86)
- \p{Script_Extensions: Katakana} (Short: \p{Scx=Kana}) (565)
- \p{Script_Extensions: Kayah_Li} (Short: \p{Scx=Kali}) (48)
- \p{Script_Extensions: Khar} \p{Script_Extensions=Kharoshthi} (65)
- \p{Script_Extensions: Kharoshthi} (Short: \p{Scx=Khar}) (65)
- \p{Script_Extensions: Khmer} (Short: \p{Scx=Khmr}) (146)
- \p{Script_Extensions: Khmr} \p{Script_Extensions=Khmer} (146)
- \p{Script_Extensions: Knda} \p{Script_Extensions=Kannada} (86)
- \p{Script_Extensions: Kthi} \p{Script_Extensions=Kaithi} (76)
- \p{Script_Extensions: Lana} \p{Script_Extensions=Tai_Tham} (127)
- \p{Script_Extensions: Lao} (Short: \p{Scx=Lao}) (67)
- \p{Script_Extensions: Laoo} \p{Script_Extensions=Lao} (67)
- \p{Script_Extensions: Latin} (Short: \p{Scx=Latn}) (1289)
- \p{Script_Extensions: Latn} \p{Script_Extensions=Latin} (1289)
- \p{Script_Extensions: Lepc} \p{Script_Extensions=Lepcha} (74)
- \p{Script_Extensions: Lepcha} (Short: \p{Scx=Lepc}) (74)
- \p{Script_Extensions: Limb} \p{Script_Extensions=Limbu} (66)
- \p{Script_Extensions: Limbu} (Short: \p{Scx=Limb}) (66)
- \p{Script_Extensions: Linb} \p{Script_Extensions=Linear_B} (268)
- \p{Script_Extensions: Linear_B} (Short: \p{Scx=Linb}) (268)
- \p{Script_Extensions: Lisu} (Short: \p{Scx=Lisu}) (48)
- \p{Script_Extensions: Lyci} \p{Script_Extensions=Lycian} (29)
- \p{Script_Extensions: Lycian} (Short: \p{Scx=Lyci}) (29)
- \p{Script_Extensions: Lydi} \p{Script_Extensions=Lydian} (27)
- \p{Script_Extensions: Lydian} (Short: \p{Scx=Lydi}) (27)
- \p{Script_Extensions: Malayalam} (Short: \p{Scx=Mlym}) (98)
- \p{Script_Extensions: Mand} \p{Script_Extensions=Mandaic} (30)
- \p{Script_Extensions: Mandaic} (Short: \p{Scx=Mand}) (30)
- \p{Script_Extensions: Meetei_Mayek} (Short: \p{Scx=Mtei}) (79)
- \p{Script_Extensions: Merc} \p{Script_Extensions=Meroitic_Cursive}
- (26)
- \p{Script_Extensions: Mero} \p{Script_Extensions=
- Meroitic_Hieroglyphs} (32)
- \p{Script_Extensions: Meroitic_Cursive} (Short: \p{Scx=Merc}) (26)
- \p{Script_Extensions: Meroitic_Hieroglyphs} (Short: \p{Scx=Mero})
- (32)
- \p{Script_Extensions: Miao} (Short: \p{Scx=Miao}) (133)
- \p{Script_Extensions: Mlym} \p{Script_Extensions=Malayalam} (98)
- \p{Script_Extensions: Mong} \p{Script_Extensions=Mongolian} (156)
- \p{Script_Extensions: Mongolian} (Short: \p{Scx=Mong}) (156)
- \p{Script_Extensions: Mtei} \p{Script_Extensions=Meetei_Mayek} (79)
- \p{Script_Extensions: Myanmar} (Short: \p{Scx=Mymr}) (188)
- \p{Script_Extensions: Mymr} \p{Script_Extensions=Myanmar} (188)
- \p{Script_Extensions: New_Tai_Lue} (Short: \p{Scx=Talu}) (83)
- \p{Script_Extensions: Nko} (Short: \p{Scx=Nko}) (59)
- \p{Script_Extensions: Nkoo} \p{Script_Extensions=Nko} (59)
- \p{Script_Extensions: Ogam} \p{Script_Extensions=Ogham} (29)
- \p{Script_Extensions: Ogham} (Short: \p{Scx=Ogam}) (29)
- \p{Script_Extensions: Ol_Chiki} (Short: \p{Scx=Olck}) (48)
- \p{Script_Extensions: Olck} \p{Script_Extensions=Ol_Chiki} (48)
- \p{Script_Extensions: Old_Italic} (Short: \p{Scx=Ital}) (35)
- \p{Script_Extensions: Old_Persian} (Short: \p{Scx=Xpeo}) (50)
- \p{Script_Extensions: Old_South_Arabian} (Short: \p{Scx=Sarb}) (32)
- \p{Script_Extensions: Old_Turkic} (Short: \p{Scx=Orkh}) (73)
- \p{Script_Extensions: Oriya} (Short: \p{Scx=Orya}) (92)
- \p{Script_Extensions: Orkh} \p{Script_Extensions=Old_Turkic} (73)
- \p{Script_Extensions: Orya} \p{Script_Extensions=Oriya} (92)
- \p{Script_Extensions: Osma} \p{Script_Extensions=Osmanya} (40)
- \p{Script_Extensions: Osmanya} (Short: \p{Scx=Osma}) (40)
- \p{Script_Extensions: Phag} \p{Script_Extensions=Phags_Pa} (59)
- \p{Script_Extensions: Phags_Pa} (Short: \p{Scx=Phag}) (59)
- \p{Script_Extensions: Phli} \p{Script_Extensions=
- Inscriptional_Pahlavi} (27)
- \p{Script_Extensions: Phnx} \p{Script_Extensions=Phoenician} (29)
- \p{Script_Extensions: Phoenician} (Short: \p{Scx=Phnx}) (29)
- \p{Script_Extensions: Plrd} \p{Script_Extensions=Miao} (133)
- \p{Script_Extensions: Prti} \p{Script_Extensions=
- Inscriptional_Parthian} (30)
- \p{Script_Extensions: Qaac} \p{Script_Extensions=Coptic} (137)
- \p{Script_Extensions: Qaai} \p{Script_Extensions=Inherited} (459)
- \p{Script_Extensions: Rejang} (Short: \p{Scx=Rjng}) (37)
- \p{Script_Extensions: Rjng} \p{Script_Extensions=Rejang} (37)
- \p{Script_Extensions: Runic} (Short: \p{Scx=Runr}) (78)
- \p{Script_Extensions: Runr} \p{Script_Extensions=Runic} (78)
- \p{Script_Extensions: Samaritan} (Short: \p{Scx=Samr}) (61)
- \p{Script_Extensions: Samr} \p{Script_Extensions=Samaritan} (61)
- \p{Script_Extensions: Sarb} \p{Script_Extensions=
- Old_South_Arabian} (32)
- \p{Script_Extensions: Saur} \p{Script_Extensions=Saurashtra} (81)
- \p{Script_Extensions: Saurashtra} (Short: \p{Scx=Saur}) (81)
- \p{Script_Extensions: Sharada} (Short: \p{Scx=Shrd}) (83)
- \p{Script_Extensions: Shavian} (Short: \p{Scx=Shaw}) (48)
- \p{Script_Extensions: Shaw} \p{Script_Extensions=Shavian} (48)
- \p{Script_Extensions: Shrd} \p{Script_Extensions=Sharada} (83)
- \p{Script_Extensions: Sinh} \p{Script_Extensions=Sinhala} (80)
- \p{Script_Extensions: Sinhala} (Short: \p{Scx=Sinh}) (80)
- \p{Script_Extensions: Sora} \p{Script_Extensions=Sora_Sompeng} (35)
- \p{Script_Extensions: Sora_Sompeng} (Short: \p{Scx=Sora}) (35)
- \p{Script_Extensions: Sund} \p{Script_Extensions=Sundanese} (72)
- \p{Script_Extensions: Sundanese} (Short: \p{Scx=Sund}) (72)
- \p{Script_Extensions: Sylo} \p{Script_Extensions=Syloti_Nagri} (44)
- \p{Script_Extensions: Syloti_Nagri} (Short: \p{Scx=Sylo}) (44)
- \p{Script_Extensions: Syrc} \p{Script_Extensions=Syriac} (93)
- \p{Script_Extensions: Syriac} (Short: \p{Scx=Syrc}) (93)
- \p{Script_Extensions: Tagalog} (Short: \p{Scx=Tglg}) (22)
- \p{Script_Extensions: Tagb} \p{Script_Extensions=Tagbanwa} (20)
- \p{Script_Extensions: Tagbanwa} (Short: \p{Scx=Tagb}) (20)
- \p{Script_Extensions: Tai_Le} (Short: \p{Scx=Tale}) (35)
- \p{Script_Extensions: Tai_Tham} (Short: \p{Scx=Lana}) (127)
- \p{Script_Extensions: Tai_Viet} (Short: \p{Scx=Tavt}) (72)
- \p{Script_Extensions: Takr} \p{Script_Extensions=Takri} (78)
- \p{Script_Extensions: Takri} (Short: \p{Scx=Takr}) (78)
- \p{Script_Extensions: Tale} \p{Script_Extensions=Tai_Le} (35)
- \p{Script_Extensions: Talu} \p{Script_Extensions=New_Tai_Lue} (83)
- \p{Script_Extensions: Tamil} (Short: \p{Scx=Taml}) (72)
- \p{Script_Extensions: Taml} \p{Script_Extensions=Tamil} (72)
- \p{Script_Extensions: Tavt} \p{Script_Extensions=Tai_Viet} (72)
- \p{Script_Extensions: Telu} \p{Script_Extensions=Telugu} (93)
- \p{Script_Extensions: Telugu} (Short: \p{Scx=Telu}) (93)
- \p{Script_Extensions: Tfng} \p{Script_Extensions=Tifinagh} (59)
- \p{Script_Extensions: Tglg} \p{Script_Extensions=Tagalog} (22)
- \p{Script_Extensions: Thaa} \p{Script_Extensions=Thaana} (65)
- \p{Script_Extensions: Thaana} (Short: \p{Scx=Thaa}) (65)
- \p{Script_Extensions: Thai} (Short: \p{Scx=Thai}) (86)
- \p{Script_Extensions: Tibetan} (Short: \p{Scx=Tibt}) (207)
- \p{Script_Extensions: Tibt} \p{Script_Extensions=Tibetan} (207)
- \p{Script_Extensions: Tifinagh} (Short: \p{Scx=Tfng}) (59)
- \p{Script_Extensions: Ugar} \p{Script_Extensions=Ugaritic} (31)
- \p{Script_Extensions: Ugaritic} (Short: \p{Scx=Ugar}) (31)
- \p{Script_Extensions: Unknown} (Short: \p{Scx=Zzzz}) (1_003_930)
- \p{Script_Extensions: Vai} (Short: \p{Scx=Vai}) (300)
- \p{Script_Extensions: Vaii} \p{Script_Extensions=Vai} (300)
- \p{Script_Extensions: Xpeo} \p{Script_Extensions=Old_Persian} (50)
- \p{Script_Extensions: Xsux} \p{Script_Extensions=Cuneiform} (982)
- \p{Script_Extensions: Yi} (Short: \p{Scx=Yi}) (1246)
- \p{Script_Extensions: Yiii} \p{Script_Extensions=Yi} (1246)
- \p{Script_Extensions: Zinh} \p{Script_Extensions=Inherited} (459)
- \p{Script_Extensions: Zyyy} \p{Script_Extensions=Common} (6057)
- \p{Script_Extensions: Zzzz} \p{Script_Extensions=Unknown}
- (1_003_930)
- \p{Scx: *} \p{Script_Extensions: *}
- \p{SD} \p{Soft_Dotted} (= \p{Soft_Dotted=Y}) (46)
- \p{SD: *} \p{Soft_Dotted: *}
- \p{Sentence_Break: AT} \p{Sentence_Break=ATerm} (4)
- \p{Sentence_Break: ATerm} (Short: \p{SB=AT}) (4)
- \p{Sentence_Break: CL} \p{Sentence_Break=Close} (177)
- \p{Sentence_Break: Close} (Short: \p{SB=CL}) (177)
- \p{Sentence_Break: CR} (Short: \p{SB=CR}) (1)
- \p{Sentence_Break: EX} \p{Sentence_Break=Extend} (1649)
- \p{Sentence_Break: Extend} (Short: \p{SB=EX}) (1649)
- \p{Sentence_Break: FO} \p{Sentence_Break=Format} (137)
- \p{Sentence_Break: Format} (Short: \p{SB=FO}) (137)
- \p{Sentence_Break: LE} \p{Sentence_Break=OLetter} (97_841)
- \p{Sentence_Break: LF} (Short: \p{SB=LF}) (1)
- \p{Sentence_Break: LO} \p{Sentence_Break=Lower} (1933)
- \p{Sentence_Break: Lower} (Short: \p{SB=LO}) (1933)
- \p{Sentence_Break: NU} \p{Sentence_Break=Numeric} (452)
- \p{Sentence_Break: Numeric} (Short: \p{SB=NU}) (452)
- \p{Sentence_Break: OLetter} (Short: \p{SB=LE}) (97_841)
- \p{Sentence_Break: Other} (Short: \p{SB=XX}) (1_010_273)
- \p{Sentence_Break: SC} \p{Sentence_Break=SContinue} (26)
- \p{Sentence_Break: SContinue} (Short: \p{SB=SC}) (26)
- \p{Sentence_Break: SE} \p{Sentence_Break=Sep} (3)
- \p{Sentence_Break: Sep} (Short: \p{SB=SE}) (3)
- \p{Sentence_Break: Sp} (Short: \p{SB=Sp}) (21)
- \p{Sentence_Break: ST} \p{Sentence_Break=STerm} (80)
- \p{Sentence_Break: STerm} (Short: \p{SB=ST}) (80)
- \p{Sentence_Break: UP} \p{Sentence_Break=Upper} (1514)
- \p{Sentence_Break: Upper} (Short: \p{SB=UP}) (1514)
- \p{Sentence_Break: XX} \p{Sentence_Break=Other} (1_010_273)
- \p{Separator} \p{General_Category=Separator} (Short:
- \p{Z}) (20)
- \p{Sharada} \p{Script=Sharada} (Short: \p{Shrd}; NOT
- \p{Block=Sharada}) (83)
- \p{Shavian} \p{Script=Shavian} (Short: \p{Shaw}) (48)
- \p{Shaw} \p{Shavian} (= \p{Script=Shavian}) (48)
- \p{Shrd} \p{Sharada} (= \p{Script=Sharada}) (NOT
- \p{Block=Sharada}) (83)
- \p{Sinh} \p{Sinhala} (= \p{Script=Sinhala}) (NOT
- \p{Block=Sinhala}) (80)
- \p{Sinhala} \p{Script=Sinhala} (Short: \p{Sinh}; NOT
- \p{Block=Sinhala}) (80)
- \p{Sk} \p{Modifier_Symbol} (=
- \p{General_Category=Modifier_Symbol})
- (115)
- \p{Sm} \p{Math_Symbol} (= \p{General_Category=
- Math_Symbol}) (952)
- X \p{Small_Form_Variants} \p{Block=Small_Form_Variants} (Short:
- \p{InSmallForms}) (32)
- X \p{Small_Forms} \p{Small_Form_Variants} (= \p{Block=
- Small_Form_Variants}) (32)
- \p{So} \p{Other_Symbol} (= \p{General_Category=
- Other_Symbol}) (4404)
- \p{Soft_Dotted} \p{Soft_Dotted=Y} (Short: \p{SD}) (46)
- \p{Soft_Dotted: N*} (Short: \p{SD=N}, \P{SD}) (1_114_066)
- \p{Soft_Dotted: Y*} (Short: \p{SD=Y}, \p{SD}) (46)
- \p{Sora} \p{Sora_Sompeng} (= \p{Script=
- Sora_Sompeng}) (NOT \p{Block=
- Sora_Sompeng}) (35)
- \p{Sora_Sompeng} \p{Script=Sora_Sompeng} (Short: \p{Sora};
- NOT \p{Block=Sora_Sompeng}) (35)
- \p{Space} \p{White_Space=Y} \s including beyond
- ASCII and vertical tab (26)
- \p{Space: *} \p{White_Space: *}
- \p{Space_Separator} \p{General_Category=Space_Separator}
- (Short: \p{Zs}) (18)
- \p{SpacePerl} \p{XPerlSpace} (26)
- \p{Spacing_Mark} \p{General_Category=Spacing_Mark} (Short:
- \p{Mc}) (353)
- X \p{Spacing_Modifier_Letters} \p{Block=Spacing_Modifier_Letters}
- (Short: \p{InModifierLetters}) (80)
- X \p{Specials} \p{Block=Specials} (16)
- \p{STerm} \p{STerm=Y} (83)
- \p{STerm: N*} (Single: \P{STerm}) (1_114_029)
- \p{STerm: Y*} (Single: \p{STerm}) (83)
- \p{Sund} \p{Sundanese} (= \p{Script=Sundanese})
- (NOT \p{Block=Sundanese}) (72)
- \p{Sundanese} \p{Script=Sundanese} (Short: \p{Sund}; NOT
- \p{Block=Sundanese}) (72)
- X \p{Sundanese_Sup} \p{Sundanese_Supplement} (= \p{Block=
- Sundanese_Supplement}) (16)
- X \p{Sundanese_Supplement} \p{Block=Sundanese_Supplement} (Short:
- \p{InSundaneseSup}) (16)
- X \p{Sup_Arrows_A} \p{Supplemental_Arrows_A} (= \p{Block=
- Supplemental_Arrows_A}) (16)
- X \p{Sup_Arrows_B} \p{Supplemental_Arrows_B} (= \p{Block=
- Supplemental_Arrows_B}) (128)
- X \p{Sup_Math_Operators} \p{Supplemental_Mathematical_Operators} (=
- \p{Block=
- Supplemental_Mathematical_Operators})
- (256)
- X \p{Sup_PUA_A} \p{Supplementary_Private_Use_Area_A} (=
- \p{Block=
- Supplementary_Private_Use_Area_A})
- (65_536)
- X \p{Sup_PUA_B} \p{Supplementary_Private_Use_Area_B} (=
- \p{Block=
- Supplementary_Private_Use_Area_B})
- (65_536)
- X \p{Sup_Punctuation} \p{Supplemental_Punctuation} (= \p{Block=
- Supplemental_Punctuation}) (128)
- X \p{Super_And_Sub} \p{Superscripts_And_Subscripts} (=
- \p{Block=Superscripts_And_Subscripts})
- (48)
- X \p{Superscripts_And_Subscripts} \p{Block=
- Superscripts_And_Subscripts} (Short:
- \p{InSuperAndSub}) (48)
- X \p{Supplemental_Arrows_A} \p{Block=Supplemental_Arrows_A} (Short:
- \p{InSupArrowsA}) (16)
- X \p{Supplemental_Arrows_B} \p{Block=Supplemental_Arrows_B} (Short:
- \p{InSupArrowsB}) (128)
- X \p{Supplemental_Mathematical_Operators} \p{Block=
- Supplemental_Mathematical_Operators}
- (Short: \p{InSupMathOperators}) (256)
- X \p{Supplemental_Punctuation} \p{Block=Supplemental_Punctuation}
- (Short: \p{InSupPunctuation}) (128)
- X \p{Supplementary_Private_Use_Area_A} \p{Block=
- Supplementary_Private_Use_Area_A}
- (Short: \p{InSupPUAA}) (65_536)
- X \p{Supplementary_Private_Use_Area_B} \p{Block=
- Supplementary_Private_Use_Area_B}
- (Short: \p{InSupPUAB}) (65_536)
- \p{Surrogate} \p{General_Category=Surrogate} (Short:
- \p{Cs}) (2048)
- \p{Sylo} \p{Syloti_Nagri} (= \p{Script=
- Syloti_Nagri}) (NOT \p{Block=
- Syloti_Nagri}) (44)
- \p{Syloti_Nagri} \p{Script=Syloti_Nagri} (Short: \p{Sylo};
- NOT \p{Block=Syloti_Nagri}) (44)
- \p{Symbol} \p{General_Category=Symbol} (Short: \p{S})
- (5520)
- \p{Syrc} \p{Syriac} (= \p{Script=Syriac}) (NOT
- \p{Block=Syriac}) (77)
- \p{Syriac} \p{Script=Syriac} (Short: \p{Syrc}; NOT
- \p{Block=Syriac}) (77)
- \p{Tagalog} \p{Script=Tagalog} (Short: \p{Tglg}; NOT
- \p{Block=Tagalog}) (20)
- \p{Tagb} \p{Tagbanwa} (= \p{Script=Tagbanwa}) (NOT
- \p{Block=Tagbanwa}) (18)
- \p{Tagbanwa} \p{Script=Tagbanwa} (Short: \p{Tagb}; NOT
- \p{Block=Tagbanwa}) (18)
- X \p{Tags} \p{Block=Tags} (128)
- \p{Tai_Le} \p{Script=Tai_Le} (Short: \p{Tale}; NOT
- \p{Block=Tai_Le}) (35)
- \p{Tai_Tham} \p{Script=Tai_Tham} (Short: \p{Lana}; NOT
- \p{Block=Tai_Tham}) (127)
- \p{Tai_Viet} \p{Script=Tai_Viet} (Short: \p{Tavt}; NOT
- \p{Block=Tai_Viet}) (72)
- X \p{Tai_Xuan_Jing} \p{Tai_Xuan_Jing_Symbols} (= \p{Block=
- Tai_Xuan_Jing_Symbols}) (96)
- X \p{Tai_Xuan_Jing_Symbols} \p{Block=Tai_Xuan_Jing_Symbols} (Short:
- \p{InTaiXuanJing}) (96)
- \p{Takr} \p{Takri} (= \p{Script=Takri}) (NOT
- \p{Block=Takri}) (66)
- \p{Takri} \p{Script=Takri} (Short: \p{Takr}; NOT
- \p{Block=Takri}) (66)
- \p{Tale} \p{Tai_Le} (= \p{Script=Tai_Le}) (NOT
- \p{Block=Tai_Le}) (35)
- \p{Talu} \p{New_Tai_Lue} (= \p{Script=New_Tai_Lue})
- (NOT \p{Block=New_Tai_Lue}) (83)
- \p{Tamil} \p{Script=Tamil} (Short: \p{Taml}; NOT
- \p{Block=Tamil}) (72)
- \p{Taml} \p{Tamil} (= \p{Script=Tamil}) (NOT
- \p{Block=Tamil}) (72)
- \p{Tavt} \p{Tai_Viet} (= \p{Script=Tai_Viet}) (NOT
- \p{Block=Tai_Viet}) (72)
- \p{Telu} \p{Telugu} (= \p{Script=Telugu}) (NOT
- \p{Block=Telugu}) (93)
- \p{Telugu} \p{Script=Telugu} (Short: \p{Telu}; NOT
- \p{Block=Telugu}) (93)
- \p{Term} \p{Terminal_Punctuation} (=
- \p{Terminal_Punctuation=Y}) (176)
- \p{Term: *} \p{Terminal_Punctuation: *}
- \p{Terminal_Punctuation} \p{Terminal_Punctuation=Y} (Short:
- \p{Term}) (176)
- \p{Terminal_Punctuation: N*} (Short: \p{Term=N}, \P{Term})
- (1_113_936)
- \p{Terminal_Punctuation: Y*} (Short: \p{Term=Y}, \p{Term}) (176)
- \p{Tfng} \p{Tifinagh} (= \p{Script=Tifinagh}) (NOT
- \p{Block=Tifinagh}) (59)
- \p{Tglg} \p{Tagalog} (= \p{Script=Tagalog}) (NOT
- \p{Block=Tagalog}) (20)
- \p{Thaa} \p{Thaana} (= \p{Script=Thaana}) (NOT
- \p{Block=Thaana}) (50)
- \p{Thaana} \p{Script=Thaana} (Short: \p{Thaa}; NOT
- \p{Block=Thaana}) (50)
- \p{Thai} \p{Script=Thai} (NOT \p{Block=Thai}) (86)
- \p{Tibetan} \p{Script=Tibetan} (Short: \p{Tibt}; NOT
- \p{Block=Tibetan}) (207)
- \p{Tibt} \p{Tibetan} (= \p{Script=Tibetan}) (NOT
- \p{Block=Tibetan}) (207)
- \p{Tifinagh} \p{Script=Tifinagh} (Short: \p{Tfng}; NOT
- \p{Block=Tifinagh}) (59)
- \p{Title} \p{Titlecase} (/i= Cased=Yes) (31)
- \p{Titlecase} (= \p{Gc=Lt}) (Short: \p{Title}; /i=
- Cased=Yes) (31)
- \p{Titlecase_Letter} \p{General_Category=Titlecase_Letter}
- (Short: \p{Lt}; /i= General_Category=
- Cased_Letter) (31)
- X \p{Transport_And_Map} \p{Transport_And_Map_Symbols} (= \p{Block=
- Transport_And_Map_Symbols}) (128)
- X \p{Transport_And_Map_Symbols} \p{Block=Transport_And_Map_Symbols}
- (Short: \p{InTransportAndMap}) (128)
- X \p{UCAS} \p{Unified_Canadian_Aboriginal_Syllabics}
- (= \p{Block=
- Unified_Canadian_Aboriginal_Syllabics})
- (640)
- X \p{UCAS_Ext} \p{Unified_Canadian_Aboriginal_Syllabics_-
- Extended} (= \p{Block=
- Unified_Canadian_Aboriginal_Syllabics_-
- Extended}) (80)
- \p{Ugar} \p{Ugaritic} (= \p{Script=Ugaritic}) (NOT
- \p{Block=Ugaritic}) (31)
- \p{Ugaritic} \p{Script=Ugaritic} (Short: \p{Ugar}; NOT
- \p{Block=Ugaritic}) (31)
- \p{UIdeo} \p{Unified_Ideograph} (=
- \p{Unified_Ideograph=Y}) (74_617)
- \p{UIdeo: *} \p{Unified_Ideograph: *}
- \p{Unassigned} \p{General_Category=Unassigned} (Short:
- \p{Cn}) (864_414)
- X \p{Unified_Canadian_Aboriginal_Syllabics} \p{Block=
- Unified_Canadian_Aboriginal_Syllabics}
- (Short: \p{InUCAS}) (640)
- X \p{Unified_Canadian_Aboriginal_Syllabics_Extended} \p{Block=
- Unified_Canadian_Aboriginal_Syllabics_-
- Extended} (Short: \p{InUCASExt}) (80)
- \p{Unified_Ideograph} \p{Unified_Ideograph=Y} (Short: \p{UIdeo})
- (74_617)
- \p{Unified_Ideograph: N*} (Short: \p{UIdeo=N}, \P{UIdeo})
- (1_039_495)
- \p{Unified_Ideograph: Y*} (Short: \p{UIdeo=Y}, \p{UIdeo}) (74_617)
- \p{Unknown} \p{Script=Unknown} (Short: \p{Zzzz})
- (1_003_930)
- \p{Upper} \p{Uppercase=Y} (/i= Cased=Yes) (1483)
- \p{Upper: *} \p{Uppercase: *}
- \p{Uppercase} \p{Upper} (= \p{Uppercase=Y}) (/i= Cased=
- Yes) (1483)
- \p{Uppercase: N*} (Short: \p{Upper=N}, \P{Upper}; /i= Cased=
- No) (1_112_629)
- \p{Uppercase: Y*} (Short: \p{Upper=Y}, \p{Upper}; /i= Cased=
- Yes) (1483)
- \p{Uppercase_Letter} \p{General_Category=Uppercase_Letter}
- (Short: \p{Lu}; /i= General_Category=
- Cased_Letter) (1441)
- \p{Vai} \p{Script=Vai} (NOT \p{Block=Vai}) (300)
- \p{Vaii} \p{Vai} (= \p{Script=Vai}) (NOT \p{Block=
- Vai}) (300)
- \p{Variation_Selector} \p{Variation_Selector=Y} (Short: \p{VS};
- NOT \p{Variation_Selectors}) (259)
- \p{Variation_Selector: N*} (Short: \p{VS=N}, \P{VS}) (1_113_853)
- \p{Variation_Selector: Y*} (Short: \p{VS=Y}, \p{VS}) (259)
- X \p{Variation_Selectors} \p{Block=Variation_Selectors} (Short:
- \p{InVS}) (16)
- X \p{Variation_Selectors_Supplement} \p{Block=
- Variation_Selectors_Supplement} (Short:
- \p{InVSSup}) (240)
- X \p{Vedic_Ext} \p{Vedic_Extensions} (= \p{Block=
- Vedic_Extensions}) (48)
- X \p{Vedic_Extensions} \p{Block=Vedic_Extensions} (Short:
- \p{InVedicExt}) (48)
- X \p{Vertical_Forms} \p{Block=Vertical_Forms} (16)
- \p{VertSpace} \v (7)
- \p{VS} \p{Variation_Selector} (=
- \p{Variation_Selector=Y}) (NOT
- \p{Variation_Selectors}) (259)
- \p{VS: *} \p{Variation_Selector: *}
- X \p{VS_Sup} \p{Variation_Selectors_Supplement} (=
- \p{Block=
- Variation_Selectors_Supplement}) (240)
- \p{WB: *} \p{Word_Break: *}
- \p{White_Space} \p{White_Space=Y} (Short: \p{WSpace}) (26)
- \p{White_Space: N*} (Short: \p{Space=N}, \P{WSpace})
- (1_114_086)
- \p{White_Space: Y*} (Short: \p{Space=Y}, \p{WSpace}) (26)
- \p{Word} \w, including beyond ASCII; = \p{Alnum} +
- \pM + \p{Pc} (103_406)
- \p{Word_Break: ALetter} (Short: \p{WB=LE}) (24_941)
- \p{Word_Break: CR} (Short: \p{WB=CR}) (1)
- \p{Word_Break: EX} \p{Word_Break=ExtendNumLet} (10)
- \p{Word_Break: Extend} (Short: \p{WB=Extend}) (1649)
- \p{Word_Break: ExtendNumLet} (Short: \p{WB=EX}) (10)
- \p{Word_Break: FO} \p{Word_Break=Format} (136)
- \p{Word_Break: Format} (Short: \p{WB=FO}) (136)
- \p{Word_Break: KA} \p{Word_Break=Katakana} (310)
- \p{Word_Break: Katakana} (Short: \p{WB=KA}) (310)
- \p{Word_Break: LE} \p{Word_Break=ALetter} (24_941)
- \p{Word_Break: LF} (Short: \p{WB=LF}) (1)
- \p{Word_Break: MB} \p{Word_Break=MidNumLet} (8)
- \p{Word_Break: MidLetter} (Short: \p{WB=ML}) (8)
- \p{Word_Break: MidNum} (Short: \p{WB=MN}) (15)
- \p{Word_Break: MidNumLet} (Short: \p{WB=MB}) (8)
- \p{Word_Break: ML} \p{Word_Break=MidLetter} (8)
- \p{Word_Break: MN} \p{Word_Break=MidNum} (15)
- \p{Word_Break: Newline} (Short: \p{WB=NL}) (5)
- \p{Word_Break: NL} \p{Word_Break=Newline} (5)
- \p{Word_Break: NU} \p{Word_Break=Numeric} (451)
- \p{Word_Break: Numeric} (Short: \p{WB=NU}) (451)
- \p{Word_Break: Other} (Short: \p{WB=XX}) (1_086_551)
- \p{Word_Break: Regional_Indicator} (Short: \p{WB=RI}) (26)
- \p{Word_Break: RI} \p{Word_Break=Regional_Indicator} (26)
- \p{Word_Break: XX} \p{Word_Break=Other} (1_086_551)
- \p{WSpace} \p{White_Space} (= \p{White_Space=Y}) (26)
- \p{WSpace: *} \p{White_Space: *}
- \p{XDigit} \p{Hex_Digit=Y} (Short: \p{Hex}) (44)
- \p{XID_Continue} \p{XID_Continue=Y} (Short: \p{XIDC})
- (103_336)
- \p{XID_Continue: N*} (Short: \p{XIDC=N}, \P{XIDC}) (1_010_776)
- \p{XID_Continue: Y*} (Short: \p{XIDC=Y}, \p{XIDC}) (103_336)
- \p{XID_Start} \p{XID_Start=Y} (Short: \p{XIDS}) (101_217)
- \p{XID_Start: N*} (Short: \p{XIDS=N}, \P{XIDS}) (1_012_895)
- \p{XID_Start: Y*} (Short: \p{XIDS=Y}, \p{XIDS}) (101_217)
- \p{XIDC} \p{XID_Continue} (= \p{XID_Continue=Y})
- (103_336)
- \p{XIDC: *} \p{XID_Continue: *}
- \p{XIDS} \p{XID_Start} (= \p{XID_Start=Y}) (101_217)
- \p{XIDS: *} \p{XID_Start: *}
- \p{Xpeo} \p{Old_Persian} (= \p{Script=Old_Persian})
- (NOT \p{Block=Old_Persian}) (50)
- \p{XPerlSpace} \s, including beyond ASCII (Short:
- \p{SpacePerl}) (26)
- \p{XPosixAlnum} \p{Alnum} (102_619)
- \p{XPosixAlpha} \p{Alpha} (= \p{Alphabetic=Y}) (102_159)
- \p{XPosixBlank} \p{Blank} (19)
- \p{XPosixCntrl} \p{Cntrl} (= \p{General_Category=Control})
- (65)
- \p{XPosixDigit} \p{Digit} (= \p{General_Category=
- Decimal_Number}) (460)
- \p{XPosixGraph} \p{Graph} (247_565)
- \p{XPosixLower} \p{Lower} (= \p{Lowercase=Y}) (/i= Cased=
- Yes) (1934)
- \p{XPosixPrint} \p{Print} (247_583)
- \p{XPosixPunct} \p{Punct} + ASCII-range \p{Symbol} (641)
- \p{XPosixSpace} \p{Space} (= \p{White_Space=Y}) (26)
- \p{XPosixUpper} \p{Upper} (= \p{Uppercase=Y}) (/i= Cased=
- Yes) (1483)
- \p{XPosixWord} \p{Word} (103_406)
- \p{XPosixXDigit} \p{XDigit} (= \p{Hex_Digit=Y}) (44)
- \p{Xsux} \p{Cuneiform} (= \p{Script=Cuneiform})
- (NOT \p{Block=Cuneiform}) (982)
- \p{Yi} \p{Script=Yi} (1220)
- X \p{Yi_Radicals} \p{Block=Yi_Radicals} (64)
- X \p{Yi_Syllables} \p{Block=Yi_Syllables} (1168)
- \p{Yiii} \p{Yi} (= \p{Script=Yi}) (1220)
- X \p{Yijing} \p{Yijing_Hexagram_Symbols} (= \p{Block=
- Yijing_Hexagram_Symbols}) (64)
- X \p{Yijing_Hexagram_Symbols} \p{Block=Yijing_Hexagram_Symbols}
- (Short: \p{InYijing}) (64)
- \p{Z} \p{Separator} (= \p{General_Category=
- Separator}) (20)
- \p{Zinh} \p{Inherited} (= \p{Script=Inherited})
- (523)
- \p{Zl} \p{Line_Separator} (= \p{General_Category=
- Line_Separator}) (1)
- \p{Zp} \p{Paragraph_Separator} (=
- \p{General_Category=
- Paragraph_Separator}) (1)
- \p{Zs} \p{Space_Separator} (=
- \p{General_Category=Space_Separator})
- (18)
- \p{Zyyy} \p{Common} (= \p{Script=Common}) (6413)
- \p{Zzzz} \p{Unknown} (= \p{Script=Unknown})
- (1_003_930)
- TX\p{_CanonDCIJ} (For internal use by Perl, not necessarily
- stable) (= \p{Soft_Dotted=Y}) (46)
- TX\p{_Case_Ignorable} (For internal use by Perl, not necessarily
- stable) (= \p{Case_Ignorable=Y}) (1799)
- TX\p{_CombAbove} (For internal use by Perl, not necessarily
- stable) (= \p{Canonical_Combining_Class=
- Above}) (349)
\p{}
and \P{}
constructs that match no charactersUnicode has some property-value pairs that currently don't match anything. This happens generally either because they are obsolete, or they exist for symmetry with other forms, but no language has yet been encoded that uses them. In this version of Unicode, the following match zero code points:
All the Unicode character properties mentioned above (except for those marked as for internal use by Perl) are also accessible by prop_invlist() in Unicode::UCD.
Due to their nature, not all Unicode character properties are suitable for
regular expression matches, nor prop_invlist()
. The remaining
non-provisional, non-internal ones are accessible via
prop_invmap() in Unicode::UCD (except for those that this Perl installation
hasn't included; see below for which those are).
For compatibility with other parts of Perl, all the single forms given in the
table in the section above
are recognized. BUT, there are some ambiguities between some Perl extensions
and the Unicode properties, all of which are silently resolved in favor of the
official Unicode property. To avoid surprises, you should only use
prop_invmap()
for forms listed in the table below, which omits the
non-recommended ones. The affected forms are the Perl single form equivalents
of Unicode properties, such as \p{sc}
being a single-form equivalent of
\p{gc=sc}
, which is treated by prop_invmap()
as the Script
property,
whose short name is sc
. The table indicates the current ambiguities in the
INFO column, beginning with the word "NOT"
.
The standard Unicode properties listed below are documented in http://www.unicode.org/reports/tr44/; Perl_Decimal_Digit is documented in prop_invmap() in Unicode::UCD. The other Perl extensions are in Other Properties in perlunicode;
The first column in the table is a name for the property; the second column is an alternative name, if any, plus possibly some annotations. The alternative name is the property's full name, unless that would simply repeat the first column, in which case the second column indicates the property's short name (if different). The annotations are given only in the entry for the full name. If a property is obsolete, etc, the entry will be flagged with the same characters used in the table in the section above, like D or S.
- NAME INFO
- Age
- AHex ASCII_Hex_Digit
- All Any. (Perl extension)
- Alnum (Perl extension). Alphabetic and
- (decimal) Numeric
- Alpha Alphabetic
- Alphabetic (Short: Alpha)
- Any (Perl extension). [\x{0000}-\x{10FFFF}]
- ASCII Block=ASCII. (Perl extension).
- [[:ASCII:]]
- ASCII_Hex_Digit (Short: AHex)
- Assigned (Perl extension). All assigned code points
- Bc Bidi_Class
- Bidi_C Bidi_Control
- Bidi_Class (Short: bc)
- Bidi_Control (Short: Bidi_C)
- Bidi_M Bidi_Mirrored
- Bidi_Mirrored (Short: Bidi_M)
- Bidi_Mirroring_Glyph (Short: bmg)
- Blank (Perl extension). \h, Horizontal white
- space
- Blk Block
- Block (Short: blk)
- Bmg Bidi_Mirroring_Glyph
- Canonical_Combining_Class (Short: ccc)
- Case_Folding (Short: cf)
- Case_Ignorable (Short: CI)
- Cased
- Category General_Category
- Ccc Canonical_Combining_Class
- CE Composition_Exclusion
- Cf Case_Folding; NOT 'cf' meaning
- 'General_Category=Format'
- Changes_When_Casefolded (Short: CWCF)
- Changes_When_Casemapped (Short: CWCM)
- Changes_When_Lowercased (Short: CWL)
- Changes_When_NFKC_Casefolded (Short: CWKCF)
- Changes_When_Titlecased (Short: CWT)
- Changes_When_Uppercased (Short: CWU)
- CI Case_Ignorable
- Cntrl General_Category=Cntrl. (Perl extension).
- Control characters
- Comp_Ex Full_Composition_Exclusion
- Composition_Exclusion (Short: CE)
- CWCF Changes_When_Casefolded
- CWCM Changes_When_Casemapped
- CWKCF Changes_When_NFKC_Casefolded
- CWL Changes_When_Lowercased
- CWT Changes_When_Titlecased
- CWU Changes_When_Uppercased
- Dash
- Decomposition_Mapping (Short: dm)
- Decomposition_Type (Short: dt)
- Default_Ignorable_Code_Point (Short: DI)
- Dep Deprecated
- Deprecated (Short: Dep)
- DI Default_Ignorable_Code_Point
- Dia Diacritic
- Diacritic (Short: Dia)
- Digit General_Category=Digit. (Perl extension).
- [0-9] + all other decimal digits
- Dm Decomposition_Mapping
- Dt Decomposition_Type
- Ea East_Asian_Width
- East_Asian_Width (Short: ea)
- Ext Extender
- Extender (Short: Ext)
- Full_Composition_Exclusion (Short: Comp_Ex)
- Gc General_Category
- GCB Grapheme_Cluster_Break
- General_Category (Short: gc)
- Gr_Base Grapheme_Base
- Gr_Ext Grapheme_Extend
- Graph (Perl extension). Characters that are
- graphical
- Grapheme_Base (Short: Gr_Base)
- Grapheme_Cluster_Break (Short: GCB)
- Grapheme_Extend (Short: Gr_Ext)
- Hangul_Syllable_Type (Short: hst)
- Hex Hex_Digit
- Hex_Digit (Short: Hex)
- HorizSpace Blank. (Perl extension)
- Hst Hangul_Syllable_Type
- D Hyphen Supplanted by Line_Break property values;
- see www.unicode.org/reports/tr14
- ID_Continue (Short: IDC)
- ID_Start (Short: IDS)
- IDC ID_Continue
- Ideo Ideographic
- Ideographic (Short: Ideo)
- IDS ID_Start
- IDS_Binary_Operator (Short: IDSB)
- IDS_Trinary_Operator (Short: IDST)
- IDSB IDS_Binary_Operator
- IDST IDS_Trinary_Operator
- In Present_In. (Perl extension)
- Isc ISO_Comment; NOT 'isc' meaning
- 'General_Category=Other'
- ISO_Comment (Short: isc)
- Jg Joining_Group
- Join_C Join_Control
- Join_Control (Short: Join_C)
- Joining_Group (Short: jg)
- Joining_Type (Short: jt)
- Jt Joining_Type
- Lb Line_Break
- Lc Lowercase_Mapping; NOT 'lc' meaning
- 'General_Category=Cased_Letter'
- Line_Break (Short: lb)
- LOE Logical_Order_Exception
- Logical_Order_Exception (Short: LOE)
- Lower Lowercase
- Lowercase (Short: Lower)
- Lowercase_Mapping (Short: lc)
- Math
- Na Name
- Na1 Unicode_1_Name
- Name (Short: na)
- Name_Alias
- NChar Noncharacter_Code_Point
- NFC_QC NFC_Quick_Check
- NFC_Quick_Check (Short: NFC_QC)
- NFD_QC NFD_Quick_Check
- NFD_Quick_Check (Short: NFD_QC)
- NFKC_Casefold (Short: NFKC_CF)
- NFKC_CF NFKC_Casefold
- NFKC_QC NFKC_Quick_Check
- NFKC_Quick_Check (Short: NFKC_QC)
- NFKD_QC NFKD_Quick_Check
- NFKD_Quick_Check (Short: NFKD_QC)
- Noncharacter_Code_Point (Short: NChar)
- Nt Numeric_Type
- Numeric_Type (Short: nt)
- Numeric_Value (Short: nv)
- Nv Numeric_Value
- Pat_Syn Pattern_Syntax
- Pat_WS Pattern_White_Space
- Pattern_Syntax (Short: Pat_Syn)
- Pattern_White_Space (Short: Pat_WS)
- Perl_Decimal_Digit (Perl extension)
- PerlSpace (Perl extension). \s, restricted to ASCII
- = [ \f\n\r\t] plus vertical tab
- PerlWord (Perl extension). \w, restricted to ASCII
- = [A-Za-z0-9_]
- PosixAlnum (Perl extension). [A-Za-z0-9]
- PosixAlpha (Perl extension). [A-Za-z]
- PosixBlank (Perl extension). \t and ' '
- PosixCntrl (Perl extension). ASCII control
- characters: NUL, SOH, STX, ETX, EOT, ENQ,
- ACK, BEL, BS, HT, LF, VT, FF, CR, SO, SI,
- DLE, DC1, DC2, DC3, DC4, NAK, SYN, ETB,
- CAN, EOM, SUB, ESC, FS, GS, RS, US, and DEL
- PosixDigit (Perl extension). [0-9]
- PosixGraph (Perl extension). [-
- !"#$%&'()*+,./:;<>?@[\\]^_`{|}~0-9A-Za-z]
- PosixLower (Perl extension). [a-z]
- PosixPrint (Perl extension). [- 0-9A-Za-
- z!"#$%&'()*+,./:;<>?@[\\]^_`{|}~]
- PosixPunct (Perl extension). [-
- !"#$%&'()*+,./:;<>?@[\\]^_`{|}~]
- PosixSpace (Perl extension). \t, \n, \cK, \f, \r,
- and ' '. (\cK is vertical tab)
- PosixUpper (Perl extension). [A-Z]
- PosixWord PerlWord. (Perl extension)
- PosixXDigit (Perl extension). [0-9A-Fa-f]
- Present_In (Short: In). (Perl extension)
- Print (Perl extension). Characters that are
- graphical plus space characters (but no
- controls)
- Punct General_Category=Punct. (Perl extension)
- QMark Quotation_Mark
- Quotation_Mark (Short: QMark)
- Radical
- SB Sentence_Break
- Sc Script; NOT 'sc' meaning
- 'General_Category=Currency_Symbol'
- Scf Simple_Case_Folding
- Script (Short: sc)
- Script_Extensions (Short: scx)
- Scx Script_Extensions
- SD Soft_Dotted
- Sentence_Break (Short: SB)
- Sfc Simple_Case_Folding
- Simple_Case_Folding (Short: scf)
- Simple_Lowercase_Mapping (Short: slc)
- Simple_Titlecase_Mapping (Short: stc)
- Simple_Uppercase_Mapping (Short: suc)
- Slc Simple_Lowercase_Mapping
- Soft_Dotted (Short: SD)
- Space White_Space
- SpacePerl XPerlSpace. (Perl extension)
- Stc Simple_Titlecase_Mapping
- STerm
- Suc Simple_Uppercase_Mapping
- Tc Titlecase_Mapping
- Term Terminal_Punctuation
- Terminal_Punctuation (Short: Term)
- Title Titlecase. (Perl extension)
- Titlecase (Short: Title). (Perl extension). (=
- \p{Gc=Lt})
- Titlecase_Mapping (Short: tc)
- Uc Uppercase_Mapping
- UIdeo Unified_Ideograph
- Unicode_1_Name (Short: na1)
- Unified_Ideograph (Short: UIdeo)
- Upper Uppercase
- Uppercase (Short: Upper)
- Uppercase_Mapping (Short: uc)
- Variation_Selector (Short: VS)
- VertSpace (Perl extension). \v
- VS Variation_Selector
- WB Word_Break
- White_Space (Short: WSpace)
- Word (Perl extension). \w, including beyond
- ASCII; = \p{Alnum} + \pM + \p{Pc}
- Word_Break (Short: WB)
- WSpace White_Space
- XDigit (Perl extension)
- XID_Continue (Short: XIDC)
- XID_Start (Short: XIDS)
- XIDC XID_Continue
- XIDS XID_Start
- XPerlSpace (Perl extension). \s, including beyond
- ASCII
- XPosixAlnum Alnum. (Perl extension)
- XPosixAlpha Alpha. (Perl extension)
- XPosixBlank Blank. (Perl extension)
- XPosixCntrl General_Category=Cntrl. (Perl extension)
- XPosixDigit General_Category=Digit. (Perl extension)
- XPosixGraph Graph. (Perl extension)
- XPosixLower Lower. (Perl extension)
- XPosixPrint Print. (Perl extension)
- XPosixPunct (Perl extension). \p{Punct} + ASCII-range
- \p{Symbol}
- XPosixSpace Space. (Perl extension)
- XPosixUpper Upper. (Perl extension)
- XPosixWord Word. (Perl extension)
- XPosixXDigit XDigit. (Perl extension)
Certain properties are accessible also via core function calls. These are:
Also, Case_Folding is accessible through the /i modifier in regular
expressions, the \F
transliteration escape, and the fc
operator.
And, the Name and Name_Aliases properties are accessible through the \N{}
interpolation in double-quoted strings and regular expressions; and functions
charnames::viacode()
, charnames::vianame()
, and
charnames::string_vianame()
(which require a use charnames ();
to be
specified.
Finally, most properties related to decomposition are accessible via Unicode::Normalize.
Perl will generate an error for a few character properties in Unicode when used in a regular expression. The non-Unihan ones are listed below, with the reasons they are not accepted, perhaps with work-arounds. The short names for the properties are listed enclosed in (parentheses). As described after the list, an installation can change the defaults and choose to accept any of these. The list is machine generated based on the choices made for the installation that generated this document.
Deprecated by Unicode. These are characters that expand to more than one character in the specified normalization form, but whether they actually take up more bytes or not depends on the encoding being used. For example, a UTF-8 encoded character may expand to a different number of bytes than a UTF-32 encoded character.
Deprecated by Unicode: Duplicates ccc=vr (Canonical_Combining_Class=Virama)
Provisional
Used by Unicode internally for generating other properties and not intended to be used stand-alone
Obsolete. All code points previously matched by this have been moved to "Script=Common". Consider instead using "Script_Extensions=Katakana" or "Script_Extensions=Hiragana" (or both)
All code points that would be matched by this are matched by either "Script_Extensions=Katakana" or "Script_Extensions=Hiragana"
An installation can choose to allow any of these to be matched by downloading
the Unicode database from http://www.unicode.org/Public/ to
$Config{privlib}
/unicore/ in the Perl source tree, changing the
controlling lists contained in the program
$Config{privlib}
/unicore/mktables and then re-compiling and installing.
(%Config
is available from the Config module).
The Unicode data base is delivered in two different formats. The XML version is valid for more modern Unicode releases. The other version is a collection of files. The two are intended to give equivalent information. Perl uses the older form; this allows you to recompile Perl to use early Unicode releases.
The only non-character property that Perl currently supports is Named
Sequences, in which a sequence of code points
is given a name and generally treated as a single entity. (Perl supports
these via the \N{...}
double-quotish construct,
charnames::string_vianame(name) in charnames, and namedseq() in Unicode::UCD.
Below is a list of the files in the Unicode data base that Perl doesn't currently use, along with very brief descriptions of their purposes. Some of the names of the files have been shortened from those that Unicode uses, in order to allow them to be distinguishable from similarly named files on file systems for which only the first 8 characters of a name are significant.
Documentation of validation tests
Validation Tests
Maps the kRSUnicode property values to corresponding code points
Maps certain Unicode code points to their legacy Japanese cell-phone values
Alphabetical index of Unicode characters
Provisional; for the analysis and processing of Indic scripts
Named sequences proposed for inclusion in a later version of the Unicode Standard; if you need them now, you can append this file to NamedSequences.txt and recompile perl
Annotated list of characters
Documentation of corrections already incorporated into the Unicode data base
Only in very early releases; is a subset of PropList.txt (which is used instead)
Documentation
Certain glyph variations for character display are standardized. This lists the non-Unihan ones; the Unihan ones are also not used by Perl, and are in a separate Unicode data base http://www.unicode.org/ivd
Documentation of status and cross reference of proposals for encoding by Unicode of Unihan characters
perlunitut - Perl Unicode Tutorial
The days of just flinging strings around are over. It's well established that modern programs need to be capable of communicating funny accented letters, and things like euro symbols. This means that programmers need new habits. It's easy to program Unicode capable software, but it does require discipline to do it right.
There's a lot to know about character sets, and text encodings. It's probably best to spend a full day learning all this, but the basics can be learned in minutes.
These are not the very basics, though. It is assumed that you already know the difference between bytes and characters, and realise (and accept!) that there are many different character sets and encodings, and that your program has to be explicit about them. Recommended reading is "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" by Joel Spolsky, at http://joelonsoftware.com/articles/Unicode.html.
This tutorial speaks in rather absolute terms, and provides only a limited view of the wealth of character string related features that Perl has to offer. For most projects, this information will probably suffice.
It's important to set a few things straight first. This is the most important part of this tutorial. This view may conflict with other information that you may have found on the web, but that's mostly because many sources are wrong.
You may have to re-read this entire section a few times...
Unicode is a character set with room for lots of characters. The ordinal value of a character is called a code point. (But in practice, the distinction between code point and character is blurred, so the terms often are used interchangeably.)
There are many, many code points, but computers work with bytes, and a byte has room for only 256 values. Unicode has many more characters than that, so you need a method to make these accessible.
Unicode is encoded using several competing encodings, of which UTF-8 is the most used. In a Unicode encoding, multiple subsequent bytes can be used to store a single code point, or simply: character.
UTF-8 is a Unicode encoding. Many people think that Unicode and UTF-8 are the same thing, but they're not. There are more Unicode encodings, but much of the world has standardized on UTF-8.
UTF-8 treats the first 128 codepoints, 0..127, the same as ASCII. They take only one byte per character. All other characters are encoded as two or more (up to six) bytes using a complex scheme. Fortunately, Perl handles this for us, so we don't have to worry about this.
Text strings, or character strings are made of characters. Bytes are irrelevant here, and so are encodings. Each character is just that: the character.
On a text string, you would do things like:
The value of a character (ord, chr) is the corresponding Unicode code
point.
Binary strings, or byte strings are made of bytes. Here, you don't have characters, just bytes. All communication with the outside world (anything outside of your current Perl process) is done in binary.
On a binary string, you would do things like:
Encoding (as a verb) is the conversion from text to binary. To encode,
you have to supply the target encoding, for example iso-8859-1
or UTF-8
.
Some encodings, like the iso-8859
("latin") range, do not support the full
Unicode standard; characters that can't be represented are lost in the
conversion.
Decoding is the conversion from binary to text. To decode, you have to know what encoding was used during the encoding phase. And most of all, it must be something decodable. It doesn't make much sense to decode a PNG image into a text string.
Perl has an internal format, an encoding that it uses to encode text strings so it can store them in memory. All text strings are in this internal format. In fact, text strings are never in any other format!
You shouldn't worry about what this format is, because conversion is automatically done when you decode or encode.
Add to your standard heading the following line:
- use Encode qw(encode decode);
Or, if you're lazy, just:
- use Encode;
The typical input/output flow of a program is:
- 1. Receive and decode
- 2. Process
- 3. Encode and output
If your input is binary, and is supposed to remain binary, you shouldn't decode it to a text string, of course. But in all other cases, you should decode it.
Decoding can't happen reliably if you don't know how the data was encoded. If you get to choose, it's a good idea to standardize on UTF-8.
- my $foo = decode('UTF-8', get 'http://example.com/');
- my $bar = decode('ISO-8859-1', readline STDIN);
- my $xyzzy = decode('Windows-1251', $cgi->param('foo'));
Processing happens as you knew before. The only difference is that you're now
using characters instead of bytes. That's very useful if you use things like
substr, or length.
It's important to realize that there are no bytes in a text string. Of course, Perl has its internal encoding to store the string in memory, but ignore that. If you have to do anything with the number of bytes, it's probably best to move that part to step 3, just after you've encoded the string. Then you know exactly how many bytes it will be in the destination string.
The syntax for encoding text strings to binary strings is as simple as decoding:
- $body = encode('UTF-8', $body);
If you needed to know the length of the string in bytes, now's the perfect time
for that. Because $body
is now a byte string, length will report the
number of bytes, instead of the number of characters. The number of
characters is no longer known, because characters only exist in text strings.
And if the protocol you're using supports a way of letting the recipient know
which character encoding you used, please help the receiving end by using that
feature! For example, E-mail and HTTP support MIME headers, so you can use the
Content-Type
header. They can also have Content-Length
to indicate the
number of bytes, which is always a good idea to supply if the number is
known.
- "Content-Type: text/plain; charset=UTF-8",
- "Content-Length: $byte_count"
Decode everything you receive, encode everything you send out. (If it's text data.)
After reading this document, you ought to read perlunifaq too.
Thanks to Johan Vromans from Squirrel Consultancy. His UTF-8 rants during the Amsterdam Perl Mongers meetings got me interested and determined to find out how to use character encodings in Perl in ways that don't break easily.
Thanks to Gerard Goossen from TTY. His presentation "UTF-8 in the wild" (Dutch Perl Workshop 2006) inspired me to publish my thoughts and write this tutorial.
Thanks to the people who asked about this kind of stuff in several Perl IRC channels, and have constantly reminded me that a simpler explanation was needed.
Thanks to the people who reviewed this document for me, before it went public. They are: Benjamin Smith, Jan-Pieter Cornet, Johan Vromans, Lukas Mai, Nathan Gray.
Juerd Waalboer <#####@juerd.nl>
perlutil - utilities packaged with the Perl distribution
Along with the Perl interpreter itself, the Perl distribution installs a range of utilities on your system. There are also several utilities which are used by the Perl distribution itself as part of the install process. This document exists to list all of these utilities, explain what they are for and provide pointers to each module's documentation, if appropriate.
The main interface to Perl's documentation is perldoc
, although
if you're reading this, it's more than likely that you've already found
it. perldoc will extract and format the documentation from any file
in the current directory, any Perl module installed on the system, or
any of the standard documentation pages, such as this one. Use
perldoc <name>
to get information on any of the utilities
described in this document.
If it's run from a terminal, perldoc will usually call pod2man to translate POD (Plain Old Documentation - see perlpod for an explanation) into a manpage, and then run man to display it; if man isn't available, pod2text will be used instead and the output piped through your favourite pager.
As well as these two, there are two other converters: pod2html will produce HTML pages from POD, and pod2latex, which produces LaTeX files.
If you just want to know how to use the utilities described here,
pod2usage will just extract the "USAGE" section; some of
the utilities will automatically call pod2usage on themselves when
you call them with -help
.
pod2usage is a special case of podselect, a utility to extract
named sections from documents written in POD. For instance, while
utilities have "USAGE" sections, Perl modules usually have "SYNOPSIS"
sections: podselect -s "SYNOPSIS" ...
will extract this section for
a given file.
If you're writing your own documentation in POD, the podchecker utility will look for errors in your markup.
splain is an interface to perldiag - paste in your error message to it, and it'll explain it for you.
roffitall
The roffitall
utility is not installed on your system but lives in
the pod/ directory of your Perl source kit; it converts all the
documentation from the distribution to *roff format, and produces a
typeset PostScript or text file of the whole lot.
To help you convert legacy programs to Perl, we've included three conversion filters:
a2p converts awk scripts to Perl programs; for example, a2p -F:
on the simple awk script {print $2}
will produce a Perl program
based around this code:
Similarly, s2p converts sed scripts to Perl programs. s2p run
on s/foo/bar will produce a Perl program based around this:
When invoked as psed, it behaves as a sed implementation, written in Perl.
Finally, find2perl translates find
commands to Perl equivalents which
use the File::Find module. As an example,
find2perl . -user root -perm 4000 -print
produces the following callback
subroutine for File::Find
:
- sub wanted {
- my ($dev,$ino,$mode,$nlink,$uid,$gid);
- (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) &&
- $uid == $uid{'root'}) &&
- (($mode & 0777) == 04000);
- print("$name\n");
- }
As well as these filters for converting other languages, the pl2pm utility will help you convert old-style Perl 4 libraries to new-style Perl5 modules.
Query or change configuration of Perl modules that use Module::Build-based configuration files for features and config data.
To display and change the libnet configuration run the libnetcfg command.
The perlivp program is set up at Perl source code build time to test
the Perl version it was built under. It can be used after running make
install
(or your platform's equivalent procedure) to verify that perl
and its libraries have been installed correctly.
There are a set of utilities which help you in developing Perl programs, and in particular, extending Perl with C.
perlbug is the recommended way to report bugs in the perl interpreter itself or any of the standard library modules back to the developers; please read through the documentation for perlbug thoroughly before using it to submit a bug report.
This program provides an easy way to send a thank-you message back to the authors and maintainers of perl. It's just perlbug installed under another name.
Back before Perl had the XS system for connecting with C libraries,
programmers used to get library constants by reading through the C
header files. You may still see require 'syscall.ph'
or similar
around - the .ph file should be created by running h2ph on the
corresponding .h file. See the h2ph documentation for more on how
to convert a whole bunch of header files at once.
c2ph and pstruct, which are actually the same program but behave differently depending on how they are called, provide another way of getting at C with Perl - they'll convert C structures and union declarations to Perl code. This is deprecated in favour of h2xs these days.
h2xs converts C header files into XS modules, and will try and write as much glue between C libraries and Perl modules as it can. It's also very useful for creating skeletons of pure Perl modules.
enc2xs builds a Perl extension for use by Encode from either Unicode Character Mapping files (.ucm) or Tcl Encoding Files (.enc). Besides being used internally during the build process of the Encode module, you can use enc2xs to add your own encoding to perl. No knowledge of XS is necessary.
xsubpp is a compiler to convert Perl XS code into C code. It is typically run by the makefiles created by ExtUtils::MakeMaker.
xsubpp will compile XS code into C code by embedding the constructs necessary to let C functions manipulate Perl values and creates the glue necessary to let Perl access those functions.
prove is a command-line interface to the test-running functionality
of Test::Harness. It's an alternative to make test
.
A command-line front-end to Module::CoreList
, to query what modules
were shipped with given versions of perl.
A few general-purpose tools are shipped with perl, mostly because they came along modules included in the perl distribution.
piconv is a Perl version of iconv, a character encoding converter widely available for various Unixen today. This script was primarily a technology demonstrator for Perl v5.8.0, but you can use piconv in the place of iconv for virtually any case.
ptar is a tar-like program, written in pure Perl.
ptardiff is a small utility that produces a diff between an extracted
archive and an unextracted one. (Note that this utility requires the
Text::Diff
module to function properly; this module isn't distributed
with perl, but is available from the CPAN.)
ptargrep is a utility to apply pattern matching to the contents of files in a tar archive.
This utility, that comes with the Digest::SHA
module, is used to print
or verify SHA checksums.
zipdetails displays information about the internal record structure of the zip file. It is not concerned with displaying any details of the compressed data stored in the zip file.
These utilities help manage extra Perl modules that don't come with the perl distribution.
cpan is a command-line interface to CPAN.pm. It allows you to install modules or distributions from CPAN, or just get information about them, and a lot more. It is similar to the command line mode of the CPAN module,
- perl -MCPAN -e shell
cpanp is, like cpan, a command-line interface to the CPAN, using
the CPANPLUS
module as a back-end. It can be used interactively or
imperatively.
cpan2dist is a tool to create distributions (or packages) from CPAN
modules, then suitable for your package manager of choice. Support for
specific formats are available from CPAN as CPANPLUS::Dist::*
modules.
A little interface to ExtUtils::Installed to examine installed modules, validate your packlists and even create a tarball from an installed module.
perldoc, pod2man, perlpod,
pod2html, pod2usage, podselect,
podchecker, splain, perldiag,
roffitall|roffitall
, a2p, s2p, find2perl,
File::Find, pl2pm, perlbug,
h2ph, c2ph, h2xs, enc2xs, xsubpp,
cpan, cpanp, cpan2dist, instmodsh, piconv, prove,
corelist, ptar, ptardiff, shasum, zipdetails
perlvar - Perl predefined variables
Variable names in Perl can have several formats. Usually, they
must begin with a letter or underscore, in which case they can be
arbitrarily long (up to an internal limit of 251 characters) and
may contain letters, digits, underscores, or the special sequence
::
or '. In this case, the part before the last ::
or
' is taken to be a package qualifier; see perlmod.
Perl variable names may also be a sequence of digits or a single
punctuation or control character. These names are all reserved for
special uses by Perl; for example, the all-digits names are used
to hold data captured by backreferences after a regular expression
match. Perl has a special syntax for the single-control-character
names: It understands ^X (caret X
) to mean the control-X
character. For example, the notation $^W
(dollar-sign caret
W
) is the scalar variable whose name is the single character
control-W
. This is better than typing a literal control-W
into your program.
Since Perl v5.6.0, Perl variable names may be alphanumeric
strings that begin with control characters (or better yet, a caret).
These variables must be written in the form ${^Foo}
; the braces
are not optional. ${^Foo}
denotes the scalar variable whose
name is a control-F
followed by two o
's. These variables are
reserved for future special uses by Perl, except for the ones that
begin with ^_ (control-underscore or caret-underscore). No
control-character name that begins with ^_ will acquire a special
meaning in any future version of Perl; such names may therefore be
used safely in programs. $^_
itself, however, is reserved.
Perl identifiers that begin with digits, control characters, or
punctuation characters are exempt from the effects of the package
declaration and are always forced to be in package main
; they are
also exempt from strict 'vars'
errors. A few other names are also
exempt in these ways:
- ENV STDIN
- INC STDOUT
- ARGV STDERR
- ARGVOUT
- SIG
In particular, the special ${^_XYZ}
variables are always taken
to be in package main
, regardless of any package declarations
presently in scope.
The following names have special meaning to Perl. Most punctuation names have reasonable mnemonics, or analogs in the shells. Nevertheless, if you wish to use long variable names, you need only say:
- use English;
at the top of your program. This aliases all the short names to the long
names in the current package. Some even have medium names, generally
borrowed from awk. To avoid a performance hit, if you don't need the
$PREMATCH
, $MATCH
, or $POSTMATCH
it's best to use the English
module without them:
- use English '-no_match_vars';
Before you continue, note the sort order for variables. In general, we
first list the variables in case-insensitive, almost-lexigraphical
order (ignoring the { or ^ preceding words, as in ${^UNICODE}
or $^T
), although $_
and @_
move up to the top of the pile.
For variables with the same identifier, we list it in order of scalar,
array, hash, and bareword.
The default input and pattern-searching space. The following pairs are equivalent:
Here are the places where Perl will assume $_
even if you don't use it:
The following functions use $_
as a default argument:
abs, alarm, chomp, chop, chr, chroot, cos, defined, eval, evalbytes, exp, fc, glob, hex, int, lc, lcfirst, length, log, lstat, mkdir, oct, ord, pos, print, printf, quotemeta, readlink, readpipe, ref, require, reverse (in scalar context only), rmdir, say, sin, split (for its second argument), sqrt, stat, study, uc, ucfirst, unlink, unpack.
All file tests (-f
, -d
) except for -t
, which defaults to STDIN.
See -X
The pattern matching operations m//, s/// and tr/// (aka y///)
when used without an =~
operator.
The default iterator variable in a foreach
loop if no other
variable is supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given()
.
The default place to put the next value or input record
when a <FH>
, readline, readdir or each
operation's result is tested by itself as the sole criterion of a while
test. Outside a while
test, this will not happen.
$_
is by default a global variable. However, as
of perl v5.10.0, you can use a lexical version of
$_
by declaring it in a file or in a block with my. Moreover,
declaring our $_
restores the global $_
in the current scope. Though
this seemed like a good idea at the time it was introduced, lexical $_
actually causes more problems than it solves. If you call a function that
expects to be passed information via $_
, it may or may not work,
depending on how the function is written, there not being any easy way to
solve this. Just avoid lexical $_
, unless you are feeling particularly
masochistic. For this reason lexical $_
is still experimental and will
produce a warning unless warnings have been disabled. As with other
experimental features, the behavior of lexical $_
is subject to change
without notice, including change into a fatal error.
Mnemonic: underline is understood in certain operations.
Within a subroutine the array @_
contains the parameters passed to
that subroutine. Inside a subroutine, @_
is the default array for
the array operators push, pop, shift, and unshift.
See perlsub.
When an array or an array slice is interpolated into a double-quoted
string or a similar context such as /.../
, its elements are
separated by this value. Default is a space. For example, this:
- print "The array is: @array\n";
is equivalent to this:
Mnemonic: works in double-quoted context.
The process number of the Perl running this script. Though you can set
this variable, doing so is generally discouraged, although it can be
invaluable for some testing purposes. It will be reset automatically
across fork() calls.
Note for Linux and Debian GNU/kFreeBSD users: Before Perl v5.16.0 perl would emulate POSIX semantics on Linux systems using LinuxThreads, a partial implementation of POSIX Threads that has since been superseded by the Native POSIX Thread Library (NPTL).
LinuxThreads is now obsolete on Linux, and caching getpid()
like this made embedding perl unnecessarily complex (since you'd have
to manually update the value of $$), so now $$
and getppid()
will always return the same values as the underlying C library.
Debian GNU/kFreeBSD systems also used LinuxThreads up until and including the 6.0 release, but after that moved to FreeBSD thread semantics, which are POSIX-like.
To see if your system is affected by this discrepancy check if
getconf GNU_LIBPTHREAD_VERSION | grep -q NPTL
returns a false
value. NTPL threads preserve the POSIX semantics.
Mnemonic: same as shells.
Contains the name of the program being executed.
On some (but not all) operating systems assigning to $0
modifies
the argument area that the ps
program sees. On some platforms you
may have to use special ps
options or a different ps
to see the
changes. Modifying the $0
is more useful as a way of indicating the
current program state than it is for hiding the program you're
running.
Note that there are platform-specific limitations on the maximum
length of $0
. In the most extreme case it may be limited to the
space occupied by the original $0
.
In some platforms there may be arbitrary amount of padding, for
example space characters, after the modified name as shown by ps
.
In some platforms this padding may extend all the way to the original
length of the argument area, no matter what you do (this is the case
for example with Linux 2.2).
Note for BSD users: setting $0
does not completely remove "perl"
from the ps(1) output. For example, setting $0
to "foobar"
may
result in "perl: foobar (perl)"
(whether both the "perl: "
prefix
and the " (perl)" suffix are shown depends on your exact BSD variant
and version). This is an operating system feature, Perl cannot help it.
In multithreaded scripts Perl coordinates the threads so that any
thread may modify its copy of the $0
and the change becomes visible
to ps(1) (assuming the operating system plays along). Note that
the view of $0
the other threads have will not change since they
have their own copies of it.
If the program has been given to perl via the switches -e
or -E
,
$0
will contain the string "-e"
.
On Linux as of perl v5.14.0 the legacy process name will be set with
prctl(2)
, in addition to altering the POSIX name via argv[0]
as
perl has done since version 4.000. Now system utilities that read the
legacy process name such as ps, top and killall will recognize the
name you set when assigning to $0
. The string you supply will be
cut off at 16 bytes, this is a limitation imposed by Linux.
Mnemonic: same as sh and ksh.
The real gid of this process. If you are on a machine that supports
membership in multiple groups simultaneously, gives a space separated
list of groups you are in. The first number is the one returned by
getgid()
, and the subsequent ones by getgroups()
, one of which may be
the same as the first number.
However, a value assigned to $(
must be a single number used to
set the real gid. So the value given by $(
should not be assigned
back to $(
without being forced numeric, such as by adding zero. Note
that this is different to the effective gid ($)
) which does take a
list.
You can change both the real gid and the effective gid at the same
time by using POSIX::setgid()
. Changes
to $(
require a check to $!
to detect any possible errors after an attempted change.
Mnemonic: parentheses are used to group things. The real gid is the group you left, if you're running setgid.
The effective gid of this process. If you are on a machine that
supports membership in multiple groups simultaneously, gives a space
separated list of groups you are in. The first number is the one
returned by getegid()
, and the subsequent ones by getgroups()
,
one of which may be the same as the first number.
Similarly, a value assigned to $)
must also be a space-separated
list of numbers. The first number sets the effective gid, and
the rest (if any) are passed to setgroups()
. To get the effect of an
empty list for setgroups()
, just repeat the new effective gid; that is,
to force an effective gid of 5 and an effectively empty setgroups()
list, say $) = "5 5"
.
You can change both the effective gid and the real gid at the same
time by using POSIX::setgid()
(use only a single numeric argument).
Changes to $)
require a check to $!
to detect any possible errors
after an attempted change.
$<
, $>
, $(
and $)
can be set only on
machines that support the corresponding set[re][ug]id() routine. $(
and $)
can be swapped only on machines supporting setregid()
.
Mnemonic: parentheses are used to group things. The effective gid is the group that's right for you, if you're running setgid.
The real uid of this process. You can change both the real uid and the
effective uid at the same time by using POSIX::setuid()
. Since
changes to $<
require a system call, check $!
after a change
attempt to detect any possible errors.
Mnemonic: it's the uid you came from, if you're running setuid.
The effective uid of this process. For example:
- $< = $>; # set real to effective uid
- ($<,$>) = ($>,$<); # swap real and effective uids
You can change both the effective uid and the real uid at the same
time by using POSIX::setuid()
. Changes to $>
require a check
to $!
to detect any possible errors after an attempted change.
$<
and $>
can be swapped only on machines
supporting setreuid()
.
Mnemonic: it's the uid you went to, if you're running setuid.
The subscript separator for multidimensional array emulation. If you refer to a hash element as
- $foo{$a,$b,$c}
it really means
- $foo{join($;, $a, $b, $c)}
But don't put
- @foo{$a,$b,$c} # a slice--note the @
which means
- ($foo{$a},$foo{$b},$foo{$c})
Default is "\034", the same as SUBSEP in awk. If your keys contain
binary data there might not be any safe value for $;
.
Consider using "real" multidimensional arrays as described in perllol.
Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
Special package variables when using sort(), see sort.
Because of this specialness $a
and $b
don't need to be declared
(using use vars
, or our()) even when using the strict 'vars'
pragma. Don't lexicalize them with my $a
or my $b
if you want to
be able to use them in the sort() comparison block or function.
The hash %ENV
contains your current environment. Setting a
value in ENV
changes the environment for any child processes
you subsequently fork() off.
The maximum system file descriptor, ordinarily 2. System file
descriptors are passed to exec()ed processes, while higher file
descriptors are not. Also, during an
open(), system file descriptors are
preserved even if the open() fails (ordinary file descriptors are
closed before the open() is attempted). The close-on-exec
status of a file descriptor will be decided according to the value of
$^F
when the corresponding file, pipe, or socket was opened, not the
time of the exec().
The array @F
contains the fields of each line read in when autosplit
mode is turned on. See perlrun for the -a switch. This array
is package-specific, and must be declared or given a full package name
if not in package main when running under strict 'vars'
.
The array @INC
contains the list of places that the do EXPR
,
require, or use constructs look for their library files. It
initially consists of the arguments to any -I command-line
switches, followed by the default Perl library, probably
/usr/local/lib/perl, followed by ".", to represent the current
directory. ("." will not be appended if taint checks are enabled,
either by -T
or by -t
.) If you need to modify this at runtime,
you should use the use lib
pragma to get the machine-dependent
library properly loaded also:
You can also insert hooks into the file inclusion system by putting Perl
code directly into @INC
. Those hooks may be subroutine references,
array references or blessed objects. See require for details.
The hash %INC
contains entries for each filename included via the
do, require, or use operators. The key is the filename
you specified (with module names converted to pathnames), and the
value is the location of the file found. The require
operator uses this hash to determine whether a particular file has
already been included.
If the file was loaded via a hook (e.g. a subroutine reference, see
require for a description of these hooks), this hook is
by default inserted into %INC
in place of a filename. Note, however,
that the hook may have set the %INC
entry by itself to provide some more
specific info.
The current value of the inplace-edit extension. Use undef to disable
inplace editing.
Mnemonic: value of -i switch.
By default, running out of memory is an untrappable, fatal error.
However, if suitably built, Perl can use the contents of $^M
as an emergency memory pool after die()ing. Suppose that your Perl
were compiled with -DPERL_EMERGENCY_SBRK
and used Perl's malloc.
Then
- $^M = 'a' x (1 << 16);
would allocate a 64K buffer for use in an emergency. See the INSTALL file in the Perl distribution for information on how to add custom C compilation flags when compiling perl. To discourage casual use of this advanced feature, there is no English long name for this variable.
This variable was added in Perl 5.004.
The name of the operating system under which this copy of Perl was built, as determined during the configuration process. For examples see PLATFORMS in perlport.
The value is identical to $Config{'osname'}
. See also Config
and the -V command-line switch documented in perlrun.
In Windows platforms, $^O
is not very helpful: since it is always
MSWin32
, it doesn't tell the difference between
95/98/ME/NT/2000/XP/CE/.NET. Use Win32::GetOSName()
or
Win32::GetOSVersion() (see Win32 and perlport) to distinguish
between the variants.
This variable was added in Perl 5.003.
The hash %SIG
contains signal handlers for signals. For example:
Using a value of 'IGNORE'
usually has the effect of ignoring the
signal, except for the CHLD
signal. See perlipc for more about
this special case.
Here are some other examples:
- $SIG{"PIPE"} = "Plumber"; # assumes main::Plumber (not
- # recommended)
- $SIG{"PIPE"} = \&Plumber; # just fine; assume current
- # Plumber
- $SIG{"PIPE"} = *Plumber; # somewhat esoteric
- $SIG{"PIPE"} = Plumber(); # oops, what did Plumber()
- # return??
Be sure not to use a bareword as the name of a signal handler, lest you inadvertently call it.
If your system has the sigaction()
function then signal handlers
are installed using it. This means you get reliable signal handling.
The default delivery policy of signals changed in Perl v5.8.0 from immediate (also known as "unsafe") to deferred, also known as "safe signals". See perlipc for more information.
Certain internal hooks can be also set using the %SIG
hash. The
routine indicated by $SIG{__WARN__}
is called when a warning
message is about to be printed. The warning message is passed as the
first argument. The presence of a __WARN__
hook causes the
ordinary printing of warnings to STDERR
to be suppressed. You can
use this to save warnings in a variable, or turn warnings into fatal
errors, like this:
As the 'IGNORE'
hook is not supported by __WARN__
, you can
disable warnings using the empty subroutine:
The routine indicated by $SIG{__DIE__}
is called when a fatal
exception is about to be thrown. The error message is passed as the
first argument. When a __DIE__
hook routine returns, the exception
processing continues as it would have in the absence of the hook,
unless the hook routine itself exits via a goto &sub
, a loop exit,
or a die(). The __DIE__
handler is explicitly disabled during
the call, so that you can die from a __DIE__
handler. Similarly
for __WARN__
.
Due to an implementation glitch, the $SIG{__DIE__}
hook is called
even inside an eval(). Do not use this to rewrite a pending
exception in $@
, or as a bizarre substitute for overriding
CORE::GLOBAL::die()
. This strange action at a distance may be fixed
in a future release so that $SIG{__DIE__}
is only called if your
program is about to exit, as was the original intent. Any other use is
deprecated.
__DIE__
/__WARN__
handlers are very special in one respect: they
may be called to report (probable) errors found by the parser. In such
a case the parser may be in inconsistent state, so any attempt to
evaluate Perl code from such a handler will probably result in a
segfault. This means that warnings or errors that result from parsing
Perl should be used with extreme caution, like this:
Here the first line will load Carp
unless it is the parser who
called the handler. The second line will print backtrace and die if
Carp
was available. The third line will be executed only if Carp
was
not available.
Having to even think about the $^S
variable in your exception
handlers is simply wrong. $SIG{__DIE__}
as currently implemented
invites grievous and difficult to track down errors. Avoid it
and use an END{}
or CORE::GLOBAL::die override instead.
See die, warn, eval, and warnings for additional information.
The time at which the program began running, in seconds since the epoch (beginning of 1970). The values returned by the -M, -A, and -C filetests are based on this value.
The revision, version, and subversion of the Perl interpreter,
represented as a version
object.
This variable first appeared in perl v5.6.0; earlier versions of perl
will see an undefined value. Before perl v5.10.0 $^V
was represented
as a v-string.
$^V
can be used to determine whether the Perl interpreter executing
a script is in the right range of versions. For example:
- warn "Hashes not randomized!\n" if !$^V or $^V lt v5.8.1
To convert $^V
into its string representation use sprintf()'s
"%vd"
conversion:
- printf "version is v%vd\n", $^V; # Perl's version
See the documentation of use VERSION
and require VERSION
for a convenient way to fail if the running Perl interpreter is too old.
See also $]
for an older representation of the Perl version.
This variable was added in Perl v5.6.0.
Mnemonic: use ^V for Version Control.
If this variable is set to a true value, then stat() on Windows will
not try to open the file. This means that the link count cannot be
determined and file attributes may be out of date if additional
hardlinks to the file exist. On the other hand, not opening the file
is considerably faster, especially for files on network drives.
This variable could be set in the sitecustomize.pl file to
configure the local Perl installation to use "sloppy" stat() by
default. See the documentation for -f in
perlrun for more information about site
customization.
This variable was added in Perl v5.10.0.
The name used to execute the current copy of Perl, from C's
argv[0]
or (where supported) /proc/self/exe.
Depending on the host operating system, the value of $^X
may be
a relative or absolute pathname of the perl program file, or may
be the string used to invoke perl but not the pathname of the
perl program file. Also, most operating systems permit invoking
programs that are not in the PATH environment variable, so there
is no guarantee that the value of $^X
is in PATH. For VMS, the
value may or may not include a version number.
You usually can use the value of $^X
to re-invoke an independent
copy of the same perl that is currently running, e.g.,
- @first_run = `$^X -le "print int rand 100 for 1..100"`;
But recall that not all operating systems support forking or capturing of the output of commands, so this complex statement may not be portable.
It is not safe to use the value of $^X
as a path name of a file,
as some operating systems that have a mandatory suffix on
executable files do not require use of the suffix when invoking
a command. To convert the value of $^X
to a path name, use the
following statements:
Because many operating systems permit anyone with read access to
the Perl program file to make a copy of it, patch the copy, and
then execute the copy, the security-conscious Perl programmer
should take care to invoke the installed copy of perl, not the
copy referenced by $^X
. The following statements accomplish
this goal, and produce a pathname that can be invoked as a
command or referenced as a file.
Most of the special variables related to regular expressions are side effects. Perl sets these variables when it has a successful match, so you should check the match result before using them. For instance:
These variables are read-only and dynamically-scoped, unless we note otherwise.
The dynamic nature of the regular expression variables means that their value is limited to the block that they are in, as demonstrated by this bit of code:
The output shows that while in the OUTER
block, the values of $1
and $2
are from the match against $outer
. Inside the INNER
block, the values of $1
and $2
are from the match against
$inner
, but only until the end of the block (i.e. the dynamic
scope). After the INNER
block completes, the values of $1
and
$2
return to the values for the match against $outer
even though
we have not made another match:
- $1 is Wallace; $2 is Grommit
- $1 is Mutt; $2 is Jeff
- $1 is Wallace; $2 is Grommit
Due to an unfortunate accident of Perl's implementation, use
English
imposes a considerable performance penalty on all regular
expression matches in a program because it uses the $`
, $&
, and
$'
, regardless of whether they occur in the scope of use
English
. For that reason, saying use English
in libraries is
strongly discouraged unless you import it without the match variables:
- use English '-no_match_vars'
The Devel::NYTProf
and Devel::FindAmpersand
modules can help you find uses of these
problematic match variables in your code.
Since Perl v5.10.0, you can use the /p match operator flag and the
${^PREMATCH}
, ${^MATCH}
, and ${^POSTMATCH}
variables instead
so you only suffer the performance penalties.
Contains the subpattern from the corresponding set of capturing parentheses from the last successful pattern match, not counting patterns matched in nested blocks that have been exited already.
These variables are read-only and dynamically-scoped.
Mnemonic: like \digits.
The string matched by the last successful pattern match (not counting
any matches hidden within a BLOCK or eval() enclosed by the current
BLOCK).
The use of this variable anywhere in a program imposes a considerable
performance penalty on all regular expression matches. To avoid this
penalty, you can extract the same substring by using @-. Starting
with Perl v5.10.0, you can use the /p match flag and the ${^MATCH}
variable to do the same thing for particular match operations.
This variable is read-only and dynamically-scoped.
Mnemonic: like &
in some editors.
This is similar to $&
($MATCH
) except that it does not incur the
performance penalty associated with that variable, and is only guaranteed
to return a defined value when the pattern was compiled or executed with
the /p modifier.
This variable was added in Perl v5.10.0.
This variable is read-only and dynamically-scoped.
The string preceding whatever was matched by the last successful
pattern match, not counting any matches hidden within a BLOCK or eval
enclosed by the current BLOCK.
The use of this variable anywhere in a program imposes a considerable
performance penalty on all regular expression matches. To avoid this
penalty, you can extract the same substring by using @-. Starting
with Perl v5.10.0, you can use the /p match flag and the
${^PREMATCH}
variable to do the same thing for particular match
operations.
This variable is read-only and dynamically-scoped.
Mnemonic: ` often precedes a quoted string.
This is similar to $`
($PREMATCH) except that it does not incur the
performance penalty associated with that variable, and is only guaranteed
to return a defined value when the pattern was compiled or executed with
the /p modifier.
This variable was added in Perl v5.10.0
This variable is read-only and dynamically-scoped.
The string following whatever was matched by the last successful
pattern match (not counting any matches hidden within a BLOCK or eval()
enclosed by the current BLOCK). Example:
The use of this variable anywhere in a program imposes a considerable
performance penalty on all regular expression matches.
To avoid this penalty, you can extract the same substring by
using @-. Starting with Perl v5.10.0, you can use the /p match flag
and the ${^POSTMATCH}
variable to do the same thing for particular
match operations.
This variable is read-only and dynamically-scoped.
Mnemonic: ' often follows a quoted string.
This is similar to $'
($POSTMATCH
) except that it does not incur the
performance penalty associated with that variable, and is only guaranteed
to return a defined value when the pattern was compiled or executed with
the /p modifier.
This variable was added in Perl v5.10.0.
This variable is read-only and dynamically-scoped.
The text matched by the last bracket of the last successful search pattern. This is useful if you don't know which one of a set of alternative patterns matched. For example:
- /Version: (.*)|Revision: (.*)/ && ($rev = $+);
This variable is read-only and dynamically-scoped.
Mnemonic: be positive and forward looking.
The text matched by the used group most-recently closed (i.e. the group with the rightmost closing parenthesis) of the last successful search pattern.
This is primarily used inside (?{...}) blocks for examining text
recently matched. For example, to effectively capture text to a variable
(in addition to $1
, $2
, etc.), replace (...)
with
- (?:(...)(?{ $var = $^N }))
By setting and then using $var
in this way relieves you from having to
worry about exactly which numbered set of parentheses they are.
This variable was added in Perl v5.8.0.
Mnemonic: the (possibly) Nested parenthesis that most recently closed.
This array holds the offsets of the ends of the last successful
submatches in the currently active dynamic scope. $+[0]
is
the offset into the string of the end of the entire match. This
is the same value as what the pos function returns when called
on the variable that was matched against. The nth element
of this array holds the offset of the nth submatch, so
$+[1]
is the offset past where $1
ends, $+[2]
the offset
past where $2
ends, and so on. You can use $#+
to determine
how many subgroups were in the last successful match. See the
examples given for the @-
variable.
This variable was added in Perl v5.6.0.
Similar to @+
, the %+
hash allows access to the named capture
buffers, should they exist, in the last successful match in the
currently active dynamic scope.
For example, $+{foo}
is equivalent to $1
after the following match:
- 'foo' =~ /(?<foo>foo)/;
The keys of the %+
hash list only the names of buffers that have
captured (and that are thus associated to defined values).
The underlying behaviour of %+
is provided by the
Tie::Hash::NamedCapture module.
Note: %-
and %+
are tied views into a common internal hash
associated with the last successful regular expression. Therefore mixing
iterative access to them via each may have unpredictable results.
Likewise, if the last successful match changes, then the results may be
surprising.
This variable was added in Perl v5.10.0.
This variable is read-only and dynamically-scoped.
$-[0]
is the offset of the start of the last successful match.
$-[n] is the offset of the start of the substring matched by
n-th subpattern, or undef if the subpattern did not match.
Thus, after a match against $_
, $&
coincides with substr $_, $-[0],
$+[0] - $-[0]
. Similarly, $n coincides with substr $_, $-[n],
$+[n] - $-[n]
if $-[n]
is defined, and $+ coincides with
substr $_, $-[$#-], $+[$#-] - $-[$#-]
. One can use $#-
to find the
last matched subgroup in the last successful match. Contrast with
$#+
, the number of subgroups in the regular expression. Compare
with @+
.
This array holds the offsets of the beginnings of the last
successful submatches in the currently active dynamic scope.
$-[0]
is the offset into the string of the beginning of the
entire match. The nth element of this array holds the offset
of the nth submatch, so $-[1]
is the offset where $1
begins, $-[2]
the offset where $2
begins, and so on.
After a match against some variable $var
:
$`
is the same as substr($var, 0, $-[0])
$&
is the same as substr($var, $-[0], $+[0] - $-[0])
$'
is the same as substr($var, $+[0])
$1
is the same as substr($var, $-[1], $+[1] - $-[1])
$2
is the same as substr($var, $-[2], $+[2] - $-[2])
$3
is the same as substr($var, $-[3], $+[3] - $-[3])
This variable was added in Perl v5.6.0.
Similar to %+
, this variable allows access to the named capture groups
in the last successful match in the currently active dynamic scope. To
each capture group name found in the regular expression, it associates a
reference to an array containing the list of values captured by all
buffers with that name (should there be several of them), in the order
where they appear.
Here's an example:
would print out:
- $-{A}[0] : '1'
- $-{A}[1] : '3'
- $-{B}[0] : '2'
- $-{B}[1] : '4'
The keys of the %-
hash correspond to all buffer names found in
the regular expression.
The behaviour of %-
is implemented via the
Tie::Hash::NamedCapture module.
Note: %-
and %+
are tied views into a common internal hash
associated with the last successful regular expression. Therefore mixing
iterative access to them via each may have unpredictable results.
Likewise, if the last successful match changes, then the results may be
surprising.
This variable was added in Perl v5.10.0.
This variable is read-only and dynamically-scoped.
The result of evaluation of the last successful (?{ code })
regular expression assertion (see perlre). May be written to.
This variable was added in Perl 5.005.
The current value of the regex debugging flags. Set to 0 for no debug output
even when the re 'debug'
module is loaded. See re for details.
This variable was added in Perl v5.10.0.
Controls how certain regex optimisations are applied and how much memory they utilize. This value by default is 65536 which corresponds to a 512kB temporary cache. Set this to a higher value to trade memory for speed when matching large alternations. Set it to a lower value if you want the optimisations to be as conservative of memory as possible but still occur, and set it to a negative value to prevent the optimisation and conserve the most memory. Under normal situations this variable should be of no interest to you.
This variable was added in Perl v5.10.0.
Variables that depend on the currently selected filehandle may be set
by calling an appropriate object method on the IO::Handle
object,
although this is less efficient than using the regular built-in
variables. (Summary lines below for this contain the word HANDLE.)
First you must say
- use IO::Handle;
after which you may use either
- method HANDLE EXPR
or more safely,
- HANDLE->method(EXPR)
Each method returns the old value of the IO::Handle
attribute. The
methods each take an optional EXPR, which, if supplied, specifies the
new value for the IO::Handle
attribute in question. If not
supplied, most methods do nothing to the current value--except for
autoflush()
, which will assume a 1 for you, just to be different.
Because loading in the IO::Handle
class is an expensive operation,
you should learn how to use the regular built-in variables.
A few of these variables are considered "read-only". This means that if you try to assign to this variable, either directly or indirectly through a reference, you'll raise a run-time exception.
You should be very careful when modifying the default values of most special variables described in this document. In most cases you want to localize these variables before changing them, since if you don't, the change may affect other modules which rely on the default values of the special variables that you have changed. This is one of the correct ways to read the whole file at once:
But the following code is quite bad:
since some other module, may want to read data from some file in the
default "line mode", so if the code we have just presented has been
executed, the global value of $/
is now changed for any other code
running inside the same Perl interpreter.
Usually when a variable is localized you want to make sure that this
change affects the shortest scope possible. So unless you are already
inside some short {}
block, you should create one yourself. For
example:
Here is an example of how your own code can go broken:
You probably expect this code to print the equivalent of
- "1\r\n2\r\n3\r\n"
but instead you get:
- "1\f2\f3\f"
Why? Because nasty_break()
modifies $\
without localizing it
first. The value you set in nasty_break()
is still there when you
return. The fix is to add local() so the value doesn't leak out of
nasty_break()
:
- local $\ = "\f";
It's easy to notice the problem in such a short example, but in more complicated code you are looking for trouble if you don't localize changes to the special variables.
Contains the name of the current file when reading from <>
.
The array @ARGV
contains the command-line arguments intended for
the script. $#ARGV
is generally the number of arguments minus
one, because $ARGV[0]
is the first argument, not the program's
command name itself. See $0 for the command name.
The special filehandle that iterates over command-line filenames in
@ARGV
. Usually written as the null filehandle in the angle operator
<>
. Note that currently ARGV
only has its magical effect
within the <>
operator; elsewhere it is just a plain filehandle
corresponding to the last file opened by <>
. In particular,
passing \*ARGV
as a parameter to a function that expects a filehandle
may not cause your function to automatically read the contents of all the
files in @ARGV
.
The special filehandle that points to the currently open output file
when doing edit-in-place processing with -i. Useful when you have
to do a lot of inserting and don't want to keep modifying $_
. See
perlrun for the -i switch.
The output field separator for the print operator. If defined, this
value is printed between each of print's arguments. Default is undef.
You cannot call output_field_separator()
on a handle, only as a
static method. See IO::Handle.
Mnemonic: what is printed when there is a "," in your print statement.
Current line number for the last filehandle accessed.
Each filehandle in Perl counts the number of lines that have been read
from it. (Depending on the value of $/
, Perl's idea of what
constitutes a line may not match yours.) When a line is read from a
filehandle (via readline() or <>
), or when tell() or
seek() is called on it, $.
becomes an alias to the line counter
for that filehandle.
You can adjust the counter by assigning to $.
, but this will not
actually move the seek pointer. Localizing $.
will not localize
the filehandle's line count. Instead, it will localize perl's notion
of which filehandle $.
is currently aliased to.
$.
is reset when the filehandle is closed, but not when an open
filehandle is reopened without an intervening close(). For more
details, see I/O Operators in perlop. Because <>
never does
an explicit close, line numbers increase across ARGV
files (but see
examples in eof).
You can also use HANDLE->input_line_number(EXPR)
to access the
line counter for a given filehandle without having to worry about
which handle you last accessed.
Mnemonic: many programs use "." to mean the current line number.
The input record separator, newline by default. This influences Perl's
idea of what a "line" is. Works like awk's RS variable, including
treating empty lines as a terminator if set to the null string (an
empty line cannot contain any spaces or tabs). You may set it to a
multi-character string to match a multi-character terminator, or to
undef to read through the end of file. Setting it to "\n\n"
means something slightly different than setting to ""
, if the file
contains consecutive empty lines. Setting to ""
will treat two or
more consecutive empty lines as a single empty line. Setting to
"\n\n"
will blindly assume that the next input character belongs to
the next paragraph, even if it's a newline.
Remember: the value of $/
is a string, not a regex. awk has to
be better for something. :-)
Setting $/
to a reference to an integer, scalar containing an
integer, or scalar that's convertible to an integer will attempt to
read records instead of lines, with the maximum record size being the
referenced integer number of characters. So this:
will read a record of no more than 32768 characters from $fh. If you're not reading from a record-oriented file (or your OS doesn't have record-oriented files), then you'll likely get a full chunk of data with every read. If a record is larger than the record size you've set, you'll get the record back in pieces. Trying to set the record size to zero or less will cause reading in the (rest of the) whole file.
On VMS only, record reads bypass PerlIO layers and any associated buffering, so you must not mix record and non-record reads on the same filehandle. Record mode mixes with line mode only when the same buffering layer is in use for both modes.
You cannot call input_record_separator()
on a handle, only as a
static method. See IO::Handle.
See also Newlines in perlport. Also see $..
Mnemonic: / delimits line boundaries when quoting poetry.
The output record separator for the print operator. If defined, this
value is printed after the last of print's arguments. Default is undef.
You cannot call output_record_separator()
on a handle, only as a
static method. See IO::Handle.
Mnemonic: you set $\
instead of adding "\n" at the end of the print.
Also, it's just like $/
, but it's what you get "back" from Perl.
If set to nonzero, forces a flush right away and after every write or
print on the currently selected output channel. Default is 0
(regardless of whether the channel is really buffered by the system or
not; $|
tells you only whether you've asked Perl explicitly to
flush after each write). STDOUT will typically be line buffered if
output is to the terminal and block buffered otherwise. Setting this
variable is useful primarily when you are outputting to a pipe or
socket, such as when you are running a Perl program under rsh and
want to see the output as it's happening. This has no effect on input
buffering. See getc for that. See select on
how to select the output channel. See also IO::Handle.
Mnemonic: when you want your pipes to be piping hot.
This read-only variable contains a reference to the last-read filehandle.
This is set by <HANDLE>
, readline, tell, eof and seek.
This is the same handle that $.
and tell and eof without arguments
use. It is also the handle used when Perl appends ", <STDIN> line 1" to
an error or warning message.
This variable was added in Perl v5.18.0.
The special variables for formats are a subset of those for filehandles. See perlform for more information about Perl's formats.
The current value of the write() accumulator for format() lines.
A format contains formline() calls that put their result into
$^A
. After calling its format, write() prints out the contents
of $^A
and empties. So you never really see the contents of $^A
unless you call formline() yourself and then look at it. See
perlform and formline PICTURE,LIST.
What formats output as a form feed. The default is \f
.
You cannot call format_formfeed()
on a handle, only as a static
method. See IO::Handle.
The current page number of the currently selected output channel.
Mnemonic: %
is page number in nroff.
The number of lines left on the page of the currently selected output channel.
Mnemonic: lines_on_page - lines_printed.
The current set of characters after which a string may be broken to
fill continuation fields (starting with ^) in a format. The default is
" \n-", to break on a space, newline, or a hyphen.
You cannot call format_line_break_characters()
on a handle, only as
a static method. See IO::Handle.
Mnemonic: a "colon" in poetry is a part of a line.
The current page length (printable lines) of the currently selected output channel. The default is 60.
Mnemonic: = has horizontal lines.
The name of the current top-of-page format for the currently selected
output channel. The default is the name of the filehandle with _TOP
appended. For example, the default format top name for the STDOUT
filehandle is STDOUT_TOP
.
Mnemonic: points to top of page.
The name of the current report format for the currently selected
output channel. The default format name is the same as the filehandle
name. For example, the default format name for the STDOUT
filehandle is just STDOUT
.
Mnemonic: brother to $^
.
The variables $@
, $!
, $^E
, and $?
contain information
about different types of error conditions that may appear during
execution of a Perl program. The variables are shown ordered by
the "distance" between the subsystem which reported the error and
the Perl process. They correspond to errors detected by the Perl
interpreter, C library, operating system, or an external program,
respectively.
To illustrate the differences between these variables, consider the following Perl expression, which uses a single-quoted string. After execution of this statement, perl may have set all four special error variables:
- eval q{
- open my $pipe, "/cdrom/install |" or die $!;
- my @res = <$pipe>;
- close $pipe or die "bad pipe: $?, $!";
- };
When perl executes the eval() expression, it translates the
open(), <PIPE>
, and close calls in the C run-time library
and thence to the operating system kernel. perl sets $!
to
the C library's errno
if one of these calls fails.
$@
is set if the string to be eval-ed did not compile (this may
happen if open or close were imported with bad prototypes), or
if Perl code executed during evaluation die()d. In these cases the
value of $@
is the compile error, or the argument to die (which
will interpolate $!
and $?
). (See also Fatal, though.)
Under a few operating systems, $^E
may contain a more verbose error
indicator, such as in this case, "CDROM tray not closed." Systems that
do not support extended error messages leave $^E
the same as $!
.
Finally, $?
may be set to non-0 value if the external program
/cdrom/install fails. The upper eight bits reflect specific error
conditions encountered by the program (the program's exit() value).
The lower eight bits reflect mode of failure, like signal death and
core dump information. See wait(2) for details. In contrast to
$!
and $^E
, which are set only if error condition is detected,
the variable $?
is set on each wait or pipe close,
overwriting the old value. This is more like $@
, which on every
eval() is always set on failure and cleared on success.
For more details, see the individual descriptions at $@
, $!
,
$^E
, and $?
.
The native status returned by the last pipe close, backtick (``
)
command, successful call to wait() or waitpid(), or from the
system() operator. On POSIX-like systems this value can be decoded
with the WIFEXITED, WEXITSTATUS, WIFSIGNALED, WTERMSIG, WIFSTOPPED,
WSTOPSIG and WIFCONTINUED functions provided by the POSIX module.
Under VMS this reflects the actual VMS exit status; i.e. it is the
same as $?
when the pragma use vmsish 'status'
is in effect.
This variable was added in Perl v5.10.0.
Error information specific to the current operating system. At the
moment, this differs from $!
under only VMS, OS/2, and Win32 (and
for MacPerl). On all other platforms, $^E
is always just the same
as $!
.
Under VMS, $^E
provides the VMS status value from the last system
error. This is more specific information about the last system error
than that provided by $!
. This is particularly important when $!
is set to EVMSERR.
Under OS/2, $^E
is set to the error code of the last call to OS/2
API either via CRT, or directly from perl.
Under Win32, $^E
always returns the last error information reported
by the Win32 call GetLastError()
which describes the last error
from within the Win32 API. Most Win32-specific code will report errors
via $^E
. ANSI C and Unix-like calls set errno
and so most
portable Perl code will report errors via $!
.
Caveats mentioned in the description of $!
generally apply to
$^E
, also.
This variable was added in Perl 5.003.
Mnemonic: Extra error explanation.
Current state of the interpreter.
The first state may happen in $SIG{__DIE__}
and $SIG{__WARN__}
handlers.
The English name $EXCEPTIONS_BEING_CAUGHT is slightly misleading, because
the undef value does not indicate whether exceptions are being caught,
since compilation of the main program does not catch exceptions.
This variable was added in Perl 5.004.
The current value of the warning switch, initially true if -w was used, false otherwise, but directly modifiable.
See also warnings.
Mnemonic: related to the -w switch.
The current set of warning checks enabled by the use warnings
pragma.
It has the same scoping as the $^H
and %^H
variables. The exact
values are considered internal to the warnings pragma and may change
between versions of Perl.
This variable was added in Perl v5.6.0.
When referenced, $!
retrieves the current value
of the C errno
integer variable.
If $!
is assigned a numerical value, that value is stored in errno
.
When referenced as a string, $!
yields the system error string
corresponding to errno
.
Many system or library calls set errno
if they fail,
to indicate the cause of failure. They usually do not
set errno
to zero if they succeed. This means errno
,
hence $!
, is meaningful only immediately after a failure:
Here, meaningless means that $!
may be unrelated to the outcome
of the open() operator. Assignment to $!
is similarly ephemeral.
It can be used immediately before invoking the die() operator,
to set the exit value, or to inspect the system error string
corresponding to error n, or to restore $!
to a meaningful state.
Mnemonic: What just went bang?
Each element of %!
has a true value only if $!
is set to that
value. For example, $!{ENOENT}
is true if and only if the current
value of $!
is ENOENT
; that is, if the most recent error was "No
such file or directory" (or its moral equivalent: not all operating
systems give that exact error, and certainly not all languages). To
check if a particular key is meaningful on your system, use exists
$!{the_key}
; for a list of legal keys, use keys %!
. See Errno
for more information, and also see $!.
This variable was added in Perl 5.005.
The status returned by the last pipe close, backtick (``
) command,
successful call to wait() or waitpid(), or from the system()
operator. This is just the 16-bit status word returned by the
traditional Unix wait() system call (or else is made up to look
like it). Thus, the exit value of the subprocess is really ($?>>
8
), and $? & 127
gives which signal, if any, the process died
from, and $? & 128
reports whether there was a core dump.
Additionally, if the h_errno
variable is supported in C, its value
is returned via $?
if any gethost*()
function fails.
If you have installed a signal handler for SIGCHLD
, the
value of $?
will usually be wrong outside that handler.
Inside an END
subroutine $?
contains the value that is going to be
given to exit(). You can modify $?
in an END
subroutine to
change the exit status of your program. For example:
- END {
- $? = 1 if $? == 255; # die would make it 255
- }
Under VMS, the pragma use vmsish 'status'
makes $?
reflect the
actual VMS exit status, instead of the default emulation of POSIX
status; see $? in perlvms for details.
Mnemonic: similar to sh and ksh.
The Perl syntax error message from the
last eval() operator. If $@
is
the null string, the last eval() parsed and executed correctly
(although the operations you invoked may have failed in the normal
fashion).
Warning messages are not collected in this variable. You can, however,
set up a routine to process warnings by setting $SIG{__WARN__}
as
described in %SIG.
Mnemonic: Where was the syntax error "at"?
These variables provide information about the current interpreter state.
The current value of the flag associated with the -c switch.
Mainly of use with -MO=... to allow code to alter its behavior
when being compiled, such as for example to AUTOLOAD
at compile
time rather than normal, deferred loading. Setting
$^C = 1
is similar to calling B::minus_c
.
This variable was added in Perl v5.6.0.
The current value of the debugging flags. May be read or set. Like its
command-line equivalent, you can use numeric or symbolic values, eg
$^D = 10
or $^D = "st"
.
Mnemonic: value of -D switch.
The object reference to the Encode
object that is used to convert
the source code to Unicode. Thanks to this variable your Perl script
does not have to be written in UTF-8. Default is undef. The direct
manipulation of this variable is highly discouraged.
This variable was added in Perl 5.8.2.
The current phase of the perl interpreter.
Possible values are:
The PerlInterpreter*
is being constructed via perl_construct
. This
value is mostly there for completeness and for use via the
underlying C variable PL_phase
. It's not really possible for Perl
code to be executed unless construction of the interpreter is
finished.
This is the global compile-time. That includes, basically, every
BEGIN
block executed directly or indirectly from during the
compile-time of the top-level program.
This phase is not called "BEGIN" to avoid confusion with
BEGIN
-blocks, as those are executed during compile-time of any
compilation unit, not just the top-level program. A new, localised
compile-time entered at run-time, for example by constructs as
eval "use SomeModule"
are not global interpreter phases, and
therefore aren't reflected by ${^GLOBAL_PHASE}
.
Execution of any CHECK
blocks.
Similar to "CHECK", but for INIT
-blocks, not CHECK
blocks.
The main run-time, i.e. the execution of PL_main_root
.
Execution of any END
blocks.
Global destruction.
Also note that there's no value for UNITCHECK-blocks. That's because those are run for each compilation unit individually, and therefore is not a global interpreter phase.
Not every program has to go through each of the possible phases, but transition from one phase to another can only happen in the order described in the above list.
An example of all of the phases Perl code can see:
- BEGIN { print "compile-time: ${^GLOBAL_PHASE}\n" }
- INIT { print "init-time: ${^GLOBAL_PHASE}\n" }
- CHECK { print "check-time: ${^GLOBAL_PHASE}\n" }
- {
- package Print::Phase;
- sub new {
- my ($class, $time) = @_;
- return bless \$time, $class;
- }
- sub DESTROY {
- my $self = shift;
- print "$$self: ${^GLOBAL_PHASE}\n";
- }
- }
- print "run-time: ${^GLOBAL_PHASE}\n";
- my $runtime = Print::Phase->new(
- "lexical variables are garbage collected before END"
- );
- END { print "end-time: ${^GLOBAL_PHASE}\n" }
- our $destruct = Print::Phase->new(
- "package variables are garbage collected after END"
- );
This will print out
This variable was added in Perl 5.14.0.
WARNING: This variable is strictly for internal use only. Its availability, behavior, and contents are subject to change without notice.
This variable contains compile-time hints for the Perl interpreter. At the end of compilation of a BLOCK the value of this variable is restored to the value when the interpreter started to compile the BLOCK.
When perl begins to parse any block construct that provides a lexical scope
(e.g., eval body, required file, subroutine body, loop body, or conditional
block), the existing value of $^H
is saved, but its value is left unchanged.
When the compilation of the block is completed, it regains the saved value.
Between the points where its value is saved and restored, code that
executes within BEGIN blocks is free to change the value of $^H
.
This behavior provides the semantic of lexical scoping, and is used in,
for instance, the use strict
pragma.
The contents should be an integer; different bits of it are used for different pragmatic flags. Here's an example:
Consider what happens during execution of the BEGIN block. At this point
the BEGIN block has already been compiled, but the body of foo()
is still
being compiled. The new value of $^H
will therefore be visible only while
the body of foo()
is being compiled.
Substitution of BEGIN { add_100() }
block with:
- BEGIN { require strict; strict->import('vars') }
demonstrates how use strict 'vars'
is implemented. Here's a conditional
version of the same lexical pragma:
This variable was added in Perl 5.003.
The %^H
hash provides the same scoping semantic as $^H
. This makes
it useful for implementation of lexically scoped pragmas. See
perlpragma.
When putting items into %^H
, in order to avoid conflicting with other
users of the hash there is a convention regarding which keys to use.
A module should use only keys that begin with the module's name (the
name of its main package) and a "/" character. For example, a module
Foo::Bar
should use keys such as Foo::Bar/baz
.
This variable was added in Perl v5.6.0.
An internal variable used by PerlIO. A string in two parts, separated
by a \0
byte, the first part describes the input layers, the second
part describes the output layers.
This variable was added in Perl v5.8.0.
The internal variable for debugging support. The meanings of the various bits are subject to change, but currently indicate:
Debug subroutine enter/exit.
Line-by-line debugging. Causes DB::DB()
subroutine to be called for
each statement executed. Also causes saving source code lines (like
0x400).
Switch off optimizations.
Preserve more data for future interactive inspections.
Keep info about source lines on which a subroutine is defined.
Start with single-step on.
Use subroutine address instead of name when reporting.
Report goto &subroutine
as well.
Provide informative "file" names for evals based on the place they were compiled.
Provide informative names to anonymous subroutines based on the place they were compiled.
Save source code lines into @{"_<$filename"}
.
Some bits may be relevant at compile-time only, some at run-time only. This is a new mechanism and the details may change. See also perldebguts.
Reflects if taint mode is on or off. 1 for on (the program was run with -T), 0 for off, -1 when only taint warnings are enabled (i.e. with -t or -TU).
This variable is read-only.
This variable was added in Perl v5.8.0.
Reflects certain Unicode settings of Perl. See perlrun
documentation for the -C
switch for more information about
the possible values.
This variable is set during Perl startup and is thereafter read-only.
This variable was added in Perl v5.8.2.
This variable controls the state of the internal UTF-8 offset caching code. 1 for on (the default), 0 for off, -1 to debug the caching code by checking all its results against linear scans, and panicking on any discrepancy.
This variable was added in Perl v5.8.9. It is subject to change or removal without notice, but is currently used to avoid recalculating the boundaries of multi-byte UTF-8-encoded characters.
This variable indicates whether a UTF-8 locale was detected by perl at
startup. This information is used by perl when it's in
adjust-utf8ness-to-locale mode (as when run with the -CL
command-line
switch); see perlrun for more info on this.
This variable was added in Perl v5.8.8.
Deprecating a variable announces the intent of the perl maintainers to eventually remove the variable from the language. It may still be available despite its status. Using a deprecated variable triggers a warning.
Once a variable is removed, its use triggers an error telling you the variable is unsupported.
See perldiag for details about error messages.
$#
was a variable that could be used to format printed numbers.
After a deprecation cycle, its magic was removed in Perl v5.10.0 and
using it now triggers a warning: $# is no longer supported.
This is not the sigil you use in front of an array name to get the
last index, like $#array
. That's still how you get the last index
of an array in Perl. The two have nothing to do with each other.
Deprecated in Perl 5.
Removed in Perl v5.10.0.
$*
was a variable that you could use to enable multiline matching.
After a deprecation cycle, its magic was removed in Perl v5.10.0.
Using it now triggers a warning: $* is no longer supported.
You should use the /s and /m regexp modifiers instead.
Deprecated in Perl 5.
Removed in Perl v5.10.0.
This variable stores the index of the first element in an array, and of the first character in a substring. The default is 0, but you could theoretically set it to 1 to make Perl behave more like awk (or Fortran) when subscripting and when evaluating the index() and substr() functions.
As of release 5 of Perl, assignment to $[
is treated as a compiler
directive, and cannot influence the behavior of any other file.
(That's why you can only assign compile-time constants to it.)
Its use is highly discouraged.
Prior to Perl v5.10.0, assignment to $[
could be seen from outer lexical
scopes in the same file, unlike other compile-time directives (such as
strict). Using local() on it would bind its value strictly to a lexical
block. Now it is always lexically scoped.
As of Perl v5.16.0, it is implemented by the arybase module. See arybase for more details on its behaviour.
Under use v5.16
, or no feature "array_base"
, $[
no longer has any
effect, and always contains 0. Assigning 0 to it is permitted, but any
other value will produce an error.
Mnemonic: [ begins subscripts.
Deprecated in Perl v5.12.0.
See $^V for a more modern representation of the Perl version that allows accurate string comparisons.
The version + patchlevel / 1000 of the Perl interpreter. This variable can be used to determine whether the Perl interpreter executing a script is in the right range of versions:
- warn "No checksumming!\n" if $] < 3.019;
The floating point representation can sometimes lead to inaccurate numeric comparisons.
See also the documentation of use VERSION
and require VERSION
for a convenient way to fail if the running Perl interpreter is too old.
Mnemonic: Is this version of perl in the right bracket?
perlvms - VMS-specific documentation for Perl
Gathered below are notes describing details of Perl 5's behavior on VMS. They are a supplement to the regular Perl 5 documentation, so we have focussed on the ways in which Perl 5 functions differently under VMS than it does under Unix, and on the interactions between Perl and the rest of the operating system. We haven't tried to duplicate complete descriptions of Perl features from the main Perl documentation, which can be found in the [.pod] subdirectory of the Perl distribution.
We hope these notes will save you from confusion and lost sleep when writing Perl scripts on VMS. If you find we've missed something you think should appear here, please don't hesitate to drop a line to vmsperl@perl.org.
Directions for building and installing Perl 5 can be found in the file README.vms in the main source directory of the Perl distribution..
During the installation process, three Perl images are produced. Miniperl.Exe is an executable image which contains all of the basic functionality of Perl, but cannot take advantage of Perl extensions. It is used to generate several files needed to build the complete Perl and various extensions. Once you've finished installing Perl, you can delete this image.
Most of the complete Perl resides in the shareable image PerlShr.Exe, which provides a core to which the Perl executable image and all Perl extensions are linked. You should place this image in Sys$Share, or define the logical name PerlShr to translate to the full file specification of this image. It should be world readable. (Remember that if a user has execute only access to PerlShr, VMS will treat it as if it were a privileged shareable image, and will therefore require all downstream shareable images to be INSTALLed, etc.)
Finally, Perl.Exe is an executable image containing the main entry point for Perl, as well as some initialization code. It should be placed in a public directory, and made world executable. In order to run Perl with command line arguments, you should define a foreign command to invoke this image.
Perl extensions are packages which provide both XS and Perl code
to add new functionality to perl. (XS is a meta-language which
simplifies writing C code which interacts with Perl, see
perlxs for more details.) The Perl code for an
extension is treated like any other library module - it's
made available in your script through the appropriate
use or require statement, and usually defines a Perl
package containing the extension.
The portion of the extension provided by the XS code may be
connected to the rest of Perl in either of two ways. In the
static configuration, the object code for the extension is
linked directly into PerlShr.Exe, and is initialized whenever
Perl is invoked. In the dynamic configuration, the extension's
machine code is placed into a separate shareable image, which is
mapped by Perl's DynaLoader when the extension is used or
required in your script. This allows you to maintain the
extension as a separate entity, at the cost of keeping track of the
additional shareable image. Most extensions can be set up as either
static or dynamic.
The source code for an extension usually resides in its own
directory. At least three files are generally provided:
Extshortname.xs (where Extshortname is the portion of
the extension's name following the last ::
), containing
the XS code, Extshortname.pm, the Perl library module
for the extension, and Makefile.PL, a Perl script which uses
the MakeMaker
library modules supplied with Perl to generate
a Descrip.MMS file for the extension.
Since static extensions are incorporated directly into
PerlShr.Exe, you'll have to rebuild Perl to incorporate a
new extension. You should edit the main Descrip.MMS or Makefile
you use to build Perl, adding the extension's name to the ext
macro, and the extension's object file to the extobj
macro.
You'll also need to build the extension's object file, either
by adding dependencies to the main Descrip.MMS, or using a
separate Descrip.MMS for the extension. Then, rebuild
PerlShr.Exe to incorporate the new code.
Finally, you'll need to copy the extension's Perl library
module to the [.Extname] subdirectory under one
of the directories in @INC
, where Extname is the name
of the extension, with all ::
replaced by . (e.g.
the library module for extension Foo::Bar would be copied
to a [.Foo.Bar] subdirectory).
In general, the distributed kit for a Perl extension includes a file named Makefile.PL, which is a Perl program which is used to create a Descrip.MMS file which can be used to build and install the files required by the extension. The kit should be unpacked into a directory tree not under the main Perl source directory, and the procedure for building the extension is simply
- $ perl Makefile.PL ! Create Descrip.MMS
- $ mmk ! Build necessary files
- $ mmk test ! Run test code, if supplied
- $ mmk install ! Install into public Perl tree
N.B. The procedure by which extensions are built and tested creates several levels (at least 4) under the directory in which the extension's source files live. For this reason if you are running a version of VMS prior to V7.1 you shouldn't nest the source directory too deeply in your directory structure lest you exceed RMS' maximum of 8 levels of subdirectory in a filespec. (You can use rooted logical names to get another 8 levels of nesting, if you can't place the files near the top of the physical directory structure.)
VMS support for this process in the current release of Perl
is sufficient to handle most extensions. However, it does
not yet recognize extra libraries required to build shareable
images which are part of an extension, so these must be added
to the linker options file for the extension by hand. For
instance, if the PGPLOT extension to Perl requires the
PGPLOTSHR.EXE shareable image in order to properly link
the Perl extension, then the line PGPLOTSHR/Share
must
be added to the linker options file PGPLOT.Opt produced
during the build process for the Perl extension.
By default, the shareable image for an extension is placed in
the [.lib.site_perl.autoArch.Extname] directory of the
installed Perl directory tree (where Arch is VMS_VAX or
VMS_AXP, and Extname is the name of the extension, with
each ::
translated to .). (See the MakeMaker documentation
for more details on installation options for extensions.)
However, it can be manually placed in any of several locations:
the [.Lib.Auto.Arch$PVersExtname] subdirectory
of one of the directories in @INC
(where PVers
is the version of Perl you're using, as supplied in $]
,
with '.' converted to '_'), or
one of the directories in @INC
, or
a directory which the extensions Perl library module passes to the DynaLoader when asking it to map the shareable image, or
Sys$Share or Sys$Library.
If the shareable image isn't in any of these places, you'll need
to define a logical name Extshortname, where Extshortname
is the portion of the extension's name after the last ::
, which
translates to the full file specification of the shareable image.
We have tried to make Perl aware of both VMS-style and Unix-style file
specifications wherever possible. You may use either style, or both,
on the command line and in scripts, but you may not combine the two
styles within a single file specification. VMS Perl interprets Unix
pathnames in much the same way as the CRTL (e.g. the first component
of an absolute path is read as the device name for the VMS file
specification). There are a set of functions provided in the
VMS::Filespec
package for explicit interconversion between VMS and
Unix syntax; its documentation provides more details.
We've tried to minimize the dependence of Perl library modules on Unix syntax, but you may find that some of these, as well as some scripts written for Unix systems, will require that you use Unix syntax, since they will assume that '/' is the directory separator, etc. If you find instances of this in the Perl distribution itself, please let us know, so we can try to work around them.
Also when working on Perl programs on VMS, if you need a syntax in a specific operating system format, then you need either to check the appropriate DECC$ feature logical, or call a conversion routine to force it to that format.
The feature logical name DECC$FILENAME_UNIX_REPORT modifies traditional
Perl behavior in the conversion of file specifications from Unix to VMS
format in order to follow the extended character handling rules now
expected by the CRTL. Specifically, when this feature is in effect, the
./.../ in a Unix path is now translated to [.^.^.^.] instead of
the traditional VMS [...]
. To be compatible with what MakeMaker
expects, if a VMS path cannot be translated to a Unix path, it is
passed through unchanged, so unixify("[...]")
will return [...]
.
The handling of extended characters is largely complete in the VMS-specific C infrastructure of Perl, but more work is still needed to fully support extended syntax filenames in several core modules. In particular, at this writing PathTools has only partial support for directories containing some extended characters.
There are several ambiguous cases where a conversion routine cannot determine whether an input filename is in Unix format or in VMS format, since now both VMS and Unix file specifications may have characters in them that could be mistaken for syntax delimiters of the other type. So some pathnames simply cannot be used in a mode that allows either type of pathname to be present. Perl will tend to assume that an ambiguous filename is in Unix format.
Allowing "." as a version delimiter is simply incompatible with determining whether a pathname is in VMS format or in Unix format with extended file syntax. There is no way to know whether "perl-5.8.6" is a Unix "perl-5.8.6" or a VMS "perl-5.8;6" when passing it to unixify() or vmsify().
The DECC$FILENAME_UNIX_REPORT logical name controls how Perl interprets filenames to the extent that Perl uses the CRTL internally for many purposes, and attempts to follow CRTL conventions for reporting filenames. The DECC$FILENAME_UNIX_ONLY feature differs in that it expects all filenames passed to the C run-time to be already in Unix format. This feature is not yet supported in Perl since Perl uses traditional OpenVMS file specifications internally and in the test harness, and it is not yet clear whether this mode will be useful or useable. The feature logical name DECC$POSIX_COMPLIANT_PATHNAMES is new with the RMS Symbolic Link SDK and included with OpenVMS v8.3, but is not yet supported in Perl.
Perl follows VMS defaults and override settings in preserving (or not preserving) filename case. Case is not preserved on ODS-2 formatted volumes on any architecture. On ODS-5 volumes, filenames may be case preserved depending on process and feature settings. Perl now honors DECC$EFS_CASE_PRESERVE and DECC$ARGV_PARSE_STYLE on those systems where the CRTL supports these features. When these features are not enabled or the CRTL does not support them, Perl follows the traditional CRTL behavior of downcasing command-line arguments and returning file specifications in lower case only.
N. B. It is very easy to get tripped up using a mixture of other programs, external utilities, and Perl scripts that are in varying states of being able to handle case preservation. For example, a file created by an older version of an archive utility or a build utility such as MMK or MMS may generate a filename in all upper case even on an ODS-5 volume. If this filename is later retrieved by a Perl script or module in a case preserving environment, that upper case name may not match the mixed-case or lower-case exceptions of the Perl code. Your best bet is to follow an all-or-nothing approach to case preservation: either don't use it at all, or make sure your entire toolchain and application environment support and use it.
OpenVMS Alpha v7.3-1 and later and all version of OpenVMS I64 support
case sensitivity as a process setting (see SET PROCESS
/CASE_LOOKUP=SENSITIVE
). Perl does not currently support case
sensitivity on VMS, but it may in the future, so Perl programs should
use the File::Spec->case_tolerant
method to determine the state, and
not the $^O
variable.
When built on an ODS-5 volume with symbolic links enabled, Perl by
default supports symbolic links when the requisite support is available
in the filesystem and CRTL (generally 64-bit OpenVMS v8.3 and later).
There are a number of limitations and caveats to be aware of when
working with symbolic links on VMS. Most notably, the target of a valid
symbolic link must be expressed as a Unix-style path and it must exist
on a volume visible from your POSIX root (see the SHOW ROOT
command
in DCL help). For further details on symbolic link capabilities and
requirements, see chapter 12 of the CRTL manual that ships with OpenVMS
v8.3 or later.
File specifications containing wildcards are allowed both on
the command line and within Perl globs (e.g. <*.c>
). If
the wildcard filespec uses VMS syntax, the resultant
filespecs will follow VMS syntax; if a Unix-style filespec is
passed in, Unix-style filespecs will be returned.
Similar to the behavior of wildcard globbing for a Unix shell,
one can escape command line wildcards with double quotation
marks " around a perl program command line argument. However,
owing to the stripping of " characters carried out by the C
handling of argv you will need to escape a construct such as
this one (in a directory containing the files PERL.C, PERL.EXE,
PERL.H, and PERL.OBJ):
- $ perl -e "print join(' ',@ARGV)" perl.*
- perl.c perl.exe perl.h perl.obj
in the following triple quoted manner:
- $ perl -e "print join(' ',@ARGV)" """perl.*"""
- perl.*
In both the case of unquoted command line arguments or in calls
to glob() VMS wildcard expansion is performed. (csh-style
wildcard expansion is available if you use File::Glob::glob
.)
If the wildcard filespec contains a device or directory
specification, then the resultant filespecs will also contain
a device and directory; otherwise, device and directory
information are removed. VMS-style resultant filespecs will
contain a full device and directory, while Unix-style
resultant filespecs will contain only as much of a directory
path as was present in the input filespec. For example, if
your default directory is Perl_Root:[000000], the expansion
of [.t]*.* will yield filespecs like
"perl_root:[t]base.dir", while the expansion of t/*/*
will
yield filespecs like "t/base.dir". (This is done to match
the behavior of glob expansion performed by Unix shells.)
Similarly, the resultant filespec will contain the file version only if one was present in the input filespec.
Input and output pipes to Perl filehandles are supported; the "file name" is passed to lib$spawn() for asynchronous execution. You should be careful to close any pipes you have opened in a Perl script, lest you leave any "orphaned" subprocesses around when Perl exits.
You may also use backticks to invoke a DCL subprocess, whose
output is used as the return value of the expression. The
string between the backticks is handled as if it were the
argument to the system operator (see below). In this case,
Perl will wait for the subprocess to complete before continuing.
The mailbox (MBX) that perl can create to communicate with a pipe
defaults to a buffer size of 8192 on 64-bit systems, 512 on VAX. The
default buffer size is adjustable via the logical name PERL_MBX_SIZE
provided that the value falls between 128 and the SYSGEN parameter
MAXBUF inclusive. For example, to set the mailbox size to 32767 use
$ENV{'PERL_MBX_SIZE'} = 32767;
and then open and use pipe constructs.
An alternative would be to issue the command:
- $ Define PERL_MBX_SIZE 32767
before running your wide record pipe program. A larger value may improve performance at the expense of the BYTLM UAF quota.
The PERL5LIB and PERLLIB logical names work as documented in perl, except that the element separator is '|' instead of ':'. The directory specifications may use either VMS or Unix syntax.
The Perl forked debugger places the debugger commands and output in a separate X-11 terminal window so that commands and output from multiple processes are not mixed together.
Perl on VMS supports an emulation of the forked debugger when Perl is run on a VMS system that has X11 support installed.
To use the forked debugger, you need to have the default display set to an X-11 Server and some environment variables set that Unix expects.
The forked debugger requires the environment variable TERM
to be xterm
,
and the environment variable DISPLAY
to exist. xterm
must be in
lower case.
- $define TERM "xterm"
- $define DISPLAY "hostname:0.0"
Currently the value of DISPLAY
is ignored. It is recommended that it be set
to be the hostname of the display, the server and screen in Unix notation. In
the future the value of DISPLAY may be honored by Perl instead of using the
default display.
It may be helpful to always use the forked debugger so that script I/O is separated from debugger I/O. You can force the debugger to be forked by assigning a value to the logical name <PERLDB_PIDS> that is not a process identification number.
- $define PERLDB_PIDS XXXX
The PERL_VMS_EXCEPTION_DEBUG being defined as "ENABLE" will cause the VMS debugger to be invoked if a fatal exception that is not otherwise handled is raised. The purpose of this is to allow debugging of internal Perl problems that would cause such a condition.
This allows the programmer to look at the execution stack and variables to find out the cause of the exception. As the debugger is being invoked as the Perl interpreter is about to do a fatal exit, continuing the execution in debug mode is usually not practical.
Starting Perl in the VMS debugger may change the program execution profile in a way that such problems are not reproduced.
The kill function can be used to test this functionality from within
a program.
In typical VMS style, only the first letter of the value of this logical name is actually checked in a case insensitive mode, and it is considered enabled if it is the value "T","1" or "E".
This logical name must be defined before Perl is started.
Perl for VMS supports redirection of input and output on the command line, using a subset of Bourne shell syntax:
<file
reads stdin from file
,
>file writes stdout to file
,
>>file
appends stdout to file
,
2>file
writes stderr to file
,
2>>file
appends stderr to file
, and
2>&1
redirects stderr to stdout.
In addition, output may be piped to a subprocess, using the character '|'. Anything after this character on the command line is passed to a subprocess for execution; the subprocess takes the output of Perl as its input.
Finally, if the command line ends with '&', the entire command is run in the background as an asynchronous subprocess.
The following command line switches behave differently under VMS than described in perlrun. Note also that in order to pass uppercase switches to Perl, you need to enclose them in double-quotes on the command line, since the CRTL downcases all unquoted strings.
On newer 64 bit versions of OpenVMS, a process setting now controls if the quoting is needed to preserve the case of command line arguments.
If the -i
switch is present but no extension for a backup
copy is given, then inplace editing creates a new version of
a file; the existing copy is not deleted. (Note that if
an extension is given, an existing file is renamed to the backup
file, as is the case under other operating systems, so it does
not remain as a previous version under the original filename.)
If the "-S"
or -"S"
switch is present and the script
name does not contain a directory, then Perl translates the
logical name DCL$PATH as a searchlist, using each translation
as a directory in which to look for the script. In addition,
if no file type is specified, Perl looks in each directory
for a file matching the name specified, with a blank type,
a type of .pl, and a type of .com, in that order.
The -u
switch causes the VMS debugger to be invoked
after the Perl program is compiled, but before it has
run. It does not create a core dump file.
As of the time this document was last revised, the following Perl functions were implemented in the VMS port of Perl (functions marked with * are discussed in more detail below):
- file tests*, abs, alarm, atan, backticks*, binmode*, bless,
- caller, chdir, chmod, chown, chomp, chop, chr,
- close, closedir, cos, crypt*, defined, delete, die, do, dump*,
- each, endgrent, endpwent, eof, eval, exec*, exists, exit, exp,
- fileno, flock getc, getgrent*, getgrgid*, getgrnam, getlogin, getppid,
- getpwent*, getpwnam*, getpwuid*, glob, gmtime*, goto,
- grep, hex, ioctl, import, index, int, join, keys, kill*,
- last, lc, lcfirst, lchown*, length, link*, local, localtime, log, lstat, m//,
- map, mkdir, my, next, no, oct, open, opendir, ord, pack,
- pipe, pop, pos, print, printf, push, q//, qq//, qw//,
- qx//*, quotemeta, rand, read, readdir, readlink*, redo, ref, rename,
- require, reset, return, reverse, rewinddir, rindex,
- rmdir, s///, scalar, seek, seekdir, select(internal),
- select (system call)*, setgrent, setpwent, shift, sin, sleep,
- socketpair, sort, splice, split, sprintf, sqrt, srand, stat,
- study, substr, symlink*, sysread, system*, syswrite, tell,
- telldir, tie, time, times*, tr///, uc, ucfirst, umask,
- undef, unlink*, unpack, untie, unshift, use, utime*,
- values, vec, wait, waitpid*, wantarray, warn, write, y///
The following functions were not implemented in the VMS port, and calling them produces a fatal error (usually) or undefined behavior (rarely, we hope):
The following functions are available on Perls compiled with Dec C 5.2 or greater and running VMS 7.0 or greater:
The following functions are available on Perls built on VMS 7.2 or greater:
- fcntl (without locking)
The following functions may or may not be implemented, depending on what type of socket support you've built into your copy of Perl:
- accept, bind, connect, getpeername,
- gethostbyname, getnetbyname, getprotobyname,
- getservbyname, gethostbyaddr, getnetbyaddr,
- getprotobynumber, getservbyport, gethostent,
- getnetent, getprotoent, getservent, sethostent,
- setnetent, setprotoent, setservent, endhostent,
- endnetent, endprotoent, endservent, getsockname,
- getsockopt, listen, recv, select(system call)*,
- send, setsockopt, shutdown, socket
The following function is available on Perls built on 64 bit OpenVMS v8.2 with hard links enabled on an ODS-5 formatted build disk. CRTL support is in principle available as of OpenVMS v7.3-1, and better configuration support could detect this.
The following functions are available on Perls built on 64 bit OpenVMS v8.2 and later. CRTL support is in principle available as of OpenVMS v7.3-2, and better configuration support could detect this.
The following functions are available on Perls built on 64 bit OpenVMS v8.2 and later.
- statvfs, socketpair
The tests -b
, -B
, -c
, -C
, -d
, -e
, -f
,
-o
, -M
, -s
, -S
, -t
, -T
, and -z
work as
advertised. The return values for -r
, -w
, and -x
tell you whether you can actually access the file; this may
not reflect the UIC-based file protections. Since real and
effective UIC don't differ under VMS, -O
, -R
, -W
,
and -X are equivalent to -o
, -r
, -w
, and -x
.
Similarly, several other tests, including -A
, -g
, -k
,
-l
, -p
, and -u
, aren't particularly meaningful under
VMS, and the values returned by these tests reflect whatever
your CRTL stat() routine does to the equivalent bits in the
st_mode field. Finally, -d
returns true if passed a device
specification without an explicit directory (e.g. DUA1:
), as
well as if passed a directory.
There are DECC feature logical names AND ODS-5 volume attributes that also control what values are returned for the date fields.
Note: Some sites have reported problems when using the file-access
tests (-r
, -w
, and -x
) on files accessed via DEC's DFS.
Specifically, since DFS does not currently provide access to the
extended file header of files on remote volumes, attempts to
examine the ACL fail, and the file tests will return false,
with $!
indicating that the file does not exist. You can
use stat on these files, since that checks UIC-based protection
only, and then manually check the appropriate bits, as defined by
your C compiler's stat.h, in the mode value it returns, if you
need an approximation of the file's protections.
Backticks create a subprocess, and pass the enclosed string
to it for execution as a DCL command. Since the subprocess is
created directly via lib$spawn()
, any valid DCL command string
may be specified.
The binmode operator will attempt to insure that no translation
of carriage control occurs on input from or output to this filehandle.
Since this involves reopening the file and then restoring its
file position indicator, if this function returns FALSE, the
underlying filehandle may no longer point to an open file, or may
point to a different position in the file than before binmode
was called.
Note that binmode is generally not necessary when using normal
filehandles; it is provided so that you can control I/O to existing
record-structured files when necessary. You can also use the
vmsfopen
function in the VMS::Stdio extension to gain finer
control of I/O to files and devices with different record structures.
The crypt operator uses the sys$hash_password
system
service to generate the hashed representation of PLAINTEXT.
If USER is a valid username, the algorithm and salt values
are taken from that user's UAF record. If it is not, then
the preferred algorithm and a salt of 0 are used. The
quadword encrypted value is returned as an 8-character string.
The value returned by crypt may be compared against
the encrypted password from the UAF returned by the getpw*
functions, in order to authenticate users. If you're
going to do this, remember that the encrypted password in
the UAF was generated using uppercase username and
password strings; you'll have to upcase the arguments to
crypt to insure that you'll get the proper value:
die will force the native VMS exit status to be an SS$_ABORT code
if neither of the $! or $? status values are ones that would cause
the native status to be interpreted as being what VMS classifies as
SEVERE_ERROR severity for DCL error handling.
When PERL_VMS_POSIX_EXIT
is active (see $? below), the native VMS exit
status value will have either one of the $!
or $?
or $^E
or
the Unix value 255 encoded into it in a way that the effective original
value can be decoded by other programs written in C, including Perl
and the GNV package. As per the normal non-VMS behavior of die if
either $!
or $?
are non-zero, one of those values will be
encoded into a native VMS status value. If both of the Unix status
values are 0, and the $^E
value is set one of ERROR or SEVERE_ERROR
severity, then the $^E
value will be used as the exit code as is.
If none of the above apply, the Unix value of 255 will be encoded into
a native VMS exit status value.
Please note a significant difference in the behavior of die in
the PERL_VMS_POSIX_EXIT
mode is that it does not force a VMS
SEVERE_ERROR status on exit. The Unix exit values of 2 through
255 will be encoded in VMS status values with severity levels of
SUCCESS. The Unix exit value of 1 will be encoded in a VMS status
value with a severity level of ERROR. This is to be compatible with
how the VMS C library encodes these values.
The minimum severity level set by die in PERL_VMS_POSIX_EXIT
mode
may be changed to be ERROR or higher in the future depending on the
results of testing and further review.
See $? for a description of the encoding of the Unix value to produce a native VMS status containing it.
Rather than causing Perl to abort and dump core, the dump
operator invokes the VMS debugger. If you continue to
execute the Perl program under the debugger, control will
be transferred to the label specified as the argument to
dump, or, if no label was specified, back to the
beginning of the program. All other state of the program
(e.g. values of variables, open file handles) are not
affected by calling dump.
A call to exec will cause Perl to exit, and to invoke the command
given as an argument to exec via lib$do_command
. If the
argument begins with '@' or '$' (other than as part of a filespec),
then it is executed as a DCL command. Otherwise, the first token on
the command line is treated as the filespec of an image to run, and
an attempt is made to invoke it (using .Exe and the process
defaults to expand the filespec) and pass the rest of exec's
argument to it as parameters. If the token has no file type, and
matches a file with null type, then an attempt is made to determine
whether the file is an executable image which should be invoked
using MCR
or a text file which should be passed to DCL as a
command procedure.
While in principle the fork operator could be implemented via
(and with the same rather severe limitations as) the CRTL vfork()
routine, and while some internal support to do just that is in
place, the implementation has never been completed, making fork
currently unavailable. A true kernel fork() is expected in a
future version of VMS, and the pseudo-fork based on interpreter
threads may be available in a future version of Perl on VMS (see
perlfork). In the meantime, use system, backticks, or piped
filehandles to create subprocesses.
These operators obtain the information described in perlfunc,
if you have the privileges necessary to retrieve the named user's
UAF information via sys$getuai
. If not, then only the $name
,
$uid
, and $gid
items are returned. The $dir
item contains
the login directory in VMS syntax, while the $comment
item
contains the login directory in Unix syntax. The $gcos
item
contains the owner field from the UAF record. The $quota
item is not used.
The gmtime operator will function properly if you have a
working CRTL gmtime() routine, or if the logical name
SYS$TIMEZONE_DIFFERENTIAL is defined as the number of seconds
which must be added to UTC to yield local time. (This logical
name is defined automatically if you are running a version of
VMS with built-in UTC support.) If neither of these cases is
true, a warning message is printed, and undef is returned.
In most cases, kill is implemented via the undocumented system
service $SIGPRC
, which has the same calling sequence as $FORCEX
, but
throws an exception in the target process rather than forcing it to call
$EXIT
. Generally speaking, kill follows the behavior of the
CRTL's kill() function, but unlike that function can be called from
within a signal handler. Also, unlike the kill in some versions of
the CRTL, Perl's kill checks the validity of the signal passed in and
returns an error rather than attempting to send an unrecognized signal.
Also, negative signal values don't do anything special under VMS; they're just converted to the corresponding positive value.
See the entry on backticks
above.
If Perl was not built with socket support, the system call
version of select is not available at all. If socket
support is present, then the system call version of
select functions only for file descriptors attached
to sockets. It will not provide information about regular
files or pipes, since the CRTL select() routine does not
provide this functionality.
Since VMS keeps track of files according to a different scheme
than Unix, it's not really possible to represent the file's ID
in the st_dev
and st_ino
fields of a struct stat
. Perl
tries its best, though, and the values it uses are pretty unlikely
to be the same for two different files. We can't guarantee this,
though, so caveat scriptor.
The system operator creates a subprocess, and passes its
arguments to the subprocess for execution as a DCL command.
Since the subprocess is created directly via lib$spawn()
, any
valid DCL command string may be specified. If the string begins with
'@', it is treated as a DCL command unconditionally. Otherwise, if
the first token contains a character used as a delimiter in file
specification (e.g. :
or ]), an attempt is made to expand it
using a default type of .Exe and the process defaults, and if
successful, the resulting file is invoked via MCR
. This allows you
to invoke an image directly simply by passing the file specification
to system, a common Unixish idiom. If the token has no file type,
and matches a file with null type, then an attempt is made to
determine whether the file is an executable image which should be
invoked using MCR
or a text file which should be passed to DCL
as a command procedure.
If LIST consists of the empty string, system spawns an
interactive DCL subprocess, in the same fashion as typing
SPAWN at the DCL prompt.
Perl waits for the subprocess to complete before continuing
execution in the current process. As described in perlfunc,
the return value of system is a fake "status" which follows
POSIX semantics unless the pragma use vmsish 'status'
is in
effect; see the description of $?
in this document for more
detail.
The value returned by time is the offset in seconds from
01-JAN-1970 00:00:00 (just like the CRTL's times() routine), in order
to make life easier for code coming in from the POSIX/Unix world.
The array returned by the times operator is divided up
according to the same rules the CRTL times() routine.
Therefore, the "system time" elements will always be 0, since
there is no difference between "user time" and "system" time
under VMS, and the time accumulated by a subprocess may or may
not appear separately in the "child time" field, depending on
whether times() keeps track of subprocesses separately. Note
especially that the VAXCRTL (at least) keeps track only of
subprocesses spawned using fork() and exec(); it will not
accumulate the times of subprocesses spawned via pipes, system(),
or backticks.
unlink will delete the highest version of a file only; in
order to delete all versions, you need to say
- 1 while unlink LIST;
You may need to make this change to scripts written for a
Unix system which expect that after a call to unlink,
no files with the names passed to unlink will exist.
(Note: This can be changed at compile time; if you
use Config
and $Config{'d_unlink_all_versions'}
is
define
, then unlink will delete all versions of a
file on the first call.)
unlink will delete a file if at all possible, even if it
requires changing file protection (though it won't try to
change the protection of the parent directory). You can tell
whether you've got explicit delete access to a file by using the
VMS::Filespec::candelete
operator. For instance, in order
to delete only files to which you have delete access, you could
say something like
(or you could just use VMS::Stdio::remove
, if you've installed
the VMS::Stdio extension distributed with Perl). If unlink has to
change the file protection to delete the file, and you interrupt it
in midstream, the file may be left intact, but with a changed ACL
allowing you delete access.
This behavior of unlink is to be compatible with POSIX behavior
and not traditional VMS behavior.
This operator changes only the modification time of the file (VMS revision date) on ODS-2 volumes and ODS-5 volumes without access dates enabled. On ODS-5 volumes with access dates enabled, the true access time is modified.
If PID is a subprocess started by a piped open() (see open),
waitpid will wait for that subprocess, and return its final status
value in $?
. If PID is a subprocess created in some other way (e.g.
SPAWNed before Perl was invoked), waitpid will simply check once per
second whether the process has completed, and return when it has. (If
PID specifies a process that isn't a subprocess of the current process,
and you invoked Perl with the -w
switch, a warning will be issued.)
Returns PID on success, -1 on error. The FLAGS argument is ignored in all cases.
The following VMS-specific information applies to the indicated "special" Perl variables, in addition to the general information in perlvar. Where there is a conflict, this information takes precedence.
The operation of the %ENV
array depends on the translation
of the logical name PERL_ENV_TABLES. If defined, it should
be a search list, each element of which specifies a location
for %ENV
elements. If you tell Perl to read or set the
element $ENV{name}, then Perl uses the translations of
PERL_ENV_TABLES as follows:
This string tells Perl to consult the CRTL's internal environ
array of key-value pairs, using name as the key. In most cases,
this contains only a few keys, but if Perl was invoked via the C
exec[lv]e() function, as is the case for CGI processing by some
HTTP servers, then the environ
array may have been populated by
the calling program.
A string beginning with CLISYM_
tells Perl to consult the CLI's
symbol tables, using name as the name of the symbol. When reading
an element of %ENV
, the local symbol table is scanned first, followed
by the global symbol table.. The characters following CLISYM_
are
significant when an element of %ENV
is set or deleted: if the
complete string is CLISYM_LOCAL
, the change is made in the local
symbol table; otherwise the global symbol table is changed.
If an element of PERL_ENV_TABLES translates to any other string, that string is used as the name of a logical name table, which is consulted using name as the logical name. The normal search order of access modes is used.
PERL_ENV_TABLES is translated once when Perl starts up; any changes
you make while Perl is running do not affect the behavior of %ENV
.
If PERL_ENV_TABLES is not defined, then Perl defaults to consulting
first the logical name tables specified by LNM$FILE_DEV, and then
the CRTL environ
array.
In all operations on %ENV, the key string is treated as if it were entirely uppercase, regardless of the case actually specified in the Perl expression.
When an element of %ENV
is read, the locations to which
PERL_ENV_TABLES points are checked in order, and the value
obtained from the first successful lookup is returned. If the
name of the %ENV
element contains a semi-colon, it and
any characters after it are removed. These are ignored when
the CRTL environ
array or a CLI symbol table is consulted.
However, the name is looked up in a logical name table, the
suffix after the semi-colon is treated as the translation index
to be used for the lookup. This lets you look up successive values
for search list logical names. For instance, if you say
- $ Define STORY once,upon,a,time,there,was
- $ perl -e "for ($i = 0; $i <= 6; $i++) " -
- _$ -e "{ print $ENV{'story;'.$i},' '}"
Perl will print ONCE UPON A TIME THERE WAS
, assuming, of course,
that PERL_ENV_TABLES is set up so that the logical name story
is found, rather than a CLI symbol or CRTL environ
element with
the same name.
When an element of %ENV
is set to a defined string, the
corresponding definition is made in the location to which the
first translation of PERL_ENV_TABLES points. If this causes a
logical name to be created, it is defined in supervisor mode.
(The same is done if an existing logical name was defined in
executive or kernel mode; an existing user or supervisor mode
logical name is reset to the new value.) If the value is an empty
string, the logical name's translation is defined as a single NUL
(ASCII 00) character, since a logical name cannot translate to a
zero-length string. (This restriction does not apply to CLI symbols
or CRTL environ
values; they are set to the empty string.)
An element of the CRTL environ
array can be set only if your
copy of Perl knows about the CRTL's setenv()
function. (This is
present only in some versions of the DECCRTL; check $Config{d_setenv}
to see whether your copy of Perl was built with a CRTL that has this
function.)
When an element of %ENV
is set to undef,
the element is looked up as if it were being read, and if it is
found, it is deleted. (An item "deleted" from the CRTL environ
array is set to the empty string; this can only be done if your
copy of Perl knows about the CRTL setenv()
function.) Using
delete to remove an element from %ENV
has a similar effect,
but after the element is deleted, another attempt is made to
look up the element, so an inner-mode logical name or a name in
another location will replace the logical name just deleted.
In either case, only the first value found searching PERL_ENV_TABLES
is altered. It is not possible at present to define a search list
logical name via %ENV.
The element $ENV{DEFAULT}
is special: when read, it returns
Perl's current default device and directory, and when set, it
resets them, regardless of the definition of PERL_ENV_TABLES.
It cannot be cleared or deleted; attempts to do so are silently
ignored.
Note that if you want to pass on any elements of the C-local environ array to a subprocess which isn't started by fork/exec, or isn't running a C program, you can "promote" them to logical names in the current process, which will then be inherited by all subprocesses, by saying
(You can't just say $ENV{$key} = $ENV{$key}
, since the
Perl optimizer is smart enough to elide the expression.)
Don't try to clear %ENV
by saying %ENV = ();
, it will throw
a fatal error. This is equivalent to doing the following from DCL:
- DELETE/LOGICAL *
You can imagine how bad things would be if, for example, the SYS$MANAGER or SYS$SYSTEM logical names were deleted.
At present, the first time you iterate over %ENV using
keys, or values, you will incur a time penalty as all
logical names are read, in order to fully populate %ENV.
Subsequent iterations will not reread logical names, so they
won't be as slow, but they also won't reflect any changes
to logical name tables caused by other programs.
You do need to be careful with the logical names representing
process-permanent files, such as SYS$INPUT
and SYS$OUTPUT
.
The translations for these logical names are prepended with a
two-byte binary value (0x1B 0x00) that needs to be stripped off
if you wantto use it. (In previous versions of Perl it wasn't
possible to get the values of these logical names, as the null
byte acted as an end-of-string marker)
The string value of $!
is that returned by the CRTL's
strerror() function, so it will include the VMS message for
VMS-specific errors. The numeric value of $!
is the
value of errno
, except if errno is EVMSERR, in which
case $!
contains the value of vaxc$errno. Setting $!
always sets errno to the value specified. If this value is
EVMSERR, it also sets vaxc$errno to 4 (NONAME-F-NOMSG), so
that the string value of $!
won't reflect the VMS error
message from before $!
was set.
This variable provides direct access to VMS status values
in vaxc$errno, which are often more specific than the
generic Unix-style error messages in $!
. Its numeric value
is the value of vaxc$errno, and its string value is the
corresponding VMS message string, as retrieved by sys$getmsg().
Setting $^E
sets vaxc$errno to the value specified.
While Perl attempts to keep the vaxc$errno value to be current, if errno is not EVMSERR, it may not be from the current operation.
The "status value" returned in $?
is synthesized from the
actual exit status of the subprocess in a way that approximates
POSIX wait(5) semantics, in order to allow Perl programs to
portably test for successful completion of subprocesses. The
low order 8 bits of $?
are always 0 under VMS, since the
termination status of a process may or may not have been
generated by an exception.
The next 8 bits contain the termination status of the program.
If the child process follows the convention of C programs compiled with the _POSIX_EXIT macro set, the status value will contain the actual value of 0 to 255 returned by that program on a normal exit.
With the _POSIX_EXIT macro set, the Unix exit value of zero is represented as a VMS native status of 1, and the Unix values from 2 to 255 are encoded by the equation:
- VMS_status = 0x35a000 + (unix_value * 8) + 1.
And in the special case of Unix value 1 the encoding is:
- VMS_status = 0x35a000 + 8 + 2 + 0x10000000.
For other termination statuses, the severity portion of the subprocess's exit status is used: if the severity was success or informational, these bits are all 0; if the severity was warning, they contain a value of 1; if the severity was error or fatal error, they contain the actual severity bits, which turns out to be a value of 2 for error and 4 for severe_error. Fatal is another term for the severe_error status.
As a result, $?
will always be zero if the subprocess's exit
status indicated successful completion, and non-zero if a
warning or error occurred or a program compliant with encoding
_POSIX_EXIT values was run and set a status.
How can you tell the difference between a non-zero status that is
the result of a VMS native error status or an encoded Unix status?
You can not unless you look at the ${^CHILD_ERROR_NATIVE} value.
The ${^CHILD_ERROR_NATIVE} value returns the actual VMS status value
and check the severity bits. If the severity bits are equal to 1,
then if the numeric value for $?
is between 2 and 255 or 0, then
$?
accurately reflects a value passed back from a Unix application.
If $?
is 1, and the severity bits indicate a VMS error (2), then
$?
is from a Unix application exit value.
In practice, Perl scripts that call programs that return _POSIX_EXIT type status values will be expecting those values, and programs that call traditional VMS programs will either be expecting the previous behavior or just checking for a non-zero status.
And success is always the value 0 in all behaviors.
When the actual VMS termination status of the child is an error,
internally the $!
value will be set to the closest Unix errno
value to that error so that Perl scripts that test for error
messages will see the expected Unix style error message instead
of a VMS message.
Conversely, when setting $?
in an END block, an attempt is made
to convert the POSIX value into a native status intelligible to
the operating system upon exiting Perl. What this boils down to
is that setting $?
to zero results in the generic success value
SS$_NORMAL, and setting $?
to a non-zero value results in the
generic failure status SS$_ABORT. See also exit in perlport.
With the PERL_VMS_POSIX_EXIT
logical name defined as "ENABLE",
setting $?
will cause the new value to be encoded into $^E
so that either the original parent or child exit status values
0 to 255 can be automatically recovered by C programs expecting
_POSIX_EXIT behavior. If both a parent and a child exit value are
non-zero, then it will be assumed that this is actually a VMS native
status value to be passed through. The special value of 0xFFFF is
almost a NOOP as it will cause the current native VMS status in the
C library to become the current native Perl VMS status, and is handled
this way as it is known to not be a valid native VMS status value.
It is recommend that only values in the range of normal Unix parent or
child status numbers, 0 to 255 are used.
The pragma use vmsish 'status'
makes $?
reflect the actual
VMS exit status instead of the default emulation of POSIX status
described above. This pragma also disables the conversion of
non-zero values to SS$_ABORT when setting $?
in an END
block (but zero will still be converted to SS$_NORMAL).
Do not use the pragma use vmsish 'status'
with PERL_VMS_POSIX_EXIT
enabled, as they are at times requesting conflicting actions and the
consequence of ignoring this advice will be undefined to allow future
improvements in the POSIX exit handling.
In general, with PERL_VMS_POSIX_EXIT
enabled, more detailed information
will be available in the exit status for DCL scripts or other native VMS tools,
and will give the expected information for Posix programs. It has not been
made the default in order to preserve backward compatibility.
N.B. Setting DECC$FILENAME_UNIX_REPORT
implicitly enables
PERL_VMS_POSIX_EXIT
.
Setting $|
for an I/O stream causes data to be flushed
all the way to disk on each write (i.e. not just to
the underlying RMS buffers for a file). In other words,
it's equivalent to calling fflush() and fsync() from C.
SDBM_File works properly on VMS. It has, however, one minor difference. The database directory file created has a .sdbm_dir extension rather than a .dir extension. .dir files are VMS filesystem directory files, and using them for other purposes could cause unacceptable problems.
Please see the git repository for revision history.
Charles Bailey bailey@cor.newman.upenn.edu Craig Berry craigberry@mac.com Dan Sugalski dan@sidhe.org John Malmberg wb8tyw@qsl.net
perlvos - Perl for Stratus OpenVOS
This file contains notes for building perl on the Stratus OpenVOS operating system. Perl is a scripting or macro language that is popular on many systems. See perlbook for a number of good books on Perl.
These are instructions for building Perl from source. This version of Perl requires the dynamic linking support that is found in OpenVOS Release 17.1 and thus is not supported on OpenVOS Release 17.0 or earlier releases.
If you are running VOS Release 14.4.1 or later, you can obtain a pre-compiled, supported copy of perl by purchasing the GNU Tools product from Stratus Technologies.
To build perl from its source code on the Stratus V Series platform you must have OpenVOS Release 17.1.0 or later, GNU Tools Release 3.5 or later, and the C/POSIX Runtime Libraries.
Follow the normal instructions for building perl; e.g, enter bash, run the Configure script, then use "gmake" to build perl.
After you have built perl using the Configure script, ensure that you
have modify and default write permission to >system>ported and
all subdirectories. Then type
- gmake install
While there are currently no architecture-specific extensions or modules distributed with perl, the following directories can be used to hold such files (replace the string VERSION by the appropriate version number):
- >system>ported>lib>perl5>VERSION>i786
Site-specific perl extensions and modules can be installed in one of two places. Put architecture-independent files into:
- >system>ported>lib>perl5>site_perl>VERSION
Put site-specific architecture-dependent files into one of the following directories:
- >system>ported>lib>perl5>site_perl>VERSION>i786
You can examine the @INC variable from within a perl program to see the order in which Perl searches these directories.
This port of Perl version 5 prefers Unix-style, slash-separated pathnames over OpenVOS-style greater-than-separated pathnames. OpenVOS-style pathnames should work in most contexts, but if you have trouble, replace all greater-than characters by slash characters. Because the slash character is used as a pathname delimiter, Perl cannot process OpenVOS pathnames containing a slash character in a directory or file name; these must be renamed.
This port of Perl also uses Unix-epoch date values internally. As long as you are dealing with ASCII character string representations of dates, this should not be an issue. The supported epoch is January 1, 1980 to January 17, 2038.
See the file pod/perlport.pod for more information about the OpenVOS port of Perl.
A number of the perl self-tests fails for various reasons; generally these are minor and due to subtle differences between common POSIX-based environments and the OpenVOS POSIX environment. Ensure that you conduct sufficient testing of your code to guarantee that it works properly in the OpenVOS environment.
I'm offering this port "as is". You can ask me questions, but I can't guarantee I'll be able to answer them. There are some excellent books available on the Perl language; consult a book seller.
If you want a supported version of perl for OpenVOS, purchase the OpenVOS GNU Tools product from Stratus Technologies, along with a support contract (or from anyone else who will sell you support).
Paul Green (Paul.Green@stratus.com)
February 28, 2013
perlwin32 - Perl under Windows
These are instructions for building Perl under Windows 2000 and later.
Before you start, you should glance through the README file found in the top-level directory to which the Perl distribution was extracted. Make sure you read and understand the terms under which this software is being distributed.
Also make sure you read BUGS AND CAVEATS below for the known limitations of this port.
The INSTALL file in the perl top-level has much information that is only relevant to people building Perl on Unix-like systems. In particular, you can safely ignore any information that talks about "Configure".
You may also want to look at one other option for building a perl that will work on Windows: the README.cygwin file, which give a different set of rules to build a perl for Windows. This method will probably enable you to build a more Unix-compatible perl, but you will also need to download and use various other build-time and run-time support software described in that file.
This set of instructions is meant to describe a so-called "native" port of Perl to the Windows platform. This includes both 32-bit and 64-bit Windows operating systems. The resulting Perl requires no additional software to run (other than what came with your operating system). Currently, this port is capable of using one of the following compilers on the Intel x86 architecture:
Note that the last two of these are actually competing projects both delivering complete gcc toolchain for MS Windows:
Delivers gcc toolchain targeting 32-bit Windows platform.
Delivers gcc toolchain targeting both 64-bit Windows and 32-bit Windows platforms (despite the project name "mingw-w64" they are not only 64-bit oriented). They deliver the native gcc compilers and cross-compilers that are also supported by perl's makefile.
The Microsoft Visual C++ compilers are also now being given away free. They are available as "Visual C++ Toolkit 2003" or "Visual C++ 2005/2008/2010/2012 Express Edition" (and also as part of the ".NET Framework SDK") and are the same compilers that ship with "Visual C++ .NET 2003 Professional" or "Visual C++ 2005/2008/2010/2012 Professional" respectively.
This port can also be built on IA64/AMD64 using:
- Microsoft Platform SDK Nov 2001 (64-bit compiler and tools)
- MinGW64 compiler (gcc version 4.4.3 or later)
The Windows SDK can be downloaded from http://www.microsoft.com/. The MinGW64 compiler is available at http://sourceforge.net/projects/mingw-w64. The latter is actually a cross-compiler targeting Win64. There's also a trimmed down compiler (no java, or gfortran) suitable for building perl available at: http://strawberryperl.com/package/kmx/64_gcctoolchain/
NOTE: If you're using a 32-bit compiler to build perl on a 64-bit Windows operating system, then you should set the WIN64 environment variable to "undef". Also, the trimmed down compiler only passes tests when USE_ITHREADS *= define (as opposed to undef) and when the CFG *= Debug line is commented out.
This port fully supports MakeMaker (the set of modules that is used to build extensions to perl). Therefore, you should be able to build and install most extensions found in the CPAN sites. See Usage Hints for Perl on Windows below for general hints about this.
You need a "make" program to build the sources. If you are using Visual C++ or the Windows SDK tools, nmake will work. Builds using the gcc need dmake.
dmake is a freely available make that has very nice macro features and parallelability.
A port of dmake for Windows is available from:
http://search.cpan.org/dist/dmake/
Fetch and install dmake somewhere on your path.
Use the default "cmd" shell that comes with Windows. Some versions of the popular 4DOS/NT shell have incompatibilities that may cause you trouble. If the build fails under that shell, try building again with the cmd shell.
Make sure the path to the build directory does not contain spaces. The build usually works in this circumstance, but some tests will fail.
The nmake that comes with Visual C++ will suffice for building. You will need to run the VCVARS32.BAT file, usually found somewhere like C:\Program Files\Microsoft Visual Studio\VC98\Bin. This will set your build environment.
You can also use dmake to build using Visual C++; provided, however, you set OSRELEASE to "microsft" (or whatever the directory name under which the Visual C dmake configuration lives) in your environment and edit win32/config.vc to change "make=nmake" into "make=dmake". The latter step is only essential if you want to use dmake as your default make for building extensions using MakeMaker.
These free versions of Visual C++ 2008/2010/2012 Professional contain the same compilers and linkers that ship with the full versions, and also contain everything necessary to build Perl, rather than requiring a separate download of the Windows SDK like previous versions did.
These packages can be downloaded by searching in the Download Center at http://www.microsoft.com/downloads/search.aspx?displaylang=en. (Providing exact links to these packages has proven a pointless task because the links keep on changing so often.)
Install Visual C++ 2008/2010/2012 Express, then setup your environment using, e.g.
- C:\Program Files\Microsoft Visual Studio 11.0\Common7\Tools\vsvars32.bat
(assuming the default installation location was chosen).
Perl should now build using the win32/Makefile. You will need to edit that file to set CCTYPE to MSVC90FREE or MSVC100FREE first.
This free version of Visual C++ 2005 Professional contains the same compiler and linker that ship with the full version, but doesn't contain everything necessary to build Perl.
You will also need to download the "Windows SDK" (the "Core SDK" and "MDAC SDK" components are required) for more header files and libraries.
These packages can both be downloaded by searching in the Download Center at http://www.microsoft.com/downloads/search.aspx?displaylang=en. (Providing exact links to these packages has proven a pointless task because the links keep on changing so often.)
Try to obtain the latest version of the Windows SDK. Sometimes these packages contain a particular Windows OS version in their name, but actually work on other OS versions too. For example, the "Windows Server 2003 R2 Platform SDK" also runs on Windows XP SP2 and Windows 2000.
Install Visual C++ 2005 first, then the Platform SDK. Setup your environment as follows (assuming default installation locations were chosen):
- SET PlatformSDKDir=C:\Program Files\Microsoft Platform SDK
- SET PATH=%SystemRoot%\system32;%SystemRoot%;C:\Program Files\Microsoft Visual Studio 8\Common7\IDE;C:\Program Files\Microsoft Visual Studio 8\VC\BIN;C:\Program Files\Microsoft Visual Studio 8\Common7\Tools;C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\bin;C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727;C:\Program Files\Microsoft Visual Studio 8\VC\VCPackages;%PlatformSDKDir%\Bin
- SET INCLUDE=C:\Program Files\Microsoft Visual Studio 8\VC\INCLUDE;%PlatformSDKDir%\include
- SET LIB=C:\Program Files\Microsoft Visual Studio 8\VC\LIB;C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\lib;%PlatformSDKDir%\lib
- SET LIBPATH=C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727
(The PlatformSDKDir might need to be set differently depending on which version you are using. Earlier versions installed into "C:\Program Files\Microsoft SDK", while the latest versions install into version-specific locations such as "C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2".)
Perl should now build using the win32/Makefile. You will need to edit that file to set
- CCTYPE = MSVC80FREE
and to set CCHOME, CCINCDIR and CCLIBDIR as per the environment setup above.
This free toolkit contains the same compiler and linker that ship with Visual C++ .NET 2003 Professional, but doesn't contain everything necessary to build Perl.
You will also need to download the "Platform SDK" (the "Core SDK" and "MDAC SDK" components are required) for header files, libraries and rc.exe, and ".NET Framework SDK" for more libraries and nmake.exe. Note that the latter (which also includes the free compiler and linker) requires the ".NET Framework Redistributable" to be installed first. This can be downloaded and installed separately, but is included in the "Visual C++ Toolkit 2003" anyway.
These packages can all be downloaded by searching in the Download Center at http://www.microsoft.com/downloads/search.aspx?displaylang=en. (Providing exact links to these packages has proven a pointless task because the links keep on changing so often.)
Try to obtain the latest version of the Windows SDK. Sometimes these packages contain a particular Windows OS version in their name, but actually work on other OS versions too. For example, the "Windows Server 2003 R2 Platform SDK" also runs on Windows XP SP2 and Windows 2000.
Install the Toolkit first, then the Platform SDK, then the .NET Framework SDK. Setup your environment as follows (assuming default installation locations were chosen):
- SET PlatformSDKDir=C:\Program Files\Microsoft Platform SDK
- SET PATH=%SystemRoot%\system32;%SystemRoot%;C:\Program Files\Microsoft Visual C++ Toolkit 2003\bin;%PlatformSDKDir%\Bin;C:\Program Files\Microsoft.NET\SDK\v1.1\Bin
- SET INCLUDE=C:\Program Files\Microsoft Visual C++ Toolkit 2003\include;%PlatformSDKDir%\include;C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\include
- SET LIB=C:\Program Files\Microsoft Visual C++ Toolkit 2003\lib;%PlatformSDKDir%\lib;C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\lib
(The PlatformSDKDir might need to be set differently depending on which version you are using. Earlier versions installed into "C:\Program Files\Microsoft SDK", while the latest versions install into version-specific locations such as "C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2".)
Several required files will still be missing:
cvtres.exe is required by link.exe when using a .res file. It is actually installed by the .NET Framework SDK, but into a location such as the following:
- C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322
Copy it from there to %PlatformSDKDir%\Bin
lib.exe is normally used to build libraries, but link.exe with the /lib option also works, so change win32/config.vc to use it instead:
Change the line reading:
- ar='lib'
to:
- ar='link /lib'
It may also be useful to create a batch file called lib.bat in C:\Program Files\Microsoft Visual C++ Toolkit 2003\bin containing:
- @echo off
- link /lib %*
for the benefit of any naughty C extension modules that you might want to build later which explicitly reference "lib" rather than taking their value from $Config{ar}.
setargv.obj is required to build perlglob.exe (and perl.exe if the USE_SETARGV option is enabled). The Platform SDK supplies this object file in source form in %PlatformSDKDir%\src\crt. Copy setargv.c, cruntime.h and internal.h from there to some temporary location and build setargv.obj using
- cl.exe /c /I. /D_CRTBLD setargv.c
Then copy setargv.obj to %PlatformSDKDir%\lib
Alternatively, if you don't need perlglob.exe and don't need to enable the USE_SETARGV option then you can safely just remove all mention of $(GLOBEXE) from win32/Makefile and setargv.obj won't be required anyway.
Perl should now build using the win32/Makefile. You will need to edit that file to set
- CCTYPE = MSVC70FREE
and to set CCHOME, CCINCDIR and CCLIBDIR as per the environment setup above.
The nmake that comes with the Platform SDK will suffice for building Perl. Make sure you are building within one of the "Build Environment" shells available after you install the Platform SDK from the Start Menu.
Perl can be compiled with gcc from MinGW release 3 and later (using gcc 3.2.x and later). It can be downloaded here:
You also need dmake. See Make above on how to get it.
Make sure you are in the "win32" subdirectory under the perl toplevel. This directory contains a "Makefile" that will work with versions of nmake that come with Visual C++ or the Windows SDK, and a dmake "makefile.mk" that will work for all supported compilers. The defaults in the dmake makefile are setup to build using MinGW/gcc.
Edit the makefile.mk (or Makefile, if you're using nmake) and change the values of INST_DRV and INST_TOP. You can also enable various build flags. These are explained in the makefiles.
Note that it is generally not a good idea to try to build a perl with INST_DRV and INST_TOP set to a path that already exists from a previous build. In particular, this may cause problems with the lib/ExtUtils/t/Embed.t test, which attempts to build a test program and may end up building against the installed perl's lib/CORE directory rather than the one being tested.
You will have to make sure that CCTYPE is set correctly and that CCHOME points to wherever you installed your compiler.
If building with the cross-compiler provided by mingw-w64.sourceforge.net you'll need to uncomment the line that sets GCCCROSS in the makefile.mk. Do this only if it's the cross-compiler - ie only if the bin folder doesn't contain a gcc.exe. (The cross-compiler does not provide a gcc.exe, g++.exe, ar.exe, etc. Instead, all of these executables are prefixed with 'x86_64-w64-mingw32-'.)
The default value for CCHOME in the makefiles for Visual C++ may not be correct for some versions. Make sure the default exists and is valid.
You may also need to comment out the DELAYLOAD = ...
line in the
Makefile if you're using VC++ 6.0 without the latest service pack and
the linker reports an internal error.
If you want build some core extensions statically into perl's dll, specify them in the STATIC_EXT macro.
Be sure to read the instructions near the top of the makefiles carefully.
Type "dmake" (or "nmake" if you are using that make).
This should build everything. Specifically, it will create perl.exe, perl518.dll at the perl toplevel, and various other extension dll's under the lib\auto directory. If the build fails for any reason, make sure you have done the previous steps correctly.
Type "dmake test" (or "nmake test"). This will run most of the tests from the testsuite (many tests will be skipped).
There should be no test failures.
Some test failures may occur if you use a command shell other than the native "cmd.exe", or if you are building from a path that contains spaces. So don't do that.
If you are running the tests from a emacs shell window, you may see failures in op/stat.t. Run "dmake test-notty" in that case.
If you run the tests on a FAT partition, you may see some failures for
link() related tests (op/write.t, op/stat.t ...). Testing on
NTFS avoids these errors.
Furthermore, you should make sure that during make test
you do not
have any GNU tool packages in your path: some toolkits like Unixutils
include some tools (type
for instance) which override the Windows
ones and makes tests fail. Remove them from your path while testing to
avoid these errors.
Please report any other failures as described under BUGS AND CAVEATS.
Type "dmake install" (or "nmake install"). This will put the newly
built perl and the libraries under whatever INST_TOP
points to in the
Makefile. It will also install the pod documentation under
$INST_TOP\$INST_VER\lib\pod
and HTML versions of the same under
$INST_TOP\$INST_VER\lib\pod\html
.
To use the Perl you just installed you will need to add a new entry to
your PATH environment variable: $INST_TOP\bin
, e.g.
- set PATH=c:\perl\bin;%PATH%
If you opted to uncomment INST_VER
and INST_ARCH
in the makefile
then the installation structure is a little more complicated and you will
need to add two new PATH components instead: $INST_TOP\$INST_VER\bin
and
$INST_TOP\$INST_VER\bin\$ARCHNAME
, e.g.
- set PATH=c:\perl\5.6.0\bin;c:\perl\5.6.0\bin\MSWin32-x86;%PATH%
The installation paths that you set during the build get compiled into perl, so you don't have to do anything additional to start using that perl (except add its location to your PATH variable).
If you put extensions in unusual places, you can set PERL5LIB to a list of paths separated by semicolons where you want perl to look for libraries. Look for descriptions of other environment variables you can set in perlrun.
You can also control the shell that perl uses to run system() and backtick commands via PERL5SHELL. See perlrun.
Perl does not depend on the registry, but it can look up certain default
values if you choose to put them there. Perl attempts to read entries from
HKEY_CURRENT_USER\Software\Perl
and HKEY_LOCAL_MACHINE\Software\Perl
.
Entries in the former override entries in the latter. One or more of the
following entries (of type REG_SZ or REG_EXPAND_SZ) may be set:
- lib-$] version-specific standard library path to add to @INC
- lib standard library path to add to @INC
- sitelib-$] version-specific site library path to add to @INC
- sitelib site library path to add to @INC
- vendorlib-$] version-specific vendor library path to add to @INC
- vendorlib vendor library path to add to @INC
- PERL* fallback for all %ENV lookups that begin with "PERL"
Note the $]
in the above is not literal. Substitute whatever version
of perl you want to honor that entry, e.g. 5.6.0
. Paths must be
separated with semicolons, as usual on Windows.
By default, perl handles file globbing using the File::Glob extension, which provides portable globbing.
If you want perl to use globbing that emulates the quirks of DOS filename conventions, you might want to consider using File::DosGlob to override the internal glob() implementation. See File::DosGlob for details.
If you are accustomed to using perl from various command-line shells found in UNIX environments, you will be less than pleased with what Windows offers by way of a command shell.
The crucial thing to understand about the Windows environment is that the command line you type in is processed twice before Perl sees it. First, your command shell (usually CMD.EXE) preprocesses the command line, to handle redirection, environment variable expansion, and location of the executable to run. Then, the perl executable splits the remaining command line into individual arguments, using the C runtime library upon which Perl was built.
It is particularly important to note that neither the shell nor the C runtime do any wildcard expansions of command-line arguments (so wildcards need not be quoted). Also, the quoting behaviours of the shell and the C runtime are rudimentary at best (and may, if you are using a non-standard shell, be inconsistent). The only (useful) quote character is the double quote ("). It can be used to protect spaces and other special characters in arguments.
The Windows documentation describes the shell parsing rules here: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/cmd.mspx?mfr=true and the C runtime parsing rules here: http://msdn.microsoft.com/en-us/library/17w5ykft%28v=VS.100%29.aspx.
Here are some further observations based on experiments: The C runtime breaks arguments at spaces and passes them to programs in argc/argv. Double quotes can be used to prevent arguments with spaces in them from being split up. You can put a double quote in an argument by escaping it with a backslash and enclosing the whole argument within double quotes. The backslash and the pair of double quotes surrounding the argument will be stripped by the C runtime.
The file redirection characters "<", ">", and "|" can be quoted by double quotes (although there are suggestions that this may not always be true). Single quotes are not treated as quotes by the shell or the C runtime, they don't get stripped by the shell (just to make this type of quoting completely useless). The caret "^" has also been observed to behave as a quoting character, but this appears to be a shell feature, and the caret is not stripped from the command line, so Perl still sees it (and the C runtime phase does not treat the caret as a quote character).
Here are some examples of usage of the "cmd" shell:
This prints two doublequotes:
- perl -e "print '\"\"' "
This does the same:
- perl -e "print \"\\\"\\\"\" "
This prints "bar" and writes "foo" to the file "blurch":
- perl -e "print 'foo'; print STDERR 'bar'" > blurch
This prints "foo" ("bar" disappears into nowhereland):
- perl -e "print 'foo'; print STDERR 'bar'" 2> nul
This prints "bar" and writes "foo" into the file "blurch":
- perl -e "print 'foo'; print STDERR 'bar'" 1> blurch
This pipes "foo" to the "less" pager and prints "bar" on the console:
- perl -e "print 'foo'; print STDERR 'bar'" | less
This pipes "foo\nbar\n" to the less pager:
- perl -le "print 'foo'; print STDERR 'bar'" 2>&1 | less
This pipes "foo" to the pager and writes "bar" in the file "blurch":
- perl -e "print 'foo'; print STDERR 'bar'" 2> blurch | less
Discovering the usefulness of the "command.com" shell on Windows 9x is left as an exercise to the reader :)
One particularly pernicious problem with the 4NT command shell for Windows is that it (nearly) always treats a % character as indicating that environment variable expansion is needed. Under this shell, it is therefore important to always double any % characters which you want Perl to see (for example, for hash variables), even when they are quoted.
The Comprehensive Perl Archive Network (CPAN) offers a wealth of extensions, some of which require a C compiler to build. Look in http://www.cpan.org/ for more information on CPAN.
Note that not all of the extensions available from CPAN may work in the Windows environment; you should check the information at http://testers.cpan.org/ before investing too much effort into porting modules that don't readily build.
Most extensions (whether they require a C compiler or not) can be built, tested and installed with the standard mantra:
- perl Makefile.PL
- $MAKE
- $MAKE test
- $MAKE install
where $MAKE is whatever 'make' program you have configured perl to use. Use "perl -V:make" to find out what this is. Some extensions may not provide a testsuite (so "$MAKE test" may not do anything or fail), but most serious ones do.
It is important that you use a supported 'make' program, and ensure Config.pm knows about it. If you don't have nmake, you can either get dmake from the location mentioned earlier or get an old version of nmake reportedly available from:
http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/nmake15.exe
Another option is to use the make written in Perl, available from CPAN.
http://www.cpan.org/modules/by-module/Make/
You may also use dmake. See Make above on how to get it.
Note that MakeMaker actually emits makefiles with different syntax depending on what 'make' it thinks you are using. Therefore, it is important that one of the following values appears in Config.pm:
- make='nmake' # MakeMaker emits nmake syntax
- make='dmake' # MakeMaker emits dmake syntax
- any other value # MakeMaker emits generic make syntax
- (e.g GNU make, or Perl make)
If the value doesn't match the 'make' program you want to use, edit Config.pm to fix it.
If a module implements XSUBs, you will need one of the supported C compilers. You must make sure you have set up the environment for the compiler for command-line compilation.
If a module does not build for some reason, look carefully for why it failed, and report problems to the module author. If it looks like the extension building support is at fault, report that with full details of how the build failed using the perlbug utility.
The default command shells on DOS descendant operating systems (such as they are) usually do not expand wildcard arguments supplied to programs. They consider it the application's job to handle that. This is commonly achieved by linking the application (in our case, perl) with startup code that the C runtime libraries usually provide. However, doing that results in incompatible perl versions (since the behavior of the argv expansion code differs depending on the compiler, and it is even buggy on some compilers). Besides, it may be a source of frustration if you use such a perl binary with an alternate shell that *does* expand wildcards.
Instead, the following solution works rather well. The nice things about it are 1) you can start using it right away; 2) it is more powerful, because it will do the right thing with a pattern like */*/*.c; 3) you can decide whether you do/don't want to use it; and 4) you can extend the method to add any customizations (or even entirely different kinds of wildcard expansion).
- C:\> copy con c:\perl\lib\Wild.pm
- # Wild.pm - emulate shell @ARGV expansion on shells that don't
- use File::DosGlob;
- @ARGV = map {
- my @g = File::DosGlob::glob($_) if /[*?]/;
- @g ? @g : $_;
- } @ARGV;
- 1;
- ^Z
- C:\> set PERL5OPT=-MWild
- C:\> perl -le "for (@ARGV) { print }" */*/perl*.c
- p4view/perl/perl.c
- p4view/perl/perlio.c
- p4view/perl/perly.c
- perl5.005/win32/perlglob.c
- perl5.005/win32/perllib.c
- perl5.005/win32/perlglob.c
- perl5.005/win32/perllib.c
- perl5.005/win32/perlglob.c
- perl5.005/win32/perllib.c
Note there are two distinct steps there: 1) You'll have to create Wild.pm and put it in your perl lib directory. 2) You'll need to set the PERL5OPT environment variable. If you want argv expansion to be the default, just set PERL5OPT in your default startup environment.
If you are using the Visual C compiler, you can get the C runtime's command line wildcard expansion built into perl binary. The resulting binary will always expand unquoted command lines, which may not be what you want if you use a shell that does that for you. The expansion done is also somewhat less powerful than the approach suggested above.
Windows .NET Server supports the LLP64 data model on the Intel Itanium architecture.
The LLP64 data model is different from the LP64 data model that is the
norm on 64-bit Unix platforms. In the former, int and long
are
both 32-bit data types, while pointers are 64 bits wide. In addition,
there is a separate 64-bit wide integral type, __int64
. In contrast,
the LP64 data model that is pervasive on Unix platforms provides int
as the 32-bit type, while both the long
type and pointers are of
64-bit precision. Note that both models provide for 64-bits of
addressability.
64-bit Windows running on Itanium is capable of running 32-bit x86 binaries transparently. This means that you could use a 32-bit build of Perl on a 64-bit system. Given this, why would one want to build a 64-bit build of Perl? Here are some reasons why you would bother:
A 64-bit native application will run much more efficiently on Itanium hardware.
There is no 2GB limit on process size.
Perl automatically provides large file support when built under 64-bit Windows.
Embedding Perl inside a 64-bit application.
Perl scripts on UNIX use the "#!" (a.k.a "shebang") line to indicate to the OS that it should execute the file using perl. Windows has no comparable means to indicate arbitrary files are executables.
Instead, all available methods to execute plain text files on Windows rely on the file "extension". There are three methods to use this to execute perl scripts:
There is a facility called "file extension associations". This can be manipulated via the two commands "assoc" and "ftype" that come standard with Windows. Type "ftype /?" for a complete example of how to set this up for perl scripts (Say what? You thought Windows wasn't perl-ready? :).
Since file associations don't work everywhere, and there are reportedly bugs with file associations where it does work, the old method of wrapping the perl script to make it look like a regular batch file to the OS, may be used. The install process makes available the "pl2bat.bat" script which can be used to wrap perl scripts into batch files. For example:
- pl2bat foo.pl
will create the file "FOO.BAT". Note "pl2bat" strips any .pl suffix and adds a .bat suffix to the generated file.
If you use the 4DOS/NT or similar command shell, note that "pl2bat" uses the "%*" variable in the generated batch file to refer to all the command line arguments, so you may need to make sure that construct works in batch files. As of this writing, 4DOS/NT users will need a "ParameterChar = *" statement in their 4NT.INI file or will need to execute "setdos /p*" in the 4DOS/NT startup file to enable this to work.
Using "pl2bat" has a few problems: the file name gets changed,
so scripts that rely on $0
to find what they must do may not
run properly; running "pl2bat" replicates the contents of the
original script, and so this process can be maintenance intensive
if the originals get updated often. A different approach that
avoids both problems is possible.
A script called "runperl.bat" is available that can be copied to any filename (along with the .bat suffix). For example, if you call it "foo.bat", it will run the file "foo" when it is executed. Since you can run batch files on Windows platforms simply by typing the name (without the extension), this effectively runs the file "foo", when you type either "foo" or "foo.bat". With this method, "foo.bat" can even be in a different location than the file "foo", as long as "foo" is available somewhere on the PATH. If your scripts are on a filesystem that allows symbolic links, you can even avoid copying "runperl.bat".
Here's a diversion: copy "runperl.bat" to "runperl", and type "runperl". Explain the observed behavior, or lack thereof. :) Hint: .gnidnats llits er'uoy fi ,"lrepnur" eteled :tniH
A full set of HTML documentation is installed, so you should be able to use it if you have a web browser installed on your system.
perldoc
is also a useful tool for browsing information contained
in the documentation, especially in conjunction with a pager
like less
(recent versions of which have Windows support). You may
have to set the PAGER environment variable to use a specific pager.
"perldoc -f foo" will print information about the perl operator
"foo".
One common mistake when using this port with a GUI library like Tk
is assuming that Perl's normal behavior of opening a command-line
window will go away. This isn't the case. If you want to start a copy
of perl
without opening a command-line window, use the wperl
executable built during the installation process. Usage is exactly
the same as normal perl
on Windows, except that options like -h
don't work (since they need a command-line window to print to).
If you find bugs in perl, you can run perlbug
to create a
bug report (you may have to send it manually if perlbug
cannot
find a mailer on your system).
Norton AntiVirus interferes with the build process, particularly if set to "AutoProtect, All Files, when Opened". Unlike large applications the perl build process opens and modifies a lot of files. Having the the AntiVirus scan each and every one slows build the process significantly. Worse, with PERLIO=stdio the build process fails with peculiar messages as the virus checker interacts badly with miniperl.exe writing configure files (it seems to either catch file part written and treat it as suspicious, or virus checker may have it "locked" in a way which inhibits miniperl updating it). The build does complete with
- set PERLIO=perlio
but that may be just luck. Other AntiVirus software may have similar issues.
Some of the built-in functions do not act exactly as documented in perlfunc, and a few are not implemented at all. To avoid surprises, particularly if you have had prior exposure to Perl in other operating environments or if you intend to write code that will be portable to other environments, see perlport for a reasonably definitive list of these differences.
Not all extensions available from CPAN may build or work properly in the Windows environment. See Building Extensions.
Most socket() related calls are supported, but they may not
behave as on Unix platforms. See perlport for the full list.
Signal handling may not behave as on Unix platforms (where it
doesn't exactly "behave", either :). For instance, calling die()
or exit() from signal handlers will cause an exception, since most
implementations of signal()
on Windows are severely crippled.
Thus, signals may work only for simple things like setting a flag
variable in the handler. Using signals under this port should
currently be considered unsupported.
Please send detailed descriptions of any problems and solutions that
you may find to <perlbug@perl.org>, along with the output
produced by perl -V
.
The use of a camel with the topic of Perl is a trademark of O'Reilly and Associates, Inc. Used with permission.
This document is maintained by Jan Dubois.
This port was originally contributed by Gary Ng around 5.003_24, and borrowed from the Hip Communications port that was available at the time. Various people have made numerous and sundry hacks since then.
GCC/mingw32 support was added in 5.005 (Nick Ing-Simmons).
Support for PERL_OBJECT was added in 5.005 (ActiveState Tool Corp).
Support for fork() emulation was added in 5.6 (ActiveState Tool Corp).
Win9x support was added in 5.6 (Benjamin Stuhl).
Support for 64-bit Windows added in 5.8 (ActiveState Corp).
Last updated: 02 January 2012
perlxs - XS language reference manual
XS is an interface description file format used to create an extension interface between Perl and C code (or a C library) which one wishes to use with Perl. The XS interface is combined with the library to create a new library which can then be either dynamically loaded or statically linked into perl. The XS interface description is written in the XS language and is the core component of the Perl extension interface.
An XSUB forms the basic unit of the XS interface. After compilation by the xsubpp compiler, each XSUB amounts to a C function definition which will provide the glue between Perl calling conventions and C calling conventions.
The glue code pulls the arguments from the Perl stack, converts these Perl values to the formats expected by a C function, call this C function, transfers the return values of the C function back to Perl. Return values here may be a conventional C return value or any C function arguments that may serve as output parameters. These return values may be passed back to Perl either by putting them on the Perl stack, or by modifying the arguments supplied from the Perl side.
The above is a somewhat simplified view of what really happens. Since Perl allows more flexible calling conventions than C, XSUBs may do much more in practice, such as checking input parameters for validity, throwing exceptions (or returning undef/empty list) if the return value from the C function indicates failure, calling different C functions based on numbers and types of the arguments, providing an object-oriented interface, etc.
Of course, one could write such glue code directly in C. However, this would be a tedious task, especially if one needs to write glue for multiple C functions, and/or one is not familiar enough with the Perl stack discipline and other such arcana. XS comes to the rescue here: instead of writing this glue C code in long-hand, one can write a more concise short-hand description of what should be done by the glue, and let the XS compiler xsubpp handle the rest.
The XS language allows one to describe the mapping between how the C
routine is used, and how the corresponding Perl routine is used. It
also allows creation of Perl routines which are directly translated to
C code and which are not related to a pre-existing C function. In cases
when the C interface coincides with the Perl interface, the XSUB
declaration is almost identical to a declaration of a C function (in K&R
style). In such circumstances, there is another tool called h2xs
that is able to translate an entire C header file into a corresponding
XS file that will provide glue to the functions/macros described in
the header file.
The XS compiler is called xsubpp. This compiler creates the constructs necessary to let an XSUB manipulate Perl values, and creates the glue necessary to let Perl call the XSUB. The compiler uses typemaps to determine how to map C function parameters and output values to Perl values and back. The default typemap (which comes with Perl) handles many common C types. A supplementary typemap may also be needed to handle any special structures and types for the library being linked. For more information on typemaps, see perlxstypemap.
A file in XS format starts with a C language section which goes until the
first MODULE =
directive. Other XS directives and XSUB definitions
may follow this line. The "language" used in this part of the file
is usually referred to as the XS language. xsubpp recognizes and
skips POD (see perlpod) in both the C and XS language sections, which
allows the XS file to contain embedded documentation.
See perlxstut for a tutorial on the whole extension creation process.
Note: For some extensions, Dave Beazley's SWIG system may provide a significantly more convenient mechanism for creating the extension glue code. See http://www.swig.org/ for more information.
Many of the examples which follow will concentrate on creating an interface between Perl and the ONC+ RPC bind library functions. The rpcb_gettime() function is used to demonstrate many features of the XS language. This function has two parameters; the first is an input parameter and the second is an output parameter. The function also returns a status value.
- bool_t rpcb_gettime(const char *host, time_t *timep);
From C this function will be called with the following statements.
- #include <rpc/rpc.h>
- bool_t status;
- time_t timep;
- status = rpcb_gettime( "localhost", &timep );
If an XSUB is created to offer a direct translation between this function and Perl, then this XSUB will be used from Perl with the following code. The $status and $timep variables will contain the output of the function.
- use RPC;
- $status = rpcb_gettime( "localhost", $timep );
The following XS file shows an XS subroutine, or XSUB, which
demonstrates one possible interface to the rpcb_gettime()
function. This XSUB represents a direct translation between
C and Perl and so preserves the interface even from Perl.
This XSUB will be invoked from Perl with the usage shown
above. Note that the first three #include statements, for
EXTERN.h
, perl.h
, and XSUB.h
, will always be present at the
beginning of an XS file. This approach and others will be
expanded later in this document.
- #include "EXTERN.h"
- #include "perl.h"
- #include "XSUB.h"
- #include <rpc/rpc.h>
- MODULE = RPC PACKAGE = RPC
- bool_t
- rpcb_gettime(host,timep)
- char *host
- time_t &timep
- OUTPUT:
- timep
Any extension to Perl, including those containing XSUBs,
should have a Perl module to serve as the bootstrap which
pulls the extension into Perl. This module will export the
extension's functions and variables to the Perl program and
will cause the extension's XSUBs to be linked into Perl.
The following module will be used for most of the examples
in this document and should be used from Perl with the use
command as shown earlier. Perl modules are explained in
more detail later in this document.
Throughout this document a variety of interfaces to the rpcb_gettime() XSUB will be explored. The XSUBs will take their parameters in different orders or will take different numbers of parameters. In each case the XSUB is an abstraction between Perl and the real C rpcb_gettime() function, and the XSUB must always ensure that the real rpcb_gettime() function is called with the correct parameters. This abstraction will allow the programmer to create a more Perl-like interface to the C function.
The simplest XSUBs consist of 3 parts: a description of the return value, the name of the XSUB routine and the names of its arguments, and a description of types or formats of the arguments.
The following XSUB allows a Perl program to access a C library function called sin(). The XSUB will imitate the C function which takes a single argument and returns a single value.
- double
- sin(x)
- double x
Optionally, one can merge the description of types and the list of argument names, rewriting this as
- double
- sin(double x)
This makes this XSUB look similar to an ANSI C declaration. An optional semicolon is allowed after the argument list, as in
- double
- sin(double x);
Parameters with C pointer types can have different semantic: C functions with similar declarations
- bool string_looks_as_a_number(char *s);
- bool make_char_uppercase(char *c);
are used in absolutely incompatible manner. Parameters to these functions could be described xsubpp like this:
- char * s
- char &c
Both these XS declarations correspond to the char*
C type, but they have
different semantics, see The & Unary Operator.
It is convenient to think that the indirection operator
*
should be considered as a part of the type and the address operator &
should be considered part of the variable. See perlxstypemap
for more info about handling qualifiers and unary operators in C types.
The function name and the return type must be placed on separate lines and should be flush left-adjusted.
- INCORRECT CORRECT
- double sin(x) double
- double x sin(x)
- double x
The rest of the function description may be indented or left-adjusted. The following example shows a function with its body left-adjusted. Most examples in this document will indent the body for better readability.
- CORRECT
- double
- sin(x)
- double x
More complicated XSUBs may contain many other sections. Each section of an XSUB starts with the corresponding keyword, such as INIT: or CLEANUP:. However, the first two lines of an XSUB always contain the same data: descriptions of the return type and the names of the function and its parameters. Whatever immediately follows these is considered to be an INPUT: section unless explicitly marked with another keyword. (See The INPUT: Keyword.)
An XSUB section continues until another section-start keyword is found.
The Perl argument stack is used to store the values which are sent as parameters to the XSUB and to store the XSUB's return value(s). In reality all Perl functions (including non-XSUB ones) keep their values on this stack all the same time, each limited to its own range of positions on the stack. In this document the first position on that stack which belongs to the active function will be referred to as position 0 for that function.
XSUBs refer to their stack arguments with the macro ST(x), where x refers to a position in this XSUB's part of the stack. Position 0 for that function would be known to the XSUB as ST(0). The XSUB's incoming parameters and outgoing return values always begin at ST(0). For many simple cases the xsubpp compiler will generate the code necessary to handle the argument stack by embedding code fragments found in the typemaps. In more complex cases the programmer must supply the code.
The RETVAL variable is a special C variable that is declared automatically
for you. The C type of RETVAL matches the return type of the C library
function. The xsubpp compiler will declare this variable in each XSUB
with non-void
return type. By default the generated C function
will use RETVAL to hold the return value of the C library function being
called. In simple cases the value of RETVAL will be placed in ST(0) of
the argument stack where it can be received by Perl as the return value
of the XSUB.
If the XSUB has a return type of void
then the compiler will
not declare a RETVAL variable for that function. When using
a PPCODE: section no manipulation of the RETVAL variable is required, the
section may use direct stack manipulation to place output values on the stack.
If PPCODE: directive is not used, void
return value should be used
only for subroutines which do not return a value, even if CODE:
directive is used which sets ST(0) explicitly.
Older versions of this document recommended to use void
return
value in such cases. It was discovered that this could lead to
segfaults in cases when XSUB was truly void
. This practice is
now deprecated, and may be not supported at some future version. Use
the return value SV *
in such cases. (Currently xsubpp
contains
some heuristic code which tries to disambiguate between "truly-void"
and "old-practice-declared-as-void" functions. Hence your code is at
mercy of this heuristics unless you use SV *
as return value.)
When you're using RETVAL to return an SV *
, there's some magic
going on behind the scenes that should be mentioned. When you're
manipulating the argument stack using the ST(x) macro, for example,
you usually have to pay special attention to reference counts. (For
more about reference counts, see perlguts.) To make your life
easier, the typemap file automatically makes RETVAL
mortal when
you're returning an SV *
. Thus, the following two XSUBs are more
or less equivalent:
- void
- alpha()
- PPCODE:
- ST(0) = newSVpv("Hello World",0);
- sv_2mortal(ST(0));
- XSRETURN(1);
- SV *
- beta()
- CODE:
- RETVAL = newSVpv("Hello World",0);
- OUTPUT:
- RETVAL
This is quite useful as it usually improves readability. While
this works fine for an SV *
, it's unfortunately not as easy
to have AV *
or HV *
as a return value. You should be
able to write:
- AV *
- array()
- CODE:
- RETVAL = newAV();
- /* do something with RETVAL */
- OUTPUT:
- RETVAL
But due to an unfixable bug (fixing it would break lots of existing
CPAN modules) in the typemap file, the reference count of the AV *
is not properly decremented. Thus, the above XSUB would leak memory
whenever it is being called. The same problem exists for HV *
,
CV *
, and SVREF
(which indicates a scalar reference, not
a general SV *
).
In XS code on perls starting with perl 5.16, you can override the
typemaps for any of these types with a version that has proper
handling of refcounts. In your TYPEMAP
section, do
- AV* T_AVREF_REFCOUNT_FIXED
to get the repaired variant. For backward compatibility with older
versions of perl, you can instead decrement the reference count
manually when you're returning one of the aforementioned
types using sv_2mortal
:
- AV *
- array()
- CODE:
- RETVAL = newAV();
- sv_2mortal((SV*)RETVAL);
- /* do something with RETVAL */
- OUTPUT:
- RETVAL
Remember that you don't have to do this for an SV *
. The reference
documentation for all core typemaps can be found in perlxstypemap.
The MODULE keyword is used to start the XS code and to specify the package of the functions which are being defined. All text preceding the first MODULE keyword is considered C code and is passed through to the output with POD stripped, but otherwise untouched. Every XS module will have a bootstrap function which is used to hook the XSUBs into Perl. The package name of this bootstrap function will match the value of the last MODULE statement in the XS source files. The value of MODULE should always remain constant within the same XS file, though this is not required.
The following example will start the XS code and will place all functions in a package named RPC.
- MODULE = RPC
When functions within an XS source file must be separated into packages the PACKAGE keyword should be used. This keyword is used with the MODULE keyword and must follow immediately after it when used.
- MODULE = RPC PACKAGE = RPC
- [ XS code in package RPC ]
- MODULE = RPC PACKAGE = RPCB
- [ XS code in package RPCB ]
- MODULE = RPC PACKAGE = RPC
- [ XS code in package RPC ]
The same package name can be used more than once, allowing for non-contiguous code. This is useful if you have a stronger ordering principle than package names.
Although this keyword is optional and in some cases provides redundant information it should always be used. This keyword will ensure that the XSUBs appear in the desired package.
The PREFIX keyword designates prefixes which should be
removed from the Perl function names. If the C function is
rpcb_gettime()
and the PREFIX value is rpcb_
then Perl will
see this function as gettime()
.
This keyword should follow the PACKAGE keyword when used. If PACKAGE is not used then PREFIX should follow the MODULE keyword.
- MODULE = RPC PREFIX = rpc_
- MODULE = RPC PACKAGE = RPCB PREFIX = rpcb_
The OUTPUT: keyword indicates that certain function parameters should be updated (new values made visible to Perl) when the XSUB terminates or that certain values should be returned to the calling Perl function. For simple functions which have no CODE: or PPCODE: section, such as the sin() function above, the RETVAL variable is automatically designated as an output value. For more complex functions the xsubpp compiler will need help to determine which variables are output variables.
This keyword will normally be used to complement the CODE: keyword. The RETVAL variable is not recognized as an output variable when the CODE: keyword is present. The OUTPUT: keyword is used in this situation to tell the compiler that RETVAL really is an output variable.
The OUTPUT: keyword can also be used to indicate that function parameters are output variables. This may be necessary when a parameter has been modified within the function and the programmer would like the update to be seen by Perl.
- bool_t
- rpcb_gettime(host,timep)
- char *host
- time_t &timep
- OUTPUT:
- timep
The OUTPUT: keyword will also allow an output parameter to be mapped to a matching piece of code rather than to a typemap.
- bool_t
- rpcb_gettime(host,timep)
- char *host
- time_t &timep
- OUTPUT:
- timep sv_setnv(ST(1), (double)timep);
xsubpp emits an automatic SvSETMAGIC()
for all parameters in the
OUTPUT section of the XSUB, except RETVAL. This is the usually desired
behavior, as it takes care of properly invoking 'set' magic on output
parameters (needed for hash or array element parameters that must be
created if they didn't exist). If for some reason, this behavior is
not desired, the OUTPUT section may contain a SETMAGIC: DISABLE
line
to disable it for the remainder of the parameters in the OUTPUT section.
Likewise, SETMAGIC: ENABLE
can be used to reenable it for the
remainder of the OUTPUT section. See perlguts for more details
about 'set' magic.
The NO_OUTPUT can be placed as the first token of the XSUB. This keyword
indicates that while the C subroutine we provide an interface to has
a non-void
return type, the return value of this C subroutine should not
be returned from the generated Perl subroutine.
With this keyword present The RETVAL Variable is created, and in the generated call to the subroutine this variable is assigned to, but the value of this variable is not going to be used in the auto-generated code.
This keyword makes sense only if RETVAL
is going to be accessed by the
user-supplied code. It is especially useful to make a function interface
more Perl-like, especially when the C return value is just an error condition
indicator. For example,
- NO_OUTPUT int
- delete_file(char *name)
- POSTCALL:
- if (RETVAL != 0)
- croak("Error %d while deleting file '%s'", RETVAL, name);
Here the generated XS function returns nothing on success, and will die() with a meaningful error message on error.
This keyword is used in more complicated XSUBs which require special handling for the C function. The RETVAL variable is still declared, but it will not be returned unless it is specified in the OUTPUT: section.
The following XSUB is for a C function which requires special handling of its parameters. The Perl usage is given first.
- $status = rpcb_gettime( "localhost", $timep );
The XSUB follows.
- bool_t
- rpcb_gettime(host,timep)
- char *host
- time_t timep
- CODE:
- RETVAL = rpcb_gettime( host, &timep );
- OUTPUT:
- timep
- RETVAL
The INIT: keyword allows initialization to be inserted into the XSUB before the compiler generates the call to the C function. Unlike the CODE: keyword above, this keyword does not affect the way the compiler handles RETVAL.
- bool_t
- rpcb_gettime(host,timep)
- char *host
- time_t &timep
- INIT:
- printf("# Host is %s\n", host );
- OUTPUT:
- timep
Another use for the INIT: section is to check for preconditions before making a call to the C function:
- long long
- lldiv(a,b)
- long long a
- long long b
- INIT:
- if (a == 0 && b == 0)
- XSRETURN_UNDEF;
- if (b == 0)
- croak("lldiv: cannot divide by 0");
The NO_INIT keyword is used to indicate that a function parameter is being used only as an output value. The xsubpp compiler will normally generate code to read the values of all function parameters from the argument stack and assign them to C variables upon entry to the function. NO_INIT will tell the compiler that some parameters will be used for output rather than for input and that they will be handled before the function terminates.
The following example shows a variation of the rpcb_gettime() function. This function uses the timep variable only as an output variable and does not care about its initial contents.
- bool_t
- rpcb_gettime(host,timep)
- char *host
- time_t &timep = NO_INIT
- OUTPUT:
- timep
Starting with Perl 5.16, you can embed typemaps into your XS code instead of or in addition to typemaps in a separate file. Multiple such embedded typemaps will be processed in order of appearance in the XS code and like local typemap files take precendence over the default typemap, the embedded typemaps may overwrite previous definitions of TYPEMAP, INPUT, and OUTPUT stanzas. The syntax for embedded typemaps is
- TYPEMAP: <<HERE
- ... your typemap code here ...
- HERE
where the TYPEMAP
keyword must appear in the first column of a
new line.
Refer to perlxstypemap for details on writing typemaps.
C function parameters are normally initialized with their values from
the argument stack (which in turn contains the parameters that were
passed to the XSUB from Perl). The typemaps contain the
code segments which are used to translate the Perl values to
the C parameters. The programmer, however, is allowed to
override the typemaps and supply alternate (or additional)
initialization code. Initialization code starts with the first
=
, ;
or +
on a line in the INPUT: section. The only
exception happens if this ;
terminates the line, then this ;
is quietly ignored.
The following code demonstrates how to supply initialization code for
function parameters. The initialization code is eval'ed within double
quotes by the compiler before it is added to the output so anything
which should be interpreted literally [mainly $
, @
, or \\
]
must be protected with backslashes. The variables $var
, $arg
,
and $type
can be used as in typemaps.
- bool_t
- rpcb_gettime(host,timep)
- char *host = (char *)SvPV_nolen($arg);
- time_t &timep = 0;
- OUTPUT:
- timep
This should not be used to supply default values for parameters. One would normally use this when a function parameter must be processed by another library function before it can be used. Default parameters are covered in the next section.
If the initialization begins with =
, then it is output in
the declaration for the input variable, replacing the initialization
supplied by the typemap. If the initialization
begins with ;
or +
, then it is performed after
all of the input variables have been declared. In the ;
case the initialization normally supplied by the typemap is not performed.
For the +
case, the declaration for the variable will include the
initialization from the typemap. A global
variable, %v
, is available for the truly rare case where
information from one initialization is needed in another
initialization.
Here's a truly obscure example:
- bool_t
- rpcb_gettime(host,timep)
- time_t &timep; /* \$v{timep}=@{[$v{timep}=$arg]} */
- char *host + SvOK($v{timep}) ? SvPV_nolen($arg) : NULL;
- OUTPUT:
- timep
The construct \$v{timep}=@{[$v{timep}=$arg]}
used in the above
example has a two-fold purpose: first, when this line is processed by
xsubpp, the Perl snippet $v{timep}=$arg
is evaluated. Second,
the text of the evaluated snippet is output into the generated C file
(inside a C comment)! During the processing of char *host
line,
$arg
will evaluate to ST(0)
, and $v{timep}
will evaluate to
ST(1)
.
Default values for XSUB arguments can be specified by placing an
assignment statement in the parameter list. The default value may
be a number, a string or the special string NO_INIT
. Defaults should
always be used on the right-most parameters only.
To allow the XSUB for rpcb_gettime() to have a default host value the parameters to the XSUB could be rearranged. The XSUB will then call the real rpcb_gettime() function with the parameters in the correct order. This XSUB can be called from Perl with either of the following statements:
- $status = rpcb_gettime( $timep, $host );
- $status = rpcb_gettime( $timep );
The XSUB will look like the code which follows. A CODE: block is used to call the real rpcb_gettime() function with the parameters in the correct order for that function.
- bool_t
- rpcb_gettime(timep,host="localhost")
- char *host
- time_t timep = NO_INIT
- CODE:
- RETVAL = rpcb_gettime( host, &timep );
- OUTPUT:
- timep
- RETVAL
The PREINIT: keyword allows extra variables to be declared immediately before or after the declarations of the parameters from the INPUT: section are emitted.
If a variable is declared inside a CODE: section it will follow any typemap
code that is emitted for the input parameters. This may result in the
declaration ending up after C code, which is C syntax error. Similar
errors may happen with an explicit ;
-type or +
-type initialization of
parameters is used (see Initializing Function Parameters). Declaring
these variables in an INIT: section will not help.
In such cases, to force an additional variable to be declared together with declarations of other variables, place the declaration into a PREINIT: section. The PREINIT: keyword may be used one or more times within an XSUB.
The following examples are equivalent, but if the code is using complex typemaps then the first example is safer.
- bool_t
- rpcb_gettime(timep)
- time_t timep = NO_INIT
- PREINIT:
- char *host = "localhost";
- CODE:
- RETVAL = rpcb_gettime( host, &timep );
- OUTPUT:
- timep
- RETVAL
For this particular case an INIT: keyword would generate the same C code as the PREINIT: keyword. Another correct, but error-prone example:
- bool_t
- rpcb_gettime(timep)
- time_t timep = NO_INIT
- CODE:
- char *host = "localhost";
- RETVAL = rpcb_gettime( host, &timep );
- OUTPUT:
- timep
- RETVAL
Another way to declare host
is to use a C block in the CODE: section:
- bool_t
- rpcb_gettime(timep)
- time_t timep = NO_INIT
- CODE:
- {
- char *host = "localhost";
- RETVAL = rpcb_gettime( host, &timep );
- }
- OUTPUT:
- timep
- RETVAL
The ability to put additional declarations before the typemap entries are processed is very handy in the cases when typemap conversions manipulate some global state:
- MyObject
- mutate(o)
- PREINIT:
- MyState st = global_state;
- INPUT:
- MyObject o;
- CLEANUP:
- reset_to(global_state, st);
Here we suppose that conversion to MyObject
in the INPUT: section and from
MyObject when processing RETVAL will modify a global variable global_state
.
After these conversions are performed, we restore the old value of
global_state
(to avoid memory leaks, for example).
There is another way to trade clarity for compactness: INPUT sections allow declaration of C variables which do not appear in the parameter list of a subroutine. Thus the above code for mutate() can be rewritten as
- MyObject
- mutate(o)
- MyState st = global_state;
- MyObject o;
- CLEANUP:
- reset_to(global_state, st);
and the code for rpcb_gettime() can be rewritten as
- bool_t
- rpcb_gettime(timep)
- time_t timep = NO_INIT
- char *host = "localhost";
- C_ARGS:
- host, &timep
- OUTPUT:
- timep
- RETVAL
The SCOPE: keyword allows scoping to be enabled for a particular XSUB. If enabled, the XSUB will invoke ENTER and LEAVE automatically.
To support potentially complex type mappings, if a typemap entry used
by an XSUB contains a comment like /*scope*/
then scoping will
be automatically enabled for that XSUB.
To enable scoping:
- SCOPE: ENABLE
To disable scoping:
- SCOPE: DISABLE
The XSUB's parameters are usually evaluated immediately after entering the XSUB. The INPUT: keyword can be used to force those parameters to be evaluated a little later. The INPUT: keyword can be used multiple times within an XSUB and can be used to list one or more input variables. This keyword is used with the PREINIT: keyword.
The following example shows how the input parameter timep
can be
evaluated late, after a PREINIT.
- bool_t
- rpcb_gettime(host,timep)
- char *host
- PREINIT:
- time_t tt;
- INPUT:
- time_t timep
- CODE:
- RETVAL = rpcb_gettime( host, &tt );
- timep = tt;
- OUTPUT:
- timep
- RETVAL
The next example shows each input parameter evaluated late.
- bool_t
- rpcb_gettime(host,timep)
- PREINIT:
- time_t tt;
- INPUT:
- char *host
- PREINIT:
- char *h;
- INPUT:
- time_t timep
- CODE:
- h = host;
- RETVAL = rpcb_gettime( h, &tt );
- timep = tt;
- OUTPUT:
- timep
- RETVAL
Since INPUT sections allow declaration of C variables which do not appear in the parameter list of a subroutine, this may be shortened to:
- bool_t
- rpcb_gettime(host,timep)
- time_t tt;
- char *host;
- char *h = host;
- time_t timep;
- CODE:
- RETVAL = rpcb_gettime( h, &tt );
- timep = tt;
- OUTPUT:
- timep
- RETVAL
(We used our knowledge that input conversion for char *
is a "simple" one,
thus host
is initialized on the declaration line, and our assignment
h = host
is not performed too early. Otherwise one would need to have the
assignment h = host
in a CODE: or INIT: section.)
In the list of parameters for an XSUB, one can precede parameter names
by the IN
/OUTLIST
/IN_OUTLIST
/OUT
/IN_OUT
keywords.
IN
keyword is the default, the other keywords indicate how the Perl
interface should differ from the C interface.
Parameters preceded by OUTLIST
/IN_OUTLIST
/OUT
/IN_OUT
keywords are considered to be used by the C subroutine via
pointers. OUTLIST
/OUT
keywords indicate that the C subroutine
does not inspect the memory pointed by this parameter, but will write
through this pointer to provide additional return values.
Parameters preceded by OUTLIST
keyword do not appear in the usage
signature of the generated Perl function.
Parameters preceded by IN_OUTLIST
/IN_OUT
/OUT
do appear as
parameters to the Perl function. With the exception of
OUT
-parameters, these parameters are converted to the corresponding
C type, then pointers to these data are given as arguments to the C
function. It is expected that the C function will write through these
pointers.
The return list of the generated Perl function consists of the C return value
from the function (unless the XSUB is of void
return type or
The NO_OUTPUT Keyword
was used) followed by all the OUTLIST
and IN_OUTLIST
parameters (in the order of appearance). On the
return from the XSUB the IN_OUT
/OUT
Perl parameter will be
modified to have the values written by the C function.
For example, an XSUB
should be used from Perl as
The C signature of the corresponding function should be
The IN
/OUTLIST
/IN_OUTLIST
/IN_OUT
/OUT
keywords can be
mixed with ANSI-style declarations, as in
(here the optional IN
keyword is omitted).
The IN_OUT
parameters are identical with parameters introduced with
The & Unary Operator and put into the OUTPUT:
section (see
The OUTPUT: Keyword). The IN_OUTLIST
parameters are very similar,
the only difference being that the value C function writes through the
pointer would not modify the Perl parameter, but is put in the output
list.
The OUTLIST
/OUT
parameter differ from IN_OUTLIST
/IN_OUT
parameters only by the initial value of the Perl parameter not
being read (and not being given to the C function - which gets some
garbage instead). For example, the same C function as above can be
interfaced with as
or
- void
- day_month(day, unix_time, month)
- int &day = NO_INIT
- int unix_time
- int &month = NO_INIT
- OUTPUT:
- day
- month
However, the generated Perl function is called in very C-ish style:
length(NAME) KeywordIf one of the input arguments to the C function is the length of a string
argument NAME
, one can substitute the name of the length-argument by
length(NAME) in the XSUB declaration. This argument must be omitted when
the generated Perl function is called. E.g.,
- void
- dump_chars(char *s, short l)
- {
- short n = 0;
- while (n < l) {
- printf("s[%d] = \"\\%#03o\"\n", n, (int)s[n]);
- n++;
- }
- }
- MODULE = x PACKAGE = x
- void dump_chars(char *s, short length(s))
should be called as dump_chars($string)
.
This directive is supported with ANSI-type function declarations only.
XSUBs can have variable-length parameter lists by specifying an ellipsis
(...)
in the parameter list. This use of the ellipsis is similar to that
found in ANSI C. The programmer is able to determine the number of
arguments passed to the XSUB by examining the items
variable which the
xsubpp compiler supplies for all XSUBs. By using this mechanism one can
create an XSUB which accepts a list of parameters of unknown length.
The host parameter for the rpcb_gettime() XSUB can be optional so the ellipsis can be used to indicate that the XSUB will take a variable number of parameters. Perl should be able to call this XSUB with either of the following statements.
- $status = rpcb_gettime( $timep, $host );
- $status = rpcb_gettime( $timep );
The XS code, with ellipsis, follows.
- bool_t
- rpcb_gettime(timep, ...)
- time_t timep = NO_INIT
- PREINIT:
- char *host = "localhost";
- CODE:
- if( items > 1 )
- host = (char *)SvPV_nolen(ST(1));
- RETVAL = rpcb_gettime( host, &timep );
- OUTPUT:
- timep
- RETVAL
The C_ARGS: keyword allows creating of XSUBS which have different calling sequence from Perl than from C, without a need to write CODE: or PPCODE: section. The contents of the C_ARGS: paragraph is put as the argument to the called C function without any change.
For example, suppose that a C function is declared as
and that the default flags are kept in a global C variable
default_flags
. Suppose that you want to create an interface which
is called as
- $second_deriv = $function->nth_derivative(2);
To do this, declare the XSUB as
- symbolic
- nth_derivative(function, n)
- symbolic function
- int n
- C_ARGS:
- n, function, default_flags
The PPCODE: keyword is an alternate form of the CODE: keyword and is used to tell the xsubpp compiler that the programmer is supplying the code to control the argument stack for the XSUBs return values. Occasionally one will want an XSUB to return a list of values rather than a single value. In these cases one must use PPCODE: and then explicitly push the list of values on the stack. The PPCODE: and CODE: keywords should not be used together within the same XSUB.
The actual difference between PPCODE: and CODE: sections is in the
initialization of SP
macro (which stands for the current Perl
stack pointer), and in the handling of data on the stack when returning
from an XSUB. In CODE: sections SP preserves the value which was on
entry to the XSUB: SP is on the function pointer (which follows the
last parameter). In PPCODE: sections SP is moved backward to the
beginning of the parameter list, which allows PUSH*()
macros
to place output values in the place Perl expects them to be when
the XSUB returns back to Perl.
The generated trailer for a CODE: section ensures that the number of return
values Perl will see is either 0 or 1 (depending on the void
ness of the
return value of the C function, and heuristics mentioned in
The RETVAL Variable). The trailer generated for a PPCODE: section
is based on the number of return values and on the number of times
SP
was updated by [X]PUSH*()
macros.
Note that macros ST(i)
, XST_m*()
and XSRETURN*()
work equally
well in CODE: sections and PPCODE: sections.
The following XSUB will call the C rpcb_gettime() function and will return its two output values, timep and status, to Perl as a single list.
- void
- rpcb_gettime(host)
- char *host
- PREINIT:
- time_t timep;
- bool_t status;
- PPCODE:
- status = rpcb_gettime( host, &timep );
- EXTEND(SP, 2);
- PUSHs(sv_2mortal(newSViv(status)));
- PUSHs(sv_2mortal(newSViv(timep)));
Notice that the programmer must supply the C code necessary to have the real rpcb_gettime() function called and to have the return values properly placed on the argument stack.
The void
return type for this function tells the xsubpp compiler that
the RETVAL variable is not needed or used and that it should not be created.
In most scenarios the void return type should be used with the PPCODE:
directive.
The EXTEND() macro is used to make room on the argument
stack for 2 return values. The PPCODE: directive causes the
xsubpp compiler to create a stack pointer available as SP
, and it
is this pointer which is being used in the EXTEND() macro.
The values are then pushed onto the stack with the PUSHs()
macro.
Now the rpcb_gettime() function can be used from Perl with the following statement.
- ($status, $timep) = rpcb_gettime("localhost");
When handling output parameters with a PPCODE section, be sure to handle 'set' magic properly. See perlguts for details about 'set' magic.
Occasionally the programmer will want to return simply
undef or an empty list if a function fails rather than a
separate status value. The rpcb_gettime() function offers
just this situation. If the function succeeds we would like
to have it return the time and if it fails we would like to
have undef returned. In the following Perl code the value
of $timep will either be undef or it will be a valid time.
- $timep = rpcb_gettime( "localhost" );
The following XSUB uses the SV *
return type as a mnemonic only,
and uses a CODE: block to indicate to the compiler
that the programmer has supplied all the necessary code. The
sv_newmortal() call will initialize the return value to undef, making that
the default return value.
- SV *
- rpcb_gettime(host)
- char * host
- PREINIT:
- time_t timep;
- bool_t x;
- CODE:
- ST(0) = sv_newmortal();
- if( rpcb_gettime( host, &timep ) )
- sv_setnv( ST(0), (double)timep);
The next example demonstrates how one would place an explicit undef in the return value, should the need arise.
- SV *
- rpcb_gettime(host)
- char * host
- PREINIT:
- time_t timep;
- bool_t x;
- CODE:
- if( rpcb_gettime( host, &timep ) ){
- ST(0) = sv_newmortal();
- sv_setnv( ST(0), (double)timep);
- }
- else{
- ST(0) = &PL_sv_undef;
- }
To return an empty list one must use a PPCODE: block and then not push return values on the stack.
- void
- rpcb_gettime(host)
- char *host
- PREINIT:
- time_t timep;
- PPCODE:
- if( rpcb_gettime( host, &timep ) )
- PUSHs(sv_2mortal(newSViv(timep)));
- else{
- /* Nothing pushed on stack, so an empty
- * list is implicitly returned. */
- }
Some people may be inclined to include an explicit return in the above
XSUB, rather than letting control fall through to the end. In those
situations XSRETURN_EMPTY
should be used, instead. This will ensure that
the XSUB stack is properly adjusted. Consult perlapi for other
XSRETURN
macros.
Since XSRETURN_*
macros can be used with CODE blocks as well, one can
rewrite this example as:
- int
- rpcb_gettime(host)
- char *host
- PREINIT:
- time_t timep;
- CODE:
- RETVAL = rpcb_gettime( host, &timep );
- if (RETVAL == 0)
- XSRETURN_UNDEF;
- OUTPUT:
- RETVAL
In fact, one can put this check into a POSTCALL: section as well. Together with PREINIT: simplifications, this leads to:
- int
- rpcb_gettime(host)
- char *host
- time_t timep;
- POSTCALL:
- if (RETVAL == 0)
- XSRETURN_UNDEF;
The REQUIRE: keyword is used to indicate the minimum version of the xsubpp compiler needed to compile the XS module. An XS module which contains the following statement will compile with only xsubpp version 1.922 or greater:
- REQUIRE: 1.922
This keyword can be used when an XSUB requires special cleanup procedures before it terminates. When the CLEANUP: keyword is used it must follow any CODE:, PPCODE:, or OUTPUT: blocks which are present in the XSUB. The code specified for the cleanup block will be added as the last statements in the XSUB.
This keyword can be used when an XSUB requires special procedures executed after the C subroutine call is performed. When the POSTCALL: keyword is used it must precede OUTPUT: and CLEANUP: blocks which are present in the XSUB.
See examples in The NO_OUTPUT Keyword and Returning Undef And Empty Lists.
The POSTCALL: block does not make a lot of sense when the C subroutine call is supplied by user by providing either CODE: or PPCODE: section.
The BOOT: keyword is used to add code to the extension's bootstrap function. The bootstrap function is generated by the xsubpp compiler and normally holds the statements necessary to register any XSUBs with Perl. With the BOOT: keyword the programmer can tell the compiler to add extra statements to the bootstrap function.
This keyword may be used any time after the first MODULE keyword and should appear on a line by itself. The first blank line after the keyword will terminate the code block.
- BOOT:
- # The following message will be printed when the
- # bootstrap function executes.
- printf("Hello from the bootstrap!\n");
The VERSIONCHECK: keyword corresponds to xsubpp's -versioncheck
and
-noversioncheck
options. This keyword overrides the command line
options. Version checking is enabled by default. When version checking is
enabled the XS module will attempt to verify that its version matches the
version of the PM module.
To enable version checking:
- VERSIONCHECK: ENABLE
To disable version checking:
- VERSIONCHECK: DISABLE
Note that if the version of the PM module is an NV (a floating point number), it will be stringified with a possible loss of precision (currently chopping to nine decimal places) so that it may not match the version of the XS module anymore. Quoting the $VERSION declaration to make it a string is recommended if long version numbers are used.
The PROTOTYPES: keyword corresponds to xsubpp's -prototypes
and
-noprototypes
options. This keyword overrides the command line options.
Prototypes are enabled by default. When prototypes are enabled XSUBs will
be given Perl prototypes. This keyword may be used multiple times in an XS
module to enable and disable prototypes for different parts of the module.
To enable prototypes:
- PROTOTYPES: ENABLE
To disable prototypes:
- PROTOTYPES: DISABLE
This keyword is similar to the PROTOTYPES: keyword above but can be used to force xsubpp to use a specific prototype for the XSUB. This keyword overrides all other prototype options and keywords but affects only the current XSUB. Consult Prototypes in perlsub for information about Perl prototypes.
- bool_t
- rpcb_gettime(timep, ...)
- time_t timep = NO_INIT
- PROTOTYPE: $;$
- PREINIT:
- char *host = "localhost";
- CODE:
- if( items > 1 )
- host = (char *)SvPV_nolen(ST(1));
- RETVAL = rpcb_gettime( host, &timep );
- OUTPUT:
- timep
- RETVAL
If the prototypes are enabled, you can disable it locally for a given XSUB as in the following example:
- void
- rpcb_gettime_noproto()
- PROTOTYPE: DISABLE
- ...
The ALIAS: keyword allows an XSUB to have two or more unique Perl names
and to know which of those names was used when it was invoked. The Perl
names may be fully-qualified with package names. Each alias is given an
index. The compiler will setup a variable called ix
which contain the
index of the alias which was used. When the XSUB is called with its
declared name ix
will be 0.
The following example will create aliases FOO::gettime()
and
BAR::getit()
for this function.
- bool_t
- rpcb_gettime(host,timep)
- char *host
- time_t &timep
- ALIAS:
- FOO::gettime = 1
- BAR::getit = 2
- INIT:
- printf("# ix = %d\n", ix );
- OUTPUT:
- timep
Instead of writing an overloaded interface using pure Perl, you can also use the OVERLOAD keyword to define additional Perl names for your functions (like the ALIAS: keyword above). However, the overloaded functions must be defined with three parameters (except for the nomethod() function which needs four parameters). If any function has the OVERLOAD: keyword, several additional lines will be defined in the c file generated by xsubpp in order to register with the overload magic.
Since blessed objects are actually stored as RV's, it is useful to use the typemap features to preprocess parameters and extract the actual SV stored within the blessed RV. See the sample for T_PTROBJ_SPECIAL below.
To use the OVERLOAD: keyword, create an XS function which takes three input parameters ( or use the c style '...' definition) like this:
- SV *
- cmp (lobj, robj, swap)
- My_Module_obj lobj
- My_Module_obj robj
- IV swap
- OVERLOAD: cmp <=>
- { /* function defined here */}
In this case, the function will overload both of the three way comparison operators. For all overload operations using non-alpha characters, you must type the parameter without quoting, separating multiple overloads with whitespace. Note that "" (the stringify overload) should be entered as \"\" (i.e. escaped).
In addition to the OVERLOAD keyword, if you need to control how Perl autogenerates missing overloaded operators, you can set the FALLBACK keyword in the module header section, like this:
- MODULE = RPC PACKAGE = RPC
- FALLBACK: TRUE
- ...
where FALLBACK can take any of the three values TRUE, FALSE, or UNDEF. If you do not set any FALLBACK value when using OVERLOAD, it defaults to UNDEF. FALLBACK is not used except when one or more functions using OVERLOAD have been defined. Please see fallback in overload for more details.
This keyword declares the current XSUB as a keeper of the given calling signature. If some text follows this keyword, it is considered as a list of functions which have this signature, and should be attached to the current XSUB.
For example, if you have 4 C functions multiply(), divide(), add(), subtract() all having the signature:
- symbolic f(symbolic, symbolic);
you can make them all to use the same XSUB using this:
- symbolic
- interface_s_ss(arg1, arg2)
- symbolic arg1
- symbolic arg2
- INTERFACE:
- multiply divide
- add subtract
(This is the complete XSUB code for 4 Perl functions!) Four generated Perl function share names with corresponding C functions.
The advantage of this approach comparing to ALIAS: keyword is that there is no need to code a switch statement, each Perl function (which shares the same XSUB) knows which C function it should call. Additionally, one can attach an extra function remainder() at runtime by using
- CV *mycv = newXSproto("Symbolic::remainder",
- XS_Symbolic_interface_s_ss, __FILE__, "$$");
- XSINTERFACE_FUNC_SET(mycv, remainder);
say, from another XSUB. (This example supposes that there was no
INTERFACE_MACRO: section, otherwise one needs to use something else instead of
XSINTERFACE_FUNC_SET
, see the next section.)
This keyword allows one to define an INTERFACE using a different way
to extract a function pointer from an XSUB. The text which follows
this keyword should give the name of macros which would extract/set a
function pointer. The extractor macro is given return type, CV*
,
and XSANY.any_dptr
for this CV*
. The setter macro is given cv,
and the function pointer.
The default value is XSINTERFACE_FUNC
and XSINTERFACE_FUNC_SET
.
An INTERFACE keyword with an empty list of functions can be omitted if
INTERFACE_MACRO keyword is used.
Suppose that in the previous example functions pointers for
multiply(), divide(), add(), subtract() are kept in a global C array
fp[]
with offsets being multiply_off
, divide_off
, add_off
,
subtract_off
. Then one can use
- #define XSINTERFACE_FUNC_BYOFFSET(ret,cv,f) \
- ((XSINTERFACE_CVT_ANON(ret))fp[CvXSUBANY(cv).any_i32])
- #define XSINTERFACE_FUNC_BYOFFSET_set(cv,f) \
- CvXSUBANY(cv).any_i32 = CAT2( f, _off )
in C section,
- symbolic
- interface_s_ss(arg1, arg2)
- symbolic arg1
- symbolic arg2
- INTERFACE_MACRO:
- XSINTERFACE_FUNC_BYOFFSET
- XSINTERFACE_FUNC_BYOFFSET_set
- INTERFACE:
- multiply divide
- add subtract
in XSUB section.
This keyword can be used to pull other files into the XS module. The other files may have XS code. INCLUDE: can also be used to run a command to generate the XS code to be pulled into the module.
The file Rpcb1.xsh contains our rpcb_gettime()
function:
- bool_t
- rpcb_gettime(host,timep)
- char *host
- time_t &timep
- OUTPUT:
- timep
The XS module can use INCLUDE: to pull that file into it.
- INCLUDE: Rpcb1.xsh
If the parameters to the INCLUDE: keyword are followed by a pipe (|) then
the compiler will interpret the parameters as a command. This feature is
mildly deprecated in favour of the INCLUDE_COMMAND:
directive, as documented
below.
- INCLUDE: cat Rpcb1.xsh |
Do not use this to run perl: INCLUDE: perl |
will run the perl that
happens to be the first in your path and not necessarily the same perl that is
used to run xsubpp
. See The INCLUDE_COMMAND: Keyword.
Runs the supplied command and includes its output into the current XS
document. INCLUDE_COMMAND
assigns special meaning to the $^X
token
in that it runs the same perl interpreter that is running xsubpp
:
- INCLUDE_COMMAND: cat Rpcb1.xsh
- INCLUDE_COMMAND: $^X -e ...
The CASE: keyword allows an XSUB to have multiple distinct parts with each part acting as a virtual XSUB. CASE: is greedy and if it is used then all other XS keywords must be contained within a CASE:. This means nothing may precede the first CASE: in the XSUB and anything following the last CASE: is included in that case.
A CASE: might switch via a parameter of the XSUB, via the ix
ALIAS:
variable (see The ALIAS: Keyword), or maybe via the items
variable
(see Variable-length Parameter Lists). The last CASE: becomes the
default case if it is not associated with a conditional. The following
example shows CASE switched via ix
with a function rpcb_gettime()
having an alias x_gettime()
. When the function is called as
rpcb_gettime()
its parameters are the usual (char *host, time_t *timep)
,
but when the function is called as x_gettime()
its parameters are
reversed, (time_t *timep, char *host)
.
- long
- rpcb_gettime(a,b)
- CASE: ix == 1
- ALIAS:
- x_gettime = 1
- INPUT:
- # 'a' is timep, 'b' is host
- char *b
- time_t a = NO_INIT
- CODE:
- RETVAL = rpcb_gettime( b, &a );
- OUTPUT:
- a
- RETVAL
- CASE:
- # 'a' is host, 'b' is timep
- char *a
- time_t &b = NO_INIT
- OUTPUT:
- b
- RETVAL
That function can be called with either of the following statements. Note the different argument lists.
- $status = rpcb_gettime( $host, $timep );
- $status = x_gettime( $timep, $host );
The EXPORT_XSUB_SYMBOLS: keyword is likely something you will never need.
In perl versions earlier than 5.16.0, this keyword does nothing. Starting
with 5.16, XSUB symbols are no longer exported by default. That is, they
are static
functions. If you include
- EXPORT_XSUB_SYMBOLS: ENABLE
in your XS code, the XSUBs following this line will not be declared static
.
You can later disable this with
- EXPORT_XSUB_SYMBOLS: DISABLE
which, again, is the default that you should probably never change.
You cannot use this keyword on versions of perl before 5.16 to make
XSUBs static
.
The &
unary operator in the INPUT: section is used to tell xsubpp
that it should convert a Perl value to/from C using the C type to the left
of &
, but provide a pointer to this value when the C function is called.
This is useful to avoid a CODE: block for a C function which takes a parameter
by reference. Typically, the parameter should be not a pointer type (an
int or long
but not an int* or long*
).
The following XSUB will generate incorrect C code. The xsubpp compiler will
turn this into code which calls rpcb_gettime()
with parameters (char
*host, time_t timep)
, but the real rpcb_gettime()
wants the timep
parameter to be of type time_t*
rather than time_t
.
- bool_t
- rpcb_gettime(host,timep)
- char *host
- time_t timep
- OUTPUT:
- timep
That problem is corrected by using the &
operator. The xsubpp compiler
will now turn this into code which calls rpcb_gettime()
correctly with
parameters (char *host, time_t *timep)
. It does this by carrying the
&
through, so the function call looks like rpcb_gettime(host, &timep)
.
- bool_t
- rpcb_gettime(host,timep)
- char *host
- time_t &timep
- OUTPUT:
- timep
C preprocessor directives are allowed within BOOT:, PREINIT: INIT:, CODE:,
PPCODE:, POSTCALL:, and CLEANUP: blocks, as well as outside the functions.
Comments are allowed anywhere after the MODULE keyword. The compiler will
pass the preprocessor directives through untouched and will remove the
commented lines. POD documentation is allowed at any point, both in the
C and XS language sections. POD must be terminated with a =cut
command;
xsubpp
will exit with an error if it does not. It is very unlikely that
human generated C code will be mistaken for POD, as most indenting styles
result in whitespace in front of any line starting with =
. Machine
generated XS files may fall into this trap unless care is taken to
ensure that a space breaks the sequence "\n=".
Comments can be added to XSUBs by placing a #
as the first
non-whitespace of a line. Care should be taken to avoid making the
comment look like a C preprocessor directive, lest it be interpreted as
such. The simplest way to prevent this is to put whitespace in front of
the #
.
If you use preprocessor directives to choose one of two versions of a function, use
- #if ... version1
- #else /* ... version2 */
- #endif
and not
- #if ... version1
- #endif
- #if ... version2
- #endif
because otherwise xsubpp will believe that you made a duplicate definition of the function. Also, put a blank line before the #else/#endif so it will not be seen as part of the function body.
If an XSUB name contains ::
, it is considered to be a C++ method.
The generated Perl function will assume that
its first argument is an object pointer. The object pointer
will be stored in a variable called THIS. The object should
have been created by C++ with the new() function and should
be blessed by Perl with the sv_setref_pv() macro. The
blessing of the object by Perl can be handled by a typemap. An example
typemap is shown at the end of this section.
If the return type of the XSUB includes static
, the method is considered
to be a static method. It will call the C++
function using the class::method() syntax. If the method is not static
the function will be called using the THIS->method() syntax.
The next examples will use the following C++ class.
The XSUBs for the blue() and set_blue() methods are defined with the class name but the parameter for the object (THIS, or "self") is implicit and is not listed.
- int
- color::blue()
- void
- color::set_blue( val )
- int val
Both Perl functions will expect an object as the first parameter. In the
generated C++ code the object is called THIS
, and the method call will
be performed on this object. So in the C++ code the blue() and set_blue()
methods will be called as this:
- RETVAL = THIS->blue();
- THIS->set_blue( val );
You could also write a single get/set method using an optional argument:
- int
- color::blue( val = NO_INIT )
- int val
- PROTOTYPE $;$
- CODE:
- if (items > 1)
- THIS->set_blue( val );
- RETVAL = THIS->blue();
- OUTPUT:
- RETVAL
If the function's name is DESTROY then the C++ delete function will be
called and THIS
will be given as its parameter. The generated C++ code for
- void
- color::DESTROY()
will look like this:
- color *THIS = ...; // Initialized as in typemap
- delete THIS;
If the function's name is new then the C++ new
function will be called
to create a dynamic C++ object. The XSUB will expect the class name, which
will be kept in a variable called CLASS
, to be given as the first
argument.
- color *
- color::new()
The generated C++ code will call new
.
- RETVAL = new color();
The following is an example of a typemap that could be used for this C++ example.
- TYPEMAP
- color * O_OBJECT
- OUTPUT
- # The Perl object is blessed into 'CLASS', which should be a
- # char* having the name of the package for the blessing.
- O_OBJECT
- sv_setref_pv( $arg, CLASS, (void*)$var );
- INPUT
- O_OBJECT
- if( sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG) )
- $var = ($type)SvIV((SV*)SvRV( $arg ));
- else{
- warn( \"${Package}::$func_name() -- $var is not a blessed SV reference\" );
- XSRETURN_UNDEF;
- }
When designing an interface between Perl and a C library a straight
translation from C to XS (such as created by h2xs -x
) is often sufficient.
However, sometimes the interface will look
very C-like and occasionally nonintuitive, especially when the C function
modifies one of its parameters, or returns failure inband (as in "negative
return values mean failure"). In cases where the programmer wishes to
create a more Perl-like interface the following strategy may help to
identify the more critical parts of the interface.
Identify the C functions with input/output or output parameters. The XSUBs for these functions may be able to return lists to Perl.
Identify the C functions which use some inband info as an indication of failure. They may be candidates to return undef or an empty list in case of failure. If the failure may be detected without a call to the C function, you may want to use an INIT: section to report the failure. For failures detectable after the C function returns one may want to use a POSTCALL: section to process the failure. In more complicated cases use CODE: or PPCODE: sections.
If many functions use the same failure indication based on the return value, you may want to create a special typedef to handle this situation. Put
- typedef int negative_is_failure;
near the beginning of XS file, and create an OUTPUT typemap entry
for negative_is_failure
which converts negative values to undef, or
maybe croak()s. After this the return value of type negative_is_failure
will create more Perl-like interface.
Identify which values are used by only the C and XSUB functions themselves, say, when a parameter to a function should be a contents of a global variable. If Perl does not need to access the contents of the value then it may not be necessary to provide a translation for that value from C to Perl.
Identify the pointers in the C function parameter lists and return
values. Some pointers may be used to implement input/output or
output parameters, they can be handled in XS with the &
unary operator,
and, possibly, using the NO_INIT keyword.
Some others will require handling of types like int *
, and one needs
to decide what a useful Perl translation will do in such a case. When
the semantic is clear, it is advisable to put the translation into a typemap
file.
Identify the structures used by the C functions. In many
cases it may be helpful to use the T_PTROBJ typemap for
these structures so they can be manipulated by Perl as
blessed objects. (This is handled automatically by h2xs -x
.)
If the same C type is used in several different contexts which require
different translations, typedef
several new types mapped to this C type,
and create separate typemap entries for these new types. Use these
types in declarations of return type and parameters to XSUBs.
When dealing with C structures one should select either T_PTROBJ or T_PTRREF for the XS type. Both types are designed to handle pointers to complex objects. The T_PTRREF type will allow the Perl object to be unblessed while the T_PTROBJ type requires that the object be blessed. By using T_PTROBJ one can achieve a form of type-checking because the XSUB will attempt to verify that the Perl object is of the expected type.
The following XS code shows the getnetconfigent() function which is used with ONC+ TIRPC. The getnetconfigent() function will return a pointer to a C structure and has the C prototype shown below. The example will demonstrate how the C pointer will become a Perl reference. Perl will consider this reference to be a pointer to a blessed object and will attempt to call a destructor for the object. A destructor will be provided in the XS source to free the memory used by getnetconfigent(). Destructors in XS can be created by specifying an XSUB function whose name ends with the word DESTROY. XS destructors can be used to free memory which may have been malloc'd by another XSUB.
- struct netconfig *getnetconfigent(const char *netid);
A typedef
will be created for struct netconfig
. The Perl
object will be blessed in a class matching the name of the C
type, with the tag Ptr
appended, and the name should not
have embedded spaces if it will be a Perl package name. The
destructor will be placed in a class corresponding to the
class of the object and the PREFIX keyword will be used to
trim the name to the word DESTROY as Perl will expect.
- typedef struct netconfig Netconfig;
- MODULE = RPC PACKAGE = RPC
- Netconfig *
- getnetconfigent(netid)
- char *netid
- MODULE = RPC PACKAGE = NetconfigPtr PREFIX = rpcb_
- void
- rpcb_DESTROY(netconf)
- Netconfig *netconf
- CODE:
- printf("Now in NetconfigPtr::DESTROY\n");
- free( netconf );
This example requires the following typemap entry. Consult perlxstypemap for more information about adding new typemaps for an extension.
- TYPEMAP
- Netconfig * T_PTROBJ
This example will be used with the following Perl statements.
- use RPC;
- $netconf = getnetconfigent("udp");
When Perl destroys the object referenced by $netconf it will send the object to the supplied XSUB DESTROY function. Perl cannot determine, and does not care, that this object is a C struct and not a Perl object. In this sense, there is no difference between the object created by the getnetconfigent() XSUB and an object created by a normal Perl subroutine.
Starting with Perl 5.8, a macro framework has been defined to allow static data to be safely stored in XS modules that will be accessed from a multi-threaded Perl.
Although primarily designed for use with multi-threaded Perl, the macros have been designed so that they will work with non-threaded Perl as well.
It is therefore strongly recommended that these macros be used by all XS modules that make use of static data.
The easiest way to get a template set of macros to use is by specifying
the -g
(--global
) option with h2xs (see h2xs).
Below is an example module that makes use of the macros.
- #include "EXTERN.h"
- #include "perl.h"
- #include "XSUB.h"
- /* Global Data */
- #define MY_CXT_KEY "BlindMice::_guts" XS_VERSION
- typedef struct {
- int count;
- char name[3][100];
- } my_cxt_t;
- START_MY_CXT
- MODULE = BlindMice PACKAGE = BlindMice
- BOOT:
- {
- MY_CXT_INIT;
- MY_CXT.count = 0;
- strcpy(MY_CXT.name[0], "None");
- strcpy(MY_CXT.name[1], "None");
- strcpy(MY_CXT.name[2], "None");
- }
- int
- newMouse(char * name)
- char * name;
- PREINIT:
- dMY_CXT;
- CODE:
- if (MY_CXT.count >= 3) {
- warn("Already have 3 blind mice");
- RETVAL = 0;
- }
- else {
- RETVAL = ++ MY_CXT.count;
- strcpy(MY_CXT.name[MY_CXT.count - 1], name);
- }
- char *
- get_mouse_name(index)
- int index
- CODE:
- dMY_CXT;
- RETVAL = MY_CXT.lives ++;
- if (index > MY_CXT.count)
- croak("There are only 3 blind mice.");
- else
- RETVAL = newSVpv(MY_CXT.name[index - 1]);
- void
- CLONE(...)
- CODE:
- MY_CXT_CLONE;
REFERENCE
This macro is used to define a unique key to refer to the static data for an XS module. The suggested naming scheme, as used by h2xs, is to use a string that consists of the module name, the string "::_guts" and the module version number.
- #define MY_CXT_KEY "MyModule::_guts" XS_VERSION
This struct typedef must always be called my_cxt_t
. The other
CXT*
macros assume the existence of the my_cxt_t
typedef name.
Declare a typedef named my_cxt_t
that is a structure that contains
all the data that needs to be interpreter-local.
- typedef struct {
- int some_value;
- } my_cxt_t;
Always place the START_MY_CXT macro directly after the declaration
of my_cxt_t
.
The MY_CXT_INIT macro initialises storage for the my_cxt_t
struct.
It must be called exactly once, typically in a BOOT: section. If you are maintaining multiple interpreters, it should be called once in each interpreter instance, except for interpreters cloned from existing ones. (But see MY_CXT_CLONE below.)
Use the dMY_CXT macro (a declaration) in all the functions that access MY_CXT.
Use the MY_CXT macro to access members of the my_cxt_t
struct. For
example, if my_cxt_t
is
then use this to access the index member
- dMY_CXT;
- MY_CXT.index = 2;
dMY_CXT
may be quite expensive to calculate, and to avoid the overhead
of invoking it in each function it is possible to pass the declaration
onto other functions using the aMY_CXT
/pMY_CXT
macros, eg
- void sub1() {
- dMY_CXT;
- MY_CXT.index = 1;
- sub2(aMY_CXT);
- }
- void sub2(pMY_CXT) {
- MY_CXT.index = 2;
- }
Analogously to pTHX
, there are equivalent forms for when the macro is the
first or last in multiple arguments, where an underscore represents a
comma, i.e. _aMY_CXT
, aMY_CXT_
, _pMY_CXT
and pMY_CXT_
.
By default, when a new interpreter is created as a copy of an existing one
(eg via threads->create()
), both interpreters share the same physical
my_cxt_t structure. Calling MY_CXT_CLONE
(typically via the package's
CLONE()
function), causes a byte-for-byte copy of the structure to be
taken, and any future dMY_CXT will cause the copy to be accessed instead.
These are versions of the macros which take an explicit interpreter as an argument.
Note that these macros will only work together within the same source file; that is, a dMY_CTX in one source file will access a different structure than a dMY_CTX in another source file.
Starting from Perl 5.8, in C/C++ level Perl knows how to wrap system/library interfaces that have thread-aware versions (e.g. getpwent_r()) into frontend macros (e.g. getpwent()) that correctly handle the multithreaded interaction with the Perl interpreter. This will happen transparently, the only thing you need to do is to instantiate a Perl interpreter.
This wrapping happens always when compiling Perl core source (PERL_CORE is defined) or the Perl core extensions (PERL_EXT is defined). When compiling XS code outside of Perl core the wrapping does not take place. Note, however, that intermixing the _r-forms (as Perl compiled for multithreaded operation will do) and the _r-less forms is neither well-defined (inconsistent results, data corruption, or even crashes become more likely), nor is it very portable.
File RPC.xs
: Interface to some ONC+ RPC bind library functions.
- #include "EXTERN.h"
- #include "perl.h"
- #include "XSUB.h"
- #include <rpc/rpc.h>
- typedef struct netconfig Netconfig;
- MODULE = RPC PACKAGE = RPC
- SV *
- rpcb_gettime(host="localhost")
- char *host
- PREINIT:
- time_t timep;
- CODE:
- ST(0) = sv_newmortal();
- if( rpcb_gettime( host, &timep ) )
- sv_setnv( ST(0), (double)timep );
- Netconfig *
- getnetconfigent(netid="udp")
- char *netid
- MODULE = RPC PACKAGE = NetconfigPtr PREFIX = rpcb_
- void
- rpcb_DESTROY(netconf)
- Netconfig *netconf
- CODE:
- printf("NetconfigPtr::DESTROY\n");
- free( netconf );
File typemap
: Custom typemap for RPC.xs. (cf. perlxstypemap)
- TYPEMAP
- Netconfig * T_PTROBJ
File RPC.pm
: Perl module for the RPC extension.
File rpctest.pl
: Perl test program for the RPC extension.
This document covers features supported by ExtUtils::ParseXS
(also known as xsubpp
) 3.13_01.
Originally written by Dean Roehrich <roehrich@cray.com>.
Maintained since 1996 by The Perl Porters <perlbug@perl.org>.
perlxstut - Tutorial for writing XSUBs
This tutorial will educate the reader on the steps involved in creating a Perl extension. The reader is assumed to have access to perlguts, perlapi and perlxs.
This tutorial starts with very simple examples and becomes more complex, with each new example adding new features. Certain concepts may not be completely explained until later in the tutorial in order to slowly ease the reader into building extensions.
This tutorial was written from a Unix point of view. Where I know them to be otherwise different for other platforms (e.g. Win32), I will list them. If you find something that was missed, please let me know.
This tutorial assumes that the make program that Perl is configured to
use is called make
. Instead of running "make" in the examples that
follow, you may have to substitute whatever make program Perl has been
configured to use. Running perl -V:make should tell you what it is.
When writing a Perl extension for general consumption, one should expect that the extension will be used with versions of Perl different from the version available on your machine. Since you are reading this document, the version of Perl on your machine is probably 5.005 or later, but the users of your extension may have more ancient versions.
To understand what kinds of incompatibilities one may expect, and in the rare case that the version of Perl on your machine is older than this document, see the section on "Troubleshooting these Examples" for more information.
If your extension uses some features of Perl which are not available on older releases of Perl, your users would appreciate an early meaningful warning. You would probably put this information into the README file, but nowadays installation of extensions may be performed automatically, guided by CPAN.pm module or other tools.
In MakeMaker-based installations, Makefile.PL provides the earliest opportunity to perform version checks. One can put something like this in Makefile.PL for this purpose:
- eval { require 5.007 }
- or die <<EOD;
- ############
- ### This module uses frobnication framework which is not available before
- ### version 5.007 of Perl. Upgrade your Perl before installing Kara::Mba.
- ############
- EOD
It is commonly thought that if a system does not have the capability to dynamically load a library, you cannot build XSUBs. This is incorrect. You can build them, but you must link the XSUBs subroutines with the rest of Perl, creating a new executable. This situation is similar to Perl 4.
This tutorial can still be used on such a system. The XSUB build mechanism will check the system and build a dynamically-loadable library if possible, or else a static library and then, optionally, a new statically-linked executable with that static library linked in.
Should you wish to build a statically-linked executable on a system which
can dynamically load libraries, you may, in all the following examples,
where the command "make
" with no arguments is executed, run the command
"make perl
" instead.
If you have generated such a statically-linked executable by choice, then
instead of saying "make test
", you should say "make test_static
".
On systems that cannot build dynamically-loadable libraries at all, simply
saying "make test
" is sufficient.
Now let's go on with the show!
Our first extension will be very simple. When we call the routine in the extension, it will print out a well-known message and return.
Run "h2xs -A -n Mytest
". This creates a directory named Mytest,
possibly under ext/ if that directory exists in the current working
directory. Several files will be created under the Mytest dir, including
MANIFEST, Makefile.PL, lib/Mytest.pm, Mytest.xs, t/Mytest.t, and Changes.
The MANIFEST file contains the names of all the files just created in the Mytest directory.
The file Makefile.PL should look something like this:
- use ExtUtils::MakeMaker;
- # See lib/ExtUtils/MakeMaker.pm for details of how to influence
- # the contents of the Makefile that is written.
- WriteMakefile(
- NAME => 'Mytest',
- VERSION_FROM => 'Mytest.pm', # finds $VERSION
- LIBS => [''], # e.g., '-lm'
- DEFINE => '', # e.g., '-DHAVE_SOMETHING'
- INC => '', # e.g., '-I/usr/include/other'
- );
The file Mytest.pm should start with something like this:
- package Mytest;
- use 5.008008;
- use strict;
- use warnings;
- require Exporter;
- our @ISA = qw(Exporter);
- our %EXPORT_TAGS = ( 'all' => [ qw(
- ) ] );
- our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
- our @EXPORT = qw(
- );
- our $VERSION = '0.01';
- require XSLoader;
- XSLoader::load('Mytest', $VERSION);
- # Preloaded methods go here.
- 1;
- __END__
- # Below is the stub of documentation for your module. You better edit it!
The rest of the .pm file contains sample code for providing documentation for the extension.
Finally, the Mytest.xs file should look something like this:
- #include "EXTERN.h"
- #include "perl.h"
- #include "XSUB.h"
- #include "ppport.h"
- MODULE = Mytest PACKAGE = Mytest
Let's edit the .xs file by adding this to the end of the file:
- void
- hello()
- CODE:
- printf("Hello, world!\n");
It is okay for the lines starting at the "CODE:" line to not be indented. However, for readability purposes, it is suggested that you indent CODE: one level and the lines following one more level.
Now we'll run "perl Makefile.PL
". This will create a real Makefile,
which make needs. Its output looks something like:
- % perl Makefile.PL
- Checking if your kit is complete...
- Looks good
- Writing Makefile for Mytest
- %
Now, running make will produce output that looks something like this (some long lines have been shortened for clarity and some extraneous lines have been deleted):
- % make
- cp lib/Mytest.pm blib/lib/Mytest.pm
- perl xsubpp -typemap typemap Mytest.xs > Mytest.xsc && mv Mytest.xsc Mytest.c
- Please specify prototyping behavior for Mytest.xs (see perlxs manual)
- cc -c Mytest.c
- Running Mkbootstrap for Mytest ()
- chmod 644 Mytest.bs
- rm -f blib/arch/auto/Mytest/Mytest.so
- cc -shared -L/usr/local/lib Mytest.o -o blib/arch/auto/Mytest/Mytest.so \
- \
- chmod 755 blib/arch/auto/Mytest/Mytest.so
- cp Mytest.bs blib/arch/auto/Mytest/Mytest.bs
- chmod 644 blib/arch/auto/Mytest/Mytest.bs
- Manifying blib/man3/Mytest.3pm
- %
You can safely ignore the line about "prototyping behavior" - it is explained in The PROTOTYPES: Keyword in perlxs.
Perl has its own special way of easily writing test scripts, but for this example only, we'll create our own test script. Create a file called hello that looks like this:
Now we make the script executable (chmod +x hello
), run the script
and we should see the following output:
- % ./hello
- Hello, world!
- %
Now let's add to our extension a subroutine that will take a single numeric argument as input and return 1 if the number is even or 0 if the number is odd.
Add the following to the end of Mytest.xs:
- int
- is_even(input)
- int input
- CODE:
- RETVAL = (input % 2 == 0);
- OUTPUT:
- RETVAL
There does not need to be whitespace at the start of the "int input
"
line, but it is useful for improving readability. Placing a semi-colon at
the end of that line is also optional. Any amount and kind of whitespace
may be placed between the "int" and "input
".
Now re-run make to rebuild our new shared library.
Now perform the same steps as before, generating a Makefile from the Makefile.PL file, and running make.
In order to test that our extension works, we now need to look at the file Mytest.t. This file is set up to imitate the same kind of testing structure that Perl itself has. Within the test script, you perform a number of tests to confirm the behavior of the extension, printing "ok" when the test is correct, "not ok" when it is not.
- use Test::More tests => 4;
- BEGIN { use_ok('Mytest') };
- #########################
- # Insert your test code below, the Test::More module is use()ed here so read
- # its man page ( perldoc Test::More ) for help writing this test script.
- is(&Mytest::is_even(0), 1);
- is(&Mytest::is_even(1), 0);
- is(&Mytest::is_even(2), 1);
We will be calling the test script through the command "make test
". You
should see output that looks something like this:
- %make test
- PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
- t/Mytest....ok
- All tests successful.
- Files=1, Tests=4, 0 wallclock secs ( 0.03 cusr + 0.00 csys = 0.03 CPU)
- %
The program h2xs is the starting point for creating extensions. In later examples we'll see how we can use h2xs to read header files and generate templates to connect to C routines.
h2xs creates a number of files in the extension directory. The file Makefile.PL is a perl script which will generate a true Makefile to build the extension. We'll take a closer look at it later.
The .pm and .xs files contain the meat of the extension. The .xs file holds the C routines that make up the extension. The .pm file contains routines that tell Perl how to load your extension.
Generating the Makefile and running make
created a directory called blib
(which stands for "build library") in the current working directory. This
directory will contain the shared library that we will build. Once we have
tested it, we can install it into its final location.
Invoking the test script via "make test
" did something very important.
It invoked perl with all those -I
arguments so that it could find the
various files that are part of the extension. It is very important that
while you are still testing extensions that you use "make test
". If you
try to run the test script all by itself, you will get a fatal error.
Another reason it is important to use "make test
" to run your test
script is that if you are testing an upgrade to an already-existing version,
using "make test
" ensures that you will test your new extension, not the
already-existing version.
When Perl sees a use extension;
, it searches for a file with the same name
as the use'd extension that has a .pm suffix. If that file cannot be found,
Perl dies with a fatal error. The default search path is contained in the
@INC
array.
In our case, Mytest.pm tells perl that it will need the Exporter and Dynamic
Loader extensions. It then sets the @ISA
and @EXPORT
arrays and the
$VERSION
scalar; finally it tells perl to bootstrap the module. Perl
will call its dynamic loader routine (if there is one) and load the shared
library.
The two arrays @ISA
and @EXPORT
are very important. The @ISA
array contains a list of other packages in which to search for methods (or
subroutines) that do not exist in the current package. This is usually
only important for object-oriented extensions (which we will talk about
much later), and so usually doesn't need to be modified.
The @EXPORT
array tells Perl which of the extension's variables and
subroutines should be placed into the calling package's namespace. Because
you don't know if the user has already used your variable and subroutine
names, it's vitally important to carefully select what to export. Do not
export method or variable names by default without a good reason.
As a general rule, if the module is trying to be object-oriented then don't
export anything. If it's just a collection of functions and variables, then
you can export them via another array, called @EXPORT_OK
. This array
does not automatically place its subroutine and variable names into the
namespace unless the user specifically requests that this be done.
See perlmod for more information.
The $VERSION
variable is used to ensure that the .pm file and the shared
library are "in sync" with each other. Any time you make changes to
the .pm or .xs files, you should increment the value of this variable.
The importance of writing good test scripts cannot be over-emphasized. You should closely follow the "ok/not ok" style that Perl itself uses, so that it is very easy and unambiguous to determine the outcome of each test case. When you find and fix a bug, make sure you add a test case for it.
By running "make test
", you ensure that your Mytest.t script runs and uses
the correct version of your extension. If you have many test cases,
save your test files in the "t" directory and use the suffix ".t".
When you run "make test
", all of these test files will be executed.
Our third extension will take one argument as its input, round off that value, and set the argument to the rounded value.
Add the following to the end of Mytest.xs:
- void
- round(arg)
- double arg
- CODE:
- if (arg > 0.0) {
- arg = floor(arg + 0.5);
- } else if (arg < 0.0) {
- arg = ceil(arg - 0.5);
- } else {
- arg = 0.0;
- }
- OUTPUT:
- arg
Edit the Makefile.PL file so that the corresponding line looks like this:
- 'LIBS' => ['-lm'], # e.g., '-lm'
Generate the Makefile and run make. Change the test number in Mytest.t to "9" and add the following tests:
- $i = -1.5; &Mytest::round($i); is( $i, -2.0 );
- $i = -1.1; &Mytest::round($i); is( $i, -1.0 );
- $i = 0.0; &Mytest::round($i); is( $i, 0.0 );
- $i = 0.5; &Mytest::round($i); is( $i, 1.0 );
- $i = 1.2; &Mytest::round($i); is( $i, 1.0 );
Running "make test
" should now print out that all nine tests are okay.
Notice that in these new test cases, the argument passed to round was a scalar variable. You might be wondering if you can round a constant or literal. To see what happens, temporarily add the following line to Mytest.t:
- &Mytest::round(3);
Run "make test
" and notice that Perl dies with a fatal error. Perl won't
let you change the value of constants!
We've made some changes to Makefile.PL. In this case, we've specified an extra library to be linked into the extension's shared library, the math library libm in this case. We'll talk later about how to write XSUBs that can call every routine in a library.
The value of the function is not being passed back as the function's return value, but by changing the value of the variable that was passed into the function. You might have guessed that when you saw that the return value of round is of type "void".
You specify the parameters that will be passed into the XSUB on the line(s) after you declare the function's return value and name. Each input parameter line starts with optional whitespace, and may have an optional terminating semicolon.
The list of output parameters occurs at the very end of the function, just after the OUTPUT: directive. The use of RETVAL tells Perl that you wish to send this value back as the return value of the XSUB function. In Example 3, we wanted the "return value" placed in the original variable which we passed in, so we listed it (and not RETVAL) in the OUTPUT: section.
The xsubpp program takes the XS code in the .xs file and translates it into C code, placing it in a file whose suffix is .c. The C code created makes heavy use of the C functions within Perl.
The xsubpp program uses rules to convert from Perl's data types (scalar,
array, etc.) to C's data types (int, char, etc.). These rules are stored
in the typemap file ($PERLLIB/ExtUtils/typemap). There's a brief discussion
below, but all the nitty-gritty details can be found in perlxstypemap.
If you have a new-enough version of perl (5.16 and up) or an upgraded
XS compiler (ExtUtils::ParseXS
3.13_01 or better), then you can inline
typemaps in your XS instead of writing separate files.
Either way, this typemap thing is split into three parts:
The first section maps various C data types to a name, which corresponds somewhat with the various Perl types. The second section contains C code which xsubpp uses to handle input parameters. The third section contains C code which xsubpp uses to handle output parameters.
Let's take a look at a portion of the .c file created for our extension. The file name is Mytest.c:
- XS(XS_Mytest_round)
- {
- dXSARGS;
- if (items != 1)
- Perl_croak(aTHX_ "Usage: Mytest::round(arg)");
- PERL_UNUSED_VAR(cv); /* -W */
- {
- double arg = (double)SvNV(ST(0)); /* XXXXX */
- if (arg > 0.0) {
- arg = floor(arg + 0.5);
- } else if (arg < 0.0) {
- arg = ceil(arg - 0.5);
- } else {
- arg = 0.0;
- }
- sv_setnv(ST(0), (double)arg); /* XXXXX */
- SvSETMAGIC(ST(0));
- }
- XSRETURN_EMPTY;
- }
Notice the two lines commented with "XXXXX". If you check the first part of the typemap file (or section), you'll see that doubles are of type T_DOUBLE. In the INPUT part of the typemap, an argument that is T_DOUBLE is assigned to the variable arg by calling the routine SvNV on something, then casting it to double, then assigned to the variable arg. Similarly, in the OUTPUT section, once arg has its final value, it is passed to the sv_setnv function to be passed back to the calling subroutine. These two functions are explained in perlguts; we'll talk more later about what that "ST(0)" means in the section on the argument stack.
In general, it's not a good idea to write extensions that modify their input parameters, as in Example 3. Instead, you should probably return multiple values in an array and let the caller handle them (we'll do this in a later example). However, in order to better accommodate calling pre-existing C routines, which often do modify their input parameters, this behavior is tolerated.
In this example, we'll now begin to write XSUBs that will interact with pre-defined C libraries. To begin with, we will build a small library of our own, then let h2xs write our .pm and .xs files for us.
Create a new directory called Mytest2 at the same level as the directory Mytest. In the Mytest2 directory, create another directory called mylib, and cd into that directory.
Here we'll create some files that will generate a test library. These will include a C source file and a header file. We'll also create a Makefile.PL in this directory. Then we'll make sure that running make at the Mytest2 level will automatically run this Makefile.PL file and the resulting Makefile.
In the mylib directory, create a file mylib.h that looks like this:
- #define TESTVAL 4
- extern double foo(int, long, const char*);
Also create a file mylib.c that looks like this:
- #include <stdlib.h>
- #include "./mylib.h"
- double
- foo(int a, long b, const char *c)
- {
- return (a + b + atof(c) + TESTVAL);
- }
And finally create a file Makefile.PL that looks like this:
- use ExtUtils::MakeMaker;
- $Verbose = 1;
- WriteMakefile(
- NAME => 'Mytest2::mylib',
- SKIP => [qw(all static static_lib dynamic dynamic_lib)],
- clean => {'FILES' => 'libmylib$(LIB_EXT)'},
- );
- sub MY::top_targets {
- '
- all :: static
- pure_all :: static
- static :: libmylib$(LIB_EXT)
- libmylib$(LIB_EXT): $(O_FILES)
- $(AR) cr libmylib$(LIB_EXT) $(O_FILES)
- $(RANLIB) libmylib$(LIB_EXT)
- ';
- }
Make sure you use a tab and not spaces on the lines beginning with "$(AR)" and "$(RANLIB)". Make will not function properly if you use spaces. It has also been reported that the "cr" argument to $(AR) is unnecessary on Win32 systems.
We will now create the main top-level Mytest2 files. Change to the directory above Mytest2 and run the following command:
- % h2xs -O -n Mytest2 ./Mytest2/mylib/mylib.h
This will print out a warning about overwriting Mytest2, but that's okay. Our files are stored in Mytest2/mylib, and will be untouched.
The normal Makefile.PL that h2xs generates doesn't know about the mylib directory. We need to tell it that there is a subdirectory and that we will be generating a library in it. Let's add the argument MYEXTLIB to the WriteMakefile call so that it looks like this:
- WriteMakefile(
- 'NAME' => 'Mytest2',
- 'VERSION_FROM' => 'Mytest2.pm', # finds $VERSION
- 'LIBS' => [''], # e.g., '-lm'
- 'DEFINE' => '', # e.g., '-DHAVE_SOMETHING'
- 'INC' => '', # e.g., '-I/usr/include/other'
- 'MYEXTLIB' => 'mylib/libmylib$(LIB_EXT)',
- );
and then at the end add a subroutine (which will override the pre-existing subroutine). Remember to use a tab character to indent the line beginning with "cd"!
Let's also fix the MANIFEST file so that it accurately reflects the contents of our extension. The single line that says "mylib" should be replaced by the following three lines:
- mylib/Makefile.PL
- mylib/mylib.c
- mylib/mylib.h
To keep our namespace nice and unpolluted, edit the .pm file and change
the variable @EXPORT
to @EXPORT_OK
. Finally, in the
.xs file, edit the #include line to read:
- #include "mylib/mylib.h"
And also add the following function definition to the end of the .xs file:
- double
- foo(a,b,c)
- int a
- long b
- const char * c
- OUTPUT:
- RETVAL
Now we also need to create a typemap because the default Perl doesn't
currently support the const char *
type. Include a new TYPEMAP
section in your XS code before the above function:
- TYPEMAP: <<END;
- const char * T_PV
- END
Now run perl on the top-level Makefile.PL. Notice that it also created a Makefile in the mylib directory. Run make and watch that it does cd into the mylib directory and run make in there as well.
Now edit the Mytest2.t script and change the number of tests to "4", and add the following lines to the end of the script:
- is( &Mytest2::foo(1, 2, "Hello, world!"), 7 );
- is( &Mytest2::foo(1, 2, "0.0"), 7 );
- ok( abs(&Mytest2::foo(0, 0, "-3.4") - 0.6) <= 0.01 );
(When dealing with floating-point comparisons, it is best to not check for equality, but rather that the difference between the expected and actual result is below a certain amount (called epsilon) which is 0.01 in this case)
Run "make test
" and all should be well. There are some warnings on missing tests
for the Mytest2::mylib extension, but you can ignore them.
Unlike previous examples, we've now run h2xs on a real include file. This has caused some extra goodies to appear in both the .pm and .xs files.
In the .xs file, there's now a #include directive with the absolute path to the mylib.h header file. We changed this to a relative path so that we could move the extension directory if we wanted to.
There's now some new C code that's been added to the .xs file. The purpose
of the constant
routine is to make the values that are #define'd in the
header file accessible by the Perl script (by calling either TESTVAL
or
&Mytest2::TESTVAL
). There's also some XS code to allow calls to the
constant
routine.
The .pm file originally exported the name TESTVAL
in the @EXPORT
array.
This could lead to name clashes. A good rule of thumb is that if the #define
is only going to be used by the C routines themselves, and not by the user,
they should be removed from the @EXPORT
array. Alternately, if you don't
mind using the "fully qualified name" of a variable, you could move most
or all of the items from the @EXPORT
array into the @EXPORT_OK
array.
If our include file had contained #include directives, these would not have been processed by h2xs. There is no good solution to this right now.
We've also told Perl about the library that we built in the mylib
subdirectory. That required only the addition of the MYEXTLIB
variable
to the WriteMakefile call and the replacement of the postamble subroutine
to cd into the subdirectory and run make. The Makefile.PL for the
library is a bit more complicated, but not excessively so. Again we
replaced the postamble subroutine to insert our own code. This code
simply specified that the library to be created here was a static archive
library (as opposed to a dynamically loadable library) and provided the
commands to build it.
The .xs file of EXAMPLE 4 contained some new elements. To understand the meaning of these elements, pay attention to the line which reads
- MODULE = Mytest2 PACKAGE = Mytest2
Anything before this line is plain C code which describes which headers to include, and defines some convenience functions. No translations are performed on this part, apart from having embedded POD documentation skipped over (see perlpod) it goes into the generated output C file as is.
Anything after this line is the description of XSUB functions. These descriptions are translated by xsubpp into C code which implements these functions using Perl calling conventions, and which makes these functions visible from Perl interpreter.
Pay a special attention to the function constant
. This name appears
twice in the generated .xs file: once in the first part, as a static C
function, then another time in the second part, when an XSUB interface to
this static C function is defined.
This is quite typical for .xs files: usually the .xs file provides an interface to an existing C function. Then this C function is defined somewhere (either in an external library, or in the first part of .xs file), and a Perl interface to this function (i.e. "Perl glue") is described in the second part of .xs file. The situation in EXAMPLE 1, EXAMPLE 2, and EXAMPLE 3, when all the work is done inside the "Perl glue", is somewhat of an exception rather than the rule.
In EXAMPLE 4 the second part of .xs file contained the following description of an XSUB:
- double
- foo(a,b,c)
- int a
- long b
- const char * c
- OUTPUT:
- RETVAL
Note that in contrast with EXAMPLE 1, EXAMPLE 2 and EXAMPLE 3, this description does not contain the actual code for what is done during a call to Perl function foo(). To understand what is going on here, one can add a CODE section to this XSUB:
- double
- foo(a,b,c)
- int a
- long b
- const char * c
- CODE:
- RETVAL = foo(a,b,c);
- OUTPUT:
- RETVAL
However, these two XSUBs provide almost identical generated C code: xsubpp
compiler is smart enough to figure out the CODE:
section from the first
two lines of the description of XSUB. What about OUTPUT:
section? In
fact, that is absolutely the same! The OUTPUT:
section can be removed
as well, as far as CODE:
section or PPCODE:
section is not
specified: xsubpp can see that it needs to generate a function call
section, and will autogenerate the OUTPUT section too. Thus one can
shortcut the XSUB to become:
- double
- foo(a,b,c)
- int a
- long b
- const char * c
Can we do the same with an XSUB
- int
- is_even(input)
- int input
- CODE:
- RETVAL = (input % 2 == 0);
- OUTPUT:
- RETVAL
of EXAMPLE 2? To do this, one needs to define a C function int
is_even(int input)
. As we saw in Anatomy of .xs file, a proper place
for this definition is in the first part of .xs file. In fact a C function
- int
- is_even(int arg)
- {
- return (arg % 2 == 0);
- }
is probably overkill for this. Something as simple as a #define
will
do too:
- #define is_even(arg) ((arg) % 2 == 0)
After having this in the first part of .xs file, the "Perl glue" part becomes as simple as
This technique of separation of the glue part from the workhorse part has obvious tradeoffs: if you want to change a Perl interface, you need to change two places in your code. However, it removes a lot of clutter, and makes the workhorse part independent from idiosyncrasies of Perl calling convention. (In fact, there is nothing Perl-specific in the above description, a different version of xsubpp might have translated this to TCL glue or Python glue as well.)
With the completion of Example 4, we now have an easy way to simulate some real-life libraries whose interfaces may not be the cleanest in the world. We shall now continue with a discussion of the arguments passed to the xsubpp compiler.
When you specify arguments to routines in the .xs file, you are really passing three pieces of information for each argument listed. The first piece is the order of that argument relative to the others (first, second, etc). The second is the type of argument, and consists of the type declaration of the argument (e.g., int, char*, etc). The third piece is the calling convention for the argument in the call to the library function.
While Perl passes arguments to functions by reference, C passes arguments by value; to implement a C function which modifies data of one of the "arguments", the actual argument of this C function would be a pointer to the data. Thus two C functions with declarations
- int string_length(char *s);
- int upper_case_char(char *cp);
may have completely different semantics: the first one may inspect an array
of chars pointed by s, and the second one may immediately dereference cp
and manipulate *cp
only (using the return value as, say, a success
indicator). From Perl one would use these functions in
a completely different manner.
One conveys this info to xsubpp by replacing *
before the
argument by &
. &
means that the argument should be passed to a library
function by its address. The above two function may be XSUB-ified as
- int
- string_length(s)
- char * s
- int
- upper_case_char(cp)
- char &cp
For example, consider:
- int
- foo(a,b)
- char &a
- char * b
The first Perl argument to this function would be treated as a char and assigned to the variable a, and its address would be passed into the function foo. The second Perl argument would be treated as a string pointer and assigned to the variable b. The value of b would be passed into the function foo. The actual call to the function foo that xsubpp generates would look like this:
- foo(&a, b);
xsubpp will parse the following function argument lists identically:
- char &a
- char&a
- char & a
However, to help ease understanding, it is suggested that you place a "&" next to the variable name and away from the variable type), and place a "*" near the variable type, but away from the variable name (as in the call to foo above). By doing so, it is easy to understand exactly what will be passed to the C function; it will be whatever is in the "last column".
You should take great pains to try to pass the function the type of variable it wants, when possible. It will save you a lot of trouble in the long run.
If we look at any of the C code generated by any of the examples except example 1, you will notice a number of references to ST(n), where n is usually 0. "ST" is actually a macro that points to the n'th argument on the argument stack. ST(0) is thus the first argument on the stack and therefore the first argument passed to the XSUB, ST(1) is the second argument, and so on.
When you list the arguments to the XSUB in the .xs file, that tells xsubpp which argument corresponds to which of the argument stack (i.e., the first one listed is the first argument, and so on). You invite disaster if you do not list them in the same order as the function expects them.
The actual values on the argument stack are pointers to the values passed in. When an argument is listed as being an OUTPUT value, its corresponding value on the stack (i.e., ST(0) if it was the first argument) is changed. You can verify this by looking at the C code generated for Example 3. The code for the round() XSUB routine contains lines that look like this:
- double arg = (double)SvNV(ST(0));
- /* Round the contents of the variable arg */
- sv_setnv(ST(0), (double)arg);
The arg variable is initially set by taking the value from ST(0), then is stored back into ST(0) at the end of the routine.
XSUBs are also allowed to return lists, not just scalars. This must be done by manipulating stack values ST(0), ST(1), etc, in a subtly different way. See perlxs for details.
XSUBs are also allowed to avoid automatic conversion of Perl function arguments
to C function arguments. See perlxs for details. Some people prefer
manual conversion by inspecting ST(i)
even in the cases when automatic
conversion will do, arguing that this makes the logic of an XSUB call clearer.
Compare with Getting the fat out of XSUBs for a similar tradeoff of
a complete separation of "Perl glue" and "workhorse" parts of an XSUB.
While experts may argue about these idioms, a novice to Perl guts may prefer a way which is as little Perl-guts-specific as possible, meaning automatic conversion and automatic call generation, as in Getting the fat out of XSUBs. This approach has the additional benefit of protecting the XSUB writer from future changes to the Perl API.
Sometimes you might want to provide some extra methods or subroutines to assist in making the interface between Perl and your extension simpler or easier to understand. These routines should live in the .pm file. Whether they are automatically loaded when the extension itself is loaded or only loaded when called depends on where in the .pm file the subroutine definition is placed. You can also consult AutoLoader for an alternate way to store and load your extra subroutines.
There is absolutely no excuse for not documenting your extension. Documentation belongs in the .pm file. This file will be fed to pod2man, and the embedded documentation will be converted to the manpage format, then placed in the blib directory. It will be copied to Perl's manpage directory when the extension is installed.
You may intersperse documentation and Perl code within the .pm file. In fact, if you want to use method autoloading, you must do this, as the comment inside the .pm file explains.
See perlpod for more information about the pod format.
Once your extension is complete and passes all its tests, installing it is quite simple: you simply run "make install". You will either need to have write permission into the directories where Perl is installed, or ask your system administrator to run the make for you.
Alternately, you can specify the exact directory to place the extension's files by placing a "PREFIX=/destination/directory" after the make install. (or in between the make and install if you have a brain-dead version of make). This can be very useful if you are building an extension that will eventually be distributed to multiple systems. You can then just archive the files in the destination directory and distribute them to your destination systems.
In this example, we'll do some more work with the argument stack. The previous examples have all returned only a single value. We'll now create an extension that returns an array.
This extension is very Unix-oriented (struct statfs and the statfs system call). If you are not running on a Unix system, you can substitute for statfs any other function that returns multiple values, you can hard-code values to be returned to the caller (although this will be a bit harder to test the error case), or you can simply not do this example. If you change the XSUB, be sure to fix the test cases to match the changes.
Return to the Mytest directory and add the following code to the end of Mytest.xs:
- void
- statfs(path)
- char * path
- INIT:
- int i;
- struct statfs buf;
- PPCODE:
- i = statfs(path, &buf);
- if (i == 0) {
- XPUSHs(sv_2mortal(newSVnv(buf.f_bavail)));
- XPUSHs(sv_2mortal(newSVnv(buf.f_bfree)));
- XPUSHs(sv_2mortal(newSVnv(buf.f_blocks)));
- XPUSHs(sv_2mortal(newSVnv(buf.f_bsize)));
- XPUSHs(sv_2mortal(newSVnv(buf.f_ffree)));
- XPUSHs(sv_2mortal(newSVnv(buf.f_files)));
- XPUSHs(sv_2mortal(newSVnv(buf.f_type)));
- } else {
- XPUSHs(sv_2mortal(newSVnv(errno)));
- }
You'll also need to add the following code to the top of the .xs file, just after the include of "XSUB.h":
- #include <sys/vfs.h>
Also add the following code segment to Mytest.t while incrementing the "9" tests to "11":
This example added quite a few new concepts. We'll take them one at a time.
The INIT: directive contains code that will be placed immediately after
the argument stack is decoded. C does not allow variable declarations at
arbitrary locations inside a function,
so this is usually the best way to declare local variables needed by the XSUB.
(Alternatively, one could put the whole PPCODE:
section into braces, and
put these declarations on top.)
This routine also returns a different number of arguments depending on the success or failure of the call to statfs. If there is an error, the error number is returned as a single-element array. If the call is successful, then a 7-element array is returned. Since only one argument is passed into this function, we need room on the stack to hold the 7 values which may be returned.
We do this by using the PPCODE: directive, rather than the CODE: directive. This tells xsubpp that we will be managing the return values that will be put on the argument stack by ourselves.
When we want to place values to be returned to the caller onto the stack, we use the series of macros that begin with "XPUSH". There are five different versions, for placing integers, unsigned integers, doubles, strings, and Perl scalars on the stack. In our example, we placed a Perl scalar onto the stack. (In fact this is the only macro which can be used to return multiple values.)
The XPUSH* macros will automatically extend the return stack to prevent it from being overrun. You push values onto the stack in the order you want them seen by the calling program.
The values pushed onto the return stack of the XSUB are actually mortal SV's. They are made mortal so that once the values are copied by the calling program, the SV's that held the returned values can be deallocated. If they were not mortal, then they would continue to exist after the XSUB routine returned, but would not be accessible. This is a memory leak.
If we were interested in performance, not in code compactness, in the success
branch we would not use XPUSHs
macros, but PUSHs
macros, and would
pre-extend the stack before pushing the return values:
- EXTEND(SP, 7);
The tradeoff is that one needs to calculate the number of return values in advance (though overextending the stack will not typically hurt anything but memory consumption).
Similarly, in the failure branch we could use PUSHs
without extending
the stack: the Perl function reference comes to an XSUB on the stack, thus
the stack is always large enough to take one return value.
In this example, we will accept a reference to an array as an input parameter, and return a reference to an array of hashes. This will demonstrate manipulation of complex Perl data types from an XSUB.
This extension is somewhat contrived. It is based on the code in the previous example. It calls the statfs function multiple times, accepting a reference to an array of filenames as input, and returning a reference to an array of hashes containing the data for each of the filesystems.
Return to the Mytest directory and add the following code to the end of Mytest.xs:
- SV *
- multi_statfs(paths)
- SV * paths
- INIT:
- AV * results;
- I32 numpaths = 0;
- int i, n;
- struct statfs buf;
- SvGETMAGIC(paths);
- if ((!SvROK(paths))
- || (SvTYPE(SvRV(paths)) != SVt_PVAV)
- || ((numpaths = av_top_index((AV *)SvRV(paths))) < 0))
- {
- XSRETURN_UNDEF;
- }
- results = (AV *)sv_2mortal((SV *)newAV());
- CODE:
- for (n = 0; n <= numpaths; n++) {
- HV * rh;
- STRLEN l;
- char * fn = SvPV(*av_fetch((AV *)SvRV(paths), n, 0), l);
- i = statfs(fn, &buf);
- if (i != 0) {
- av_push(results, newSVnv(errno));
- continue;
- }
- rh = (HV *)sv_2mortal((SV *)newHV());
- hv_store(rh, "f_bavail", 8, newSVnv(buf.f_bavail), 0);
- hv_store(rh, "f_bfree", 7, newSVnv(buf.f_bfree), 0);
- hv_store(rh, "f_blocks", 8, newSVnv(buf.f_blocks), 0);
- hv_store(rh, "f_bsize", 7, newSVnv(buf.f_bsize), 0);
- hv_store(rh, "f_ffree", 7, newSVnv(buf.f_ffree), 0);
- hv_store(rh, "f_files", 7, newSVnv(buf.f_files), 0);
- hv_store(rh, "f_type", 6, newSVnv(buf.f_type), 0);
- av_push(results, newRV((SV *)rh));
- }
- RETVAL = newRV((SV *)results);
- OUTPUT:
- RETVAL
And add the following code to Mytest.t, while incrementing the "11" tests to "13":
There are a number of new concepts introduced here, described below:
This function does not use a typemap. Instead, we declare it as accepting
one SV* (scalar) parameter, and returning an SV* value, and we take care of
populating these scalars within the code. Because we are only returning
one value, we don't need a PPCODE:
directive - instead, we use CODE:
and OUTPUT:
directives.
When dealing with references, it is important to handle them with caution.
The INIT:
block first calls SvGETMAGIC(paths), in case
paths is a tied variable. Then it checks that SvROK
returns
true, which indicates that paths is a valid reference. (Simply
checking SvROK
won't trigger FETCH on a tied variable.) It
then verifies that the object referenced by paths is an array, using SvRV
to dereference paths, and SvTYPE
to discover its type. As an added test,
it checks that the array referenced by paths is non-empty, using the av_top_index
function (which returns -1 if the array is empty). The XSRETURN_UNDEF macro
is used to abort the XSUB and return the undefined value whenever all three of
these conditions are not met.
We manipulate several arrays in this XSUB. Note that an array is represented
internally by an AV* pointer. The functions and macros for manipulating
arrays are similar to the functions in Perl: av_top_index
returns the highest
index in an AV*, much like $#array; av_fetch
fetches a single scalar value
from an array, given its index; av_push
pushes a scalar value onto the
end of the array, automatically extending the array as necessary.
Specifically, we read pathnames one at a time from the input array, and store the results in an output array (results) in the same order. If statfs fails, the element pushed onto the return array is the value of errno after the failure. If statfs succeeds, though, the value pushed onto the return array is a reference to a hash containing some of the information in the statfs structure.
As with the return stack, it would be possible (and a small performance win) to pre-extend the return array before pushing data into it, since we know how many elements we will return:
- av_extend(results, numpaths);
We are performing only one hash operation in this function, which is storing
a new scalar under a key using hv_store
. A hash is represented by an HV*
pointer. Like arrays, the functions for manipulating hashes from an XSUB
mirror the functionality available from Perl. See perlguts and perlapi
for details.
To create a reference, we use the newRV
function. Note that you can
cast an AV* or an HV* to type SV* in this case (and many others). This
allows you to take references to arrays, hashes and scalars with the same
function. Conversely, the SvRV
function always returns an SV*, which may
need to be cast to the appropriate type if it is something other than a
scalar (check with SvTYPE
).
At this point, xsubpp is doing very little work - the differences between Mytest.xs and Mytest.c are minimal.
XPUSH args AND set RETVAL AND assign return value to array
Setting $!
You would think passing files to an XS is difficult, with all the typeglobs and stuff. Well, it isn't.
Suppose that for some strange reason we need a wrapper around the
standard C library function fputs()
. This is all we need:
- #define PERLIO_NOT_STDIO 0
- #include "EXTERN.h"
- #include "perl.h"
- #include "XSUB.h"
- #include <stdio.h>
- int
- fputs(s, stream)
- char * s
- FILE * stream
The real work is done in the standard typemap.
But you lose all the fine stuff done by the perlio layers. This
calls the stdio function fputs()
, which knows nothing about them.
The standard typemap offers three variants of PerlIO *:
InputStream
(T_IN), InOutStream
(T_INOUT) and OutputStream
(T_OUT). A bare PerlIO *
is considered a T_INOUT. If it matters
in your code (see below for why it might) #define or typedef
one of the specific names and use that as the argument or result
type in your XS file.
The standard typemap does not contain PerlIO * before perl 5.7, but it has the three stream variants. Using a PerlIO * directly is not backwards compatible unless you provide your own typemap.
For streams coming from perl the main difference is that
OutputStream
will get the output PerlIO * - which may make
a difference on a socket. Like in our example...
For streams being handed to perl a new file handle is created
(i.e. a reference to a new glob) and associated with the PerlIO *
provided. If the read/write state of the PerlIO * is not correct then you
may get errors or warnings from when the file handle is used.
So if you opened the PerlIO * as "w" it should really be an
OutputStream
if open as "r" it should be an InputStream
.
Now, suppose you want to use perlio layers in your XS. We'll use the
perlio PerlIO_puts()
function as an example.
In the C part of the XS file (above the first MODULE line) you have
- #define OutputStream PerlIO *
- or
- typedef PerlIO * OutputStream;
And this is the XS code:
- int
- perlioputs(s, stream)
- char * s
- OutputStream stream
- CODE:
- RETVAL = PerlIO_puts(stream, s);
- OUTPUT:
- RETVAL
We have to use a CODE
section because PerlIO_puts()
has the arguments
reversed compared to fputs()
, and we want to keep the arguments the same.
Wanting to explore this thoroughly, we want to use the stdio fputs()
on a PerlIO *. This means we have to ask the perlio system for a stdio
FILE *
:
- int
- perliofputs(s, stream)
- char * s
- OutputStream stream
- PREINIT:
- FILE *fp = PerlIO_findFILE(stream);
- CODE:
- if (fp != (FILE*) 0) {
- RETVAL = fputs(s, fp);
- } else {
- RETVAL = -1;
- }
- OUTPUT:
- RETVAL
Note: PerlIO_findFILE()
will search the layers for a stdio
layer. If it can't find one, it will call PerlIO_exportFILE()
to
generate a new stdio FILE
. Please only call PerlIO_exportFILE()
if
you want a new FILE
. It will generate one on each call and push a
new stdio layer. So don't call it repeatedly on the same
file. PerlIO_findFILE()
will retrieve the stdio layer once it has been
generated by PerlIO_exportFILE()
.
This applies to the perlio system only. For versions before 5.7,
PerlIO_exportFILE()
is equivalent to PerlIO_findFILE()
.
As mentioned at the top of this document, if you are having problems with these example extensions, you might see if any of these help you.
In versions of 5.002 prior to the gamma version, the test script in Example 1 will not function properly. You need to change the "use lib" line to read:
- use lib './blib';
In versions of 5.002 prior to version 5.002b1h, the test.pl file was not automatically created by h2xs. This means that you cannot say "make test" to run the test script. You will need to add the following line before the "use extension" statement:
- use lib './blib';
In versions 5.000 and 5.001, instead of using the above line, you will need to use the following line:
- BEGIN { unshift(@INC, "./blib") }
This document assumes that the executable named "perl" is Perl version 5. Some systems may have installed Perl version 5 as "perl5".
For more information, consult perlguts, perlapi, perlxs, perlmod, and perlpod.
Jeff Okamoto <okamoto@corp.hp.com>
Reviewed and assisted by Dean Roehrich, Ilya Zakharevich, Andreas Koenig, and Tim Bunce.
PerlIO material contributed by Lupe Christoph, with some clarification by Nick Ing-Simmons.
Changes for h2xs as of Perl 5.8.x by Renee Baecker
2012-01-20
perlxstypemap - Perl XS C/Perl type mapping
The more you think about interfacing between two languages, the more you'll realize that the majority of programmer effort has to go into converting between the data structures that are native to either of the languages involved. This trumps other matter such as differing calling conventions because the problem space is so much greater. There are simply more ways to shove data into memory than there are ways to implement a function call.
Perl XS' attempt at a solution to this is the concept of typemaps. At an abstract level, a Perl XS typemap is nothing but a recipe for converting from a certain Perl data structure to a certain C data structure and vice versa. Since there can be C types that are sufficiently similar to warrant converting with the same logic, XS typemaps are represented by a unique identifier, henceforth called an <XS type> in this document. You can then tell the XS compiler that multiple C types are to be mapped with the same XS typemap.
In your XS code, when you define an argument with a C type or when
you are using a CODE:
and an OUTPUT:
section together with a
C return type of your XSUB, it'll be the typemapping mechanism that
makes this easy.
In more practical terms, the typemap is a collection of code
fragments which are used by the xsubpp compiler to map C function
parameters and values to Perl values. The typemap file may consist
of three sections labelled TYPEMAP
, INPUT
, and OUTPUT
.
An unlabelled initial section is assumed to be a TYPEMAP
section.
The INPUT section tells the compiler how to translate Perl values
into variables of certain C types. The OUTPUT section tells the
compiler how to translate the values from certain C types into values
Perl can understand. The TYPEMAP section tells the compiler which
of the INPUT and OUTPUT code fragments should be used to map a given
C type to a Perl value. The section labels TYPEMAP
, INPUT
, or
OUTPUT
must begin in the first column on a line by themselves,
and must be in uppercase.
Each type of section can appear an arbitrary number of times
and does not have to appear at all. For example, a typemap may
commonly lack INPUT
and OUTPUT
sections if all it needs to
do is associate additional C types with core XS types like T_PTROBJ.
Lines that start with a hash #
are considered comments and ignored
in the TYPEMAP
section, but are considered significant in INPUT
and OUTPUT
. Blank lines are generally ignored.
Traditionally, typemaps needed to be written to a separate file,
conventionally called typemap
in a CPAN distribution. With
ExtUtils::ParseXS (the XS compiler) version 3.12 or better which
comes with perl 5.16, typemaps can also be embedded directly into
XS code using a HERE-doc like syntax:
- TYPEMAP: <<HERE
- ...
- HERE
where HERE
can be replaced by other identifiers like with normal
Perl HERE-docs. All details below about the typemap textual format
remain valid.
The TYPEMAP
section should contain one pair of C type and
XS type per line as follows. An example from the core typemap file:
- TYPEMAP
- # all variants of char* is handled by the T_PV typemap
- char * T_PV
- const char * T_PV
- unsigned char * T_PV
- ...
The INPUT
and OUTPUT
sections have identical formats, that is,
each unindented line starts a new in- or output map respectively.
A new in- or output map must start with the name of the XS type to
map on a line by itself, followed by the code that implements it
indented on the following lines. Example:
- INPUT
- T_PV
- $var = ($type)SvPV_nolen($arg)
- T_PTR
- $var = INT2PTR($type,SvIV($arg))
We'll get to the meaning of those Perlish-looking variables in a little bit.
Finally, here's an example of the full typemap file for mapping C
strings of the char *
type to Perl scalars/strings:
- TYPEMAP
- char * T_PV
- INPUT
- T_PV
- $var = ($type)SvPV_nolen($arg)
- OUTPUT
- T_PV
- sv_setpv((SV*)$arg, $var);
Here's a more complicated example: suppose that you wanted
struct netconfig
to be blessed into the class Net::Config
.
One way to do this is to use underscores (_) to separate package
names, as follows:
- typedef struct netconfig * Net_Config;
And then provide a typemap entry T_PTROBJ_SPECIAL
that maps
underscores to double-colons (::), and declare Net_Config
to be of
that type:
- TYPEMAP
- Net_Config T_PTROBJ_SPECIAL
- INPUT
- T_PTROBJ_SPECIAL
- if (sv_derived_from($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\")){
- IV tmp = SvIV((SV*)SvRV($arg));
- $var = INT2PTR($type, tmp);
- }
- else
- croak(\"$var is not of type ${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\")
- OUTPUT
- T_PTROBJ_SPECIAL
- sv_setref_pv($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\",
- (void*)$var);
The INPUT and OUTPUT sections substitute underscores for double-colons on the fly, giving the desired effect. This example demonstrates some of the power and versatility of the typemap facility.
The INT2PTR
macro (defined in perl.h) casts an integer to a pointer
of a given type, taking care of the possible different size of integers
and pointers. There are also PTR2IV
, PTR2UV
, PTR2NV
macros,
to map the other way, which may be useful in OUTPUT sections.
The default typemap in the lib/ExtUtils directory of the Perl source
contains many useful types which can be used by Perl extensions. Some
extensions define additional typemaps which they keep in their own directory.
These additional typemaps may reference INPUT and OUTPUT maps in the main
typemap. The xsubpp compiler will allow the extension's own typemap to
override any mappings which are in the default typemap. Instead of using
an additional typemap file, typemaps may be embedded verbatim in XS
with a heredoc-like syntax. See the documentation on the TYPEMAP:
XS
keyword.
For CPAN distributions, you can assume that the XS types defined by
the perl core are already available. Additionally, the core typemap
has default XS types for a large number of C types. For example, if
you simply return a char *
from your XSUB, the core typemap will
have this C type associated with the T_PV XS type. That means your
C string will be copied into the PV (pointer value) slot of a new scalar
that will be returned from your XSUB to to Perl.
If you're developing a CPAN distribution using XS, you may add your own file called typemap to the distribution. That file may contain typemaps that either map types that are specific to your code or that override the core typemap file's mappings for common C types.
Starting with ExtUtils::ParseXS version 3.13_01 (comes with perl 5.16
and better), it is rather easy to share typemap code between multiple
CPAN distributions. The general idea is to share it as a module that
offers a certain API and have the dependent modules declare that as a
built-time requirement and import the typemap into the XS. An example
of such a typemap-sharing module on CPAN is
ExtUtils::Typemaps::Basic
. Two steps to getting that module's
typemaps available in your code:
Declare ExtUtils::Typemaps::Basic
as a build-time dependency
in Makefile.PL
(use BUILD_REQUIRES
), or in your Build.PL
(use build_requires
).
Include the following line in the XS section of your XS file: (don't break the line)
- INCLUDE_COMMAND: $^X -MExtUtils::Typemaps::Cmd
- -e "print embeddable_typemap(q{Basic})"
Each INPUT or OUTPUT typemap entry is a double-quoted Perl string that will be evaluated in the presence of certain variables to get the final C code for mapping a certain C type.
This means that you can embed Perl code in your typemap (C) code using
constructs such as
${ perl code that evaluates to scalar reference here }
. A common
use case is to generate error messages that refer to the true function
name even when using the ALIAS XS feature:
- ${ $ALIAS ? \q[GvNAME(CvGV(cv))] : \qq[\"$pname\"] }
For many typemap examples, refer to the core typemap file that can be found in the perl source tree at lib/ExtUtils/typemap.
The Perl variables that are available for interpolation into typemaps are the following:
$var - the name of the input or output variable, eg. RETVAL for return values.
$type - the raw C type of the parameter, any :
replaced with
_
.
$ntype - the supplied type with *
replaced with Ptr
.
e.g. for a type of Foo::Bar
, $ntype is Foo::Bar
$arg - the stack entry, that the parameter is input from or output
to, e.g. ST(0)
$argoff - the argument stack offset of the argument. ie. 0 for the first argument, etc.
$pname - the full name of the XSUB, with including the PACKAGE
name, with any PREFIX
stripped. This is the non-ALIAS name.
$Package - the package specified by the most recent PACKAGE
keyword.
$ALIAS - non-zero if the current XSUB has any aliases declared with
ALIAS
.
Each C type is represented by an entry in the typemap file that is responsible for converting perl variables (SV, AV, HV, CV, etc.) to and from that type. The following sections list all XS types that come with perl by default.
This simply passes the C representation of the Perl variable (an SV*) in and out of the XS layer. This can be used if the C code wants to deal directly with the Perl variable.
Used to pass in and return a reference to an SV.
Note that this typemap does not decrement the reference count when returning the reference to an SV*. See also: T_SVREF_REFCOUNT_FIXED
Used to pass in and return a reference to an SV. This is a fixed variant of T_SVREF that decrements the refcount appropriately when returning a reference to an SV*. Introduced in perl 5.15.4.
From the perl level this is a reference to a perl array. From the C level this is a pointer to an AV.
Note that this typemap does not decrement the reference count when returning an AV*. See also: T_AVREF_REFCOUNT_FIXED
From the perl level this is a reference to a perl array. From the C level this is a pointer to an AV. This is a fixed variant of T_AVREF that decrements the refcount appropriately when returning an AV*. Introduced in perl 5.15.4.
From the perl level this is a reference to a perl hash. From the C level this is a pointer to an HV.
Note that this typemap does not decrement the reference count when returning an HV*. See also: T_HVREF_REFCOUNT_FIXED
From the perl level this is a reference to a perl hash. From the C level this is a pointer to an HV. This is a fixed variant of T_HVREF that decrements the refcount appropriately when returning an HV*. Introduced in perl 5.15.4.
From the perl level this is a reference to a perl subroutine (e.g. $sub = sub { 1 };). From the C level this is a pointer to a CV.
Note that this typemap does not decrement the reference count when returning an HV*. See also: T_HVREF_REFCOUNT_FIXED
From the perl level this is a reference to a perl subroutine (e.g. $sub = sub { 1 };). From the C level this is a pointer to a CV.
This is a fixed variant of T_HVREF that decrements the refcount appropriately when returning an HV*. Introduced in perl 5.15.4.
The T_SYSRET typemap is used to process return values from system calls. It is only meaningful when passing values from C to perl (there is no concept of passing a system return value from Perl to C).
System calls return -1 on error (setting ERRNO with the reason)
and (usually) 0 on success. If the return value is -1 this typemap
returns undef. If the return value is not -1, this typemap
translates a 0 (perl false) to "0 but true" (which
is perl true) or returns the value itself, to indicate that the
command succeeded.
The POSIX module makes extensive use of this type.
An unsigned integer.
A signed integer. This is cast to the required integer type when passed to C and converted to an IV when passed back to Perl.
A signed integer. This typemap converts the Perl value to a native
integer type (the int type on the current platform). When returning
the value to perl it is processed in the same way as for T_IV.
Its behaviour is identical to using an int type in XS with T_IV.
An enum value. Used to transfer an enum component from C. There is no reason to pass an enum value to C since it is stored as an IV inside perl.
A boolean type. This can be used to pass true and false values to and from C.
This is for unsigned integers. It is equivalent to using T_UV
but explicitly casts the variable to type unsigned int
.
The default type for unsigned int
is T_UV.
Short integers. This is equivalent to T_IV but explicitly casts
the return to type short
. The default typemap for short
is T_IV.
Unsigned short integers. This is equivalent to T_UV but explicitly
casts the return to type unsigned short
. The default typemap for
unsigned short
is T_UV.
T_U_SHORT is used for type U16
in the standard typemap.
Long integers. This is equivalent to T_IV but explicitly casts
the return to type long
. The default typemap for long
is T_IV.
Unsigned long integers. This is equivalent to T_UV but explicitly
casts the return to type unsigned long
. The default typemap for
unsigned long
is T_UV.
T_U_LONG is used for type U32
in the standard typemap.
Single 8-bit characters.
An unsigned byte.
A floating point number. This typemap guarantees to return a variable
cast to a float
.
A Perl floating point number. Similar to T_IV and T_UV in that the return type is cast to the requested numeric type rather than to a specific type.
A double precision floating point number. This typemap guarantees to
return a variable cast to a double
.
A string (char *).
A memory address (pointer). Typically associated with a void *
type.
Similar to T_PTR except that the pointer is stored in a scalar and the reference to that scalar is returned to the caller. This can be used to hide the actual pointer value from the programmer since it is usually not required directly from within perl.
The typemap checks that a scalar reference is passed from perl to XS.
Similar to T_PTRREF except that the reference is blessed into a class. This allows the pointer to be used as an object. Most commonly used to deal with C structs. The typemap checks that the perl object passed into the XS routine is of the correct class (or part of a subclass).
The pointer is blessed into a class that is derived from the name of type of the pointer but with all '*' in the name replaced with 'Ptr'.
NOT YET
Similar to T_PTROBJ in that the pointer is blessed into a scalar object. The difference is that when the object is passed back into XS it must be of the correct type (inheritance is not supported).
The pointer is blessed into a class that is derived from the name of type of the pointer but with all '*' in the name replaced with 'Ptr'.
NOT YET
Similar to T_PTRREF, except the pointer stored in the referenced scalar is dereferenced and copied to the output variable. This means that T_REFREF is to T_PTRREF as T_OPAQUE is to T_OPAQUEPTR. All clear?
Only the INPUT part of this is implemented (Perl to XSUB) and there are no known users in core or on CPAN.
NOT YET
This can be used to store bytes in the string component of the SV. Here the representation of the data is irrelevant to perl and the bytes themselves are just stored in the SV. It is assumed that the C variable is a pointer (the bytes are copied from that memory location). If the pointer is pointing to something that is represented by 8 bytes then those 8 bytes are stored in the SV (and length() will report a value of 8). This entry is similar to T_OPAQUE.
In principle the unpack() command can be used to convert the bytes back to a number (if the underlying type is known to be a number).
This entry can be used to store a C structure (the number
of bytes to be copied is calculated using the C sizeof
function)
and can be used as an alternative to T_PTRREF without having to worry
about a memory leak (since Perl will clean up the SV).
This can be used to store data from non-pointer types in the string part of an SV. It is similar to T_OPAQUEPTR except that the typemap retrieves the pointer directly rather than assuming it is being supplied. For example, if an integer is imported into Perl using T_OPAQUE rather than T_IV the underlying bytes representing the integer will be stored in the SV but the actual integer value will not be available. i.e. The data is opaque to perl.
The data may be retrieved using the unpack function if the
underlying type of the byte stream is known.
T_OPAQUE supports input and output of simple types. T_OPAQUEPTR can be used to pass these bytes back into C if a pointer is acceptable.
xsubpp supports a special syntax for returning packed C arrays to perl. If the XS return type is given as
- array(type, nelem)
xsubpp will copy the contents of nelem * sizeof(type)
bytes from
RETVAL to an SV and push it onto the stack. This is only really useful
if the number of items to be returned is known at compile time and you
don't mind having a string of bytes in your SV. Use T_ARRAY to push a
variable number of arguments onto the return stack (they won't be
packed as a single string though).
This is similar to using T_OPAQUEPTR but can be used to process more than one element.
Calls user-supplied functions for conversion. For OUTPUT
(XSUB to Perl), a function named XS_pack_$ntype
is called
with the output Perl scalar and the C variable to convert from.
$ntype
is the normalized C type that is to be mapped to
Perl. Normalized means that all *
are replaced by the
string Ptr
. The return value of the function is ignored.
Conversely for INPUT
(Perl to XSUB) mapping, the
function named XS_unpack_$ntype
is called with the input Perl
scalar as argument and the return value is cast to the mapped
C type and assigned to the output C variable.
An example conversion function for a typemapped struct
foo_t *
might be:
- static void
- XS_pack_foo_tPtr(SV *out, foo_t *in)
- {
- dTHX; /* alas, signature does not include pTHX_ */
- HV* hash = newHV();
- hv_stores(hash, "int_member", newSViv(in->int_member));
- hv_stores(hash, "float_member", newSVnv(in->float_member));
- /* ... */
- /* mortalize as thy stack is not refcounted */
- sv_setsv(out, sv_2mortal(newRV_noinc((SV*)hash)));
- }
The conversion from Perl to C is left as an exercise to the reader, but the prototype would be:
- static foo_t *
- XS_unpack_foo_tPtr(SV *in);
Instead of an actual C function that has to fetch the thread context
using dTHX
, you can define macros of the same name and avoid the
overhead. Also, keep in mind to possibly free the memory allocated by
XS_unpack_foo_tPtr
.
T_PACKEDARRAY is similar to T_PACKED. In fact, the INPUT
(Perl
to XSUB) typemap is indentical, but the OUTPUT
typemap passes
an additional argument to the XS_pack_$ntype
function. This
third parameter indicates the number of elements in the output
so that the function can handle C arrays sanely. The variable
needs to be declared by the user and must have the name
count_$ntype
where $ntype
is the normalized C type name
as explained above. The signature of the function would be for
the example above and foo_t **
:
- static void
- XS_pack_foo_tPtrPtr(SV *out, foo_t *in, UV count_foo_tPtrPtr);
The type of the third parameter is arbitrary as far as the typemap is concerned. It just has to be in line with the declared variable.
Of course, unless you know the number of elements in the
sometype **
C array, within your XSUB, the return value from
foo_t ** XS_unpack_foo_tPtrPtr(...)
will be hard to decypher.
Since the details are all up to the XS author (the typemap user),
there are several solutions, none of which particularly elegant.
The most commonly seen solution has been to allocate memory for
N+1 pointers and assign NULL
to the (N+1)th to facilitate
iteration.
Alternatively, using a customized typemap for your purposes in the first place is probably preferrable.
NOT YET
NOT YET
This is used to convert the perl argument list to a C array and for pushing the contents of a C array onto the perl argument stack.
The usual calling signature is
- @out = array_func( @in );
Any number of arguments can occur in the list before the array but the input and output arrays must be the last elements in the list.
When used to pass a perl list to C the XS writer must provide a
function (named after the array type but with 'Ptr' substituted for
'*') to allocate the memory required to hold the list. A pointer
should be returned. It is up to the XS writer to free the memory on
exit from the function. The variable ix_$var
is set to the number
of elements in the new array.
When returning a C array to Perl the XS writer must provide an integer
variable called size_$var
containing the number of elements in the
array. This is used to determine how many elements should be pushed
onto the return argument stack. This is not required on input since
Perl knows how many arguments are on the stack when the routine is
called. Ordinarily this variable would be called size_RETVAL
.
Additionally, the type of each element is determined from the type of
the array. If the array uses type intArray *
xsubpp will
automatically work out that it contains variables of type int and
use that typemap entry to perform the copy of each element. All
pointer '*' and 'Array' tags are removed from the name to determine
the subtype.
This is used for passing perl filehandles to and from C using
FILE *
structures.
This is used for passing perl filehandles to and from C using
PerlIO *
structures. The file handle can used for reading and
writing. This corresponds to the +<
mode, see also T_IN
and T_OUT.
See perliol for more information on the Perl IO abstraction
layer. Perl must have been built with -Duseperlio
.
There is no check to assert that the filehandle passed from Perl
to C was created with the right open() mode.
Hint: The perlxstut tutorial covers the T_INOUT, T_IN, and T_OUT XS types nicely.
Same as T_INOUT, but the filehandle that is returned from C to Perl
can only be used for reading (mode <
).
Same as T_INOUT, but the filehandle that is returned from C to Perl
is set to use the open mode +>.
piconv -- iconv(1), reinvented in perl
- piconv [-f from_encoding] [-t to_encoding] [-s string] [files...]
- piconv -l
- piconv [-C N|-c|-p]
- piconv -S scheme ...
- piconv -r encoding
- piconv -D ...
- piconv -h
piconv is perl version of iconv, a character encoding converter widely available for various Unixen today. This script was primarily a technology demonstrator for Perl 5.8.0, but you can use piconv in the place of iconv for virtually any case.
piconv converts the character encoding of either STDIN or files specified in the argument and prints out to STDOUT.
Here is the list of options. Each option can be in short format (-f) or long (--from).
Specifies the encoding you are converting from. Unlike iconv, this option can be omitted. In such cases, the current locale is used.
Specifies the encoding you are converting to. Unlike iconv, this option can be omitted. In such cases, the current locale is used.
Therefore, when both -f and -t are omitted, piconv just acts like cat.
uses string instead of file for the source of text.
Lists all available encodings, one per line, in case-insensitive order. Note that only the canonical names are listed; many aliases exist. For example, the names are case-insensitive, and many standard and common aliases work, such as "latin1" for "ISO-8859-1", or "ibm850" instead of "cp850", or "winlatin1" for "cp1252". See Encode::Supported for a full discussion.
Check the validity of the stream if N = 1. When N = -1, something interesting happens when it encounters an invalid character.
Same as -C 1
.
Applies PERLQQ, HTMLCREF, XMLCREF, respectively. Try
- piconv -f utf8 -t ascii --perlqq
To see what it does.
Show usage.
Invokes debugging mode. Primarily for Encode hackers.
Selects which scheme is to be used for conversion. Available schemes are as follows:
Uses Encode::from_to for conversion. This is the default.
Input strings are decode()d then encode()d. A straight two-step implementation.
The new perlIO layer is used. NI-S' favorite.
You should use this option if you are using UTF-16 and others which linefeed is not $/.
Like the -D option, this is also for Encode hackers.
iconv(1) locale(3) Encode Encode::Supported Encode::Alias PerlIO
pod2html - convert .pod files to .html files
- pod2html --help --htmlroot=<name> --infile=<name> --outfile=<name>
- --podpath=<name>:...:<name> --podroot=<name>
- --recurse --norecurse --verbose
- --index --noindex --title=<name>
Converts files from pod format (see perlpod) to HTML format.
pod2html takes the following arguments:
- --help
Displays the usage message.
- --htmlroot=name
Sets the base URL for the HTML files. When cross-references are made, the HTML root is prepended to the URL.
- --infile=name
Specify the pod file to convert. Input is taken from STDIN if no infile is specified.
- --outfile=name
Specify the HTML file to create. Output goes to STDOUT if no outfile is specified.
- --podroot=name
Specify the base directory for finding library pods.
- --podpath=name:...:name
Specify which subdirectories of the podroot contain pod files whose HTML converted forms can be linked-to in cross-references.
- --index
Generate an index at the top of the HTML file (default behaviour).
- --noindex
Do not generate an index at the top of the HTML file.
- --recurse
Recurse into subdirectories specified in podpath (default behaviour).
- --norecurse
Do not recurse into subdirectories specified in podpath.
- --title=title
Specify the title of the resulting HTML file.
- --verbose
Display progress messages.
Tom Christiansen, <tchrist@perl.com>.
See Pod::Html for a list of known bugs in the translator.
This program is distributed under the Artistic License.
pod2latex - convert pod documentation to latex format
- pod2latex *.pm
- pod2latex -out mytex.tex *.pod
- pod2latex -full -sections 'DESCRIPTION|NAME' SomeDir
- pod2latex -prefile h.tex -postfile t.tex my.pod
pod2latex
is a program to convert POD format documentation
(perlpod) into latex. It can process multiple input documents at a
time and either generate a latex file per input document or a single
combined output file.
This section describes the supported command line options. Minimum matching is supported.
Name of the output file to be used. If there are multiple input pods
it is assumed that the intention is to write all translated output
into a single file. .tex is appended if not present. If the
argument is not supplied, a single document will be created for each
input file.
Creates a complete latex
file that can be processed immediately
(unless =for/=begin
directives are used that rely on extra packages).
Table of contents and index generation commands are included in the
wrapper latex
code.
Specify pod sections to include (or remove if negated) in the
translation. See SECTION SPECIFICATIONS in Pod::Select for the
format to use for section-spec. This option may be given multiple
times on the command line.This is identical to the similar option in
the podselect()
command.
This option causes the output latex
to be slightly
modified from the input pod such that when a =head1 NAME
is encountered a section is created containing the actual
pod name (rather than NAME) and all subsequent =head1
directives are treated as subsections. This has the advantage
that the description of a module will be in its own section
which is helpful for including module descriptions in documentation.
Also forces latex
label and index entries to be prefixed by the
name of the module.
Specifies the latex
section that is equivalent to a H1
pod
directive. This is an integer between 0 and 5 with 0 equivalent to a
latex
chapter, 1 equivalent to a latex
section etc. The default
is 1 (H1
equivalent to a latex section).
Print a brief help message and exit.
Print the manual page and exit.
Print information messages as each document is processed.
A user-supplied preamble for the LaTeX code. Multiple values are supported and appended in order separated by "\n". See -prefile for reading the preamble from a file.
A user supplied postamble for the LaTeX code. Multiple values are supported and appended in order separated by "\n". See -postfile for reading the postamble from a file.
A user-supplied preamble for the LaTeX code to be read from the named file. Multiple values are supported and appended in order. See -preamble.
A user-supplied postamble for the LaTeX code to be read from the named file. Multiple values are supported and appended in order. See -postamble.
Known bugs are:
Cross references between documents are not resolved when multiple
pod documents are converted into a single output latex
file.
Functions and variables are not automatically recognized and they will therefore not be marked up in any special way unless instructed by an explicit pod command.
Tim Jenness <tjenness@cpan.org>
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Copyright (C) 2000, 2003, 2004 Tim Jenness. All Rights Reserved.
pod2man - Convert POD data to formatted *roff input
pod2man [--center=string] [--date=string] [--errors=style] [--fixed=font] [--fixedbold=font] [--fixeditalic=font] [--fixedbolditalic=font] [--name=name] [--nourls] [--official] [--quotes=quotes] [--release[=version]] [--section=manext] [--stderr] [--utf8] [--verbose] [input [output] ...]
pod2man --help
pod2man is a front-end for Pod::Man, using it to generate *roff input from POD source. The resulting *roff code is suitable for display on a terminal using nroff(1), normally via man(1), or printing using troff(1).
input is the file to read for POD source (the POD can be embedded in
code). If input isn't given, it defaults to STDIN
. output, if
given, is the file to which to write the formatted output. If output
isn't given, the formatted output is written to STDOUT
. Several POD
files can be processed in the same pod2man invocation (saving module
load and compile times) by providing multiple pairs of input and
output files on the command line.
--section, --release, --center, --date, and --official can be used to set the headers and footers to use; if not given, Pod::Man will assume various defaults. See below or Pod::Man for details.
pod2man assumes that your *roff formatters have a fixed-width font
named CW
. If yours is called something else (like CR
), use
--fixed to specify it. This generally only matters for troff output
for printing. Similarly, you can set the fonts used for bold, italic, and
bold italic fixed-width output.
Besides the obvious pod conversions, Pod::Man, and therefore pod2man also
takes care of formatting func(), func(n), and simple variable references
like $foo or @bar so you don't have to use code escapes for them; complex
expressions like $fred{'stuff'}
will still need to be escaped, though.
It also translates dashes that aren't used as hyphens into en dashes, makes
long dashes--like this--into proper em dashes, fixes "paired quotes," and
takes care of several other troff-specific tweaks. See Pod::Man for
complete information.
Sets the centered page header to string. The default is "User Contributed Perl Documentation", but also see --official below.
Set the left-hand footer string to this value. By default, the modification
date of the input file will be used, or the current date if input comes from
STDIN
.
Set the error handling style. die says to throw an exception on any
POD formatting error. stderr
says to report errors on standard error,
but not to throw an exception. pod
says to include a POD ERRORS
section in the resulting documentation summarizing the errors. none
ignores POD errors entirely, as much as possible.
The default is die.
The fixed-width font to use for verbatim text and code. Defaults to
CW
. Some systems may want CR
instead. Only matters for troff(1)
output.
Bold version of the fixed-width font. Defaults to CB
. Only matters
for troff(1) output.
Italic version of the fixed-width font (actually, something of a misnomer,
since most fixed-width fonts only have an oblique version, not an italic
version). Defaults to CI
. Only matters for troff(1) output.
Bold italic (probably actually oblique) version of the fixed-width font.
Pod::Man doesn't assume you have this, and defaults to CB
. Some
systems (such as Solaris) have this font available as CX
. Only matters
for troff(1) output.
Print out usage information.
No longer used. pod2man used to check its input for validity as a manual page, but this should now be done by podchecker(1) instead. Accepted for backward compatibility; this option no longer does anything.
Set the name of the manual page to name. Without this option, the manual
name is set to the uppercased base name of the file being converted unless
the manual section is 3, in which case the path is parsed to see if it is a
Perl module path. If it is, a path like .../lib/Pod/Man.pm is converted
into a name like Pod::Man
. This option, if given, overrides any
automatic determination of the name.
Note that this option is probably not useful when converting multiple POD files at once. The convention for Unix man pages for commands is for the man page title to be in all-uppercase even if the command isn't.
Normally, L<> formatting codes with a URL but anchor text are formatted to show both the anchor text and the URL. In other words:
- L<foo|http://example.com/>
is formatted as:
- foo <http://example.com/>
This flag, if given, suppresses the URL when anchor text is given, so this
example would be formatted as just foo
. This can produce less
cluttered output in cases where the URLs are not particularly important.
Set the default header to indicate that this page is part of the standard Perl release, if --center is not also given.
Sets the quote marks used to surround C<> text to quotes. If quotes is a single character, it is used as both the left and right quote; if quotes is two characters, the first character is used as the left quote and the second as the right quoted; and if quotes is four characters, the first two are used as the left quote and the second two as the right quote.
quotes may also be set to the special value none
, in which case no
quote marks are added around C<> text (but the font is still changed for
troff output).
Set the centered footer. By default, this is the version of Perl you run pod2man under. Note that some system an macro sets assume that the centered footer will be a modification date and will prepend something like "Last modified: "; if this is the case, you may want to set --release to the last modified date and --date to the version number.
Set the section for the .TH macro. The standard section numbering
convention is to use 1 for user commands, 2 for system calls, 3 for
functions, 4 for devices, 5 for file formats, 6 for games, 7 for
miscellaneous information, and 8 for administrator commands. There is a lot
of variation here, however; some systems (like Solaris) use 4 for file
formats, 5 for miscellaneous information, and 7 for devices. Still others
use 1m instead of 8, or some mix of both. About the only section numbers
that are reliably consistent are 1, 2, and 3.
By default, section 1 will be used unless the file ends in .pm, in
which case section 3 will be selected.
By default, pod2man dies if any errors are detected in the POD input.
If --stderr is given and no --errors flag is present, errors are
sent to standard error, but pod2man does not abort. This is equivalent
to --errors=stderr
and is supported for backward compatibility.
By default, pod2man produces the most conservative possible *roff
output to try to ensure that it will work with as many different *roff
implementations as possible. Many *roff implementations cannot handle
non-ASCII characters, so this means all non-ASCII characters are converted
either to a *roff escape sequence that tries to create a properly accented
character (at least for troff output) or to X
.
This option says to instead output literal UTF-8 characters. If your *roff implementation can handle it, this is the best output format to use and avoids corruption of documents containing non-ASCII characters. However, be warned that *roff source with literal UTF-8 characters is not supported by many implementations and may even result in segfaults and other bad behavior.
Be aware that, when using this option, the input encoding of your POD
source must be properly declared unless it is US-ASCII or Latin-1. POD
input without an =encoding
command will be assumed to be in Latin-1,
and if it's actually in UTF-8, the output will be double-encoded. See
perlpod(1) for more information on the =encoding
command.
Print out the name of each output file as it is being generated.
As long as all documents processed result in some output, even if that
output includes errata (a POD ERRORS
section generated with
--errors=pod
), pod2man will exit with status 0. If any of the
documents being processed do not result in an output document, pod2man
will exit with status 1. If there are syntax errors in a POD document
being processed and the error handling style is set to the default of
die, pod2man will abort immediately with exit status 255.
If pod2man fails with errors, see Pod::Man and Pod::Simple for information about what those errors might mean.
- pod2man program > program.1
- pod2man SomeModule.pm /usr/perl/man/man3/SomeModule.3
- pod2man --section=7 note.pod > note.7
If you would like to print out a lot of man page continuously, you probably want to set the C and D registers to set contiguous page numbering and even/odd paging, at least on some versions of man(7).
- troff -man -rC1 -rD1 perl.1 perldata.1 perlsyn.1 ...
To get index entries on STDERR
, turn on the F register, as in:
- troff -man -rF1 perl.1
The indexing merely outputs messages via .tm for each major page,
section, subsection, item, and any X<>
directives. See
Pod::Man for more details.
Lots of this documentation is duplicated from Pod::Man.
Pod::Man, Pod::Simple, man(1), nroff(1), perlpod(1), podchecker(1), perlpodstyle(1), troff(1), man(7)
The man page documenting the an macro set may be man(5) instead of man(7) on your system.
The current version of this script is always available from its web site at http://www.eyrie.org/~eagle/software/podlators/. It is also part of the Perl core distribution as of 5.6.0.
Russ Allbery <rra@stanford.edu>, based very heavily on the original pod2man by Larry Wall and Tom Christiansen.
Copyright 1999, 2000, 2001, 2004, 2006, 2008, 2010, 2012, 2013 Russ Allbery <rra@stanford.edu>.
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
pod2text - Convert POD data to formatted ASCII text
pod2text [-aclostu] [--code] [--errors=style] [-i indent] [-q quotes] [--nourls] [--stderr] [-w width] [input [output ...]]
pod2text -h
pod2text is a front-end for Pod::Text and its subclasses. It uses them to generate formatted ASCII text from POD source. It can optionally use either termcap sequences or ANSI color escape sequences to format the text.
input is the file to read for POD source (the POD can be embedded in
code). If input isn't given, it defaults to STDIN
. output, if
given, is the file to which to write the formatted output. If output
isn't given, the formatted output is written to STDOUT
. Several POD
files can be processed in the same pod2text invocation (saving module
load and compile times) by providing multiple pairs of input and
output files on the command line.
Use an alternate output format that, among other things, uses a different
heading style and marks =item
entries with a colon in the left margin.
Include any non-POD text from the input file in the output as well. Useful for viewing code documented with POD blocks with the POD rendered and the code left intact.
Format the output with ANSI color escape sequences. Using this option requires that Term::ANSIColor be installed on your system.
Set the number of spaces to indent regular text, and the default indentation
for =over
blocks. Defaults to 4 spaces if this option isn't given.
Set the error handling style. die says to throw an exception on any
POD formatting error. stderr
says to report errors on standard error,
but not to throw an exception. pod
says to include a POD ERRORS
section in the resulting documentation summarizing the errors. none
ignores POD errors entirely, as much as possible.
The default is die.
Print out usage information and exit.
Print a blank line after a =head1
heading. Normally, no blank line is
printed after =head1
, although one is still printed after =head2
,
because this is the expected formatting for manual pages; if you're
formatting arbitrary text documents, using this option is recommended.
The width of the left margin in spaces. Defaults to 0. This is the margin for all text, including headings, not the amount by which regular text is indented; for the latter, see -i option.
Normally, L<> formatting codes with a URL but anchor text are formatted to show both the anchor text and the URL. In other words:
- L<foo|http://example.com/>
is formatted as:
- foo <http://example.com/>
This flag, if given, suppresses the URL when anchor text is given, so this
example would be formatted as just foo
. This can produce less
cluttered output in cases where the URLs are not particularly important.
Format the output with overstrike printing. Bold text is rendered as character, backspace, character. Italics and file names are rendered as underscore, backspace, character. Many pagers, such as less, know how to convert this to bold or underlined text.
Sets the quote marks used to surround C<> text to quotes. If quotes is a single character, it is used as both the left and right quote; if quotes is two characters, the first character is used as the left quote and the second as the right quoted; and if quotes is four characters, the first two are used as the left quote and the second two as the right quote.
quotes may also be set to the special value none
, in which case no
quote marks are added around C<> text.
Assume each sentence ends with two spaces and try to preserve that spacing. Without this option, all consecutive whitespace in non-verbatim paragraphs is compressed into a single space.
By default, pod2text dies if any errors are detected in the POD input.
If --stderr is given and no --errors flag is present, errors are
sent to standard error, but pod2text does not abort. This is
equivalent to --errors=stderr
and is supported for backward
compatibility.
Try to determine the width of the screen and the bold and underline sequences for the terminal from termcap, and use that information in formatting the output. Output will be wrapped at two columns less than the width of your terminal device. Using this option requires that your system have a termcap file somewhere where Term::Cap can find it and requires that your system support termios. With this option, the output of pod2text will contain terminal control sequences for your current terminal type.
By default, pod2text tries to use the same output encoding as its input encoding (to be backward-compatible with older versions). This option says to instead force the output encoding to UTF-8.
Be aware that, when using this option, the input encoding of your POD
source must be properly declared unless it is US-ASCII or Latin-1. POD
input without an =encoding
command will be assumed to be in Latin-1,
and if it's actually in UTF-8, the output will be double-encoded. See
perlpod(1) for more information on the =encoding
command.
The column at which to wrap text on the right-hand side. Defaults to 76, unless -t is given, in which case it's two columns less than the width of your terminal device.
As long as all documents processed result in some output, even if that
output includes errata (a POD ERRORS
section generated with
--errors=pod
), pod2text will exit with status 0. If any of the
documents being processed do not result in an output document, pod2text
will exit with status 1. If there are syntax errors in a POD document
being processed and the error handling style is set to the default of
die, pod2text will abort immediately with exit status 255.
If pod2text fails with errors, see Pod::Text and Pod::Simple for information about what those errors might mean. Internally, it can also produce the following diagnostics:
(F) -c or --color were given, but Term::ANSIColor could not be loaded.
(F) An unknown command line option was given.
In addition, other Getopt::Long error messages may result from invalid command-line options.
If -t is given, pod2text will take the current width of your screen from this environment variable, if available. It overrides terminal width information in TERMCAP.
If -t is given, pod2text will use the contents of this environment variable if available to determine the correct formatting sequences for your current terminal device.
Pod::Text, Pod::Text::Color, Pod::Text::Overstrike, Pod::Text::Termcap, Pod::Simple, perlpod(1)
The current version of this script is always available from its web site at http://www.eyrie.org/~eagle/software/podlators/. It is also part of the Perl core distribution as of 5.6.0.
Russ Allbery <rra@stanford.edu>.
Copyright 1999, 2000, 2001, 2004, 2006, 2008, 2010, 2012, 2013 Russ Allbery <rra@stanford.edu>.
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
pod2usage - print usage messages from embedded pod docs in files
[-help] [-man] [-exit exitval] [-output outfile] [-verbose level] [-pathlist dirlist] [-formatter module] file
Print a brief help message and exit.
Print this command's manual page and exit.
The exit status value to return.
The output file to print to. If the special names "-" or ">&1" or ">&STDOUT" are used then standard output is used. If ">&2" or ">&STDERR" is used then standard error is used.
The desired level of verbosity to use:
- 1 : print SYNOPSIS only
- 2 : print SYNOPSIS sections and any OPTIONS/ARGUMENTS sections
- 3 : print the entire manpage (similar to running pod2text)
Specifies one or more directories to search for the input file if it was not supplied with an absolute path. Each directory path in the given list should be separated by a ':' on Unix (';' on MSWin32 and DOS).
Which text formatter to use. Default is Pod::Text, or for very old Perl versions Pod::PlainText. An alternative would be e.g. Pod::Text::Termcap.
The pathname of a file containing pod documentation to be output in usage message format (defaults to standard input).
pod2usage will read the given input file looking for pod documentation and will print the corresponding usage message. If no input file is specified then standard input is read.
pod2usage invokes the pod2usage() function in the Pod::Usage module. Please see pod2usage() in Pod::Usage.
Pod::Usage, pod2text(1)
Please report bugs using http://rt.cpan.org.
Brad Appleton <bradapp@enteract.com>
Based on code for pod2text(1) written by Tom Christiansen <tchrist@mox.perl.com>
podchecker - check the syntax of POD format documentation files
podchecker [-help] [-man] [-(no)warnings] [file ...]
Print a brief help message and exit.
Print the manual page and exit.
Turn on/off printing of warnings. Repeating -warnings increases the warning level, i.e. more warnings are printed. Currently increasing to level two causes flagging of unescaped "<,>" characters.
The pathname of a POD file to syntax-check (defaults to standard input).
podchecker will read the given input files looking for POD syntax errors in the POD documentation and will print any errors it find to STDERR. At the end, it will print a status message indicating the number of errors found.
Directories are ignored, an appropriate warning message is printed.
podchecker invokes the podchecker() function exported by Pod::Checker Please see podchecker() in Pod::Checker for more details.
podchecker returns a 0 (zero) exit status if all specified POD files are ok.
podchecker returns the exit status 1 if at least one of the given POD files has syntax errors.
The status 2 indicates that at least one of the specified files does not contain any POD commands.
Status 1 overrides status 2. If you want unambiguous results, call podchecker with one single argument only.
Please report bugs using http://rt.cpan.org.
Brad Appleton <bradapp@enteract.com>, Marek Rouchal <marekr@cpan.org>
Based on code for Pod::Text::pod2text(1) written by Tom Christiansen <tchrist@mox.perl.com>
podselect - print selected sections of pod documentation on standard output
podselect [-help] [-man] [-section section-spec] [file ...]
Print a brief help message and exit.
Print the manual page and exit.
Specify a section to include in the output. See SECTION SPECIFICATIONS in Pod::Parser for the format to use for section-spec. This option may be given multiple times on the command line.
The pathname of a file from which to select sections of pod documentation (defaults to standard input).
podselect will read the given input files looking for pod documentation and will print out (in raw pod format) all sections that match one ore more of the given section specifications. If no section specifications are given than all pod sections encountered are output.
podselect invokes the podselect() function exported by Pod::Select Please see podselect() in Pod::Select for more details.
Please report bugs using http://rt.cpan.org.
Brad Appleton <bradapp@enteract.com>
Based on code for Pod::Text::pod2text(1) written by Tom Christiansen <tchrist@mox.perl.com>
prove - Run tests through a TAP harness.
- prove [options] [files or directories]
Boolean options:
- -v, --verbose Print all test lines.
- -l, --lib Add 'lib' to the path for your tests (-Ilib).
- -b, --blib Add 'blib/lib' and 'blib/arch' to the path for
- your tests
- -s, --shuffle Run the tests in random order.
- -c, --color Colored test output (default).
- --nocolor Do not color test output.
- --count Show the X/Y test count when not verbose
- (default)
- --nocount Disable the X/Y test count.
- -D --dry Dry run. Show test that would have run.
- --ext Set the extension for tests (default '.t')
- -f, --failures Show failed tests.
- -o, --comments Show comments.
- --ignore-exit Ignore exit status from test scripts.
- -m, --merge Merge test scripts' STDERR with their STDOUT.
- -r, --recurse Recursively descend into directories.
- --reverse Run the tests in reverse order.
- -q, --quiet Suppress some test output while running tests.
- -Q, --QUIET Only print summary results.
- -p, --parse Show full list of TAP parse errors, if any.
- --directives Only show results with TODO or SKIP directives.
- --timer Print elapsed time after each test.
- --trap Trap Ctrl-C and print summary on interrupt.
- --normalize Normalize TAP output in verbose output
- -T Enable tainting checks.
- -t Enable tainting warnings.
- -W Enable fatal warnings.
- -w Enable warnings.
- -h, --help Display this help
- -?, Display this help
- -H, --man Longer manpage for prove
- --norc Don't process default .proverc
Options that take arguments:
- -I Library paths to include.
- -P Load plugin (searches App::Prove::Plugin::*.)
- -M Load a module.
- -e, --exec Interpreter to run the tests ('' for compiled
- tests.)
- --harness Define test harness to use. See TAP::Harness.
- --formatter Result formatter to use. See FORMATTERS.
- --source Load and/or configure a SourceHandler. See
- SOURCE HANDLERS.
- -a, --archive out.tgz Store the resulting TAP in an archive file.
- -j, --jobs N Run N test jobs in parallel (try 9.)
- --state=opts Control prove's persistent state.
- --rc=rcfile Process options from rcfile
If ~/.proverc or ./.proverc exist they will be read and any options they contain processed before the command line options. Options in .proverc are specified in the same way as command line options:
- # .proverc
- --state=hot,fast,save
- -j9
Additional option files may be specified with the --rc
option.
Default option file processing is disabled by the --norc
option.
Under Windows and VMS the option file is named _proverc rather than .proverc and is sought only in the current directory.
STDIN
If you have a list of tests (or URLs, or anything else you want to test) in a file, you can add them to your tests by using a '-':
- prove - < my_list_of_things_to_test.txt
See the README
in the examples
directory of this distribution.
If no files or directories are supplied, prove
looks for all files
matching the pattern t/*.t.
Colored test output is the default, but if output is not to a
terminal, color is disabled. You can override this by adding the
--color
switch.
Color support requires Term::ANSIColor on Unix-like platforms and Win32::Console windows. If the necessary module is not installed colored output will not be available.
If the tests fail prove
will exit with non-zero status.
It is possible to supply arguments to tests. To do so separate them from prove's own arguments with the arisdottle, '::'. For example
- prove -v t/mytest.t :: --url http://example.com
would run t/mytest.t with the options '--url http://example.com'. When running multiple tests they will each receive the same arguments.
--exec
Normally you can just pass a list of Perl tests and the harness will know how
to execute them. However, if your tests are not written in Perl or if you
want all tests invoked exactly the same way, use the -e
, or --exec
switch:
- prove --exec '/usr/bin/ruby -w' t/
- prove --exec '/usr/bin/perl -Tw -mstrict -Ilib' t/
- prove --exec '/path/to/my/customer/exec'
--merge
If you need to make sure your diagnostics are displayed in the correct
order relative to test results you can use the --merge
option to
merge the test scripts' STDERR into their STDOUT.
This guarantees that STDOUT (where the test results appear) and STDERR (where the diagnostics appear) will stay in sync. The harness will display any diagnostics your tests emit on STDERR.
Caveat: this is a bit of a kludge. In particular note that if anything that appears on STDERR looks like a test result the test harness will get confused. Use this option only if you understand the consequences and can live with the risk.
--trap
The --trap
option will attempt to trap SIGINT (Ctrl-C) during a test
run and display the test summary even if the run is interrupted
--state
You can ask prove
to remember the state of previous test runs and
select and/or order the tests to be run based on that saved state.
The --state
switch requires an argument which must be a comma
separated list of one or more of the following options.
last
Run the same tests as the last time the state was saved. This makes it possible, for example, to recreate the ordering of a shuffled test.
- # Run all tests in random order
- $ prove -b --state=save --shuffle
- # Run them again in the same order
- $ prove -b --state=last
failed
Run only the tests that failed on the last run.
- # Run all tests
- $ prove -b --state=save
- # Run failures
- $ prove -b --state=failed
If you also specify the save
option newly passing tests will be
excluded from subsequent runs.
- # Repeat until no more failures
- $ prove -b --state=failed,save
passed
Run only the passed tests from last time. Useful to make sure that no new problems have been introduced.
all
Run all tests in normal order. Multple options may be specified, so to run all tests with the failures from last time first:
- $ prove -b --state=failed,all,save
hot
Run the tests that most recently failed first. The last failure time of
each test is stored. The hot
option causes tests to be run in most-recent-
failure order.
- $ prove -b --state=hot,save
Tests that have never failed will not be selected. To run all tests with the most recently failed first use
- $ prove -b --state=hot,all,save
This combination of options may also be specified thus
- $ prove -b --state=adrian
todo
Run any tests with todos.
slow
Run the tests in slowest to fastest order. This is useful in conjunction
with the -j
parallel testing switch to ensure that your slowest tests
start running first.
- $ prove -b --state=slow -j9
fast
Run test tests in fastest to slowest order.
new
Run the tests in newest to oldest order based on the modification times of the test scripts.
old
Run the tests in oldest to newest order.
fresh
Run those test scripts that have been modified since the last test run.
save
Save the state on exit. The state is stored in a file called .prove (_prove on Windows and VMS) in the current directory.
The --state
switch may be used more than once.
- $ prove -b --state=hot --state=all,save
prove introduces a separation between "options passed to the perl which
runs prove" and "options passed to the perl which runs tests"; this
distinction is by design. Thus the perl which is running a test starts
with the default @INC
. Additional library directories can be added
via the PERL5LIB
environment variable, via -Ifoo in PERL5OPT
or
via the -Ilib
option to prove.
Normally when a Perl program is run in taint mode the contents of the
PERL5LIB
environment variable do not appear in @INC
.
Because PERL5LIB
is often used during testing to add build
directories to @INC
prove passes the names of any directories found
in PERL5LIB
as -I switches. The net effect of this is that
PERL5LIB
is honoured even when prove is run in taint mode.
You can load a custom TAP::Parser::Formatter:
- prove --formatter MyFormatter
You can load custom TAP::Parser::SourceHandlers, to change the way the parser interprets particular sources of TAP.
- prove --source MyHandler --source YetAnother t
If you want to provide config to the source you can use:
- prove --source MyCustom \
- --source Perl --perl-option 'foo=bar baz' --perl-option avg=0.278 \
- --source File --file-option extensions=.txt --file-option extensions=.tmp t
- --source pgTAP --pgtap-option pset=format=html --pgtap-option pset=border=2
Each --$source-option
option must specify a key/value pair separated by an
=
. If an option can take multiple values, just specify it multiple times,
as with the extensions=
examples above. If the option should be a hash
reference, specify the value as a second pair separated by a =
, as in the
pset=
examples above (escape =
with a backslash).
All --sources
are combined into a hash, and passed to new in TAP::Harness's
sources
parameter.
See TAP::Parser::IteratorFactory for more details on how configuration is passed to SourceHandlers.
Plugins can be loaded using the -Pplugin syntax, eg:
- prove -PMyPlugin
This will search for a module named App::Prove::Plugin::MyPlugin
, or failing
that, MyPlugin
. If the plugin can't be found, prove
will complain & exit.
You can pass arguments to your plugin by appending =arg1,arg2,etc
to the
plugin name:
- prove -PMyPlugin=fou,du,fafa
Please check individual plugin documentation for more details.
For an up-to-date list of plugins available, please check CPAN:
http://search.cpan.org/search?query=App%3A%3AProve+Plugin
Please see PLUGINS in App::Prove.
psed - a stream editor
- psed [-an] script [file ...]
- psed [-an] [-e script] [-f script-file] [file ...]
- s2p [-an] [-e script] [-f script-file]
A stream editor reads the input stream consisting of the specified files
(or standard input, if none are given), processes is line by line by
applying a script consisting of edit commands, and writes resulting lines
to standard output. The filename '-
' may be used to read standard input.
The edit script is composed from arguments of -e options and script-files, in the given order. A single script argument may be specified as the first parameter.
If this program is invoked with the name s2p, it will act as a sed-to-Perl translator. See SED SCRIPT TRANSLATION.
sed returns an exit code of 0 on success or >0 if an error occurred.
A file specified as argument to the w edit command is by default opened before input processing starts. Using -a, opening of such files is delayed until the first line is actually written to the file.
The editing commands defined by script are appended to the script. Multiple commands must be separated by newlines.
Editing commands from the specified script-file are read and appended to the script.
By default, a line is written to standard output after the editing script has been applied to it. The -n option suppresses automatic printing.
sed command syntax is defined as
[address[,address]][!]function[argument]
with whitespace being permitted before or after addresses, and between
the function character and the argument. The addresses and the
address inverter (!
) are used to restrict the application of a
command to the selected line(s) of input.
Each command must be on a line of its own, except where noted in the synopses below.
The edit cycle performed on each input line consist of reading the line (without its trailing newline character) into the pattern space, applying the applicable commands of the edit script, writing the final contents of the pattern space and a newline to the standard output. A hold space is provided for saving the contents of the pattern space for later use.
A sed address is either a line number or a pattern, which may be combined arbitrarily to construct ranges. Lines are numbered across all input files.
Any address may be followed by an exclamation mark ('!
'), selecting
all lines not matching that address.
The line with the given number is selected.
A dollar sign ($
) is the line number of the last line of the input stream.
A pattern address is a basic regular expression (see
BASIC REGULAR EXPRESSIONS), between the delimiting character /.
Any other character except \
or newline may be used to delimit a
pattern address when the initial delimiter is prefixed with a
backslash ('\
').
If no address is given, the command selects every line.
If one address is given, it selects the line (or lines) matching the address.
Two addresses select a range that begins whenever the first address matches, and ends (including that line) when the second address matches. If the first (second) address is a matching pattern, the second address is not applied to the very same line to determine the end of the range. Likewise, if the second address is a matching pattern, the first address is not applied to the very same line to determine the begin of another range. If both addresses are line numbers, and the second line number is less than the first line number, then only the first line is selected.
The maximum permitted number of addresses is indicated with each function synopsis below.
The argument text consists of one or more lines following the command. Embedded newlines in text must be preceded with a backslash. Other backslashes in text are deleted and the following character is taken literally.
Write text (which must start on the line following the command) to standard output immediately before reading the next line of input, either by executing the N function or by beginning a new cycle.
Branch to the : function with the specified label. If no label is given, branch to the end of the script.
The line, or range of lines, selected by the address is deleted. The text (which must start on the line following the command) is written to standard output. With an address range, this occurs at the end of the range.
Deletes the pattern space and starts the next cycle.
Deletes the pattern space through the first embedded newline or to the end. If the pattern space becomes empty, a new cycle is started, otherwise execution of the script is restarted.
Replace the contents of the pattern space with the hold space.
Append a newline and the contents of the hold space to the pattern space.
Replace the contents of the hold space with the pattern space.
Append a newline and the contents of the pattern space to the hold space.
Write the text (which must start on the line following the command) to standard output.
Print the contents of the pattern space: non-printable characters are
shown in C-style escaped form; long lines are split and have a trailing
^'\
' at the point of the split; the true end of a line is marked with
a '$
'. Escapes are: '\a', '\t', '\n', '\f', '\r', '\e' for
BEL, HT, LF, FF, CR, ESC, respectively, and '\' followed by a three-digit
octal number for all other non-printable characters.
If automatic printing is enabled, write the pattern space to the standard output. Replace the pattern space with the next line of input. If there is no more input, processing is terminated.
Append a newline and the next line of input to the pattern space. If there is no more input, processing is terminated.
Print the pattern space to the standard output. (Use the -n option to suppress automatic printing at the end of a cycle if you want to avoid double printing of lines.)
Prints the pattern space through the first embedded newline or to the end.
Branch to the end of the script and quit without starting a new cycle.
Copy the contents of the file to standard output immediately before the next attempt to read a line of input. Any error encountered while reading file is silently ignored.
Substitute the replacement string for the first substring in
the pattern space that matches the regular expression.
Any character other than backslash or newline can be used instead of a
slash to delimit the regular expression and the replacement.
To use the delimiter as a literal character within the regular expression
and the replacement, precede the character by a backslash ('\
').
Literal newlines may be embedded in the replacement string by preceding a newline with a backslash.
Within the replacement, an ampersand ('&
') is replaced by the string
matching the regular expression. The strings '\1
' through '\9
' are
replaced by the corresponding subpattern (see BASIC REGULAR EXPRESSIONS).
To get a literal '&
' or '\
' in the replacement text, precede it
by a backslash.
The following flags modify the behaviour of the s command:
The replacement is performed for all matching, non-overlapping substrings of the pattern space.
Replace only the n-th matching substring of the pattern space.
If the substitution was made, print the new value of the pattern space.
If the substitution was made, write the new value of the pattern space to the specified file.
Branch to the : function with the specified label if any s substitutions have been made since the most recent reading of an input line or execution of a t function. If no label is given, branch to the end of the script.
The contents of the pattern space are written to the file.
Swap the contents of the pattern space and the hold space.
In the pattern space, replace all characters occurring in string1 by the character at the corresponding position in string2. It is possible to use any character (other than a backslash or newline) instead of a slash to delimit the strings. Within string1 and string2, a backslash followed by any character other than a newline is that literal character, and a backslash followed by an 'n' is replaced by a newline character.
Prints the current line number on the standard output.
The command specifies the position of the label. It has no other effect.
These two commands begin and end a command list. The first command may be given on the same line as the opening { command. The commands within the list are jointly selected by the address(es) given on the { command (but may still have individual addresses).
The entire line is ignored (treated as a comment). If, however, the first
two characters in the script are '#n
', automatic printing of output is
suppressed, as if the -n option were given on the command line.
A Basic Regular Expression (BRE), as defined in POSIX 1003.2, consists of atoms, for matching parts of a string, and bounds, specifying repetitions of a preceding atom.
The possible atoms of a BRE are: ., matching any single character; ^ and $, matching the null string at the beginning or end of a string, respectively; a bracket expressions, enclosed in [ and ] (see below); and any single character with no other significance (matching that character). A \ before one of: ., ^, $, [, *, \, matching the character after the backslash. A sequence of atoms enclosed in \( and \) becomes an atom and establishes the target for a backreference, consisting of the substring that actually matches the enclosed atoms. Finally, \ followed by one of the digits 0 through 9 is a backreference.
A ^ that is not first, or a $ that is not last does not have a special significance and need not be preceded by a backslash to become literal. The same is true for a ], that does not terminate a bracket expression.
An unescaped backslash cannot be last in a BRE.
The BRE bounds are: *, specifying 0 or more matches of the preceding atom; \{count\}, specifying that many repetitions; \{minimum,\}, giving a lower limit; and \{minimum,maximum\} finally defines a lower and upper bound.
A bound appearing as the first item in a BRE is taken literally.
A bracket expression is a list of characters, character ranges and character classes enclosed in [ and ] and matches any single character from the represented set of characters.
A character range is written as two characters separated by - and represents all characters (according to the character collating sequence) that are not less than the first and not greater than the second. (Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them.)
A character class is one of the class names
- alnum digit punct
- alpha graph space
- blank lower upper
- cntrl print xdigit
enclosed in [: and :] and represents the set of characters as defined in ctype(3).
If the first character after [ is ^, the sense of matching is inverted.
To include a literal '^', place it anywhere else but first. To
include a literal ']' place it first or immediately after an
initial ^. To include a literal '-
' make it the first (or
second after ^) or last character, or the second endpoint of
a range.
The special bracket expression constructs [[:<:]] and [[:>:]]
match the null string at the beginning and end of a word respectively.
(Note that neither is identical to Perl's '\b' atom.)
Since some sed implementations provide additional regular expression atoms (not defined in POSIX 1003.2), psed is capable of translating the following backslash escapes:
[[:>:]].
[[:<:]].
[[:alnum:]_].
[^[:alnum:]_].
To enable this feature, the environment variable PSEDEXTBRE must be set
to a string containing the requested characters, e.g.:
PSEDEXTBRE='<>wW'
.
The environment variable PSEDEXTBRE
may be set to extend BREs.
See Additional Atoms.
The indicated character appears twice, with different translations.
A '[' in a BRE indicates the beginning of a bracket expression.
A '\' in a BRE is used to make the subsequent character literal.
A '\' in a substitution string is used to make the subsequent character literal.
In an s command, either the 'g' flag and an n-th occurrence flag, or multiple n-th occurrence flags are specified. Note that only the digits ^'1' through '9' are permitted.
The command has more than the permitted number of addresses.
The BRE and substitution may not be delimited with '\' or newline.
The specified backreference number exceeds the number of backreferences in the BRE.
The repeat clause does not contain a valid integer value, or pair of values.
The first or second string of a y command is syntactically incorrect.
There must be at least one -e or one -f option specifying a script or script file.
The translation table strings in a y command must have equal lengths.
A } command without a preceding { command was encountered.
The end of the script was reached although a text line after a a, c or i command indicated another line.
A BRE contains an unterminated bracket expression.
A BRE contains an unterminated backreference.
A BRE contains an unterminated bounds specification.
The basic material for the preceding section was generated by running the sed script
- #no autoprint
- s/^.*Warn( *"\([^"]*\)".*$/\1/
- t process
- b
- :process
- s/$!/%s/g
- s/$[_[:alnum:]]\{1,\}/%s/g
- s/\\\\/\\/g
- s/^/=item /
- p
on the program's own text, and piping the output into sort -u
.
If this program is invoked with the name s2p it will act as a sed-to-Perl translator. After option processing (all other arguments are ignored), a Perl program is printed on standard output, which will process the input stream (as read from all arguments) in the way defined by the sed script and the option setting used for the translation.
perl(1), re_format(7)
The l command will show escape characters (ESC) as '\e
', but
a vertical tab (VT) in octal.
Trailing spaces are truncated from labels in :, t and b commands.
The meaning of an empty regular expression ('//
'), as defined by sed,
is "the last pattern used, at run time". This deviates from the Perl
interpretation, which will re-use the "last last successfully executed
regular expression". Since keeping track of pattern usage would create
terribly cluttered code, and differences would only appear in obscure
context (where other sed implementations appear to deviate, too),
the Perl semantics was adopted. Note that common usage of this feature,
such as in /abc/s//xyz/
, will work as expected.
Collating elements (of bracket expressions in BREs) are not implemented.
This sed implementation conforms to the IEEE Std1003.2-1992 ("POSIX.2") definition of sed, and is compatible with the OpenBSD implementation, except where otherwise noted (see BUGS).
This Perl implementation of sed was written by Wolfgang Laun, Wolfgang.Laun@alcatel.at.
This program is free and open software. You may use, modify, distribute, and sell this program (and any modified variants) in any way you wish, provided you do not restrict others from doing the same.
c2ph, pstruct - Dump C structures as generated from cc -g -S
stabs
- c2ph [-dpnP] [var=val] [files ...]
- Options:
- -w wide; short for: type_width=45 member_width=35 offset_width=8
- -x hex; short for: offset_fmt=x offset_width=08 size_fmt=x size_width=04
- -n do not generate perl code (default when invoked as pstruct)
- -p generate perl code (default when invoked as c2ph)
- -v generate perl code, with C decls as comments
- -i do NOT recompute sizes for intrinsic datatypes
- -a dump information on intrinsics also
- -t trace execution
- -d spew reams of debugging output
- -slist give comma-separated list a structures to dump
The following is the old c2ph.doc documentation by Tom Christiansen <tchrist@perl.com> Date: 25 Jul 91 08:10:21 GMT
Once upon a time, I wrote a program called pstruct. It was a perl program that tried to parse out C structures and display their member offsets for you. This was especially useful for people looking at binary dumps or poking around the kernel.
Pstruct was not a pretty program. Neither was it particularly robust. The problem, you see, was that the C compiler was much better at parsing C than I could ever hope to be.
So I got smart: I decided to be lazy and let the C compiler parse the C, which would spit out debugger stabs for me to read. These were much easier to parse. It's still not a pretty program, but at least it's more robust.
Pstruct takes any .c or .h files, or preferably .s ones, since that's the format it is going to massage them into anyway, and spits out listings like this:
- struct tty {
- int tty.t_locker 000 4
- int tty.t_mutex_index 004 4
- struct tty * tty.t_tp_virt 008 4
- struct clist tty.t_rawq 00c 20
- int tty.t_rawq.c_cc 00c 4
- int tty.t_rawq.c_cmax 010 4
- int tty.t_rawq.c_cfx 014 4
- int tty.t_rawq.c_clx 018 4
- struct tty * tty.t_rawq.c_tp_cpu 01c 4
- struct tty * tty.t_rawq.c_tp_iop 020 4
- unsigned char * tty.t_rawq.c_buf_cpu 024 4
- unsigned char * tty.t_rawq.c_buf_iop 028 4
- struct clist tty.t_canq 02c 20
- int tty.t_canq.c_cc 02c 4
- int tty.t_canq.c_cmax 030 4
- int tty.t_canq.c_cfx 034 4
- int tty.t_canq.c_clx 038 4
- struct tty * tty.t_canq.c_tp_cpu 03c 4
- struct tty * tty.t_canq.c_tp_iop 040 4
- unsigned char * tty.t_canq.c_buf_cpu 044 4
- unsigned char * tty.t_canq.c_buf_iop 048 4
- struct clist tty.t_outq 04c 20
- int tty.t_outq.c_cc 04c 4
- int tty.t_outq.c_cmax 050 4
- int tty.t_outq.c_cfx 054 4
- int tty.t_outq.c_clx 058 4
- struct tty * tty.t_outq.c_tp_cpu 05c 4
- struct tty * tty.t_outq.c_tp_iop 060 4
- unsigned char * tty.t_outq.c_buf_cpu 064 4
- unsigned char * tty.t_outq.c_buf_iop 068 4
- (*int)() tty.t_oproc_cpu 06c 4
- (*int)() tty.t_oproc_iop 070 4
- (*int)() tty.t_stopproc_cpu 074 4
- (*int)() tty.t_stopproc_iop 078 4
- struct thread * tty.t_rsel 07c 4
etc.
Actually, this was generated by a particular set of options. You can control the formatting of each column, whether you prefer wide or fat, hex or decimal, leading zeroes or whatever.
All you need to be able to use this is a C compiler than generates BSD/GCC-style stabs. The -g option on native BSD compilers and GCC should get this for you.
To learn more, just type a bogus option, like -\?, and a long usage message will be provided. There are a fair number of possibilities.
If you're only a C programmer, than this is the end of the message for you. You can quit right now, and if you care to, save off the source and run it when you feel like it. Or not.
But if you're a perl programmer, then for you I have something much more wondrous than just a structure offset printer.
You see, if you call pstruct by its other incybernation, c2ph, you have a code generator that translates C code into perl code! Well, structure and union declarations at least, but that's quite a bit.
Prior to this point, anyone programming in perl who wanted to interact with C programs, like the kernel, was forced to guess the layouts of the C structures, and then hardwire these into his program. Of course, when you took your wonderfully crafted program to a system where the sgtty structure was laid out differently, your program broke. Which is a shame.
We've had Larry's h2ph translator, which helped, but that only works on cpp symbols, not real C, which was also very much needed. What I offer you is a symbolic way of getting at all the C structures. I've couched them in terms of packages and functions. Consider the following program:
- #!/usr/local/bin/perl
- require 'syscall.ph';
- require 'sys/time.ph';
- require 'sys/resource.ph';
- $ru = "\0" x &rusage'sizeof();
- syscall(&SYS_getrusage, &RUSAGE_SELF, $ru) && die "getrusage: $!";
- @ru = unpack($t = &rusage'typedef(), $ru);
- $utime = $ru[ &rusage'ru_utime + &timeval'tv_sec ]
- + ($ru[ &rusage'ru_utime + &timeval'tv_usec ]) / 1e6;
- $stime = $ru[ &rusage'ru_stime + &timeval'tv_sec ]
- + ($ru[ &rusage'ru_stime + &timeval'tv_usec ]) / 1e6;
- printf "you have used %8.3fs+%8.3fu seconds.\n", $utime, $stime;
As you see, the name of the package is the name of the structure. Regular fields are just their own names. Plus the following accessor functions are provided for your convenience:
- struct This takes no arguments, and is merely the number of first-level
- elements in the structure. You would use this for indexing
- into arrays of structures, perhaps like this
- $usec = $u[ &user'u_utimer
- + (&ITIMER_VIRTUAL * &itimerval'struct)
- + &itimerval'it_value
- + &timeval'tv_usec
- ];
- sizeof Returns the bytes in the structure, or the member if
- you pass it an argument, such as
- &rusage'sizeof(&rusage'ru_utime)
- typedef This is the perl format definition for passing to pack and
- unpack. If you ask for the typedef of a nothing, you get
- the whole structure, otherwise you get that of the member
- you ask for. Padding is taken care of, as is the magic to
- guarantee that a union is unpacked into all its aliases.
- Bitfields are not quite yet supported however.
- offsetof This function is the byte offset into the array of that
- member. You may wish to use this for indexing directly
- into the packed structure with vec() if you're too lazy
- to unpack it.
- typeof Not to be confused with the typedef accessor function, this
- one returns the C type of that field. This would allow
- you to print out a nice structured pretty print of some
- structure without knoning anything about it beforehand.
- No args to this one is a noop. Someday I'll post such
- a thing to dump out your u structure for you.
The way I see this being used is like basically this:
- % h2ph <some_include_file.h > /usr/lib/perl/tmp.ph
- % c2ph some_include_file.h >> /usr/lib/perl/tmp.ph
- % install
It's a little tricker with c2ph because you have to get the includes right. I can't know this for your system, but it's not usually too terribly difficult.
The code isn't pretty as I mentioned -- I never thought it would be a 1000- line program when I started, or I might not have begun. :-) But I would have been less cavalier in how the parts of the program communicated with each other, etc. It might also have helped if I didn't have to divine the makeup of the stabs on the fly, and then account for micro differences between my compiler and gcc.
Anyway, here it is. Should run on perl v4 or greater. Maybe less.
- --tom
- ptar - a tar-like program written in perl
- ptar is a small, tar look-alike program that uses the perl module
- Archive::Tar to extract, create and list tar archives.
- ptar -c [-v] [-z] [-C] [-f ARCHIVE_FILE | -] FILE FILE ...
- ptar -c [-v] [-z] [-C] [-T index | -] [-f ARCHIVE_FILE | -]
- ptar -x [-v] [-z] [-f ARCHIVE_FILE | -]
- ptar -t [-z] [-f ARCHIVE_FILE | -]
- ptar -h
- c Create ARCHIVE_FILE or STDOUT (-) from FILE
- x Extract from ARCHIVE_FILE or STDIN (-)
- t List the contents of ARCHIVE_FILE or STDIN (-)
- f Name of the ARCHIVE_FILE to use. Default is './default.tar'
- z Read/Write zlib compressed ARCHIVE_FILE (not always available)
- v Print filenames as they are added or extracted from ARCHIVE_FILE
- h Prints this help message
- C CPAN mode - drop 022 from permissions
- T get names to create from file
- tar(1), L<Archive::Tar>.
ptardiff - program that diffs an extracted archive against an unextracted one
- ptardiff is a small program that diffs an extracted archive
- against an unextracted one, using the perl module Archive::Tar.
- This effectively lets you view changes made to an archives contents.
- Provide the progam with an ARCHIVE_FILE and it will look up all
- the files with in the archive, scan the current working directory
- for a file with the name and diff it against the contents of the
- archive.
- ptardiff ARCHIVE_FILE
- ptardiff -h
- $ tar -xzf Acme-Buffy-1.3.tar.gz
- $ vi Acme-Buffy-1.3/README
- [...]
- $ ptardiff Acme-Buffy-1.3.tar.gz > README.patch
- h Prints this help message
tar(1), Archive::Tar.
re - Perl pragma to alter regular expression behaviour
- use re 'taint';
- ($x) = ($^X =~ /^(.*)$/s); # $x is tainted here
- $pat = '(?{ $foo = 1 })';
- use re 'eval';
- /foo${pat}bar/; # won't fail (when not under -T
- # switch)
- {
- no re 'taint'; # the default
- ($x) = ($^X =~ /^(.*)$/s); # $x is not tainted here
- no re 'eval'; # the default
- /foo${pat}bar/; # disallowed (with or without -T
- # switch)
- }
- use re '/ix';
- "FOO" =~ / foo /; # /ix implied
- no re '/x';
- "FOO" =~ /foo/; # just /i implied
- use re 'debug'; # output debugging info during
- /^(.*)$/s; # compile and run time
- use re 'debugcolor'; # same as 'debug', but with colored
- # output
- ...
- use re qw(Debug All); # Same as "use re 'debug'", but you
- # can use "Debug" with things other
- # than 'All'
- use re qw(Debug More); # 'All' plus output more details
- no re qw(Debug ALL); # Turn on (almost) all re debugging
- # in this scope
- use re qw(is_regexp regexp_pattern); # import utility functions
- my ($pat,$mods)=regexp_pattern(qr/foo/i);
- if (is_regexp($obj)) {
- print "Got regexp: ",
- scalar regexp_pattern($obj); # just as perl would stringify
- } # it but no hassle with blessed
- # re's.
(We use $^X in these examples because it's tainted by default.)
When use re 'taint'
is in effect, and a tainted string is the target
of a regexp, the regexp memories (or values returned by the m// operator
in list context) are tainted. This feature is useful when regexp operations
on tainted data aren't meant to extract safe substrings, but to perform
other transformations.
When use re 'eval'
is in effect, a regexp is allowed to contain
(?{ ... }) zero-width assertions and (??{ ... })
postponed
subexpressions that are derived from variable interpolation, rather than
appearing literally within the regexp. That is normally disallowed, since
it is a
potential security risk. Note that this pragma is ignored when the regular
expression is obtained from tainted data, i.e. evaluation is always
disallowed with tainted regular expressions. See (?{ code }) in perlre
and (??{ code }) in perlre.
For the purpose of this pragma, interpolation of precompiled regular
expressions (i.e., the result of qr//) is not considered variable
interpolation. Thus:
- /foo${pat}bar/
is allowed if $pat is a precompiled regular expression, even
if $pat contains (?{ ... }) assertions or (??{ ... })
subexpressions.
When use re '/flags'
is specified, the given flags are automatically
added to every regular expression till the end of the lexical scope.
no re '/flags'
will turn off the effect of use re '/flags'
for the
given flags.
For example, if you want all your regular expressions to have /msx on by default, simply put
- use re '/msx';
at the top of your code.
The character set /adul flags cancel each other out. So, in this example,
the second use re
does an implicit no re '/u'
.
Turning on one of the character set flags with use re
takes precedence over the
locale
pragma and the 'unicode_strings' feature
, for regular
expressions. Turning off one of these flags when it is active reverts to
the behaviour specified by whatever other pragmata are in scope. For
example:
When use re 'debug'
is in effect, perl emits debugging messages when
compiling and using regular expressions. The output is the same as that
obtained by running a -DDEBUGGING
-enabled perl interpreter with the
-Dr switch. It may be quite voluminous depending on the complexity
of the match. Using debugcolor
instead of debug
enables a
form of output that can be used to get a colorful display on terminals
that understand termcap color sequences. Set $ENV{PERL_RE_TC}
to a
comma-separated list of termcap
properties to use for highlighting
strings on/off, pre-point part on/off.
See Debugging Regular Expressions in perldebug for additional info.
As of 5.9.5 the directive use re 'debug'
and its equivalents are
lexically scoped, as the other directives are. However they have both
compile-time and run-time effects.
See Pragmatic Modules in perlmodlib.
Similarly use re 'Debug'
produces debugging output, the difference
being that it allows the fine tuning of what debugging output will be
emitted. Options are divided into three groups, those related to
compilation, those related to execution and those related to special
purposes. The options are as follows:
Turns on all compile related debug options.
Turns on debug output related to the process of parsing the pattern.
Enables output related to the optimisation phase of compilation.
Detailed info about trie compilation.
Dump the final program out after it is compiled and optimised.
Turns on all "extra" debugging options.
Enable debugging the capture group storage during match. Warning, this can potentially produce extremely large output.
Enable enhanced TRIE debugging. Enhances both TRIEE and TRIEC.
Enable debugging of states in the engine.
Enable debugging of the recursion stack in the engine. Enabling or disabling this option automatically does the same for debugging states as well. This output from this can be quite large.
Enable enhanced optimisation debugging and start-point optimisations. Probably not useful except when debugging the regexp engine itself.
Dump offset information. This can be used to see how regops correlate to the pattern. Output format is
- NODENUM:POSITION[LENGTH]
Where 1 is the position of the first char in the string. Note that position can be 0, or larger than the actual length of the pattern, likewise length can be zero.
Enable debugging of offsets information. This emits copious amounts of trace information and doesn't mesh well with other debug options.
Almost definitely only useful to people hacking on the offsets part of the debug engine.
These are useful shortcuts to save on the typing.
Enable all options at once except OFFSETS, OFFSETSDBG and BUFFERS. (To get every single option without exception, use both ALL and EXTRA.)
Enable DUMP and all execute options. Equivalent to:
- use re 'debug';
Enable the options enabled by "All", plus STATE, TRIEC, and TRIEM.
As of 5.9.5 the directive use re 'debug'
and its equivalents are
lexically scoped, as are the other directives. However they have both
compile-time and run-time effects.
As of perl 5.9.5 're' debug contains a number of utility functions that may be optionally exported into the caller's namespace. They are listed below.
Returns true if the argument is a compiled regular expression as returned
by qr//, false if it is not.
This function will not be confused by overloading or blessing. In internals terms, this extracts the regexp pointer out of the PERL_MAGIC_qr structure so it cannot be fooled.
If the argument is a compiled regular expression as returned by qr//,
then this function returns the pattern.
In list context it returns a two element list, the first element containing the pattern and the second containing the modifiers used when the pattern was compiled.
- my ($pat, $mods) = regexp_pattern($ref);
In scalar context it returns the same as perl would when stringifying a raw
qr// with the same pattern inside. If the argument is not a compiled
reference then this routine returns false but defined in scalar context,
and the empty list in list context. Thus the following
- if (regexp_pattern($ref) eq '(?^i:foo)')
will be warning free regardless of what $ref actually is.
Like is_regexp
this function will not be confused by overloading
or blessing of the object.
If the argument is a compiled regular expression as returned by qr//,
then this function returns what the optimiser considers to be the longest
anchored fixed string and longest floating fixed string in the pattern.
A fixed string is defined as being a substring that must appear for the pattern to match. An anchored fixed string is a fixed string that must appear at a particular offset from the beginning of the match. A floating fixed string is defined as a fixed string that can appear at any point in a range of positions relative to the start of the match. For example,
results in
- anchored:'here'
- floating:'there'
Because the here
is before the .* in the pattern, its position
can be determined exactly. That's not true, however, for the there
;
it could appear at any point after where the anchored string appeared.
Perl uses both for its optimisations, prefering the longer, or, if they are
equal, the floating.
NOTE: This may not necessarily be the definitive longest anchored and floating string. This will be what the optimiser of the Perl that you are using thinks is the longest. If you believe that the result is wrong please report it via the perlbug utility.
Returns the contents of a named buffer of the last successful match. If $all is true, then returns an array ref containing one entry per buffer, otherwise returns the first defined buffer.
Returns a list of all of the named buffers defined in the last successful match. If $all is true, then it returns all names defined, if not it returns only names which were involved in the match.
Returns the number of distinct names defined in the pattern used for the last successful match.
Note: this result is always the actual number of distinct
named buffers defined, it may not actually match that which is
returned by regnames()
and related routines when those routines
have not been called with the $all parameter set.
psed - a stream editor
- psed [-an] script [file ...]
- psed [-an] [-e script] [-f script-file] [file ...]
- s2p [-an] [-e script] [-f script-file]
A stream editor reads the input stream consisting of the specified files
(or standard input, if none are given), processes is line by line by
applying a script consisting of edit commands, and writes resulting lines
to standard output. The filename '-
' may be used to read standard input.
The edit script is composed from arguments of -e options and script-files, in the given order. A single script argument may be specified as the first parameter.
If this program is invoked with the name s2p, it will act as a sed-to-Perl translator. See SED SCRIPT TRANSLATION.
sed returns an exit code of 0 on success or >0 if an error occurred.
A file specified as argument to the w edit command is by default opened before input processing starts. Using -a, opening of such files is delayed until the first line is actually written to the file.
The editing commands defined by script are appended to the script. Multiple commands must be separated by newlines.
Editing commands from the specified script-file are read and appended to the script.
By default, a line is written to standard output after the editing script has been applied to it. The -n option suppresses automatic printing.
sed command syntax is defined as
[address[,address]][!]function[argument]
with whitespace being permitted before or after addresses, and between
the function character and the argument. The addresses and the
address inverter (!
) are used to restrict the application of a
command to the selected line(s) of input.
Each command must be on a line of its own, except where noted in the synopses below.
The edit cycle performed on each input line consist of reading the line (without its trailing newline character) into the pattern space, applying the applicable commands of the edit script, writing the final contents of the pattern space and a newline to the standard output. A hold space is provided for saving the contents of the pattern space for later use.
A sed address is either a line number or a pattern, which may be combined arbitrarily to construct ranges. Lines are numbered across all input files.
Any address may be followed by an exclamation mark ('!
'), selecting
all lines not matching that address.
The line with the given number is selected.
A dollar sign ($
) is the line number of the last line of the input stream.
A pattern address is a basic regular expression (see
BASIC REGULAR EXPRESSIONS), between the delimiting character /.
Any other character except \
or newline may be used to delimit a
pattern address when the initial delimiter is prefixed with a
backslash ('\
').
If no address is given, the command selects every line.
If one address is given, it selects the line (or lines) matching the address.
Two addresses select a range that begins whenever the first address matches, and ends (including that line) when the second address matches. If the first (second) address is a matching pattern, the second address is not applied to the very same line to determine the end of the range. Likewise, if the second address is a matching pattern, the first address is not applied to the very same line to determine the begin of another range. If both addresses are line numbers, and the second line number is less than the first line number, then only the first line is selected.
The maximum permitted number of addresses is indicated with each function synopsis below.
The argument text consists of one or more lines following the command. Embedded newlines in text must be preceded with a backslash. Other backslashes in text are deleted and the following character is taken literally.
Write text (which must start on the line following the command) to standard output immediately before reading the next line of input, either by executing the N function or by beginning a new cycle.
Branch to the : function with the specified label. If no label is given, branch to the end of the script.
The line, or range of lines, selected by the address is deleted. The text (which must start on the line following the command) is written to standard output. With an address range, this occurs at the end of the range.
Deletes the pattern space and starts the next cycle.
Deletes the pattern space through the first embedded newline or to the end. If the pattern space becomes empty, a new cycle is started, otherwise execution of the script is restarted.
Replace the contents of the pattern space with the hold space.
Append a newline and the contents of the hold space to the pattern space.
Replace the contents of the hold space with the pattern space.
Append a newline and the contents of the pattern space to the hold space.
Write the text (which must start on the line following the command) to standard output.
Print the contents of the pattern space: non-printable characters are
shown in C-style escaped form; long lines are split and have a trailing
^'\
' at the point of the split; the true end of a line is marked with
a '$
'. Escapes are: '\a', '\t', '\n', '\f', '\r', '\e' for
BEL, HT, LF, FF, CR, ESC, respectively, and '\' followed by a three-digit
octal number for all other non-printable characters.
If automatic printing is enabled, write the pattern space to the standard output. Replace the pattern space with the next line of input. If there is no more input, processing is terminated.
Append a newline and the next line of input to the pattern space. If there is no more input, processing is terminated.
Print the pattern space to the standard output. (Use the -n option to suppress automatic printing at the end of a cycle if you want to avoid double printing of lines.)
Prints the pattern space through the first embedded newline or to the end.
Branch to the end of the script and quit without starting a new cycle.
Copy the contents of the file to standard output immediately before the next attempt to read a line of input. Any error encountered while reading file is silently ignored.
Substitute the replacement string for the first substring in
the pattern space that matches the regular expression.
Any character other than backslash or newline can be used instead of a
slash to delimit the regular expression and the replacement.
To use the delimiter as a literal character within the regular expression
and the replacement, precede the character by a backslash ('\
').
Literal newlines may be embedded in the replacement string by preceding a newline with a backslash.
Within the replacement, an ampersand ('&
') is replaced by the string
matching the regular expression. The strings '\1
' through '\9
' are
replaced by the corresponding subpattern (see BASIC REGULAR EXPRESSIONS).
To get a literal '&
' or '\
' in the replacement text, precede it
by a backslash.
The following flags modify the behaviour of the s command:
The replacement is performed for all matching, non-overlapping substrings of the pattern space.
Replace only the n-th matching substring of the pattern space.
If the substitution was made, print the new value of the pattern space.
If the substitution was made, write the new value of the pattern space to the specified file.
Branch to the : function with the specified label if any s substitutions have been made since the most recent reading of an input line or execution of a t function. If no label is given, branch to the end of the script.
The contents of the pattern space are written to the file.
Swap the contents of the pattern space and the hold space.
In the pattern space, replace all characters occurring in string1 by the character at the corresponding position in string2. It is possible to use any character (other than a backslash or newline) instead of a slash to delimit the strings. Within string1 and string2, a backslash followed by any character other than a newline is that literal character, and a backslash followed by an 'n' is replaced by a newline character.
Prints the current line number on the standard output.
The command specifies the position of the label. It has no other effect.
These two commands begin and end a command list. The first command may be given on the same line as the opening { command. The commands within the list are jointly selected by the address(es) given on the { command (but may still have individual addresses).
The entire line is ignored (treated as a comment). If, however, the first
two characters in the script are '#n
', automatic printing of output is
suppressed, as if the -n option were given on the command line.
A Basic Regular Expression (BRE), as defined in POSIX 1003.2, consists of atoms, for matching parts of a string, and bounds, specifying repetitions of a preceding atom.
The possible atoms of a BRE are: ., matching any single character; ^ and $, matching the null string at the beginning or end of a string, respectively; a bracket expressions, enclosed in [ and ] (see below); and any single character with no other significance (matching that character). A \ before one of: ., ^, $, [, *, \, matching the character after the backslash. A sequence of atoms enclosed in \( and \) becomes an atom and establishes the target for a backreference, consisting of the substring that actually matches the enclosed atoms. Finally, \ followed by one of the digits 0 through 9 is a backreference.
A ^ that is not first, or a $ that is not last does not have a special significance and need not be preceded by a backslash to become literal. The same is true for a ], that does not terminate a bracket expression.
An unescaped backslash cannot be last in a BRE.
The BRE bounds are: *, specifying 0 or more matches of the preceding atom; \{count\}, specifying that many repetitions; \{minimum,\}, giving a lower limit; and \{minimum,maximum\} finally defines a lower and upper bound.
A bound appearing as the first item in a BRE is taken literally.
A bracket expression is a list of characters, character ranges and character classes enclosed in [ and ] and matches any single character from the represented set of characters.
A character range is written as two characters separated by - and represents all characters (according to the character collating sequence) that are not less than the first and not greater than the second. (Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them.)
A character class is one of the class names
- alnum digit punct
- alpha graph space
- blank lower upper
- cntrl print xdigit
enclosed in [: and :] and represents the set of characters as defined in ctype(3).
If the first character after [ is ^, the sense of matching is inverted.
To include a literal '^', place it anywhere else but first. To
include a literal ']' place it first or immediately after an
initial ^. To include a literal '-
' make it the first (or
second after ^) or last character, or the second endpoint of
a range.
The special bracket expression constructs [[:<:]] and [[:>:]]
match the null string at the beginning and end of a word respectively.
(Note that neither is identical to Perl's '\b' atom.)
Since some sed implementations provide additional regular expression atoms (not defined in POSIX 1003.2), psed is capable of translating the following backslash escapes:
[[:>:]].
[[:<:]].
[[:alnum:]_].
[^[:alnum:]_].
To enable this feature, the environment variable PSEDEXTBRE must be set
to a string containing the requested characters, e.g.:
PSEDEXTBRE='<>wW'
.
The environment variable PSEDEXTBRE
may be set to extend BREs.
See Additional Atoms.
The indicated character appears twice, with different translations.
A '[' in a BRE indicates the beginning of a bracket expression.
A '\' in a BRE is used to make the subsequent character literal.
A '\' in a substitution string is used to make the subsequent character literal.
In an s command, either the 'g' flag and an n-th occurrence flag, or multiple n-th occurrence flags are specified. Note that only the digits ^'1' through '9' are permitted.
The command has more than the permitted number of addresses.
The BRE and substitution may not be delimited with '\' or newline.
The specified backreference number exceeds the number of backreferences in the BRE.
The repeat clause does not contain a valid integer value, or pair of values.
The first or second string of a y command is syntactically incorrect.
There must be at least one -e or one -f option specifying a script or script file.
The translation table strings in a y command must have equal lengths.
A } command without a preceding { command was encountered.
The end of the script was reached although a text line after a a, c or i command indicated another line.
A BRE contains an unterminated bracket expression.
A BRE contains an unterminated backreference.
A BRE contains an unterminated bounds specification.
The basic material for the preceding section was generated by running the sed script
- #no autoprint
- s/^.*Warn( *"\([^"]*\)".*$/\1/
- t process
- b
- :process
- s/$!/%s/g
- s/$[_[:alnum:]]\{1,\}/%s/g
- s/\\\\/\\/g
- s/^/=item /
- p
on the program's own text, and piping the output into sort -u
.
If this program is invoked with the name s2p it will act as a sed-to-Perl translator. After option processing (all other arguments are ignored), a Perl program is printed on standard output, which will process the input stream (as read from all arguments) in the way defined by the sed script and the option setting used for the translation.
perl(1), re_format(7)
The l command will show escape characters (ESC) as '\e
', but
a vertical tab (VT) in octal.
Trailing spaces are truncated from labels in :, t and b commands.
The meaning of an empty regular expression ('//
'), as defined by sed,
is "the last pattern used, at run time". This deviates from the Perl
interpretation, which will re-use the "last last successfully executed
regular expression". Since keeping track of pattern usage would create
terribly cluttered code, and differences would only appear in obscure
context (where other sed implementations appear to deviate, too),
the Perl semantics was adopted. Note that common usage of this feature,
such as in /abc/s//xyz/
, will work as expected.
Collating elements (of bracket expressions in BREs) are not implemented.
This sed implementation conforms to the IEEE Std1003.2-1992 ("POSIX.2") definition of sed, and is compatible with the OpenBSD implementation, except where otherwise noted (see BUGS).
This Perl implementation of sed was written by Wolfgang Laun, Wolfgang.Laun@alcatel.at.
This program is free and open software. You may use, modify, distribute, and sell this program (and any modified variants) in any way you wish, provided you do not restrict others from doing the same.
shasum - Print or Check SHA Checksums
- Usage: shasum [OPTION]... [FILE]...
- Print or check SHA checksums.
- With no FILE, or when FILE is -, read standard input.
- -a, --algorithm 1 (default), 224, 256, 384, 512, 512224, 512256
- -b, --binary read in binary mode
- -c, --check read SHA sums from the FILEs and check them
- -t, --text read in text mode (default)
- -p, --portable read in portable mode
- produces same digest on Windows/Unix/Mac
- -0, --01 read in BITS mode
- ASCII '0' interpreted as 0-bit,
- ASCII '1' interpreted as 1-bit,
- all other characters ignored
- The following two options are useful only when verifying checksums:
- -s, --status don't output anything, status code shows success
- -w, --warn warn about improperly formatted checksum lines
- -h, --help display this help and exit
- -v, --version output version information and exit
- When verifying SHA-512/224 or SHA-512/256 checksums, indicate the
- algorithm explicitly using the -a option, e.g.
- shasum -a 512224 -c checksumfile
- The sums are computed as described in FIPS-180-4. When checking, the
- input should be a former output of this program. The default mode is to
- print a line with checksum, a character indicating type (`*' for binary,
- ` ' for text, `?' for portable, `^' for BITS), and name for each FILE.
- Report shasum bugs to mshelor@cpan.org
Running shasum is often the quickest way to compute SHA message digests. The user simply feeds data to the script through files or standard input, and then collects the results from standard output.
The following command shows how to compute digests for typical inputs such as the NIST test vector "abc":
- perl -e "print qq(abc)" | shasum
Or, if you want to use SHA-256 instead of the default SHA-1, simply say:
- perl -e "print qq(abc)" | shasum -a 256
Since shasum mimics the behavior of the combined GNU sha1sum, sha224sum, sha256sum, sha384sum, and sha512sum programs, you can install this script as a convenient drop-in replacement.
Unlike the GNU programs, shasum encompasses the full SHA standard by allowing partial-byte inputs. This is accomplished through the BITS option (-0). The following example computes the SHA-224 digest of the 7-bit message 0001100:
- perl -e "print qq(0001100)" | shasum -0 -a 224
Copyright (c) 2003-2013 Mark Shelor <mshelor@cpan.org>.
shasum is implemented using the Perl module Digest::SHA or Digest::SHA::PurePerl.
sigtrap - Perl pragma to enable simple signal handling
- use sigtrap;
- use sigtrap qw(stack-trace old-interface-signals); # equivalent
- use sigtrap qw(BUS SEGV PIPE ABRT);
- use sigtrap qw(die INT QUIT);
- use sigtrap qw(die normal-signals);
- use sigtrap qw(die untrapped normal-signals);
- use sigtrap qw(die untrapped normal-signals
- stack-trace any error-signals);
- use sigtrap 'handler' => \&my_handler, 'normal-signals';
- use sigtrap qw(handler my_handler normal-signals
- stack-trace error-signals);
The sigtrap pragma is a simple interface to installing signal
handlers. You can have it install one of two handlers supplied by
sigtrap itself (one which provides a Perl stack trace and one which
simply die()s), or alternately you can supply your own handler for it
to install. It can be told only to install a handler for signals which
are either untrapped or ignored. It has a couple of lists of signals to
trap, plus you can supply your own list of signals.
The arguments passed to the use statement which invokes sigtrap
are processed in order. When a signal name or the name of one of
sigtrap's signal lists is encountered a handler is immediately
installed, when an option is encountered it affects subsequently
installed handlers.
These options affect which handler will be used for subsequently installed signals.
The handler used for subsequently installed signals outputs a Perl stack trace to STDERR and then tries to dump core. This is the default signal handler.
The handler used for subsequently installed signals calls die
(actually croak
) with a message indicating which signal was caught.
your-handler will be used as the handler for subsequently installed
signals. your-handler can be any value which is valid as an
assignment to an element of %SIG
. See perlvar for examples of
handler functions.
sigtrap has a few built-in lists of signals to trap. They are:
These are the signals which a program might normally expect to encounter and which by default cause it to terminate. They are HUP, INT, PIPE and TERM.
These signals usually indicate a serious problem with the Perl interpreter or with your script. They are ABRT, BUS, EMT, FPE, ILL, QUIT, SEGV, SYS and TRAP.
These are the signals which were trapped by default by the old sigtrap interface, they are ABRT, BUS, EMT, FPE, ILL, PIPE, QUIT, SEGV, SYS, TERM, and TRAP. If no signals or signals lists are passed to sigtrap, this list is used.
For each of these three lists, the collection of signals set to be trapped is checked before trapping; if your architecture does not implement a particular signal, it will not be trapped but rather silently ignored.
This token tells sigtrap to install handlers only for subsequently listed signals which aren't already trapped or ignored.
This token tells sigtrap to install handlers for all subsequently listed signals. This is the default behavior.
Any argument which looks like a signal name (that is,
/^[A-Z][A-Z0-9]*$/
) indicates that sigtrap should install a
handler for that name.
Require that at least version number of sigtrap is being used.
Provide a stack trace for the old-interface-signals:
- use sigtrap;
Ditto:
- use sigtrap qw(stack-trace old-interface-signals);
Provide a stack trace on the 4 listed signals only:
- use sigtrap qw(BUS SEGV PIPE ABRT);
Die on INT or QUIT:
- use sigtrap qw(die INT QUIT);
Die on HUP, INT, PIPE or TERM:
- use sigtrap qw(die normal-signals);
Die on HUP, INT, PIPE or TERM, except don't change the behavior for signals which are already trapped or ignored:
- use sigtrap qw(die untrapped normal-signals);
Die on receipt one of an of the normal-signals which is currently untrapped, provide a stack trace on receipt of any of the error-signals:
- use sigtrap qw(die untrapped normal-signals
- stack-trace any error-signals);
Install my_handler() as the handler for the normal-signals:
- use sigtrap 'handler', \&my_handler, 'normal-signals';
Install my_handler() as the handler for the normal-signals, provide a Perl stack trace on receipt of one of the error-signals:
- use sigtrap qw(handler my_handler normal-signals
- stack-trace error-signals);
sort - perl pragma to control sort() behaviour
- use sort 'stable'; # guarantee stability
- use sort '_quicksort'; # use a quicksort algorithm
- use sort '_mergesort'; # use a mergesort algorithm
- use sort 'defaults'; # revert to default behavior
- no sort 'stable'; # stability not important
- use sort '_qsort'; # alias for quicksort
- my $current;
- BEGIN {
- $current = sort::current(); # identify prevailing algorithm
- }
With the sort pragma you can control the behaviour of the builtin
sort() function.
In Perl versions 5.6 and earlier the quicksort algorithm was used to
implement sort(), but in Perl 5.8 a mergesort algorithm was also made
available, mainly to guarantee worst case O(N log N) behaviour:
the worst case of quicksort is O(N**2). In Perl 5.8 and later,
quicksort defends against quadratic behaviour by shuffling large
arrays before sorting.
A stable sort means that for records that compare equal, the original input ordering is preserved. Mergesort is stable, quicksort is not. Stability will matter only if elements that compare equal can be distinguished in some other way. That means that simple numerical and lexical sorts do not profit from stability, since equal elements are indistinguishable. However, with a comparison such as
stability might matter because elements that compare equal on the first 3 characters may be distinguished based on subsequent characters. In Perl 5.8 and later, quicksort can be stabilized, but doing so will add overhead, so it should only be done if it matters.
The best algorithm depends on many things. On average, mergesort
does fewer comparisons than quicksort, so it may be better when
complicated comparison routines are used. Mergesort also takes
advantage of pre-existing order, so it would be favored for using
sort() to merge several sorted arrays. On the other hand, quicksort
is often faster for small arrays, and on arrays of a few distinct
values, repeated many times. You can force the
choice of algorithm with this pragma, but this feels heavy-handed,
so the subpragmas beginning with a _
may not persist beyond Perl 5.8.
The default algorithm is mergesort, which will be stable even if
you do not explicitly demand it.
But the stability of the default sort is a side-effect that could
change in later versions. If stability is important, be sure to
say so with a
The no sort
pragma doesn't
forbid what follows, it just leaves the choice open. Thus, after
a mergesort, which happens to be stable, will be employed anyway. Note that
have exactly the same effect, leaving the choice of sort algorithm open.
As of Perl 5.10, this pragma is lexically scoped and takes effect
at compile time. In earlier versions its effect was global and took
effect at run-time; the documentation suggested using eval() to
change the behaviour:
- { eval 'use sort qw(defaults _quicksort)'; # force quicksort
- eval 'no sort "stable"'; # stability not wanted
- print sort::current . "\n";
- @a = sort @b;
- eval 'use sort "defaults"'; # clean up, for others
- }
- { eval 'use sort qw(defaults stable)'; # force stability
- print sort::current . "\n";
- @c = sort @d;
- eval 'use sort "defaults"'; # clean up, for others
- }
Such code no longer has the desired effect, for two reasons.
Firstly, the use of eval() means that the sorting algorithm
is not changed until runtime, by which time it's too late to
have any effect. Secondly, sort::current is also called at
run-time, when in fact the compile-time value of sort::current
is the one that matters.
So now this code would be written:
- { use sort qw(defaults _quicksort); # force quicksort
- no sort "stable"; # stability not wanted
- my $current;
- BEGIN { $current = sort::current; }
- print "$current\n";
- @a = sort @b;
- # Pragmas go out of scope at the end of the block
- }
- { use sort qw(defaults stable); # force stability
- my $current;
- BEGIN { $current = sort::current; }
- print "$current\n";
- @c = sort @d;
- }
diagnostics, splain - produce verbose warning diagnostics
Using the diagnostics
pragma:
Using the splain
standalone filter program:
- perl program 2>diag.out
- splain [-v] [-p] diag.out
Using diagnostics to get stack traces from a misbehaving script:
- perl -Mdiagnostics=-traceonly my_script.pl
diagnostics
PragmaThis module extends the terse diagnostics normally emitted by both the
perl compiler and the perl interpreter (from running perl with a -w
switch or use warnings
), augmenting them with the more
explicative and endearing descriptions found in perldiag. Like the
other pragmata, it affects the compilation phase of your program rather
than merely the execution phase.
To use in your program as a pragma, merely invoke
- use diagnostics;
at the start (or near the start) of your program. (Note that this does enable perl's -w flag.) Your whole compilation will then be subject(ed :-) to the enhanced diagnostics. These still go out STDERR.
Due to the interaction between runtime and compiletime issues,
and because it's probably not a very good idea anyway,
you may not use no diagnostics
to turn them off at compiletime.
However, you may control their behaviour at runtime using the
disable() and enable() methods to turn them off and on respectively.
The -verbose flag first prints out the perldiag introduction before any other diagnostics. The $diagnostics::PRETTY variable can generate nicer escape sequences for pagers.
Warnings dispatched from perl itself (or more accurately, those that match descriptions found in perldiag) are only displayed once (no duplicate descriptions). User code generated warnings a la warn() are unaffected, allowing duplicate user messages to be displayed.
This module also adds a stack trace to the error message when perl dies. This is useful for pinpointing what caused the death. The -traceonly (or just -t) flag turns off the explanations of warning messages leaving just the stack traces. So if your script is dieing, run it again with
- perl -Mdiagnostics=-traceonly my_bad_script
to see the call stack at the time of death. By supplying the -warntrace (or just -w) flag, any warnings emitted will also come with a stack trace.
While apparently a whole nuther program, splain is actually nothing
more than a link to the (executable) diagnostics.pm module, as well as
a link to the diagnostics.pod documentation. The -v flag is like
the use diagnostics -verbose
directive.
The -p flag is like the
$diagnostics::PRETTY variable. Since you're post-processing with
splain, there's no sense in being able to enable() or disable() processing.
Output from splain is directed to STDOUT, unlike the pragma.
The following file is certain to trigger a few errors at both runtime and compiletime:
If you prefer to run your program first and look at its problem afterwards, do this:
- perl -w test.pl 2>test.out
- ./splain < test.out
Note that this is not in general possible in shells of more dubious heritage, as the theoretical
- (perl -w test.pl >/dev/tty) >& test.out
- ./splain < test.out
Because you just moved the existing stdout to somewhere else.
If you don't want to modify your source code, but still have on-the-fly warnings, do this:
- exec 3>&1; perl -w test.pl 2>&1 1>&3 3>&- | splain 1>&2 3>&-
Nifty, eh?
If you want to control warnings on the fly, do something like this.
Make sure you do the use first, or you won't be able to get
at the enable() or disable() methods.
- use diagnostics; # checks entire compilation phase
- print "\ntime for 1st bogus diags: SQUAWKINGS\n";
- print BOGUS1 'nada';
- print "done with 1st bogus\n";
- disable diagnostics; # only turns off runtime warnings
- print "\ntime for 2nd bogus: (squelched)\n";
- print BOGUS2 'nada';
- print "done with 2nd bogus\n";
- enable diagnostics; # turns back on runtime warnings
- print "\ntime for 3rd bogus: SQUAWKINGS\n";
- print BOGUS3 'nada';
- print "done with 3rd bogus\n";
- disable diagnostics;
- print "\ntime for 4th bogus: (squelched)\n";
- print BOGUS4 'nada';
- print "done with 4th bogus\n";
Diagnostic messages derive from the perldiag.pod file when available at runtime. Otherwise, they may be embedded in the file itself when the splain package is built. See the Makefile for details.
If an extant $SIG{__WARN__} handler is discovered, it will continue to be honored, but only after the diagnostics::splainthis() function (the module's $SIG{__WARN__} interceptor) has had its way with your warnings.
There is a $diagnostics::DEBUG variable you may set if you're desperately curious what sorts of things are being intercepted.
- BEGIN { $diagnostics::DEBUG = 1 }
Not being able to say "no diagnostics" is annoying, but may not be insurmountable.
The -pretty
directive is called too late to affect matters.
You have to do this instead, and before you load the module.
- BEGIN { $diagnostics::PRETTY = 1 }
I could start up faster by delaying compilation until it should be needed, but this gets a "panic: top_level" when using the pragma form in Perl 5.001e.
While it's true that this documentation is somewhat subserious, if you use a program named splain, you should expect a bit of whimsy.
Tom Christiansen <tchrist@mox.perl.com>, 25 June 1995.
strict - Perl pragma to restrict unsafe constructs
If no import list is supplied, all possible restrictions are assumed. (This is the safest mode to operate in, but is sometimes too strict for casual programming.) Currently, there are three possible things to be strict about: "subs", "vars", and "refs".
strict refs
This generates a runtime error if you use symbolic references (see perlref).
There is one exception to this rule:
- $bar = \&{'foo'};
- &$bar;
is allowed so that goto &$AUTOLOAD
would not break under stricture.
strict vars
This generates a compile-time error if you access a variable that was
neither explicitly declared (using any of my, our, state, or use
vars
) nor fully qualified. (Because this is to avoid variable suicide
problems and subtle dynamic scoping issues, a merely local variable isn't
good enough.) See my, our, state,
local, and vars.
The local() generated a compile-time error because you just touched a global name without fully qualifying it.
Because of their special use by sort(), the variables $a and $b are exempted from this check.
strict subs
This disables the poetry optimization, generating a compile-time error if
you try to use a bareword identifier that's not a subroutine, unless it
is a simple identifier (no colons) and that it appears in curly braces or
on the left hand side of the =>
symbol.
- use strict 'subs';
- $SIG{PIPE} = Plumber; # blows up
- $SIG{PIPE} = "Plumber"; # just fine: quoted string is always ok
- $SIG{PIPE} = \&Plumber; # preferred form
See Pragmatic Modules in perlmodlib.
strict 'subs'
, with Perl 5.6.1, erroneously permitted to use an unquoted
compound identifier (e.g. Foo::Bar
) as a hash key (before =>
or
inside curlies), but without forcing it always to a literal string.
Starting with Perl 5.8.1 strict is strict about its restrictions: if unknown restrictions are used, the strict pragma will abort with
- Unknown 'strict' tag(s) '...'
As of version 1.04 (Perl 5.10), strict verifies that it is used as "strict" to avoid the dreaded Strict trap on case insensitive file systems.
subs - Perl pragma to predeclare sub names
- use subs qw(frob);
- frob 3..10;
This will predeclare all the subroutine whose names are in the list, allowing you to use them without parentheses even before they're declared.
Unlike pragmas that affect the $^H
hints variable, the use vars
and
use subs
declarations are not BLOCK-scoped. They are thus effective
for the entire package in which they appear. You may not rescind such
declarations with no vars
or no subs
.
See Pragmatic Modules in perlmodlib and strict subs in strict.
threads - Perl interpreter-based threads
This document describes threads version 1.86
- use threads ('yield',
- 'stack_size' => 64*4096,
- 'exit' => 'threads_only',
- 'stringify');
- sub start_thread {
- my @args = @_;
- print('Thread started: ', join(' ', @args), "\n");
- }
- my $thr = threads->create('start_thread', 'argument');
- $thr->join();
- threads->create(sub { print("I am a thread\n"); })->join();
- my $thr2 = async { foreach (@files) { ... } };
- $thr2->join();
- if (my $err = $thr2->error()) {
- warn("Thread error: $err\n");
- }
- # Invoke thread in list context (implicit) so it can return a list
- my ($thr) = threads->create(sub { return (qw/a b c/); });
- # or specify list context explicitly
- my $thr = threads->create({'context' => 'list'},
- sub { return (qw/a b c/); });
- my @results = $thr->join();
- $thr->detach();
- # Get a thread's object
- $thr = threads->self();
- $thr = threads->object($tid);
- # Get a thread's ID
- $tid = threads->tid();
- $tid = $thr->tid();
- $tid = "$thr";
- # Give other threads a chance to run
- threads->yield();
- yield();
- # Lists of non-detached threads
- my @threads = threads->list();
- my $thread_count = threads->list();
- my @running = threads->list(threads::running);
- my @joinable = threads->list(threads::joinable);
- # Test thread objects
- if ($thr1 == $thr2) {
- ...
- }
- # Manage thread stack size
- $stack_size = threads->get_stack_size();
- $old_size = threads->set_stack_size(32*4096);
- # Create a thread with a specific context and stack size
- my $thr = threads->create({ 'context' => 'list',
- 'stack_size' => 32*4096,
- 'exit' => 'thread_only' },
- \&foo);
- # Get thread's context
- my $wantarray = $thr->wantarray();
- # Check thread's state
- if ($thr->is_running()) {
- sleep(1);
- }
- if ($thr->is_joinable()) {
- $thr->join();
- }
- # Send a signal to a thread
- $thr->kill('SIGUSR1');
- # Exit a thread
- threads->exit();
Since Perl 5.8, thread programming has been available using a model called interpreter threads which provides a new Perl interpreter for each thread, and, by default, results in no data or state information being shared between threads.
(Prior to Perl 5.8, 5005threads was available through the Thread.pm
API.
This threading model has been deprecated, and was removed as of Perl 5.10.0.)
As just mentioned, all variables are, by default, thread local. To use shared variables, you need to also load threads::shared:
When loading threads::shared, you must use threads
before you
use threads::shared
. (threads
will emit a warning if you do it the
other way around.)
It is strongly recommended that you enable threads via use threads
as early
as possible in your script.
If needed, scripts can be written so as to run on both threaded and non-threaded Perls:
This will create a new thread that will begin execution with the specified
entry point function, and give it the ARGS list as parameters. It will
return the corresponding threads object, or undef if thread creation failed.
FUNCTION may either be the name of a function, an anonymous subroutine, or a code ref.
The ->new()
method is an alias for ->create()
.
This will wait for the corresponding thread to complete its execution. When
the thread finishes, ->join()
will return the return value(s) of the
entry point function.
The context (void, scalar or list) for the return value(s) for ->join()
is determined at the time of thread creation.
- # Create thread in list context (implicit)
- my ($thr1) = threads->create(sub {
- my @results = qw(a b c);
- return (@results);
- });
- # or (explicit)
- my $thr1 = threads->create({'context' => 'list'},
- sub {
- my @results = qw(a b c);
- return (@results);
- });
- # Retrieve list results from thread
- my @res1 = $thr1->join();
- # Create thread in scalar context (implicit)
- my $thr2 = threads->create(sub {
- my $result = 42;
- return ($result);
- });
- # Retrieve scalar result from thread
- my $res2 = $thr2->join();
- # Create a thread in void context (explicit)
- my $thr3 = threads->create({'void' => 1},
- sub { print("Hello, world\n"); });
- # Join the thread in void context (i.e., no return value)
- $thr3->join();
See THREAD CONTEXT for more details.
If the program exits without all threads having either been joined or detached, then a warning will be issued.
Calling ->join()
or ->detach()
on an already joined thread will
cause an error to be thrown.
Makes the thread unjoinable, and causes any eventual return value to be discarded. When the program exits, any detached threads that are still running are silently terminated.
If the program exits without all threads having either been joined or detached, then a warning will be issued.
Calling ->join()
or ->detach()
on an already detached thread
will cause an error to be thrown.
Class method that allows a thread to detach itself.
Class method that allows a thread to obtain its own threads object.
Returns the ID of the thread. Thread IDs are unique integers with the main thread in a program being 0, and incrementing by 1 for every thread created.
Class method that allows a thread to obtain its own ID.
If you add the stringify
import option to your use threads
declaration,
then using a threads object in a string or a string context (e.g., as a hash
key) will cause its ID to be used as the value:
This will return the threads object for the active thread associated
with the specified thread ID. If $tid
is the value for the current thread,
then this call works the same as ->self()
. Otherwise, returns undef
if there is no thread associated with the TID, if the thread is joined or
detached, if no TID is specified or if the specified TID is undef.
This is a suggestion to the OS to let this thread yield CPU time to other threads. What actually happens is highly dependent upon the underlying thread implementation.
You may do use threads qw(yield)
, and then just use yield()
in your
code.
With no arguments (or using threads::all
) and in a list context, returns a
list of all non-joined, non-detached threads objects. In a scalar context,
returns a count of the same.
With a true argument (using threads::running
), returns a list of all
non-joined, non-detached threads objects that are still running.
With a false argument (using threads::joinable
), returns a list of all
non-joined, non-detached threads objects that have finished running (i.e.,
for which ->join()
will not block).
Tests if two threads objects are the same thread or not. This is overloaded to the more natural forms:
(Thread comparison is based on thread IDs.)
async
creates a thread to execute the block immediately following
it. This block is treated as an anonymous subroutine, and so must have a
semicolon after the closing brace. Like threads->create()
, async
returns a threads object.
Threads are executed in an eval context. This method will return undef
if the thread terminates normally. Otherwise, it returns the value of
$@
associated with the thread's execution status in its eval context.
This private method returns the memory location of the internal thread
structure associated with a threads object. For Win32, this is a pointer to
the HANDLE
value returned by CreateThread
(i.e., HANDLE *
); for other
platforms, it is a pointer to the pthread_t
structure used in the
pthread_create
call (i.e., pthread_t *
).
This method is of no use for general Perl threads programming. Its intent is to provide other (XS-based) thread modules with the capability to access, and possibly manipulate, the underlying thread structure associated with a Perl thread.
Class method that allows a thread to obtain its own handle.
The usual method for terminating a thread is to return EXPR from the entry point function with the appropriate return value(s).
If needed, a thread can be exited at any time by calling
threads->exit()
. This will cause the thread to return undef in a
scalar context, or the empty list in a list context.
When called from the main thread, this behaves the same as exit(0).
When called from a thread, this behaves like threads->exit()
(i.e., the
exit status code is ignored).
When called from the main thread, this behaves the same as exit(status).
Calling die() in a thread indicates an abnormal exit for the thread. Any
$SIG{__DIE__}
handler in the thread will be called first, and then the
thread will exit with a warning message that will contain any arguments passed
in the die() call.
Calling exit EXPR inside a thread causes the whole
application to terminate. Because of this, the use of exit() inside
threaded code, or in modules that might be used in threaded applications, is
strongly discouraged.
If exit() really is needed, then consider using the following:
- threads->exit() if threads->can('exit'); # Thread friendly
- exit(status);
This globally overrides the default behavior of calling exit() inside a
thread, and effectively causes such calls to behave the same as
threads->exit()
. In other words, with this setting, calling exit()
causes only the thread to terminate.
Because of its global effect, this setting should not be used inside modules or the like.
The main thread is unaffected by this setting.
This overrides the default behavior of exit() inside the newly created
thread only.
This can be used to change the exit thread only behavior for a thread after
it has been created. With a true argument, exit() will cause only the
thread to exit. With a false argument, exit() will terminate the
application.
The main thread is unaffected by this call.
Class method for use inside a thread to change its own behavior for exit().
The main thread is unaffected by this call.
The following boolean methods are useful in determining the state of a thread.
Returns true if a thread is still running (i.e., if its entry point function has not yet finished or exited).
Returns true if the thread has finished running, is not detached and has not
yet been joined. In other words, the thread is ready to be joined, and a call
to $thr->join()
will not block.
Returns true if the thread has been detached.
Class method that allows a thread to determine whether or not it is detached.
As with subroutines, the type of value returned from a thread's entry point
function may be determined by the thread's context: list, scalar or void.
The thread's context is determined at thread creation. This is necessary so
that the context is available to the entry point function via
wantarray. The thread may then specify a value of
the appropriate type to be returned from ->join()
.
Because thread creation and thread joining may occur in different contexts, it
may be desirable to state the context explicitly to the thread's entry point
function. This may be done by calling ->create()
with a hash reference
as the first argument:
In the above, the threads object is returned to the parent thread in scalar
context, and the thread's entry point function foo
will be called in list
(array) context such that the parent thread can receive a list (array) from
the ->join()
call. ('array'
is synonymous with 'list'
.)
Similarly, if you need the threads object, but your thread will not be returning a value (i.e., void context), you would do the following:
- my $thr = threads->create({'context' => 'void'}, \&foo);
- ...
- $thr->join();
The context type may also be used as the key in the hash reference followed by a true value:
If not explicitly stated, the thread's context is implied from the context
of the ->create()
call:
This returns the thread's context in the same manner as wantarray.
Class method to return the current thread's context. This returns the same value as running wantarray inside the current thread's entry point function.
The default per-thread stack size for different platforms varies significantly, and is almost always far more than is needed for most applications. On Win32, Perl's makefile explicitly sets the default stack to 16 MB; on most other platforms, the system default is used, which again may be much larger than is needed.
By tuning the stack size to more accurately reflect your application's needs, you may significantly reduce your application's memory usage, and increase the number of simultaneously running threads.
Note that on Windows, address space allocation granularity is 64 KB, therefore, setting the stack smaller than that on Win32 Perl will not save any more memory.
Returns the current default per-thread stack size. The default is zero, which means the system default stack size is currently in use.
Returns the stack size for a particular thread. A return value of zero indicates the system default stack size was used for the thread.
Sets a new default per-thread stack size, and returns the previous setting.
Some platforms have a minimum thread stack size. Trying to set the stack size below this value will result in a warning, and the minimum stack size will be used.
Some Linux platforms have a maximum stack size. Setting too large of a stack size will cause thread creation to fail.
If needed, $new_size
will be rounded up to the next multiple of the memory
page size (usually 4096 or 8192).
Threads created after the stack size is set will then either call
pthread_attr_setstacksize()
(for pthreads platforms), or supply the
stack size to CreateThread()
(for Win32 Perl).
(Obviously, this call does not affect any currently extant threads.)
This sets the default per-thread stack size at the start of the application.
The default per-thread stack size may be set at the start of the application
through the use of the environment variable PERL5_ITHREADS_STACK_SIZE
:
- PERL5_ITHREADS_STACK_SIZE=1048576
- export PERL5_ITHREADS_STACK_SIZE
- perl -e'use threads; print(threads->get_stack_size(), "\n")'
This value overrides any stack_size
parameter given to use threads
. Its
primary purpose is to permit setting the per-thread stack size for legacy
threaded applications.
To specify a particular stack size for any individual thread, call
->create()
with a hash reference as the first argument:
- my $thr = threads->create({'stack_size' => 32*4096}, \&foo, @args);
This creates a new thread ($thr2
) that inherits the stack size from an
existing thread ($thr1
). This is shorthand for the following:
When safe signals is in effect (the default behavior - see Unsafe signals for more details), then signals may be sent and acted upon by individual threads.
Sends the specified signal to the thread. Signal names and (positive) signal
numbers are the same as those supported by
kill SIGNAL, LIST. For example, 'SIGTERM', 'TERM' and
(depending on the OS) 15 are all valid arguments to ->kill()
.
Returns the thread object to allow for method chaining:
- $thr->kill('SIG...')->join();
Signal handlers need to be set up in the threads for the signals they are expected to act upon. Here's an example for cancelling a thread:
- use threads;
- sub thr_func
- {
- # Thread 'cancellation' signal handler
- $SIG{'KILL'} = sub { threads->exit(); };
- ...
- }
- # Create a thread
- my $thr = threads->create('thr_func');
- ...
- # Signal the thread to terminate, and then detach
- # it so that it will get cleaned up automatically
- $thr->kill('KILL')->detach();
Here's another simplistic example that illustrates the use of thread signalling in conjunction with a semaphore to provide rudimentary suspend and resume capabilities:
- use threads;
- use Thread::Semaphore;
- sub thr_func
- {
- my $sema = shift;
- # Thread 'suspend/resume' signal handler
- $SIG{'STOP'} = sub {
- $sema->down(); # Thread suspended
- $sema->up(); # Thread resumes
- };
- ...
- }
- # Create a semaphore and pass it to a thread
- my $sema = Thread::Semaphore->new();
- my $thr = threads->create('thr_func', $sema);
- # Suspend the thread
- $sema->down();
- $thr->kill('STOP');
- ...
- # Allow the thread to continue
- $sema->up();
CAVEAT: The thread signalling capability provided by this module does not
actually send signals via the OS. It emulates signals at the Perl-level
such that signal handlers are called in the appropriate thread. For example,
sending $thr->kill('STOP')
does not actually suspend a thread (or the
whole process), but does cause a $SIG{'STOP'}
handler to be called in that
thread (as illustrated above).
As such, signals that would normally not be appropriate to use in the
kill() command (e.g., kill('KILL', $$)
) are okay to use with the
->kill()
method (again, as illustrated above).
Correspondingly, sending a signal to a thread does not disrupt the operation the thread is currently working on: The signal will be acted upon after the current operation has completed. For instance, if the thread is stuck on an I/O call, sending it a signal will not cause the I/O call to be interrupted such that the signal is acted up immediately.
Sending a signal to a terminated thread is ignored.
If the program exits without all threads having either been joined or detached, then this warning will be issued.
NOTE: If the main thread exits, then this warning cannot be suppressed
using no warnings 'threads';
as suggested below.
See the appropriate man page for pthread_create
to determine the actual
cause for the failure.
A thread terminated in some manner other than just returning from its entry
point function, or by using threads->exit()
. For example, the thread
may have terminated because of an error, or by using die.
Some platforms have a minimum thread stack size. Trying to set the stack size below this value will result in the above warning, and the stack size will be set to the minimum.
The specified SIZE exceeds the system's maximum stack size. Use a smaller value for the stack size.
If needed, thread warnings can be suppressed by using:
- no warnings 'threads';
in the appropriate scope.
The particular copy of Perl that you're trying to use was not built using the
useithreads
configuration option.
Having threads support requires all of Perl and all of the XS modules in the Perl installation to be rebuilt; it is not just a question of adding the threads module (i.e., threaded and non-threaded Perls are binary incompatible.)
The stack size of currently extant threads cannot be changed, therefore, the following results in the above error:
- $thr->set_stack_size($size);
Safe signals must be in effect to use the ->kill()
signalling method.
See Unsafe signals for more details.
The particular copy of Perl that you're trying to use does not support the
specified signal being used in a ->kill()
call.
Before you consider posting a bug report, please consult, and possibly post a message to the discussion forum to see if what you've encountered is a known problem.
See Making your module threadsafe in perlmod when creating modules that may be used in threaded applications, especially if those modules use non-Perl data, or XS code.
Unfortunately, you may encounter Perl modules that are not thread-safe. For example, they may crash the Perl interpreter during execution, or may dump core on termination. Depending on the module and the requirements of your application, it may be possible to work around such difficulties.
If the module will only be used inside a thread, you can try loading the
module from inside the thread entry point function using require (and
import if needed):
- sub thr_func
- {
- require Unsafe::Module
- # Unsafe::Module->import(...);
- ....
- }
If the module is needed inside the main thread, try modifying your
application so that the module is loaded (again using require and
->import()
) after any threads are started, and in such a way that no
other threads are started afterwards.
If the above does not work, or is not adequate for your application, then file a bug report on http://rt.cpan.org/Public/ against the problematic module.
On most systems, frequent and continual creation and destruction of threads
can lead to ever-increasing growth in the memory footprint of the Perl
interpreter. While it is simple to just launch threads and then
->join()
or ->detach()
them, for long-lived applications, it is
better to maintain a pool of threads, and to reuse them for the work needed,
using queues to notify threads of pending work. The CPAN
distribution of this module contains a simple example
(examples/pool_reuse.pl) illustrating the creation, use and monitoring of a
pool of reusable threads.
On all platforms except MSWin32, the setting for the current working directory
is shared among all threads such that changing it in one thread (e.g., using
chdir()) will affect all the threads in the application.
On MSWin32, each thread maintains its own the current working directory setting.
Currently, on all platforms except MSWin32, all system calls (e.g., using
system() or back-ticks) made from threads use the environment variable
settings from the main thread. In other words, changes made to %ENV
in
a thread will not be visible in system calls made by that thread.
To work around this, set environment variables as part of the system call. For example:
On MSWin32, each thread maintains its own set of environment variables.
Signals are caught by the main thread (thread ID = 0) of a script. Therefore, setting up signal handlers in threads for purposes other than THREAD SIGNALLING as documented above will not accomplish what is intended.
This is especially true if trying to catch SIGALRM
in a thread. To handle
alarms in threads, set up a signal handler in the main thread, and then use
THREAD SIGNALLING to relay the signal to the thread:
- # Create thread with a task that may time out
- my $thr->create(sub {
- threads->yield();
- eval {
- $SIG{ALRM} = sub { die("Timeout\n"); };
- alarm(10);
- ... # Do work here
- alarm(0);
- };
- if ($@ =~ /Timeout/) {
- warn("Task in thread timed out\n");
- }
- };
- # Set signal handler to relay SIGALRM to thread
- $SIG{ALRM} = sub { $thr->kill('ALRM') };
- ... # Main thread continues working
On some platforms, it might not be possible to destroy parent threads while there are still existing child threads.
Creating threads inside BEGIN
, CHECK
or INIT
blocks should not be
relied upon. Depending on the Perl version and the application code, results
may range from success, to (apparently harmless) warnings of leaked scalar, or
all the way up to crashing of the Perl interpreter.
Since Perl 5.8.0, signals have been made safer in Perl by postponing their handling until the interpreter is in a safe state. See Safe Signals in perl58delta and Deferred Signals (Safe Signals) in perlipc for more details.
Safe signals is the default behavior, and the old, immediate, unsafe signalling behavior is only in effect in the following situations:
PERL_OLD_SIGNALS
(see perl -V
).
PERL_SIGNALS
is set to unsafe
(see PERL_SIGNALS in perlrun).
If unsafe signals is in effect, then signal handling is not thread-safe, and
the ->kill()
signalling method cannot be used.
Returning closures from threads should not be relied upon. Depending of the Perl version and the application code, results may range from success, to (apparently harmless) warnings of leaked scalar, or all the way up to crashing of the Perl interpreter.
Returning objects from threads does not work. Depending on the classes involved, you may be able to work around this by returning a serialized version of the object (e.g., using Data::Dumper or Storable), and then reconstituting it in the joining thread. If you're using Perl 5.10.0 or later, and if the class supports shared objects, you can pass them via shared queues.
It is possible to add END blocks to threads by using require VERSION or
eval EXPR with the appropriate code. These END
blocks
will then be executed when the thread's interpreter is destroyed (i.e., either
during a ->join()
call, or at program termination).
However, calling any threads methods in such an END
block will most
likely fail (e.g., the application may hang, or generate an error) due to
mutexes that are needed to control functionality within the threads module.
For this reason, the use of END
blocks in threads is strongly
discouraged.
In perl 5.14 and higher, on systems other than Windows that do
not support the fchdir
C function, directory handles (see
opendir DIRHANDLE,EXPR) will not be copied to new
threads. You can use the d_fchdir
variable in Config.pm to
determine whether your system supports it.
In prior perl versions, spawning threads with open directory handles would crash the interpreter. [perl #75154]
Support for threads extends beyond the code in this module (i.e., threads.pm and threads.xs), and into the Perl interpreter itself. Older versions of Perl contain bugs that may manifest themselves despite using the latest version of threads from CPAN. There is no workaround for this other than upgrading to the latest version of Perl.
Even with the latest version of Perl, it is known that certain constructs with threads may result in warning messages concerning leaked scalars or unreferenced scalars. However, such warnings are harmless, and may safely be ignored.
You can search for threads related bug reports at http://rt.cpan.org/Public/. If needed submit any new bugs, problems, patches, etc. to: http://rt.cpan.org/Public/Dist/Display.html?Name=threads
Perl 5.8.0 or later
threads Discussion Forum on CPAN: http://www.cpanforum.com/dist/threads
http://www.perl.com/pub/a/2002/06/11/threads.html and http://www.perl.com/pub/a/2002/09/04/threads.html
Perl threads mailing list: http://lists.perl.org/list/ithreads.html
Stack size discussion: http://www.perlmonks.org/?node_id=532956
Artur Bergman <sky AT crucially DOT net>
CPAN version produced by Jerry D. Hedden <jdhedden AT cpan DOT org>
threads is released under the same license as Perl.
Richard Soderberg <perl AT crystalflame DOT net> - Helping me out tons, trying to find reasons for races and other weird bugs!
Simon Cozens <simon AT brecon DOT co DOT uk> - Being there to answer zillions of annoying questions
Rocco Caputo <troc AT netrus DOT net>
Vipul Ved Prakash <mail AT vipul DOT net> - Helping with debugging
Dean Arnold <darnold AT presicient DOT com> - Stack size API
utf8 - Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code
- use utf8;
- no utf8;
- # Convert the internal representation of a Perl scalar to/from UTF-8.
- $num_octets = utf8::upgrade($string);
- $success = utf8::downgrade($string[, FAIL_OK]);
- # Change each character of a Perl scalar to/from a series of
- # characters that represent the UTF-8 bytes of each original character.
- utf8::encode($string); # "\x{100}" becomes "\xc4\x80"
- utf8::decode($string); # "\xc4\x80" becomes "\x{100}"
- $flag = utf8::is_utf8(STRING); # since Perl 5.8.1
- $flag = utf8::valid(STRING);
The use utf8
pragma tells the Perl parser to allow UTF-8 in the
program text in the current lexical scope (allow UTF-EBCDIC on EBCDIC based
platforms). The no utf8
pragma tells Perl to switch back to treating
the source text as literal bytes in the current lexical scope.
Do not use this pragma for anything else than telling Perl that your
script is written in UTF-8. The utility functions described below are
directly usable without use utf8;
.
Because it is not possible to reliably tell UTF-8 from native 8 bit
encodings, you need either a Byte Order Mark at the beginning of your
source code, or use utf8;
, to instruct perl.
When UTF-8 becomes the standard source format, this pragma will effectively become a no-op. For convenience in what follows the term UTF-X is used to refer to UTF-8 on ASCII and ISO Latin based platforms and UTF-EBCDIC on EBCDIC based platforms.
See also the effects of the -C
switch and its cousin, the
$ENV{PERL_UNICODE}
, in perlrun.
Enabling the utf8
pragma has the following effect:
Bytes in the source text that have their high-bit set will be treated as being part of a literal UTF-X sequence. This includes most literals such as identifier names, string constants, and constant regular expression patterns.
On EBCDIC platforms characters in the Latin 1 character set are treated as being part of a literal UTF-EBCDIC character.
Note that if you have bytes with the eighth bit on in your script
(for example embedded Latin-1 in your string literals), use utf8
will be unhappy since the bytes are most probably not well-formed
UTF-X. If you want to have such bytes under use utf8
, you can disable
this pragma until the end the block (or file, if at top level) by
no utf8;
.
The following functions are defined in the utf8::
package by the
Perl core. You do not need to say use utf8
to use these and in fact
you should not say that unless you really want to have UTF-8 source code.
Converts in-place the internal representation of the string from an octet
sequence in the native encoding (Latin-1 or EBCDIC) to UTF-X. The
logical character sequence itself is unchanged. If $string is already
stored as UTF-X, then this is a no-op. Returns the
number of octets necessary to represent the string as UTF-X. Can be
used to make sure that the UTF-8 flag is on, so that \w
or lc()
work as Unicode on strings containing characters in the range 0x80-0xFF
(on ASCII and derivatives).
Note that this function does not handle arbitrary encodings. Therefore Encode is recommended for the general purposes; see also Encode.
Converts in-place the internal representation of the string from UTF-X to the equivalent octet sequence in the native encoding (Latin-1 or EBCDIC). The logical character sequence itself is unchanged. If $string is already stored as native 8 bit, then this is a no-op. Can be used to make sure that the UTF-8 flag is off, e.g. when you want to make sure that the substr() or length() function works with the usually faster byte algorithm.
Fails if the original UTF-X sequence cannot be represented in the
native 8 bit encoding. On failure dies or, if the value of FAIL_OK
is
true, returns false.
Returns true on success.
Note that this function does not handle arbitrary encodings. Therefore Encode is recommended for the general purposes; see also Encode.
Converts in-place the character sequence to the corresponding octet sequence in UTF-X. That is, every (possibly wide) character gets replaced with a sequence of one or more characters that represent the individual UTF-X bytes of the character. The UTF8 flag is turned off. Returns nothing.
- my $a = "\x{100}"; # $a contains one character, with ord 0x100
- utf8::encode($a); # $a contains two characters, with ords 0xc4 and 0x80
Note that this function does not handle arbitrary encodings. Therefore Encode is recommended for the general purposes; see also Encode.
Attempts to convert in-place the octet sequence in UTF-X to the corresponding character sequence. That is, it replaces each sequence of characters in the string whose ords represent a valid UTF-X byte sequence, with the corresponding single character. The UTF-8 flag is turned on only if the source string contains multiple-byte UTF-X characters. If $string is invalid as UTF-X, returns false; otherwise returns true.
- my $a = "\xc4\x80"; # $a contains two characters, with ords 0xc4 and 0x80
- utf8::decode($a); # $a contains one character, with ord 0x100
Note that this function does not handle arbitrary encodings. Therefore Encode is recommended for the general purposes; see also Encode.
(Since Perl 5.8.1) Test whether STRING is encoded internally in UTF-8. Functionally the same as Encode::is_utf8().
[INTERNAL] Test whether STRING is in a consistent state regarding UTF-8. Will return true if it is well-formed UTF-8 and has the UTF-8 flag on or if STRING is held as bytes (both these states are 'consistent'). Main reason for this routine is to allow Perl's testsuite to check that operations have left strings in a consistent state. You most probably want to use utf8::is_utf8() instead.
utf8::encode
is like utf8::upgrade
, but the UTF8 flag is
cleared. See perlunicode for more on the UTF8 flag and the C API
functions sv_utf8_upgrade
, sv_utf8_downgrade
, sv_utf8_encode
,
and sv_utf8_decode
, which are wrapped by the Perl functions
utf8::upgrade
, utf8::downgrade
, utf8::encode
and
utf8::decode
. Also, the functions utf8::is_utf8, utf8::valid,
utf8::encode, utf8::decode, utf8::upgrade, and utf8::downgrade are
actually internal, and thus always available, without a require utf8
statement.
One can have Unicode in identifier names, but not in package/class or subroutine names. While some limited functionality towards this does exist as of Perl 5.8.0, that is more accidental than designed; use of Unicode for the said purposes is unsupported.
One reason of this unfinishedness is its (currently) inherent unportability: since both package names and subroutine names may need to be mapped to file and directory names, the Unicode capability of the filesystem becomes important-- and there unfortunately aren't portable answers.
vars - Perl pragma to predeclare global variable names
- use vars qw($frob @mung %seen);
NOTE: For use with variables in the current package for a single scope, the
functionality provided by this pragma has been superseded by our
declarations, available in Perl v5.6.0 or later, and use of this pragma is
discouraged. See our.
This will predeclare all the variables whose names are in the list, allowing you to use them under "use strict", and disabling any typo warnings.
Unlike pragmas that affect the $^H
hints variable, the use vars
and
use subs
declarations are not BLOCK-scoped. They are thus effective
for the entire file in which they appear. You may not rescind such
declarations with no vars
or no subs
.
Packages such as the AutoLoader and SelfLoader that delay
loading of subroutines within packages can create problems with
package lexicals defined using my(). While the vars pragma
cannot duplicate the effect of package lexicals (total transparency
outside of the package), it can act as an acceptable substitute by
pre-declaring global symbols, ensuring their availability to the
later-loaded routines.
See Pragmatic Modules in perlmodlib.
vmsish - Perl pragma to control VMS-specific language features
If no import list is supplied, all possible VMS-specific features are assumed. Currently, there are four VMS-specific features available: 'status' (a.k.a '$?'), 'exit', 'time' and 'hushed'.
If you're not running VMS, this module does nothing.
vmsish status
This makes $?
and system return the native VMS exit status
instead of emulating the POSIX exit status.
vmsish exit
This makes exit 1
produce a successful exit (with status SS$_NORMAL),
instead of emulating UNIX exit(), which considers exit 1
to indicate
an error. As with the CRTL's exit() function, exit 0
is also mapped
to an exit status of SS$_NORMAL, and any other argument to exit() is
used directly as Perl's exit status.
vmsish time
This makes all times relative to the local time zone, instead of the default of Universal Time (a.k.a Greenwich Mean Time, or GMT).
vmsish hushed
This suppresses printing of VMS status messages to SYS$OUTPUT and SYS$ERROR if Perl terminates with an error status, and allows programs that are expecting "unix-style" Perl to avoid having to parse VMS error messages. It does not suppress any messages from Perl itself, just the messages generated by DCL after Perl exits. The DCL symbol $STATUS will still have the termination status, but with a high-order bit set:
EXAMPLE: $ perl -e"exit 44;" Non-hushed error exit %SYSTEM-F-ABORT, abort DCL message $ show sym $STATUS $STATUS == "%X0000002C"
- $ perl -e"use vmsish qw(hushed); exit 44;" Hushed error exit
- $ show sym $STATUS
- $STATUS == "%X1000002C"
The 'hushed' flag has a global scope during compilation: the exit() or die() commands that are compiled after 'vmsish hushed' will be hushed when they are executed. Doing a "no vmsish 'hushed'" turns off the hushed flag.
The status of the hushed flag also affects output of VMS error messages from compilation errors. Again, you still get the Perl error message (and the code in $STATUS)
EXAMPLE: use vmsish 'hushed'; # turn on hushed flag use Carp; # Carp compiled hushed exit 44; # will be hushed croak('I die'); # will be hushed no vmsish 'hushed'; # turn off hushed flag exit 44; # will not be hushed croak('I die2'): # WILL be hushed, croak was compiled hushed
You can also control the 'hushed' flag at run-time, using the built-in routine vmsish::hushed(). Without argument, it returns the hushed status. Since vmsish::hushed is built-in, you do not need to "use vmsish" to call it.
EXAMPLE: if ($quiet_exit) { vmsish::hushed(1); } print "Sssshhhh...I'm hushed...\n" if vmsish::hushed(); exit 44;
Note that an exit() or die() that is compiled 'hushed' because of "use vmsish" is not un-hushed by calling vmsish::hushed(0) at runtime.
The messages from error exits from inside the Perl core are generally more serious, and are not suppressed.
warnings - Perl pragma to control optional warnings
- use warnings;
- no warnings;
- use warnings "all";
- no warnings "all";
- use warnings::register;
- if (warnings::enabled()) {
- warnings::warn("some warning");
- }
- if (warnings::enabled("void")) {
- warnings::warn("void", "some warning");
- }
- if (warnings::enabled($object)) {
- warnings::warn($object, "some warning");
- }
- warnings::warnif("some warning");
- warnings::warnif("void", "some warning");
- warnings::warnif($object, "some warning");
The warnings
pragma is a replacement for the command line flag -w
,
but the pragma is limited to the enclosing block, while the flag is global.
See perllexwarn for more information and the list of built-in warning
categories.
If no import list is supplied, all possible warnings are either enabled or disabled.
A number of functions are provided to assist module authors.
Creates a new warnings category with the same name as the package where the call to the pragma is used.
Use the warnings category with the same name as the current package.
Return TRUE if that warnings category is enabled in the calling module. Otherwise returns FALSE.
Return TRUE if the warnings category, $category
, is enabled in the
calling module.
Otherwise returns FALSE.
Use the name of the class for the object reference, $object
, as the
warnings category.
Return TRUE if that warnings category is enabled in the first scope where the object is used. Otherwise returns FALSE.
Return TRUE if the warnings category with the same name as the current package has been set to FATAL in the calling module. Otherwise returns FALSE.
Return TRUE if the warnings category $category
has been set to FATAL in
the calling module.
Otherwise returns FALSE.
Use the name of the class for the object reference, $object
, as the
warnings category.
Return TRUE if that warnings category has been set to FATAL in the first scope where the object is used. Otherwise returns FALSE.
Print $message
to STDERR.
Use the warnings category with the same name as the current package.
If that warnings category has been set to "FATAL" in the calling module then die. Otherwise return.
Print $message
to STDERR.
If the warnings category, $category
, has been set to "FATAL" in the
calling module then die. Otherwise return.
Print $message
to STDERR.
Use the name of the class for the object reference, $object
, as the
warnings category.
If that warnings category has been set to "FATAL" in the scope where $object
is first used then die. Otherwise return.
Equivalent to:
- if (warnings::enabled())
- { warnings::warn($message) }
Equivalent to:
- if (warnings::enabled($category))
- { warnings::warn($category, $message) }
Equivalent to:
- if (warnings::enabled($object))
- { warnings::warn($object, $message) }
This registers warning categories for the given names and is primarily for use by the warnings::register pragma, for which see perllexwarn.
See Pragmatic Modules in perlmodlib and perllexwarn.
xsubpp - compiler to convert Perl XS code into C code
xsubpp [-v] [-except] [-s pattern] [-prototypes] [-noversioncheck] [-nolinenumbers] [-nooptimize] [-typemap typemap] [-output filename]... file.xs
This compiler is typically run by the makefiles created by ExtUtils::MakeMaker or by Module::Build or other Perl module build tools.
xsubpp will compile XS code into C code by embedding the constructs necessary to let C functions manipulate Perl values and creates the glue necessary to let Perl access those functions. The compiler uses typemaps to determine how to map C function parameters and variables to Perl values.
The compiler will search for typemap files called typemap. It will use the following search path to find default typemaps, with the rightmost typemap taking precedence.
- ../../../typemap:../../typemap:../typemap:typemap
It will also use a default typemap installed as ExtUtils::typemap
.
Note that the XSOPT
MakeMaker option may be used to add these options to
any makefiles generated by MakeMaker.
Retains '::' in type names so that C++ hierarchical types can be mapped.
Adds exception handling stubs to the C code.
Indicates that a user-supplied typemap should take precedence over the default typemaps. This option may be used multiple times, with the last typemap having the highest precedence.
Specifies the name of the output file to generate. If no file is specified, output will be written to standard output.
Prints the xsubpp version number to standard output, then exits.
By default xsubpp will not automatically generate prototype code for all xsubs. This flag will enable prototypes.
Disables the run time test that determines if the object file (derived
from the .xs file) and the .pm files have the same version
number.
Prevents the inclusion of '#line' directives in the output.
Disables certain optimizations. The only optimization that is currently affected is the use of targets by the output C code (see perlguts). This may significantly slow down the generated code, but this is the way xsubpp of 5.005 and earlier operated.
Disable recognition of IN
, OUT_LIST
and INOUT_LIST
declarations.
Disable recognition of ANSI-like descriptions of function signature.
Currently doesn't do anything at all. This flag has been a no-op for many versions of perl, at least as far back as perl5.003_07. It's allowed here for backwards compatibility.
No environment variables are used.
Originally by Larry Wall. Turned into the ExtUtils::ParseXS
module
by Ken Williams.
See the file Changes.
perl(1), perlxs(1), perlxstut(1), ExtUtils::ParseXS
warnings::register - warnings import function
- use warnings::register;
Creates a warnings category with the same name as the current package.
See warnings and perllexwarn for more information on this module's usage.
threads::shared - Perl extension for sharing data structures between threads
This document describes threads::shared version 1.43
- use threads;
- use threads::shared;
- my $var :shared;
- my %hsh :shared;
- my @ary :shared;
- my ($scalar, @array, %hash);
- share($scalar);
- share(@array);
- share(%hash);
- $var = $scalar_value;
- $var = $shared_ref_value;
- $var = shared_clone($non_shared_ref_value);
- $var = shared_clone({'foo' => [qw/foo bar baz/]});
- $hsh{'foo'} = $scalar_value;
- $hsh{'bar'} = $shared_ref_value;
- $hsh{'baz'} = shared_clone($non_shared_ref_value);
- $hsh{'quz'} = shared_clone([1..3]);
- $ary[0] = $scalar_value;
- $ary[1] = $shared_ref_value;
- $ary[2] = shared_clone($non_shared_ref_value);
- $ary[3] = shared_clone([ {}, [] ]);
- { lock(%hash); ... }
- cond_wait($scalar);
- cond_timedwait($scalar, time() + 30);
- cond_broadcast(@array);
- cond_signal(%hash);
- my $lockvar :shared;
- # condition var != lock var
- cond_wait($var, $lockvar);
- cond_timedwait($var, time()+30, $lockvar);
By default, variables are private to each thread, and each newly created thread gets a private copy of each existing variable. This module allows you to share variables across different threads (and pseudo-forks on Win32). It is used together with the threads module.
This module supports the sharing of the following data types only: scalars and scalar refs, arrays and array refs, and hashes and hash refs.
The following functions are exported by this module: share
,
shared_clone
, is_shared
, cond_wait
, cond_timedwait
, cond_signal
and cond_broadcast
Note that if this module is imported when threads has not yet been loaded, then these functions all become no-ops. This makes it possible to write modules that will work in both threaded and non-threaded environments.
share
takes a variable and marks it as shared:
- my ($scalar, @array, %hash);
- share($scalar);
- share(@array);
- share(%hash);
share
will return the shared rvalue, but always as a reference.
Variables can also be marked as shared at compile time by using the
:shared
attribute:
- my ($var, %hash, @array) :shared;
Shared variables can only store scalars, refs of shared variables, or refs of shared data (discussed in next section):
- my ($var, %hash, @array) :shared;
- my $bork;
- # Storing scalars
- $var = 1;
- $hash{'foo'} = 'bar';
- $array[0] = 1.5;
- # Storing shared refs
- $var = \%hash;
- $hash{'ary'} = \@array;
- $array[1] = \$var;
- # The following are errors:
- # $var = \$bork; # ref of non-shared variable
- # $hash{'bork'} = []; # non-shared array ref
- # push(@array, { 'x' => 1 }); # non-shared hash ref
shared_clone
takes a reference, and returns a shared version of its
argument, performing a deep copy on any non-shared elements. Any shared
elements in the argument are used as is (i.e., they are not cloned).
- my $cpy = shared_clone({'foo' => [qw/foo bar baz/]});
Object status (i.e., the class an object is blessed into) is also cloned.
For cloning empty array or hash refs, the following may also be used:
- $var = &share([]); # Same as $var = shared_clone([]);
- $var = &share({}); # Same as $var = shared_clone({});
Not all Perl data types can be cloned (e.g., globs, code refs). By default,
shared_clone
will croak if it encounters such items. To change
this behaviour to a warning, then set the following:
- $threads::shared::clone_warn = 1;
In this case, undef will be substituted for the item to be cloned. If
set to zero:
- $threads::shared::clone_warn = 0;
then the undef substitution will be performed silently.
is_shared
checks if the specified variable is shared or not. If shared,
returns the variable's internal ID (similar to
refaddr()). Otherwise, returns undef.
When used on an element of an array or hash, is_shared
checks if the
specified element belongs to a shared array or hash. (It does not check
the contents of that element.)
lock places a advisory lock on a variable until the lock goes out of
scope. If the variable is locked by another thread, the lock call will
block until it's available. Multiple calls to lock by the same thread from
within dynamically nested scopes are safe -- the variable will remain locked
until the outermost lock on the variable goes out of scope.
lock follows references exactly one level:
Note that you cannot explicitly unlock a variable; you can only wait for the lock to go out of scope. This is most easily accomplished by locking the variable inside a block.
As locks are advisory, they do not prevent data access or modification by another thread that does not itself attempt to obtain a lock on the variable.
You cannot lock the individual elements of a container variable:
If you need more fine-grained control over shared variable access, see Thread::Semaphore.
The cond_wait
function takes a locked variable as a parameter, unlocks
the variable, and blocks until another thread does a cond_signal
or
cond_broadcast
for that same locked variable. The variable that
cond_wait
blocked on is re-locked after the cond_wait
is satisfied. If
there are multiple threads cond_wait
ing on the same variable, all but one
will re-block waiting to reacquire the lock on the variable. (So if you're only
using cond_wait
for synchronization, give up the lock as soon as possible).
The two actions of unlocking the variable and entering the blocked wait state
are atomic, the two actions of exiting from the blocked wait state and
re-locking the variable are not.
In its second form, cond_wait
takes a shared, unlocked variable followed
by a shared, locked variable. The second variable is unlocked and thread
execution suspended until another thread signals the first variable.
It is important to note that the variable can be notified even if no thread
cond_signal
or cond_broadcast
on the variable. It is therefore
important to check the value of the variable and go back to waiting if the
requirement is not fulfilled. For example, to pause until a shared counter
drops to zero:
- { lock($counter); cond_wait($counter) until $counter == 0; }
In its two-argument form, cond_timedwait
takes a locked variable and an
absolute timeout in epoch seconds (see time
for more) as parameters, unlocks the variable, and blocks until the
timeout is reached or another thread signals the variable. A false value is
returned if the timeout is reached, and a true value otherwise. In either
case, the variable is re-locked upon return.
Like cond_wait
, this function may take a shared, locked variable as an
additional parameter; in this case the first parameter is an unlocked
condition variable protected by a distinct lock variable.
Again like cond_wait
, waking up and reacquiring the lock are not atomic,
and you should always check your desired condition after this function
returns. Since the timeout is an absolute value, however, it does not have to
be recalculated with each pass:
The cond_signal
function takes a locked variable as a parameter and
unblocks one thread that's cond_wait
ing on that variable. If more than one
thread is blocked in a cond_wait
on that variable, only one (and which one
is indeterminate) will be unblocked.
If there are no threads blocked in a cond_wait
on the variable, the signal
is discarded. By always locking before signaling, you can (with care), avoid
signaling before another thread has entered cond_wait().
cond_signal
will normally generate a warning if you attempt to use it on an
unlocked variable. On the rare occasions where doing this may be sensible, you
can suppress the warning with:
- { no warnings 'threads'; cond_signal($foo); }
The cond_broadcast
function works similarly to cond_signal
.
cond_broadcast
, though, will unblock all the threads that are blocked in
a cond_wait
on the locked variable, rather than only one.
threads::shared exports a version of bless REF that works on shared objects such that blessings propagate across threads.
- # Create a shared 'Foo' object
- my $foo :shared = shared_clone({});
- bless($foo, 'Foo');
- # Create a shared 'Bar' object
- my $bar :shared = shared_clone({});
- bless($bar, 'Bar');
- # Put 'bar' inside 'foo'
- $foo->{'bar'} = $bar;
- # Rebless the objects via a thread
- threads->create(sub {
- # Rebless the outer object
- bless($foo, 'Yin');
- # Cannot directly rebless the inner object
- #bless($foo->{'bar'}, 'Yang');
- # Retrieve and rebless the inner object
- my $obj = $foo->{'bar'};
- bless($obj, 'Yang');
- $foo->{'bar'} = $obj;
- })->join();
- print(ref($foo), "\n"); # Prints 'Yin'
- print(ref($foo->{'bar'}), "\n"); # Prints 'Yang'
- print(ref($bar), "\n"); # Also prints 'Yang'
threads::shared is designed to disable itself silently if threads are not available. This allows you to write modules and packages that can be used in both threaded and non-threaded applications.
If you want access to threads, you must use threads
before you
use threads::shared
. threads will emit a warning if you use it after
threads::shared.
When share
is used on arrays, hashes, array refs or hash refs, any data
they contain will be lost.
Therefore, populate such variables after declaring them as shared. (Scalar and scalar refs are not affected by this problem.)
It is often not wise to share an object unless the class itself has been written to support sharing. For example, an object's destructor may get called multiple times, once for each thread's scope exit. Another danger is that the contents of hash-based objects will be lost due to the above mentioned limitation. See examples/class.pl (in the CPAN distribution of this module) for how to create a class that supports object sharing.
Destructors may not be called on objects if those objects still exist at global destruction time. If the destructors must be called, make sure there are no circular references and that nothing is referencing the objects, before the program ends.
Does not support splice on arrays. Does not support explicitly changing
array lengths via $#array -- use push and pop instead.
Taking references to the elements of shared arrays and hashes does not autovivify the elements, and neither does slicing a shared array/hash over non-existent indices/keys autovivify the elements.
share()
allows you to share($hashref->{key})
and
share($arrayref->[idx])
without giving any error message. But the
$hashref->{key}
or $arrayref->[idx]
is not shared, causing
the error "lock can only be used on shared values" to occur when you attempt
to lock($hashref->{key}) or lock($arrayref->[idx]) in another
thread.
Using refaddr()) is unreliable for testing whether or not two shared references are equivalent (e.g., when testing for circular references). Use is_shared(), instead:
each HASH does not work properly on shared references embedded in shared structures. For example:
Either of the following will work instead:
This module supports dual-valued variables created using dualvar() from Scalar::Util). However, while $!
acts
like a dualvar, it is implemented as a tied SV. To propagate its value, use
the follow construct, if needed:
- my $errno :shared = dualvar($!,$!);
View existing bug reports at, and submit any new bugs, problems, patches, etc. to: http://rt.cpan.org/Public/Dist/Display.html?Name=threads-shared
threads::shared Discussion Forum on CPAN: http://www.cpanforum.com/dist/threads-shared
http://www.perl.com/pub/a/2002/06/11/threads.html and http://www.perl.com/pub/a/2002/09/04/threads.html
Perl threads mailing list: http://lists.perl.org/list/ithreads.html
Artur Bergman <sky AT crucially DOT net>
Documentation borrowed from the old Thread.pm.
CPAN version produced by Jerry D. Hedden <jdhedden AT cpan DOT org>.
threads::shared is released under the same license as Perl.